Bussiness
OpenAI is reportedly struggling to improve its next big AI model. It’s a warning for the entire AI industry.
- OpenAI’s next model is showing a slower rate of improvement, The Information said.
- It’s prompted a Silicon Valley debate about whether AI models are hitting a performance plateau.
- The AI boom has moved at pace because new releases have wowed users with huge leaps in performance.
OpenAI’s next flagship artificial-intelligence model is showing smaller improvements compared with previous iterations, The Information reported, in a sign that the booming generative-AI industry may be approaching a plateau.
The ChatGPT maker’s next model, Orion, showed only a moderate improvement over GPT-4, The Information said, citing some employees who have used or tested it. The leap in Orion has been smaller than that between GPT-3 and GPT-4, especially in coding tasks, the report added.
It reignites a debate about the feasibility of developing increasingly advanced models and AI scaling laws — the theoretical rules about how the models improve.
OpenAI CEO Sam Altman said on X in February that “scaling laws are decided by god; the constants are determined by members of the technical staff.”
The “laws” Altman cited suggest AI models become smarter as they size up and get access to more data and computing power.
Altman may still subscribe to the view that a preordained formula decides how much smarter AI can get, but The Information’s report showed technical staff were questioning those laws amid a fierce debate in Silicon Valley over growing evidence that leading models are hitting a performance wall.
OpenAI did not immediately respond to a request for comment from Business Insider.
Have scaling laws hit a dead end?
While Orion’s training is not yet complete, OpenAI has nonetheless reverted to additional measures to boost performance, such as baking in post-training improvements based on human feedback, The Information said.
The model, first unveiled a year ago, could still see dramatic improvements ahead of its release. But it’s a sign that future generations of AI models that have helped companies raise billions of dollars and command lofty valuations may look less impressive with each iteration.
There are two main reasons this could happen.
Data, one vital element of the scaling-law equation, has been harder to come by as companies have quickly exhausted available data online.
They have scraped vast amounts of human-created data — including text, videos, research papers, and novels — to train the models behind their AI tools and features, but the supply is limited. The research firm Epoch AI predicted in June that firms could exhaust usable textual data by 2028. Companies are trying to overcome constraints by turning to synthetic data generated by AI itself, but that, too, comes with problems.
“For general-knowledge questions, you could argue that for now we are seeing a plateau in the performance of LLMs,” Ion Stoica, a cofounder and the executive chair of the enterprise-software firm Databricks, told The Information, adding that “factual data” was more useful than synthetic data.
Computing power, the other factor that has historically boosted AI performance, is also not limitless. In a Reddit “ask me anything” thread last month, Altman acknowledged that his company faced “a lot of limitations and hard decisions” about allocating its computing resources.
It’s no wonder that some industry experts have been starting to note that AI models released this year, as well as future ones, show evidence of producing smaller leaps in performance than their predecessors.
‘Diminishing returns’
Gary Marcus, a New York University professor emeritus and outspoken critic of AI hype, argues AI development is destined to hit a wall. He has been vocal about his belief that it shows signs of “diminishing returns” and reacted to The Information’s reporting with a Substack post headlined “CONFIRMED: LLMs have indeed reached a point of diminishing returns.”
When OpenAI’s rival Anthropic released its Claude 3.5 model in June, Marcus dismissed an X post showing Claude 3.5’s performance with marginal improvements over competitors in areas like graduate-level reasoning, code, and multilingual math. He said it was in the “same ballpark as many others.”
The AI market has spent billions of dollars trying to upend the competition, only for it to deliver evidence for “convergence, rather than continued exponential growth,” Marcus said.
Ilya Sutskever, a cofounder of OpenAI and Safe Superintelligence, has suggested a similar notion. On Monday following The Information’s report, he told Reuters that results from scaling up pretraining had plateaued, adding: “Scaling the right thing matters more now than ever.”
The AI industry will keep looking for ways to spark huge jumps in performance. Anthropic CEO Dario Amodei has predicted that AI-model training runs will enter a new era next year in which they could cost $100 billion. Altman previously said it cost more than $100 million to train GPT-4. It remains to be seen how smart an AI model can get when it has that much capital thrown at it.
Scaling optimism
Other Silicon Valley leaders, including Altman, are still publicly optimistic about AI’s scaling potential. In July, Microsoft’s chief technology officer, Kevin Scott, dismissed concerns that AI progress had plateaued. “Despite what other people think, we’re not at diminishing marginal returns on scale-up,” Scott said during an interview with Sequoia Capital’s “Training Data” podcast.
There could also be strategies to make AI models smarter by enhancing the inference portion of development. Inference is the work done to refine AI outputs once they’ve been trained, via data they haven’t seen before.
The model OpenAI released in September — called OpenAI o1 — focused more on inference improvements. It managed to outperform its predecessors in complex tasks, achieving a level of intelligence similar to that of Ph.D. students on benchmark tasks in physics, chemistry, and biology, OpenAI said.
Still, it’s clear that, like Altman, much of the industry remains firm in its conviction that scaling laws are the driver of AI performance. If future models underwhelm, expect a reassessment of the boom.