Bussiness
OpenAI’s o1 model is inching closer to humanlike intelligence — but don’t get carried away
Each time OpenAI shows off new research, it’s worth considering how much closer it inches the company toward its mission statement: achieving artificial general intelligence.
The coming of AGI — a type of AI that can emulate humans’ ingenuity, judgment, and reasoning — has been an industry obsession since the arrival of the Turing test in the 1950s. Three months after it released ChatGPT, OpenAI reaffirmed its ambition to deliver AGI.
So how does its latest release stack up?
On Thursday, after much anticipation, the San Francisco-based company, led by Sam Altman, unveiled OpenAI o1, a series of AI models “designed to spend more time thinking before they respond.”
OpenAI’s big claims about the models suggest it’s entering a new paradigm in the generative-AI boom. Some experts agree. But are the models putting the industry on the cusp of AGI? Not yet.
AGI is a distance away
OpenAI has tried to strike a balance between managing expectations and generating hype about its new models.
In a blog post on Thursday, OpenAI said the current GPT-4o model behind the chatbot was better for “browsing the web for information.” But it added that while the newer models don’t have “many of the features that make ChatGPT useful,” they represent a “significant advancement” for complex reasoning tasks.
The company is so confident in this claim that it said it was “resetting the counter back to 1” with the release of these new models — limited to a preview model for now — and naming them “o1” as a symbol of the new paradigm they present.
In some ways, the o1 models do enter OpenAI into a new paradigm.
The company said the models emulated the capabilities of doctoral students on “challenging benchmark tasks in physics, chemistry, and biology,” adding that they could excel in tough competitions like the International Mathematics Olympiad and the Codeforces programming contest.
There seem to be a few reasons for this boost in performance. OpenAI said it “trained these models to spend more time thinking through problems before they respond, much like a person would.”
“Through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes,” the company said.
Noam Brown, a research scientist at OpenAI, offered a useful way to think about it. The models, he wrote on X, were trained to have a “private chain of thought” before responding, which essentially means they spend more time “thinking” before they speak.
Whereas earlier AI models were bottlenecked by the data fed to them during the “pretraining” phase, Brown wrote, o1 models showed that “we can now scale inference,” or a model’s ability to process data it hasn’t been presented with.
Jim Fan, a senior research scientist at Nvidia, said the technicalities underlying this had helped make the breakthrough of OpenAI’s o1 models possible.
Fan said that’s because a huge amount of the computing power once reserved for the training portion of building an AI model had been “shifted to serving inference instead.”
But it’s not clear that this takes OpenAI much closer to AGI.
In response to an X post on Thursday from Will Depue, an OpenAI staffer who highlighted how far large language models had come in the past four years, Altman wrote, “stochastic parrots can fly so high…”
It was a subtle reference to a research paper published in 2021 that positioned the kinds of AI models OpenAI works on as technologies that appear to understand the language they generate but do not. Is Altman suggesting the o1 models are stochastic parrots?
Meanwhile, others have said the models appear to be stuck with some of the same issues associated with earlier models. Uncertainty hovers over how o1 models will perform more broadly.
Ethan Mollick, a professor of management at Wharton who spent some time experimenting with the o1 models before their unveiling on Thursday, said that despite the clear jump in reasoning capabilities, “errors and hallucinations still happen.”
Fan also said that applying o1 to products was “much harder than nailing the academic benchmarks” OpenAI used to showcase the reasoning capabilities of its new models.
How OpenAI and the wider AI industry work toward solving these problems remains to be seen.
While the reasoning capabilities of the o1 models shift OpenAI into a new era of AI development, the company this summer put its technology at stage two on a five-stage scale of intelligence.
If it’s serious about reaching its end goal of AGI, it’s got a lot more work to do.