OpenAI announced earlier this week that most users would have to wait until the fall to get access to the Advanced Voice feature of GPT-4o, but it seems some lucky people received a sneak peak at just what is possible with the next-generation voice assistant.
Reddit user RozziTheCreator was one of the lucky few. They shared a recording of a new GPT-4o voice we haven’t heard before telling a horror story, complete with sound effects tied to the story such as thunder and footsteps. AI writer Sambhav Gupta first highlighted the clip on X, bringing it to wider attention.
It seems Rozzi getting access was a mistake. OpenAI told me in a statement that some users were given access to the model by accident but that this has now been corrected.
What can we hear in the leaked video?
Every video we’ve had of GPT-4o advanced voice so far has been under OpenAI control, and while they’ve sounded amazing, it has been restricted to tailored use-cases.
The new video by RozziTheCreator seems to show the capability in a more natural way, including a sound effects feature we haven’t heard before.
I messaged RozziTheCreator about the experience and they said: “It just suddenly came up, it did look the same the only difference was the voice.” The discovery happened late at night when RozziTheCreator was trying to ask the chatbot a question: “Boom I discovered the change.”
It only lasted a few minutes and, according to RozziTheCreator “it was very buggy” so there wasn’t time to get much out, but they managed to record a snippet of this amazing story.
“It started going insane repeating and replying to things I didn’t say,” according to RozziTheCreator, before going back to the normal basic voice everyone else can already use.
In the video, you can hear GPT-4o eagerly telling the tale in a casual way, backed by sound effects. It expounded: “Picture this, there’s this small town, everybody knows everybody kind of video and there is this small house at the end of the street.”
It continues the tale of two teens checking the house during the storm with “nothing but a flashlight and their phones for light”.
So what went wrong with the rollout?
OpenAI is rolling out a whole host of new features slowly. The first Plus users were supposed to get GPT-4o advanced voice this month, but due to some security issues and concerns over whether they had the hardware infrastructure in place — it was delayed.
I asked OpenAI what happened that led to RozziTheCreator getting access, and a spokesperson told me: “While testing the feature, we inadvertently sent invites to a small number of ChatGPT users. This was a mistake and we’ve fixed it.”
They confirmed that the first few Plus users will get access next month, but for most people, it will be a while longer. Explaining the initial rollout will be to “gather feedback, and plan to expand based on what we learn.”
So, no GPT-4o voice yet, but this is the latest in a series of examples of GPT-4o seemingly wanting to break free of its restraints and serve up its full capabilities. I’ve seen myself examples of it analyzing audio files directly one minute, then running it through code the next.
What this has done is made me even more excited for its full capabilities and even more annoyed at the delay — however understandable it might be.