Tech
Google’s Genie 2 “world model” reveal leaves more questions than answers
As podcaster Ryan Zhao put it on Bluesky, “The design process has gone wrong when what you need to prototype is ‘what if there was a space.'”
Gotta go fast
When Google revealed the first version of Genie earlier this year, it also published a detailed research paper outlining the specific steps taken behind the scenes to train the model and how that model generated interactive videos. No such research paper has been published detailing Genie 2’s process, leaving us guessing at some important details.
One of the most important of these details is model speed. The first Genie model generated its world at roughly one frame per second, a rate that was orders of magnitude slower than would be tolerably playable in real time. For Genie 2, Google only says that “the samples in this blog post are generated by an undistilled base model, to show what is possible. We can play a distilled version in real-time with a reduction in quality of the outputs.”
Reading between the lines, it sounds like the full version of Genie 2 operates at something well below the real-time interactions implied by those flashy GIFs. It’s unclear how much “reduction in quality” is necessary to get a diluted version of the model to real-time controls, but given the lack of examples presented by Google, we have to assume that reduction is significant.
Real-time, interactive AI video generation isn’t exactly a pipe dream. Earlier this year, AI model maker Decart and hardware maker Etched published the Oasis model, showing off a human-controllable, AI-generated video clone of Minecraft that runs at a full 20 frames per second. However, that 500 million parameter model was trained on millions of hours of footage of a single, relatively simple game, and focused exclusively on the limited set of actions and environmental designs inherent to that game.
When Oasis launched, its creators fully admitted the model “struggles with domain generalization,” showing how “realistic” starting scenes had to be reduced to simplistic Minecraft blocks to achieve good results. And even with those limitations, it’s not hard to find footage of Oasis degenerating into horrifying nightmare fuel after just a few minutes of play.