AI is a genie that refuses to go back in the bottle—no matter how much I plead, “Put that thing back where it came from, or so help me!” Enter Google’s Genie 2, another generative model. However, rather than merely guessing which word might come next a la an LLM, or proffering a kind of sludgey-looking still image, this AI instead outputs 3D interactive environments.
Well, at least it’s not calling me ‘a stain on the universe’ like Gemini AI allegedly told one user recently. Whereas Genie 1, revealed back in February, could only cobble together 2D scenes, the just announced Genie 2 is a step up, offering somewhat explorable 3D game environments (via PC World). I say ‘somewhat’ because it isn’t long before the wheels come off.
Introducing 🧞Genie 2 🧞 – our most capable large-scale foundation world model, which can generate a diverse array of consistent worlds, playable for up to a minute. We believe Genie 2 could unlock the next wave of capabilities for embodied agents 🧠. pic.twitter.com/AfL3EbOMeBDecember 4, 2024
For one thing, the player character model struggles to look consistent throughout movement. For another, playable worlds generated by Genie 2 don’t last long; Google’s writes, “Genie 2 can generate consistent worlds for up to a minute,” though admits that most of the examples they show lasted between 10 and 20 seconds. So, no, you won’t feed Genie 2 a prompt regarding that long awaited sequel and be satisfied by the results.
Google’s prompting process is also not as straightforward as typing, “One Cyberpunk game, please,” and diving into a Night City knockoff either. Genie 2 generates its game worlds in response to still images that were themselves generated by Imagen 3, a text-to-image model also from Google. That’s a whole lot of snakes eating their own tail.
Genie 2 itself is a “an autoregressive latent diffusion model” that draws from “a large-scale video dataset” that Google doesn’t really detail the content or source thereof otherwise. Google has also experimented with feeding photos of the real world to Genie 2, highlighting responses that “model grass blowing in the wind or water flowing in a river.” The Gifs they share of these responses are about as muddy looking as you’d expect, though.
Credit where some credit may be due though, Genie 2 does make strides in a number of things generative models often struggle with. While the player character model warps and blurs like a water damaged printout, the environments one can trundle through remain surprisingly consistent—definitely bland and generic, but consistently so.
For instance, Genie 2’s “Long Horizon Memory” remembers aspects of the environment after they disappear from view. Say you’re running between pyramids—when you look back, the pyramids will stand in the same spot you last saw them, rather than teleporting around behind you like it’s a game of ‘What’s the time, Mr. Wolf?’ Similarly, Google touts Genie 2’s ‘counterfactual’ capabilities, which is really just a fancy way of saying that multiple players can play the same generated level and have a consistent experience.
Google has shared a mix of examples controlled by humans and SIMA, their own AI player “designed to complete tasks in a range of 3D game worlds by following natural-language instructions.” Many of the shared excerpts show digital avatars controlled with typical WASD keyboard controls. Google writes, “Genie 2 responds intelligently to actions taken by pressing keys on a keyboard, identifying the character and moving it correctly. For example, our model has to figure out that arrow keys should move the robot and not the trees or clouds.” Good for you, Genie 2.
But besides walking around these occasionally wonky looking levels, Genie 2 can also generate object interactions and even NPCs. Naturally, the NPCs have nothing of note to say, but Genie 2 can generate balloons to burst and barrels to explode—though that’s hardly the most compelling gameplay loop.
While no one outside of Google can yet play around with Genie 2’s output, the company is eager to tout its potential use cases, such as rapid prototyping based off of concept art. This may sound appealing—right up until you ask what happens when a developer wants to slightly adjust absolutely anything about their AI generated prototype.
While Genie 2’s snatches of game may spark the imagination of AI-defenders, I’m far from convinced—and honestly, I’m just a wee bit concerned about its potential labour implications for game development. Earlier this year, Take-Two CEO Strauss Zelnick dismissed the claim that AI would take away jobs, arguing, “It’s not going to make people irrelevant. It’s going to change the nature of certain forms of employment. And that’s a good thing.” But as Andy Chalk points out in his news piece, industry transitioning to large scale automation is not always a painless process. It’s my sincere hope that Google’s little experiment doesn’t end up motivating even more layoffs.