DeepMind reveals Genie 3 “world model” that creates real-time interactive simulations

While no one has figured out how to make money from generative artificial intelligence, that hasn't stopped Google DeepMind from pushing the boundaries of what's possible with a big pile of inference. The capabilities (and costs) of these models have been on an impressive upward trajectory, a trend exemplified by the reveal of Genie 3. A mere seven months after showing off the Genie 2 "foundational world model," which was itself a significant improvement over its predecessor, Google now has Genie 3.

With Genie 3, all it takes is a prompt or image to create an interactive world. Since the environment is continuously generated, it can be changed on the fly. You can add or change objects, alter weather conditions, or insert new characters—DeepMind calls these "promptable events." The ability to create alterable 3D environments could make games more dynamic for players and offer developers new ways to prove out concepts and level designs. However, many in the gaming industry have expressed doubt that such tools would help.

It's tempting to think of Genie 3 simply as a way to create games, but DeepMind sees this as a research tool, too. Games play a significant role in the development of artificial intelligence because they provide challenging, interactive environments with measurable progress. That's why DeepMind previously turned to games like Go and StarCraft to expand the bounds of AI.

World models take that to the next level, generating an interactive world frame by frame. This provides an opportunity to refine how AI models—including so-called "embodied agents"—behave when they encounter real-world situations. One of the primary limitations as companies work toward the goal of artificial general intelligence (AGI) is the scarcity of reliable training data. After piping basically every webpage and video on the planet into AI models, researchers are turning toward synthetic data for many applications. DeepMind believes world models could be a key part of this effort, as they can be used to train AI agents with essentially unlimited interactive worlds.

DeepMind says Genie 3 is an important advancement because it offers much higher visual fidelity than Genie 2, and it's truly real-time. Using keyboard input, it's possible to navigate the simulated world in 720p resolution at 24 frames per second. Perhaps even more importantly, Genie 3 can remember the world it creates.

One of the most glaring limitations of Genie 2 was its limited memory, which topped out at around 10 seconds in most simulations. Similarly to a chatbot that exceeds its context window, the model would forget what parts of the world looked like after they were out of view for a brief time. Google called Genie 2's meager retention "long horizon memory" when it unveiled that model. How quickly things change. The horizon for Genie 3 is much longer, pushing the bounds of world models with multiple minutes of visual consistency.

An imperfect world

Genie 3 is not a perfect world builder yet. The ability to retain details for multiple minutes could unlock more uses, but the team acknowledges that you'd ideally want a model to remain consistent for hours at least. The model also can't simulate real-world locations—everything it generates is unique and non-deterministic. That means it's also prone to the typical AI hallucinations. The team says Genie 3 has made great strides in accuracy, but it does still produce incorrect video elements. For example, the nuance of human locomotion sometimes gets lost in the generative shuffle, producing people who appear to walk backward. Text in these AI worlds is also a jumble unless the prompt includes specific strings for the model to include.

The way AI agents integrate into world models is limited, too. While you can create worlds and promptable events with realistic conditions, agents don't have a role in that. Their interaction with the simulated world is limited to moving around inside it, as current agents lack the high-level reasoning necessary to alter the simulation. DeepMind is also still experimenting with ways to allow multiple AI agents to interact with each other inside a shared environment. So maybe we'll see that in Genie 4 in a few more months?

Even those willing to pay hundreds of dollars per month for premium AI subscriptions have learned there are limits on usage for the largest and most expensive models. Genie 3 is essentially rendering a very long video so quickly that it appears interactive, which surely uses a ton of processing power. Google DeepMind isn't offering any specifics on this, but the fact that you can't use it says volumes.

Genie 3 remains a research tool, but one with capabilities DeepMind clearly wants to show off. The team plans to grant access to a group of experts and researchers who will help refine the model. They suggest, however, that the plan is to open access to Genie world models to more people.