Google-spinoff Waymo is in the midst of expanding its self-driving car fleet into new regions. Waymo touts more than 200 million miles of driving that informs how the vehicles navigate roads, but the company’s AI has also driven billions of miles virtually, and there’s a lot more to come with the new Waymo World Model. Based on Google DeepMind’s Genie 3, Waymo says the model can create “hyper-realistic” simulated environments that train the AI on situations that are rarely (or never) encountered in real life—like snow on the Golden Gate Bridge.
Until recently, the autonomous driving industry relied entirely on training data collected from real cars and real situations. That means rare, potentially dangerous events are not well represented in training data. The Waymo World Model aims to address that by allowing engineers to create simulations with simple prompts and driving inputs.
Google revealed Genie 3 last year, positioning it as a significant upgrade over other world models by virtue of its long-horizon memory. In Google’s world model, you can wander away from a given object, and when you look back, the model will still “remember” how that object is supposed to look. In earlier attempts at world models, the simulation would lose that context almost immediately. With Genie 3, the model can remember details for several minutes.
Autoregressive world models like Genie don’t actually create 3D spaces, but instead render video quickly enough that it feels like an explorable world. Naturally, video games are cited as a prime application for world models, so much so that gaming company stocks dropped when Google recently expanded access to the technology as Project Genie. However, the latency and still rather short memory of Genie make gaming uses far from a certainty. Nevertheless, Waymo says Genie 3 is actually ideal for simulating the kind of data it needs to train self-driving cars.
On the road with AI
The Waymo World Model is not just a straight port of Genie 3 with dashcam videos stuffed inside. Waymo and DeepMind used a specialized post-training process to make the new model generate both 2D video and 3D lidar outputs of the same scene. While cameras are great for visualizing fine details, Waymo says lidar is necessary to add critical depth information to what a self-driving car “sees” on the road—maybe someone should tell Tesla about that.
Using a world model allows Waymo to take video from its vehicles and use prompts to change the route the vehicle takes, which it calls driving action control. These simulations, which come with lidar maps, reportedly offer greater realism and consistency than older reconstructive simulation methods.
This model can also help improve the self-driving AI even without adding or removing everything. There are plenty of dashcam videos available for training self-driving vehicles, but they lack the multimodal sensor data of Waymo’s vehicles. Dropping such a video into the Waymo World Model generates matching sensor data, showing how the driving AI would have seen that situation.
While the Waymo World Model can create entirely synthetic scenes, the company seems mostly interested in “mutating” the conditions in real videos. The blog post contains examples of changing the time of day or weather, adding new signage, or placing vehicles in unusual places. Or, hey, why not an elephant in the road?
Waymo’s early test cities were consistently sunny (like Phoenix) with little inclement weather. These kinds of simulations could help the cars adapt to the more varied conditions. The new markets include places with more difficult conditions, including Boston and Washington, D.C.
Of course, the benefit of the new AI model will depend on how accurately Genie 3 can simulate the real world. The test videos we’ve seen of Genie 3 run the gamut from pretty believable to uncanny valley territory, but Waymo believes the technology has improved to the point that it can teach self-driving cars a thing or two.
MoistCr1tikal stunned by his AI ‘clone’ that speaks perfect Chinese