Google DeepMind introduces SIMA 2 agent for 3D virtual worlds
Google DeepMind introduced SIMA 2, a research agent designed to play, reason, converse, and learn in 3D virtual environments.
SIMA, short for Scalable Instructable Multiworld Agent, was originally built to follow language instructions across different virtual worlds. DeepMind says the first version learned more than 600 language-following skills, such as turning left, climbing a ladder, and opening a map, while interacting with games through screen input and virtual keyboard and mouse controls rather than direct access to game internals.
SIMA 2 adds Gemini as the agent’s core reasoning model. DeepMind says that lets the system understand higher-level goals, explain what it intends to do, describe the steps it is taking, and carry out more complex actions across games.
The company trained SIMA 2 with a mix of human demonstration videos, language labels, and Gemini-generated labels. DeepMind says the new version generalizes better to games and tasks it was not trained on, including the Viking survival game ASKA and MineDojo, a Minecraft research environment.
DeepMind also tested SIMA 2 with Genie 3, its world-model system for generating real-time 3D simulated worlds from a single image or text prompt. In those generated environments, SIMA 2 was able to orient itself, interpret user instructions, and take goal-directed actions, according to DeepMind.
The system remains a research project. DeepMind frames SIMA 2 as part of its work on embodied AI agents, with possible future relevance for robotics and general-purpose agent systems.