Google DeepMind introduces AI models for robotics

Google DeepMind Robot
Source: Google DeepMind

Google announces new Robotics AI models built on Gemini 2.0. Gemini Robotics and Gemini Robotics-ER are intended to form the foundation for a new generation of helpful robots.

Google DeepMind has announced Gemini Robotics and Gemini Robotics-ER, two new AI models that improve the application of AI in the physical world. Gemini Robotics combines language, visual, and action recognition to directly control robots, while Gemini Robotics-ER focuses on spatial understanding and programmability.

Physical Interaction

Google DeepMind builds upon its Gemini 2.0 model with the introduction of Gemini Robotics, a so-called vision-language-action (VLA) model. This model adds physical action as a new output mode, enabling robots to perform tasks in the real world. Gemini Robotics is developed to control various robot types and can adapt to new objects, instructions, and environments. The model is being tested on the bi-arm robot ALOHA 2 and the humanoid Apollo robot from Apptronik, among others.

read also

Google expands access to Gemini 2.0 AI models and introduces experimental versions

Additionally, Google DeepMind introduces Gemini Robotics-ER, a vision-language model (VLM) that gives robots better spatial understanding. This model can, for example, automatically determine a suitable grip for an object and map out safe movement patterns. Gemini Robotics-ER integrates perception, spatial insight, planning, and code generation, making the interaction between robots and their environment more efficient.

Collaboration and Safety

Google DeepMind is collaborating with companies such as Apptronik, Agile Robots, Agility Robotics, Boston Dynamics, and Enchanted Tools to further develop the models. Safety remains a key pillar, with Gemini Robotics-ER able to work with existing safety mechanisms to minimize risks. Moreover, a dataset is being released to further evaluate the safety of AI-driven robots.