Nvidia Launches Cosmos Reason for Robotic Reasoning

Nvidia Launches Cosmos Reason for Robotic Reasoning

Nvidia has launched Cosmos Reason, an open vision-language model that combines video and text input for enhanced reasoning and decision-making in robotics and physical AI applications.

Nvidia has released Cosmos Reason, an open and fully customizable vision-language model (VLM) designed for robotics and physical AI applications. The model combines image and text processing to help robots and AI agents reason with prior knowledge, physical insight, and common sense, enabling decision-making in the real world. Developers can already download the model via Hugging Face.

Step-by-step Reasoning

Cosmos Reason converts video images into tokens via a vision encoder and projector, combines them with text input, and processes both in a core model using various LLM techniques. This results in step-by-step reasoning and logical answers for physical tasks.

Robot planning and reasoning. Source: Nvidia

The model is refined with supervised fine-tuning and reinforcement learning. Fine-tuning increases performance by over ten percent, while reinforcement learning adds another five percent. In benchmark tests for robotics and autonomous vehicles, Cosmos Reason achieves an average score of 65.7.

Applications in Robotics and AI

Nvidia also shares some potential applications, such as automated data analysis and annotation, robot planning where complex tasks are broken down into executable steps, and video analysis for sectors like urban transport, manufacturing, and logistics. AI agents can, for example, analyze traffic flows or detect malfunctions in factories.

read also

Google DeepMind introduces AI models for robotics

Developers can download the model via Hugging Face, with accompanying inference scripts and post-training tools on GitHub. The system supports various video formats and resolutions and operates based on text prompts specifying the desired task. An optional prompt upsampling model can refine text instructions.