World
Nvidia launches Cosmos World Foundation Model platform to accelerate physical AI
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Nvidia launched its Cosmos World Foundation Model platform to accelerate physical AI development.
In a keynote speech at CES 2025 by Nvidia CEO Jensen Huang, the company said the platform includes state-of-the-art generative world foundation models, advanced tokenizers, guardrails and an accelerated video processing pipeline built to advance the development of physical AI systems such as autonomous vehicles (AVs) and robots.
Physical AI models are costly to develop, and require vast amounts of real-world data and testing. Cosmos world foundation models, or WFMs, offer developers an easy way to generate massive amounts of photoreal, physics-based synthetic data to train and evaluate their existing models. Developers can also build custom models by fine-tuning Cosmos WFMs.
Cosmos models will be available under an open model license to accelerate the work of the robotics and AV community. Developers can preview the first models on the Nvidia API catalog, or download the family of models and fine-tuning framework from the Nvidia NGCTM catalog or Hugging Face.
“It is trained on 20 million hours of video,” Huang said. “Nvidia Cosmos. It’s about teaching the AI to understand the physical world.”
Leading robotics and automotive companies, including 1X, Agile Robots, Agility, Figure AI, Foretellix, Fourier, Galbot, Hillbot, IntBot, Neura Robotics, Skild AI, Virtual Incision, Waabi, and XPENG, along with ridesharing giant Uber are among the first to adopt Cosmos.
“The ChatGPT moment for robotics is coming. Like large language models, world foundation models are fundamental to advancing robot and AV development, yet not alldevelopers have the expertise and resources to train their own,” said Jensen Huang, founder and CEO of Nvidia, in a statement. “We created Cosmos to democratize physical AI and put general robotics in reach of every developer.”
Open World Foundation Models to Accelerate the Next Wave of AI Nvidia Cosmos’ suite of open models means developers can customize the WFMs with datasets, such as video recordings of AV trips or robots navigating a warehouse, according to the needs of their target application.
Cosmos WFMs are purpose-built for physical AI research and development, and can generate physics-based videos from a combination of inputs, like text, image and video, as well as robot sensor or motion data. The models are built for physically based interactions, object permanence, and high-quality generation of simulated industrial environments — like warehouses or factories — and of driving environments, including various road conditions.
In his opening keynote at CES, Huang showcased ways physical AI developers can use Cosmos models, including for:
- Video search and understanding, enabling developers to easily find specific training scenarios, like snowy road conditions or warehouse congestion, from video data.
- Controllable 3D-to-real synthetic data generation, using Cosmos models to generate photoreal videos from controlled 3D scenarios developed in the Nvidia Omniverse platform.
- Physical AI model development and evaluation, whether building a custom model on the foundation models, improving the models using Cosmos for reinforcement learning or testing how they perform given a specific simulated scenario.
- Foresight — the ability to predict the results of a physical AI model’s next potential actions — to help it select the best action to follow.
- Multiverse simulation, using Cosmos and Omniverse to generate every possible future outcome an AI model could take to help it select the best and most accurate path.
Building physical AI models requires petabytes of video data and tens of thousands of compute hours to process, curate and label that data. To help save enormous costs in data curation, training and model customization, Cosmos features:
- An Nvidia AI and CUDA-accelerated data processing pipeline, powered by Nvidia NeMo Curator, that enables developers to process, curate and label 20 million hours of videos in fourteen days using the Nvidia Blackwell platform, instead of 3.4 years using a CPU-only pipeline.
- Nvidia Cosmos Tokenizer, a state-of-the-art visual tokenizer for converting images and videos into tokens. It delivers 8x more total compression and 12x faster processing than today’s leading tokenizers.
- The Nvidia NeMo framework for highly efficient model training, customization and optimization.
- World’s Largest Physical AI Industries Adopt Cosmos
- Pioneers across the physical AI industry are already adopting Cosmos technologies.
- 1X, an AI and humanoid robot company, launched the 1X World Model Challenge dataset using Cosmos Tokenizer. XPENG will use Cosmos to accelerate the development of its humanoid robot. And Hillbot and SkildAI are using Cosmos to fast-track the development of their general-purpose robots.
“Data-scarcity and variability are key challenges to successful learning in robot environments,” said Pras Velagapudi, chief technology officer, at Agility, in a statement. “Cosmos’ text-, image- and video-to-world capabilities allow us to generate and augment photorealistic scenarios in a variety of tasks that we can use to train models without needing as much expensive, real-world data capture.”
Transportation leaders are also using Cosmos to build physical AI for AVs.
Waabi, a company pioneering generative AI for the physical world, will use Cosmos for the search and curation of video data for AV software development and simulation
Wayve, which is developing AI foundation models for autonomous driving, is evaluatingCosmos as a tool to search for edge and corner case driving scenarios used for safety and validation.
AV toolchain provider Foretellix will use Cosmos, alongside Nvidia Omniverse Sensor RTX APIs, to evaluate and generate high-fidelity testing scenarios and training data at scale.
Global ridesharing giant Uber is partnering with NVIDIA to accelerate autonomous mobility. Rich driving datasets from Uber, combined with the features of the Cosmos platform and NVIDIA DGX Cloud, will help AV partners build stronger AI models even more efficiently.
“Generative AI will power the future of mobility, requiring both rich data and very powerfulcompute,” said Dara Khosrowshahi, CEO of Uber. “By working with Nvidia, we areconfident that we can help supercharge the timeline for safe and scalable autonomous driving solutions for the industry.”
Developing Open, Safe and Responsible AI
Nvidia Cosmos was developed in line with Nvidia’s trustworthy AI principles, which prioritize privacy, safety, security, transparency and reducing unwanted bias.
Trustworthy AI is essential for fostering innovation within the developer community and maintaining user trust. Nvidia is committed to safe and trustworthy AI, in line with the White House’s voluntary AI commitments and other global AI safety initiatives.
The open Cosmos platform includes guardrails designed to mitigate harmful text and images, and features a tool to enhance text prompts for accuracy. Videos generated with Cosmos autoregressive and diffusion models on the Nvidia API catalog include invisible watermarks to identify AI-generated content, helping reduce the chances of misinformation and misattribution.
Nvidia encourages developers to adopt trustworthy AI practices and further enhance guardrail and watermarking solutions for their applications.
Availability
Cosmos WFMs are now available under Nvidia’s open model license on Hugging Face and the Nvidia NGC catalog. Cosmos models will soon be available as fully optimized Nvidia NIM microservices.
Developers can access Nvidia NeMo Curator for accelerated video processing and customize their own world models with Nvidia NeMo. Nvidia DGX Cloud offers a fast and easy way to deploy these models, with enterprise support available through the Nvidia AI Enterprise software platform.
Nvidia also announced new Nvidia Llama Nemotron large language models and Nvidia Cosmos Nemotron vision language models that developers can use for enterprise AI use cases in healthcare, financial services, manufacturing and more.