Skip to content
← Back to explorer

World Simulation with Video Foundation Models for Physical AI

NVIDIA, :, Arslan Ali, Junjie Bai, Maciej Bala, Yogesh Balaji, Aaron Blakeman, Tiffany Cai, Jiaxin Cao, Tianshi Cao, Elizabeth Cha, Yu-Wei Chao, Prithvijit Chattopadhyay, Mike Chen, Yongxin Chen, Yu Chen, Shuai Cheng, Yin Cui, Jenna Diamond, Yifan Ding, Jiaojiao Fan, Linxi Fan, Liang Feng, Francesco Ferroni, Sanja Fidler, Xiao Fu, Ruiyuan Gao, Yunhao Ge, Jinwei Gu, Aryaman Gupta, Siddharth Gururani, Imad El Hanafi, Ali Hassani, Zekun Hao, Jacob Huffman, Joel Jang, Pooya Jannaty, Jan Kautz, Grace Lam, Xuan Li, Zhaoshuo Li, Maosheng Liao, Chen-Hsuan Lin, Tsung-Yi Lin, Yen-Chen Lin, Huan Ling, Ming-Yu Liu, Xian Liu, Yifan Lu, Alice Luo, Qianli Ma, Hanzi Mao, Kaichun Mo, Seungjun Nah, Yashraj Narang, Abhijeet Panaskar, Lindsey Pavao, Trung Pham, Morteza Ramezanali, Fitsum Reda, Scott Reed, Xuanchi Ren, Haonan Shao, Yue Shen, Stella Shi, Shuran Song, Bartosz Stefaniak, Shangkun Sun, Shitao Tang, Sameena Tasmeen, Lyne Tchapmi, Wei-Cheng Tseng, Jibin Varghese, Andrew Z. Wang, Hao Wang, Haoxiang Wang, Heng Wang, Ting-Chun Wang, Fangyin Wei, Jiashu Xu, Dinghao Yang, Xiaodong Yang, Haotian Ye, Seonghyeon Ye, Xiaohui Zeng, Jing Zhang, Qinsheng Zhang, Kaiwen Zheng, Andrew Zhu, Yuke Zhu · Oct 28, 2025 · Citations: 0

Data freshness

Extraction: Fresh

Check recency before relying on this page for active eval decisions. Use stale pages as context and verify against current hub results.

Metadata refreshed

Feb 24, 2026, 9:52 PM

Stale

Extraction refreshed

Apr 13, 2026, 6:40 AM

Fresh

Extraction source

Persisted extraction

Confidence 0.40

Abstract

We introduce [Cosmos-Predict2.5], the latest generation of the Cosmos World Foundation Models for Physical AI. Built on a flow-based architecture, [Cosmos-Predict2.5] unifies Text2World, Image2World, and Video2World generation in a single model and leverages [Cosmos-Reason1], a Physical AI vision-language model, to provide richer text grounding and finer control of world simulation. Trained on 200M curated video clips and refined with reinforcement learning-based post-training, [Cosmos-Predict2.5] achieves substantial improvements over [Cosmos-Predict1] in video quality and instruction alignment, with models released at 2B and 14B scales. These capabilities enable more reliable synthetic data generation, policy evaluation, and closed-loop simulation for robotics and autonomous systems. We further extend the family with [Cosmos-Transfer2.5], a control-net style framework for Sim2Real and Real2Real world translation. Despite being 3.5$\times$ smaller than [Cosmos-Transfer1], it delivers higher fidelity and robust long-horizon video generation. Together, these advances establish [Cosmos-Predict2.5] and [Cosmos-Transfer2.5] as versatile tools for scaling embodied intelligence. To accelerate research and deployment in Physical AI, we release source code, pretrained checkpoints, and curated benchmarks under the NVIDIA Open Model License at https://github.com/nvidia-cosmos/cosmos-predict2.5 and https://github.com/nvidia-cosmos/cosmos-transfer2.5. We hope these open resources lower the barrier to adoption and foster innovation in building the next generation of embodied intelligence.

Low-signal caution for protocol decisions

Use this page for context, then validate protocol choices against stronger HFEPX references before implementation decisions.

  • Extraction confidence is 0.40 (below strong-reference threshold).
  • No benchmark/dataset or metric anchors were extracted.

HFEPX Relevance Assessment

This paper is adjacent to HFEPX scope and is best used for background context, not as a primary protocol reference.

Best use

Background context only

Use if you need

A secondary eval reference to pair with stronger protocol papers.

Main weakness

Extraction confidence is 0.40 (below strong-reference threshold).

Trust level

Low

Eval-Fit Score

12/100 • Low

Treat as adjacent context, not a core eval-method reference.

Human Feedback Signal

Not explicit in abstract metadata

Evaluation Signal

Detected

HFEPX Fit

Adjacent candidate

Extraction confidence: Low

Field Provenance & Confidence

Each key protocol field shows extraction state, confidence band, and data source so you can decide whether to trust it directly or validate from full text.

Human Feedback Types

missing

None explicit

Confidence: Low Source: Persisted extraction missing

No explicit feedback protocol extracted.

Evidence snippet: We introduce [Cosmos-Predict2.5], the latest generation of the Cosmos World Foundation Models for Physical AI.

Evaluation Modes

partial

Simulation Env

Confidence: Low Source: Persisted extraction evidenced

Includes extracted eval setup.

Evidence snippet: We introduce [Cosmos-Predict2.5], the latest generation of the Cosmos World Foundation Models for Physical AI.

Quality Controls

missing

Not reported

Confidence: Low Source: Persisted extraction missing

No explicit QC controls found.

Evidence snippet: We introduce [Cosmos-Predict2.5], the latest generation of the Cosmos World Foundation Models for Physical AI.

Benchmarks / Datasets

missing

Not extracted

Confidence: Low Source: Persisted extraction missing

No benchmark anchors detected.

Evidence snippet: We introduce [Cosmos-Predict2.5], the latest generation of the Cosmos World Foundation Models for Physical AI.

Reported Metrics

missing

Not extracted

Confidence: Low Source: Persisted extraction missing

No metric anchors detected.

Evidence snippet: We introduce [Cosmos-Predict2.5], the latest generation of the Cosmos World Foundation Models for Physical AI.

Rater Population

missing

Unknown

Confidence: Low Source: Persisted extraction missing

Rater source not explicitly reported.

Evidence snippet: We introduce [Cosmos-Predict2.5], the latest generation of the Cosmos World Foundation Models for Physical AI.

Human Data Lens

  • Uses human feedback: No
  • Feedback types: None
  • Rater population: Unknown
  • Unit of annotation: Unknown
  • Expertise required: Coding, Multilingual
  • Extraction source: Persisted extraction

Evaluation Lens

  • Evaluation modes: Simulation Env
  • Agentic eval: Long Horizon
  • Quality controls: Not reported
  • Confidence: 0.40
  • Flags: ambiguous

Protocol And Measurement Signals

Benchmarks / Datasets

No benchmark or dataset names were extracted from the available abstract.

Reported Metrics

No metric terms were extracted from the available abstract.

Research Brief

Deterministic synthesis

We introduce [Cosmos-Predict2.5], the latest generation of the Cosmos World Foundation Models for Physical AI. HFEPX signals include Simulation Env, Long Horizon with confidence 0.40. Updated from current HFEPX corpus.

Generated Apr 13, 2026, 6:40 AM · Grounded in abstract + metadata only

Key Takeaways

  • We introduce [Cosmos-Predict2.5], the latest generation of the Cosmos World Foundation Models for Physical AI.
  • These capabilities enable more reliable synthetic data generation, policy evaluation, and closed-loop simulation for robotics and autonomous systems.

Researcher Actions

  • Treat this as method context, then pivot to protocol-specific HFEPX hubs.
  • Identify benchmark choices from full text before operationalizing conclusions.
  • Verify metric definitions before comparing against your eval pipeline.

Caveats

  • Generated from title, abstract, and extracted metadata only; full-paper implementation details are not parsed.
  • Extraction confidence is probabilistic and should be validated for critical decisions.

Research Summary

Contribution Summary

  • We introduce [Cosmos-Predict2.5], the latest generation of the Cosmos World Foundation Models for Physical AI.
  • These capabilities enable more reliable synthetic data generation, policy evaluation, and closed-loop simulation for robotics and autonomous systems.
  • To accelerate research and deployment in Physical AI, we release source code, pretrained checkpoints, and curated benchmarks under the NVIDIA Open Model License at https://github.com/nvidia-cosmos/cosmos-predict2.5 and…

Why It Matters For Eval

  • These capabilities enable more reliable synthetic data generation, policy evaluation, and closed-loop simulation for robotics and autonomous systems.
  • To accelerate research and deployment in Physical AI, we release source code, pretrained checkpoints, and curated benchmarks under the NVIDIA Open Model License at https://github.com/nvidia-cosmos/cosmos-predict2.5 and…

Researcher Checklist

  • Gap: Human feedback protocol is explicit

    No explicit human feedback protocol detected.

  • Pass: Evaluation mode is explicit

    Detected: Simulation Env

  • Gap: Quality control reporting appears

    No calibration/adjudication/IAA control explicitly detected.

  • Gap: Benchmark or dataset anchors are present

    No benchmark/dataset anchor extracted from abstract.

  • Gap: Metric reporting is present

    No metric terms extracted.

Related Papers

Papers are ranked by protocol overlap, extraction signal alignment, and semantic proximity.

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.