Cosmos Predict2 Text-to-Image ComfyUI Official Example

Cosmos-Predict2 is NVIDIA’s next-generation physical world foundation model, specifically designed for high-quality visual generation and prediction tasks in physical AI scenarios. The model features exceptional physical accuracy, environmental interactivity, and detail reproduction capabilities, enabling realistic simulation of complex physical phenomena and dynamic scenes. Cosmos-Predict2 supports various generation methods including Text-to-Image (Text2Image) and Video-to-World (Video2World), and is widely used in industrial simulation, autonomous driving, urban planning, scientific research, and other fields. GitHub:Cosmos-predict2 huggingface: Cosmos-Predict2 This guide will walk you through completing text-to-image workflow in ComfyUI. For the video generation section, please refer to the following part:

Cosmos Predict2 Video Generation

Using Cosmos-Predict2 for video generation

If you find missing nodes when loading the workflow file below, it may be due to the following situations:

You are not using the latest Development (Nightly) version of ComfyUI.
You are using the Stable (Release) version or Desktop version of ComfyUI (which does not include the latest feature updates).
You are using the latest Commit version of ComfyUI, but some nodes failed to import during startup.

Please make sure you have successfully updated ComfyUI to the latest Development (Nightly) version. See: How to Update ComfyUI section to learn how to update ComfyUI.

Cosmos Predict2 Video2World Workflow

When testing the 2B version showed it uses around 16GB of VRAM.

1. Workflow File

Please download the image below and drag it into ComfyUI to load the workflow. The workflow already has embedded model download links. Input Image

2. Manual Model Installation

If the model download wasn’t successful, you can try to download them manually by yourself in this section. Diffusion model

cosmos_predict2_2B_t2i.safetensors

For other weights, please visit Cosmos_Predict2_repackaged to download Text encoder oldt5_xxl_fp8_e4m3fn_scaled.safetensors VAE wan_2.1_vae.safetensors File Storage Location

📂 ComfyUI/
├──📂 models/
│   ├── 📂 diffusion_models/
│   │   └─── cosmos_predict2_2B_t2i.safetensors
│   ├── 📂 text_encoders/
│   │   └─── oldt5_xxl_fp8_e4m3fn_scaled.safetensors
│   └── 📂 vae/
│       └──  wan_2.1_vae.safetensors

3. Complete Workflow Step by Step

Please follow the steps in the image to run the workflow:

Ensure the Load Diffusion Model node has loaded cosmos_predict2_2B_t2i.safetensors
Ensure the Load CLIP node has loaded oldt5_xxl_fp8_e4m3fn_scaled.safetensors
Ensure the Load VAE node has loaded wan_2.1_vae.safetensors
Set the image size in EmptySD3LatentImage
Modify the prompts in the ClipTextEncode node
Click the Run button or use the shortcut Ctrl(cmd) + Enter to run the worklfow
Once generation is complete, the image will automatically save to the ComfyUI/output/ directory. You can also preview it in the save image node.

Get Started

Basic Concepts

Interface Guide

Tutorials

Troubleshooting

Community

Cosmos Predict2 Text-to-Image ComfyUI Official Example

Cosmos Predict2 Video Generation

Cosmos Predict2 Video2World Workflow

1. Workflow File

2. Manual Model Installation

3. Complete Workflow Step by Step

Get Started

Basic Concepts

Interface Guide

Tutorials

Troubleshooting

Community

Cosmos Predict2 Video Generation

​Cosmos Predict2 Video2World Workflow

​1. Workflow File

​2. Manual Model Installation

​3. Complete Workflow Step by Step

Cosmos Predict2 Video2World Workflow

1. Workflow File

2. Manual Model Installation

3. Complete Workflow Step by Step