This guide demonstrates how to complete Cosmos-Predict2 Video2World workflows in ComfyUI
Cosmos-Predict2 is NVIDIA’s next-generation physical world foundation model, specifically designed for high-quality visual generation and prediction tasks in physical AI scenarios. The model features exceptional physical accuracy, environmental interactivity, and detail reproduction capabilities, enabling realistic simulation of complex physical phenomena and dynamic scenes.
Cosmos-Predict2 supports various generation methods including Text-to-Image (Text2Image) and Video-to-World (Video2World), and is widely used in industrial simulation, autonomous driving, urban planning, scientific research, and other fields. It serves as a crucial foundational tool for promoting deep integration of intelligent vision and the physical world.
GitHub:Cosmos-predict2 huggingface: Cosmos-Predict2
This guide will walk you through completing Video2World generation in ComfyUI.
For the text-to-image section, please refer to the following part:
Using Cosmos-Predict2 for text-to-image generation
If you find missing nodes when loading the workflow file below, it may be due to the following situations:
Please make sure you have successfully updated ComfyUI to the latest Development (Nightly) version. See: How to Update ComfyUI section to learn how to update ComfyUI.
When testing the 2B version, it takes around 16GB VRAM.
Please download the video below and drag it into ComfyUI to load the workflow. The workflow already has embedded model download links.
Download Json Format Workflow File
Please download the following image as input:
If the model download wasn’t successful, you can try to download them manually by yourself in this section.
Diffusion model
For other weights, please visit Cosmos_Predict2_repackaged to download
Text encoder
oldt5_xxl_fp8_e4m3fn_scaled.safetensors
VAE
File Storage Location
Please follow the steps in the image to run the workflow:
Load Diffusion Model
node has loaded cosmos_predict2_2B_video2world_480p_16fps.safetensors
Load CLIP
node has loaded oldt5_xxl_fp8_e4m3fn_scaled.safetensors
Load VAE
node has loaded wan_2.1_vae.safetensors
Load Image
nodeCtrl(cmd) + B
to enable last frame inputClipTextEncode
nodeCosmosPredict2ImageToVideoLatent
nodeRun
button or use the shortcut Ctrl(cmd) + Enter
to run the workflowComfyUI/output/
directory, you can also preview it in the save video
node