Wan 2.2 is a new generation multimodal generative model launched by WAN AI. This model adopts an innovative MoE (Mixture of Experts) architecture, consisting of high-noise and low-noise expert models. It can divide expert models according to denoising timesteps, thus generating higher quality video content. Wan 2.2 has three core features: cinematic-level aesthetic control, deeply integrating professional film industry aesthetic standards, supporting multi-dimensional visual control such as lighting, color, and composition; large-scale complex motion, easily restoring various complex motions and enhancing the smoothness and controllability of motion; precise semantic compliance, excelling in complex scenes and multi-object generation, better restoring users’ creative intentions. The model supports multiple generation modes such as text-to-video and image-to-video, suitable for content creation, artistic creation, education and training, and other application scenarios.

Model Highlights

  • Cinematic-level Aesthetic Control: Professional camera language, supports multi-dimensional visual control such as lighting, color, and composition
  • Large-scale Complex Motion: Smoothly restores various complex motions, enhances motion controllability and naturalness
  • Precise Semantic Compliance: Complex scene understanding, multi-object generation, better restoring creative intentions
  • Efficient Compression Technology: 5B version with high compression ratio VAE, memory optimization, supports mixed training

Wan2.2 Open Source Model Versions

The Wan2.2 series models are based on the Apache 2.0 open source license and support commercial use. The Apache 2.0 license allows you to freely use, modify, and distribute these models, including for commercial purposes, as long as you retain the original copyright notice and license text.
Model TypeModel NameParametersMain FunctionModel Repository
Hybrid ModelWan2.2-TI2V-5B5BHybrid version supporting both text-to-video and image-to-video, a single model meets two core task requirements🤗 Wan2.2-TI2V-5B
Image-to-VideoWan2.2-I2V-A14B14BConverts static images into dynamic videos, maintaining content consistency and smooth dynamic process🤗 Wan2.2-I2V-A14B
Text-to-VideoWan2.2-T2V-A14B14BGenerates high-quality videos from text descriptions, with cinematic-level aesthetic control and precise semantic compliance🤗 Wan2.2-T2V-A14B
This tutorial will use the 🤗 Comfy-Org/Wan_2.2_ComfyUI_Repackaged version.
If you find missing nodes when loading the workflow file below, it may be due to the following situations:
  1. You are not using the latest Development (Nightly) version of ComfyUI.
  2. You are using the Stable (Release) version or Desktop version of ComfyUI (which does not include the latest feature updates).
  3. You are using the latest Commit version of ComfyUI, but some nodes failed to import during startup.
Please make sure you have successfully updated ComfyUI to the latest Development (Nightly) version. See: How to Update ComfyUI section to learn how to update ComfyUI.
Wan2.2 template

Wan2.2 TI2V 5B Hybrid Version Workflow Example

The Wan2.2 5B version should fit well on 8GB vram with the ComfyUI native offloading.

1. Download Workflow File

Please update your ComfyUI to the latest version, and through the menu Workflow -> Browse Templates -> Video, find “Wan2.2 5B video generation” to load the workflow.

Download JSON Workflow File

2. Manually Download Models

Diffusion Model VAE Text Encoder
ComfyUI/
├───📂 models/
│   ├───📂 diffusion_models/
│   │   └───wan2.2_ti2v_5B_fp16.safetensors
│   ├───📂 text_encoders/
│   │   └─── umt5_xxl_fp8_e4m3fn_scaled.safetensors 
│   └───📂 vae/
│       └── wan2.2_vae.safetensors

3. Follow the Workflow Steps

Step Diagram
  1. Ensure the Load Diffusion Model node loads the wan2.2_ti2v_5B_fp16.safetensors model.
  2. Ensure the Load CLIP node loads the umt5_xxl_fp8_e4m3fn_scaled.safetensors model.
  3. Ensure the Load VAE node loads the wan2.2_vae.safetensors model.
  4. (Optional) If you need to perform image-to-video generation, you can use the shortcut Ctrl+B to enable the Load image node to upload an image.
  5. (Optional) In the Wan22ImageToVideoLatent node, you can adjust the size settings and the total number of video frames (length).
  6. (Optional) If you need to modify the prompts (positive and negative), please do so in the CLIP Text Encoder node at step 5.
  7. Click the Run button, or use the shortcut Ctrl(cmd) + Enter to execute video generation.

Wan2.2 14B T2V Text-to-Video Workflow Example

1. Workflow File

Please update your ComfyUI to the latest version, and through the menu Workflow -> Browse Templates -> Video, find “Wan2.2 14B T2V” to load the workflow. Or update your ComfyUI to the latest version, then download the following video and drag it into ComfyUI to load the workflow.

Download JSON Workflow File

2. Manually Download Models

Diffusion Model VAE Text Encoder
ComfyUI/
├───📂 models/
│   ├───📂 diffusion_models/
│   │   ├─── wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors
│   │   └─── wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors
│   ├───📂 text_encoders/
│   │   └─── umt5_xxl_fp8_e4m3fn_scaled.safetensors 
│   └───📂 vae/
│       └── wan_2.1_vae.safetensors

3. Follow the Workflow Steps

Step Diagram
  1. Ensure the first Load Diffusion Model node loads the wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors model.
  2. Ensure the second Load Diffusion Model node loads the wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors model.
  3. Ensure the Load CLIP node loads the umt5_xxl_fp8_e4m3fn_scaled.safetensors model.
  4. Ensure the Load VAE node loads the wan_2.1_vae.safetensors model.
  5. (Optional) In the EmptyHunyuanLatentVideo node, you can adjust the size settings and the total number of video frames (length).
  6. (Optional) If you need to modify the prompts (positive and negative), please do so in the CLIP Text Encoder node at step 5.
  7. Click the Run button, or use the shortcut Ctrl(cmd) + Enter to execute video generation.

Wan2.2 14B I2V Image-to-Video Workflow Example

1. Workflow File

Please update your ComfyUI to the latest version, and through the menu Workflow -> Browse Templates -> Video, find “Wan2.2 14B I2V” to load the workflow. Or update your ComfyUI to the latest version, then download the following video and drag it into ComfyUI to load the workflow.

Download JSON Workflow File

You can use the following image as input: Input Image

2. Manually Download Models

Diffusion Model VAE Text Encoder
ComfyUI/
├───📂 models/
│   ├───📂 diffusion_models/
│   │   ├─── wan2.2_i2v_low_noise_14B_fp16.safetensors
│   │   └─── wan2.2_i2v_high_noise_14B_fp16.safetensors
│   ├───📂 text_encoders/
│   │   └─── umt5_xxl_fp8_e4m3fn_scaled.safetensors 
│   └───📂 vae/
│       └── wan_2.1_vae.safetensors

3. Follow the Workflow Steps

Step Diagram
  1. Make sure the first Load Diffusion Model node loads the wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors model.
  2. Make sure the second Load Diffusion Model node loads the wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors model.
  3. Make sure the Load CLIP node loads the umt5_xxl_fp8_e4m3fn_scaled.safetensors model.
  4. Make sure the Load VAE node loads the wan_2.1_vae.safetensors model.
  5. In the Load Image node, upload the image to be used as the initial frame.
  6. If you need to modify the prompts (positive and negative), do so in the CLIP Text Encoder node at step 6.
  7. (Optional) In EmptyHunyuanLatentVideo, you can adjust the size settings and the total number of video frames (length).
  8. Click the Run button, or use the shortcut Ctrl(cmd) + Enter to execute video generation.