About Wan2.1-Fun-Control

Wan2.1-Fun-Control is an open-source video generation and control project developed by Alibaba team. It introduces innovative Control Codes mechanisms combined with deep learning and multimodal conditional inputs to generate high-quality videos that conform to preset control conditions. The project focuses on precisely guiding generated video content through multimodal control conditions.

Currently, the Fun Control model supports various control conditions, including Canny (line art), Depth, OpenPose (human posture), MLSD (geometric edges), and trajectory control. The model also supports multi-resolution video prediction with options for 512, 768, and 1024 resolutions at 16 frames per second, generating videos up to 81 frames (approximately 5 seconds) in length.

Model versions:

  • 1.3B Lightweight: Suitable for local deployment and quick inference with lower VRAM requirements
  • 14B High-performance: Model size reaches 32GB+, offering better results but requiring higher VRAM

Here are the relevant code repositories:

ComfyUI now natively supports the Wan2.1 Fun Control model. Before starting this tutorial, please update your ComfyUI to ensure you’re using a version after this commit.

In this guide, we’ll provide two workflows:

  1. A workflow using only native Comfy Core nodes
  2. A workflow using custom nodes

Due to current limitations in native nodes for video support, the native-only workflow ensures users can complete the process without installing custom nodes. However, we’ve found that providing a good user experience for video generation is challenging without custom nodes, so we’re providing both workflow versions in this guide.

Model Installation

You only need to install these models once. The workflow images also contain model download information, so you can choose your preferred download method.

The following models can be found at Wan_2.1_ComfyUI_repackaged and Wan2.1-Fun

Click the corresponding links to download. If you’ve used Wan-related workflows before, you only need to download the Diffusion models.

Diffusion models - choose 1.3B or 14B. The 14B version has a larger file size (32GB) and higher VRAM requirements:

Text encoders - choose one of the following models (fp16 precision has a larger size and higher performance requirements):

VAE

CLIP Vision

File storage location:

📂 ComfyUI/
├── 📂 models/
│   ├── 📂 diffusion_models/
│   │   └── wan2.1_fun_control_1.3B_bf16.safetensors
│   ├── 📂 text_encoders/
│   │   └─── umt5_xxl_fp8_e4m3fn_scaled.safetensors
│   └── 📂 vae/
│   │   └── wan_2.1_vae.safetensors
│   └── 📂 clip_vision/
│       └──  clip_vision_h.safetensors                 

ComfyUI Native Workflow

In this workflow, we use videos converted to WebP format since the Load Image node doesn’t currently support mp4 format. We also use Canny Edge to preprocess the original video. Because many users encounter installation failures and environment issues when installing custom nodes, this version of the workflow uses only native nodes to ensure a smoother experience.

Thanks to our powerful ComfyUI authors who provide feature-rich nodes. If you want to directly check the related version, see Workflow Using Custom Nodes.

1. Workflow File Download

1.1 Workflow File

Download the image below and drag it into ComfyUI to load the workflow:

1.2 Input Images and Videos Download

Please download the following image and video for input:

2. Complete the Workflow Step by Step

  1. Ensure the Load Diffusion Model node has loaded wan2.1_fun_control_1.3B_bf16.safetensors
  2. Ensure the Load CLIP node has loaded umt5_xxl_fp8_e4m3fn_scaled.safetensors
  3. Ensure the Load VAE node has loaded wan_2.1_vae.safetensors
  4. Ensure the Load CLIP Vision node has loaded clip_vision_h.safetensors
  5. Upload the starting frame to the Load Image node (renamed to Start_image)
  6. Upload the control video to the second Load Image node. Note: This node currently doesn’t support mp4, only WebP videos
  7. (Optional) Modify the prompt (both English and Chinese are supported)
  8. (Optional) Adjust the video size in WanFunControlToVideo, avoiding overly large dimensions
  9. Click the Run button or use the shortcut Ctrl(cmd) + Enter to execute video generation

3. Usage Notes

  • Since we need to input the same number of frames as the control video into the WanFunControlToVideo node, if the specified frame count exceeds the actual control video frames, the excess frames may display scenes not conforming to control conditions. We’ll address this issue in the Workflow Using Custom Nodes
  • Avoid setting overly large dimensions, as this can make the sampling process very time-consuming. Try generating smaller images first, then upscale
  • Use your imagination to build upon this workflow by adding text-to-image or other types of workflows to achieve direct text-to-video generation or style transfer
  • Use tools like ComfyUI-comfyui_controlnet_aux for richer control options

Workflow Using Custom Nodes

We’ll need to install the following two custom nodes:

You can use ComfyUI Manager to install missing nodes or follow the installation instructions for each custom node package.

1. Workflow File Download

1.1 Workflow File

Download the image below and drag it into ComfyUI to load the workflow:

Due to the large size of video files, you can also click here to download the workflow file in JSON format.

1.2 Input Images and Videos Download

Please download the following image and video for input:

2. Complete the Workflow Step by Step

The model part is essentially the same. If you’ve already experienced the native-only workflow, you can directly upload the corresponding images and run it.

  1. Ensure the Load Diffusion Model node has loaded wan2.1_fun_control_1.3B_bf16.safetensors
  2. Ensure the Load CLIP node has loaded umt5_xxl_fp8_e4m3fn_scaled.safetensors
  3. Ensure the Load VAE node has loaded wan_2.1_vae.safetensors
  4. Ensure the Load CLIP Vision node has loaded clip_vision_h.safetensors
  5. Upload the starting frame to the Load Image node
  6. Upload an mp4 format video to the Load Video(Upload) custom node. Note that the workflow has adjusted the default frame_load_cap
  7. For the current image, the DWPose Estimator only uses the detect_face option
  8. (Optional) Modify the prompt (both English and Chinese are supported)
  9. (Optional) Adjust the video size in WanFunControlToVideo, avoiding overly large dimensions
  10. Click the Run button or use the shortcut Ctrl(cmd) + Enter to execute video generation

3. Workflow Notes

Thanks to the ComfyUI community authors for their custom node packages:

  • This example uses Load Video(Upload) to support mp4 videos
  • The video_info obtained from Load Video(Upload) allows us to maintain the same fps for the output video
  • You can replace DWPose Estimator with other preprocessors from the ComfyUI-comfyui_controlnet_aux node package
  • Prompts support multiple languages

Usage Tips

  • A useful tip is that you can combine multiple image preprocessing techniques and then use the Image Blend node to achieve the goal of applying multiple control methods simultaneously.

  • You can use the Video Combine node from ComfyUI-VideoHelperSuite to save videos in mp4 format

  • We use SaveAnimatedWEBP because we currently don’t support embedding workflow into mp4 and some other custom nodes may not support embedding workflow too. To preserve the workflow in the video, we choose SaveAnimatedWEBP node.

  • In the WanFunControlToVideo node, control_video is not mandatory, so sometimes you can skip using a control video, first generate a very small video size like 320x320, and then use them as control video input to achieve consistent results.

  • ComfyUI-WanVideoWrapper

  • ComfyUI-KJNodes