ComfyUI Wan2.1 Fun Control Video Examples
This guide demonstrates how to use Wan2.1 Fun Control in ComfyUI to generate videos with control videos
About Wan2.1-Fun-Control
Wan2.1-Fun-Control is an open-source video generation and control project developed by Alibaba team. It introduces innovative Control Codes mechanisms combined with deep learning and multimodal conditional inputs to generate high-quality videos that conform to preset control conditions. The project focuses on precisely guiding generated video content through multimodal control conditions.
Currently, the Fun Control model supports various control conditions, including Canny (line art), Depth, OpenPose (human posture), MLSD (geometric edges), and trajectory control. The model also supports multi-resolution video prediction with options for 512, 768, and 1024 resolutions at 16 frames per second, generating videos up to 81 frames (approximately 5 seconds) in length.
Model versions:
- 1.3B Lightweight: Suitable for local deployment and quick inference with lower VRAM requirements
- 14B High-performance: Model size reaches 32GB+, offering better results but requiring higher VRAM
Here are the relevant code repositories:
- Wan2.1-Fun-1.3B-Control
- Wan2.1-Fun-14B-Control
- Code repository: VideoX-Fun
ComfyUI now natively supports the Wan2.1 Fun Control model. Before starting this tutorial, please update your ComfyUI to ensure you’re using a version after this commit.
In this guide, we’ll provide two workflows:
- A workflow using only native Comfy Core nodes
- A workflow using custom nodes
Due to current limitations in native nodes for video support, the native-only workflow ensures users can complete the process without installing custom nodes. However, we’ve found that providing a good user experience for video generation is challenging without custom nodes, so we’re providing both workflow versions in this guide.
Model Installation
You only need to install these models once. The workflow images also contain model download information, so you can choose your preferred download method.
The following models can be found at Wan_2.1_ComfyUI_repackaged and Wan2.1-Fun
Click the corresponding links to download. If you’ve used Wan-related workflows before, you only need to download the Diffusion models.
Diffusion models - choose 1.3B or 14B. The 14B version has a larger file size (32GB) and higher VRAM requirements:
- wan2.1_fun_control_1.3B_bf16.safetensors
- Wan2.1-Fun-14B-Control: Rename to
Wan2.1-Fun-14B-Control.safetensors
after downloading
Text encoders - choose one of the following models (fp16 precision has a larger size and higher performance requirements):
VAE
CLIP Vision
File storage location:
ComfyUI Native Workflow
In this workflow, we use videos converted to WebP format since the Load Image
node doesn’t currently support mp4 format. We also use Canny Edge to preprocess the original video.
Because many users encounter installation failures and environment issues when installing custom nodes, this version of the workflow uses only native nodes to ensure a smoother experience.
Thanks to our powerful ComfyUI authors who provide feature-rich nodes. If you want to directly check the related version, see Workflow Using Custom Nodes.
1. Workflow File Download
1.1 Workflow File
Download the image below and drag it into ComfyUI to load the workflow:
1.2 Input Images and Videos Download
Please download the following image and video for input:
2. Complete the Workflow Step by Step
- Ensure the
Load Diffusion Model
node has loadedwan2.1_fun_control_1.3B_bf16.safetensors
- Ensure the
Load CLIP
node has loadedumt5_xxl_fp8_e4m3fn_scaled.safetensors
- Ensure the
Load VAE
node has loadedwan_2.1_vae.safetensors
- Ensure the
Load CLIP Vision
node has loadedclip_vision_h.safetensors
- Upload the starting frame to the
Load Image
node (renamed toStart_image
) - Upload the control video to the second
Load Image
node. Note: This node currently doesn’t support mp4, only WebP videos - (Optional) Modify the prompt (both English and Chinese are supported)
- (Optional) Adjust the video size in
WanFunControlToVideo
, avoiding overly large dimensions - Click the
Run
button or use the shortcutCtrl(cmd) + Enter
to execute video generation
3. Usage Notes
- Since we need to input the same number of frames as the control video into the
WanFunControlToVideo
node, if the specified frame count exceeds the actual control video frames, the excess frames may display scenes not conforming to control conditions. We’ll address this issue in the Workflow Using Custom Nodes - Avoid setting overly large dimensions, as this can make the sampling process very time-consuming. Try generating smaller images first, then upscale
- Use your imagination to build upon this workflow by adding text-to-image or other types of workflows to achieve direct text-to-video generation or style transfer
- Use tools like ComfyUI-comfyui_controlnet_aux for richer control options
Workflow Using Custom Nodes
We’ll need to install the following two custom nodes:
You can use ComfyUI Manager to install missing nodes or follow the installation instructions for each custom node package.
1. Workflow File Download
1.1 Workflow File
Download the image below and drag it into ComfyUI to load the workflow:
Due to the large size of video files, you can also click here to download the workflow file in JSON format.
1.2 Input Images and Videos Download
Please download the following image and video for input:
2. Complete the Workflow Step by Step
The model part is essentially the same. If you’ve already experienced the native-only workflow, you can directly upload the corresponding images and run it.
- Ensure the
Load Diffusion Model
node has loadedwan2.1_fun_control_1.3B_bf16.safetensors
- Ensure the
Load CLIP
node has loadedumt5_xxl_fp8_e4m3fn_scaled.safetensors
- Ensure the
Load VAE
node has loadedwan_2.1_vae.safetensors
- Ensure the
Load CLIP Vision
node has loadedclip_vision_h.safetensors
- Upload the starting frame to the
Load Image
node - Upload an mp4 format video to the
Load Video(Upload)
custom node. Note that the workflow has adjusted the defaultframe_load_cap
- For the current image, the
DWPose Estimator
only uses thedetect_face
option - (Optional) Modify the prompt (both English and Chinese are supported)
- (Optional) Adjust the video size in
WanFunControlToVideo
, avoiding overly large dimensions - Click the
Run
button or use the shortcutCtrl(cmd) + Enter
to execute video generation
3. Workflow Notes
Thanks to the ComfyUI community authors for their custom node packages:
- This example uses
Load Video(Upload)
to support mp4 videos - The
video_info
obtained fromLoad Video(Upload)
allows us to maintain the samefps
for the output video - You can replace
DWPose Estimator
with other preprocessors from theComfyUI-comfyui_controlnet_aux
node package - Prompts support multiple languages
Usage Tips
-
A useful tip is that you can combine multiple image preprocessing techniques and then use the
Image Blend
node to achieve the goal of applying multiple control methods simultaneously. -
You can use the
Video Combine
node fromComfyUI-VideoHelperSuite
to save videos in mp4 format -
We use
SaveAnimatedWEBP
because we currently don’t support embedding workflow into mp4 and some other custom nodes may not support embedding workflow too. To preserve the workflow in the video, we chooseSaveAnimatedWEBP
node. -
In the
WanFunControlToVideo
node,control_video
is not mandatory, so sometimes you can skip using a control video, first generate a very small video size like 320x320, and then use them as control video input to achieve consistent results.
Was this page helpful?