About Wan2.1-Fun-InP

Wan-Fun InP is an open-source video generation model released by Alibaba, part of the Wan2.1-Fun series, focusing on generating videos from images with first and last frame control.

Key features:

  • First and last frame control: Supports inputting both first and last frame images to generate transitional video between them, enhancing video coherence and creative freedom. Compared to earlier community versions, Alibaba’s official model produces more stable and significantly higher quality results.
  • Multi-resolution support: Supports generating videos at 512×512, 768×768, 1024×1024 and other resolutions to accommodate different scenario requirements.

Model versions:

  • 1.3B Lightweight: Suitable for local deployment and quick inference with lower VRAM requirements
  • 14B High-performance: Model size reaches 32GB+, offering better results but requiring higher VRAM

Below are the relevant model weights and code repositories:

Currently, ComfyUI natively supports the Wan2.1 Fun InP model. Before starting this tutorial, please update your ComfyUI to ensure your version is after this commit.

Wan2.1 Fun InP Workflow

Download the image below and drag it into ComfyUI to load the workflow:

1. Workflow File Download

2. Manual Model Installation

If automatic model downloading is ineffective, please download the models manually and save them to the corresponding folders.

The following models can be found at Wan_2.1_ComfyUI_repackaged and Wan2.1-Fun

Diffusion models - choose 1.3B or 14B. The 14B version has a larger file size (32GB) and higher VRAM requirements:

Text encoders - choose one of the following models (fp16 precision has a larger size and higher performance requirements):

VAE

CLIP Vision

File storage location:

📂 ComfyUI/
├── 📂 models/
│   ├── 📂 diffusion_models/
│   │   └── Wan2.1-Fun-1.3B-InP.safetensors
│   ├── 📂 text_encoders/
│   │   └─── umt5_xxl_fp8_e4m3fn_scaled.safetensors
│   └── 📂 vae/
│   │   └── wan_2.1_vae.safetensors
│   └── 📂 clip_vision/
│       └──  clip_vision_h.safetensors                 

3. Complete the Workflow Step by Step

  1. Ensure the Load Diffusion Model node has loaded Wan2.1-Fun-1.3B-InP.safetensors
  2. Ensure the Load CLIP node has loaded umt5_xxl_fp8_e4m3fn_scaled.safetensors
  3. Ensure the Load VAE node has loaded wan_2.1_vae.safetensors
  4. Ensure the Load CLIP Vision node has loaded clip_vision_h.safetensors
  5. Upload the starting frame to the Load Image node (renamed to Start_image)
  6. Upload the ending frame to the second Load Image node
  7. (Optional) Modify the prompt (both English and Chinese are supported)
  8. (Optional) Adjust the video size in WanFunInpaintToVideo, avoiding overly large dimensions
  9. Click the Run button or use the shortcut Ctrl(cmd) + Enter to execute video generation

4. Workflow Notes

Please make sure to use the correct model, as Wan2.1-Fun-1.3B-InP.safetensors and Wan2.1-Fun-1.3B-Control.safetensors are stored in the same folder and have very similar names. Ensure you’re using the right model.

  • When using Wan Fun InP, you may need to frequently modify prompts to ensure the accuracy of the corresponding scene transitions.