ComfyUI Wan2.1 FLF2V Native Example

Wan FLF2V (First-Last Frame Video Generation) is an open-source video generation model developed by the Alibaba Tongyi Wanxiang team. Its open-source license is Apache 2.0. Users only need to provide two images as the starting and ending frames, and the model automatically generates intermediate transition frames, outputting a logically coherent and naturally flowing 720p high-definition video. Core Technical Highlights

Precise First-Last Frame Control: The matching rate of first and last frames reaches 98%, defining video boundaries through starting and ending scenes, intelligently filling intermediate dynamic changes to achieve scene transitions and object morphing effects.
Stable and Smooth Video Generation: Using CLIP semantic features and cross-attention mechanisms, the video jitter rate is reduced by 37% compared to similar models, ensuring natural and smooth transitions.
Multi-functional Creative Capabilities: Supports dynamic embedding of Chinese and English subtitles, generation of anime/realistic/fantasy and other styles, adapting to different creative needs.
720p HD Output: Directly generates 1280×720 resolution videos without post-processing, suitable for social media and commercial applications.
Open-source Ecosystem Support: Model weights, code, and training framework are fully open-sourced, supporting deployment on mainstream AI platforms.

Technical Principles and Architecture

DiT Architecture: Based on diffusion models and Diffusion Transformer architecture, combined with Full Attention mechanism to optimize spatiotemporal dependency modeling, ensuring video coherence.
3D Causal Variational Encoder: Wan-VAE technology compresses HD frames to 1/128 size while retaining subtle dynamic details, significantly reducing memory requirements.
Three-stage Training Strategy: Starting from 480P resolution pre-training, gradually upgrading to 720P, balancing generation quality and computational efficiency through phased optimization.

Related Links

GitHub Repository: GitHub
Hugging Face Model Page: Hugging Face
ModelScope Community: ModelScope

If you have not installed ComfyUI, please refer to the ComfyUI System Requirements section to install ComfyUI.If you find missing nodes when loading the workflow file below, it may be due to the following situations:

You are not using the latest Development (Nightly) version of ComfyUI.
You are using the Stable (Release) version or Desktop version of ComfyUI (which does not include the latest feature updates).
You are using the latest Commit version of ComfyUI, but some nodes failed to import during startup.

Please make sure you have successfully updated ComfyUI to the latest Development (Nightly) version. See: How to Update ComfyUI section to learn how to update ComfyUI.

Wan2.1 FLF2V 720P ComfyUI Native Workflow Example

Since this model is trained on high-resolution images, using smaller sizes may not yield good results. In the example, we use a size of 720 * 1280, which may cause users with lower VRAM hard to run smoothly and will take a long time to generate. If needed, please adjust the video generation size for testing. A small generation size may not produce good output with this model, please notice that.

Please download the WebP file below, and drag it into ComfyUI to load the corresponding workflow. The workflow has embedded the corresponding model download file information. Wan2.1 FLF2V 720P f16 workflow

Please download the two images below, which we will use as the starting and ending frames of the video start_image

2. Manual Model Installation

If corresponding All models involved in this guide can be found here. diffusion_models Choose one version based on your hardware conditions

If you have previously tried Wan Video related workflows, you may already have the following files.

Choose one version from Text encoders for download,

VAE

wan_2.1_vae.safetensors

CLIP Vision

clip_vision_h.safetensors

File Storage Location

ComfyUI/
├── models/
│   ├── diffusion_models/
│   │   └─── wan2.1_flf2v_720p_14B_fp16.safetensors           # or FP8 version
│   ├── text_encoders/
│   │   └─── umt5_xxl_fp8_e4m3fn_scaled.safetensors           # or your chosen version
│   ├── vae/
│   │   └──  wan_2.1_vae.safetensors
│   └── clip_vision/
│       └──  clip_vision_h.safetensors

3. Complete Workflow Execution Step by Step

Ensure the Load Diffusion Model node has loaded wan2.1_flf2v_720p_14B_fp16.safetensors or wan2.1_flf2v_720p_14B_fp8_e4m3fn.safetensors
Ensure the Load CLIP node has loaded umt5_xxl_fp8_e4m3fn_scaled.safetensors
Ensure the Load VAE node has loaded wan_2.1_vae.safetensors
Ensure the Load CLIP Vision node has loaded clip_vision_h.safetensors
Upload the starting frame to the Start_image node
Upload the ending frame to the End_image node
(Optional) Modify the positive and negative prompts, both Chinese and English are supported
(Important) In WanFirstLastFrameToVideo we use 7201280 as default size.because it’s a 720P model, so using a small size will not yield good output. Please use size around 7201280 for good generation.
Click the Run button, or use the shortcut Ctrl(cmd) + Enter to execute video generation

Get Started

Basic Concepts

Interface Guide

Tutorials

Troubleshooting

Community

ComfyUI Wan2.1 FLF2V Native Example

Wan2.1 FLF2V 720P ComfyUI Native Workflow Example

2. Manual Model Installation

3. Complete Workflow Execution Step by Step

Get Started

Basic Concepts

Interface Guide

Tutorials

Troubleshooting

Community

​Wan2.1 FLF2V 720P ComfyUI Native Workflow Example

​1. Download Workflow Files and Related Input Files

​2. Manual Model Installation

​3. Complete Workflow Execution Step by Step

Wan2.1 FLF2V 720P ComfyUI Native Workflow Example

1. Download Workflow Files and Related Input Files

2. Manual Model Installation

3. Complete Workflow Execution Step by Step