- Precise First-Last Frame Control: The matching rate of first and last frames reaches 98%, defining video boundaries through starting and ending scenes, intelligently filling intermediate dynamic changes to achieve scene transitions and object morphing effects.
- Stable and Smooth Video Generation: Using CLIP semantic features and cross-attention mechanisms, the video jitter rate is reduced by 37% compared to similar models, ensuring natural and smooth transitions.
- Multi-functional Creative Capabilities: Supports dynamic embedding of Chinese and English subtitles, generation of anime/realistic/fantasy and other styles, adapting to different creative needs.
- 720p HD Output: Directly generates 1280×720 resolution videos without post-processing, suitable for social media and commercial applications.
- Open-source Ecosystem Support: Model weights, code, and training framework are fully open-sourced, supporting deployment on mainstream AI platforms.
- DiT Architecture: Based on diffusion models and Diffusion Transformer architecture, combined with Full Attention mechanism to optimize spatiotemporal dependency modeling, ensuring video coherence.
- 3D Causal Variational Encoder: Wan-VAE technology compresses HD frames to 1/128 size while retaining subtle dynamic details, significantly reducing memory requirements.
- Three-stage Training Strategy: Starting from 480P resolution pre-training, gradually upgrading to 720P, balancing generation quality and computational efficiency through phased optimization.
- GitHub Repository: GitHub
- Hugging Face Model Page: Hugging Face
- ModelScope Community: ModelScope
Make sure your ComfyUI is updated.Workflows in this guide can be found in the Workflow Templates.
If you can’t find them in the template, your ComfyUI may be outdated.(Desktop version’s update will delay sometime)If nodes are missing when loading a workflow, possible reasons:
- You are not using the latest ComfyUI version(Nightly version)
- You are using Stable or Desktop version (Latest changes may not be included)
- Some nodes failed to import at startup
Wan2.1 FLF2V 720P ComfyUI Native Workflow Example
1. Download Workflow Files and Related Input Files
Since this model is trained on high-resolution images, using smaller sizes may not yield good results. In the example, we use a size of 720 * 1280, which may cause users with lower VRAM hard to run smoothly and will take a long time to generate.
If needed, please adjust the video generation size for testing. A small generation size may not produce good output with this model, please notice that.



2. Manual Model Installation
If corresponding All models involved in this guide can be found here. diffusion_models Choose one version based on your hardware conditionsIf you have previously tried Wan Video related workflows, you may already have the following files.
3. Complete Workflow Execution Step by Step

- Ensure the
Load Diffusion Model
node has loadedwan2.1_flf2v_720p_14B_fp16.safetensors
orwan2.1_flf2v_720p_14B_fp8_e4m3fn.safetensors
- Ensure the
Load CLIP
node has loadedumt5_xxl_fp8_e4m3fn_scaled.safetensors
- Ensure the
Load VAE
node has loadedwan_2.1_vae.safetensors
- Ensure the
Load CLIP Vision
node has loadedclip_vision_h.safetensors
- Upload the starting frame to the
Start_image
node - Upload the ending frame to the
End_image
node - (Optional) Modify the positive and negative prompts, both Chinese and English are supported
- (Important) In
WanFirstLastFrameToVideo
we use 7201280 as default size.because it’s a 720P model, so using a small size will not yield good output. Please use size around 7201280 for good generation. - Click the
Run
button, or use the shortcutCtrl(cmd) + Enter
to execute video generation