About VACE

VACE 14B is an open-source unified video editing model launched by the Alibaba Tongyi Wanxiang team. Through integrating multi-task capabilities, supporting high-resolution processing and flexible multi-modal input mechanisms, this model significantly improves the efficiency and quality of video creation.

The model is open-sourced under the Apache-2.0 license and can be used for personal commercial purposes.

Here is a comprehensive analysis of its core features and technical highlights:

  • Multi-modal input: Supports multiple input forms including text, images, video, masks, and control signals
  • Unified architecture: Single model supports multiple tasks with freely combinable functions
  • Motion transfer: Generates coherent actions based on reference videos
  • Local replacement: Replaces specific areas in videos through masks
  • Video extension: Completes actions or extends backgrounds
  • Background replacement: Preserves subjects while changing environmental backgrounds

Currently VACE has released two versions - 1.3B and 14B. Compared to the 1.3B version, the 14B version supports 720P resolution output with better image details and stability.

Model480P720P
VACE-1.3B
VACE-14B

Related model weights and code repositories:

Model Download and Loading in Workflows

Since the workflows covered in this document all use the same workflow template, we can first complete the model download and loading information introduction, then enable/disable different inputs through Bypassing different nodes to achieve different workflows. The model download information is already embedded in the workflow information in specific examples, so you can also complete the model download when downloading specific example workflows.

Model Download

diffusion_models wan2.1_vace_14B_fp16.safetensors wan2.1_vace_1.3B_fp16.safetensors

If you have used Wan Video related workflows before, you have already downloaded the following model files.

VAE

Choose one version from Text encoders to download

File save location

📂 ComfyUI/
├── 📂 models/
│   ├── 📂 diffusion_models/
│   │   └─── wan2.1_vace_14B_fp16.safetensors
│   ├── 📂 text_encoders/
│   │   └─── umt5_xxl_fp8_e4m3fn_scaled.safetensors # or umt5_xxl_fp16.safetensors
│   └── 📂 vae/
│       └──  wan_2.1_vae.safetensors

Model Loading

Since the models used in the workflows covered in this document are consistent, the workflows are also the same, and only the nodes are bypassed to enable/disable different inputs, please refer to the following image to ensure that the corresponding models are correctly loaded in different workflows.

  1. Make sure the Load Diffusion Model node has loaded wan2.1_vace_14B_fp16.safetensors
  2. Make sure the Load CLIP node has loaded umt5_xxl_fp8_e4m3fn_scaled.safetensors or umt5_xxl_fp16.safetensors
  3. Make sure the Load VAE node has loaded wan_2.1_vae.safetensors

How to toggle Node Bypass Status

When a node is set to Bypass status, data passing through the node will not be affected by the node and will be output directly. We often set nodes to Bypass status when we don’t need them. Here are three ways to toggle a node’s Bypass status:

  1. After selecting the node, click the arrow in the indicator section of the selection toolbox to quickly toggle the node’s Bypass status
  2. After selecting the node, right-click the node and select Mode -> Always to switch to Always mode
  3. After selecting the node, right-click the node and select the Bypass option to toggle the Bypass status

VACE Text-to-Video Workflow

1. Workflow Download

Download the video below and drag it into ComfyUI to load the corresponding workflow

2. Complete the Workflow Step by Step

Please follow the numbered steps in the image to ensure smooth workflow execution

  1. Enter positive prompts in the CLIP Text Encode (Positive Prompt) node
  2. Enter negative prompts in the CLIP Text Encode (Negative Prompt) node
  3. Set the image dimensions (640x640 resolution recommended for first run) and frame count (video duration) in WanVaceToVideo
  4. Click the Run button or use the shortcut Ctrl(cmd) + Enter to execute video generation
  5. Once generated, the video will automatically save to ComfyUI/output/video directory (subfolder location depends on save video node settings)

During testing with a 4090 GPU:

  • 720x1280 resolution, generating 81 frames takes about 40 minutes
  • 640x640 resolution, generating 49 frames takes about 7 minutes

However, 720P video quality is better.

VACE Image-to-Video Workflow

You can continue using the workflow above, just unbypass the Load image node in Load reference image and input your image. You can also use the image below - in this file we’ve already set up the corresponding parameters.

1. Workflow Download

Download the video below and drag it into ComfyUI to load the corresponding workflow

Please download the image below as input

2. Complete the Workflow Step by Step

Please follow the numbered steps in the image to ensure smooth workflow execution

  1. Input the corresponding image in the Load image node
  2. You can modify and edit prompts like in the text-to-video workflow
  3. Set the image dimensions (640x640 resolution recommended for first run) and frame count (video duration) in WanVaceToVideo
  4. Click the Run button or use the shortcut Ctrl(cmd) + Enter to execute video generation
  5. Once generated, the video will automatically save to ComfyUI/output/video directory (subfolder location depends on save video node settings)

You may want to use nodes like getting image dimensions to set the resolution, but due to width and height step requirements of the corresponding nodes, you may get error messages if your image dimensions are not divisible by 16.

3. Additional Workflow Notes

VACE also supports inputting multiple reference images in a single image to generate corresponding videos. You can see related examples on the VACE project page

VACE Video-to-Video Workflow

1. Workflow Download

Download the video below and drag it into ComfyUI to load the corresponding workflow

We will use the following materials as input:

  1. Input image for reference

  2. The video below has been preprocessed and will be used to control video generation

  1. The video below is the original video. You can download these materials and use preprocessing nodes like comfyui_controlnet_aux to preprocess the images

2. Complete the Workflow Step by Step

Please follow the numbered steps in the image to ensure smooth workflow execution

  1. Input the reference image in the Load Image node under Load reference image
  2. Input the control video in the Load Video node under Load control video. Since the provided video is preprocessed, no additional processing is needed
  3. If you need to preprocess the original video yourself, you can modify the Image preprocessing group or use comfyui_controlnet_aux nodes to complete the preprocessing
  4. Modify prompts
  5. Set the image dimensions (640x640 resolution recommended for first run) and frame count (video duration) in WanVaceToVideo
  6. Click the Run button or use the shortcut Ctrl(cmd) + Enter to execute video generation
  7. Once generated, the video will automatically save to ComfyUI/output/video directory (subfolder location depends on save video node settings)

VACE Video Outpainting Workflow

[To be updated]

Please refer to the documentation below to learn about related nodes

WanVaceToVideo Node Documentation

WanVaceToVideo Node Documentation

TrimVideoLatent Node Documentation

ComfyUI TrimVideoLatent Node Documentation