Wan Vace To Video - ComfyUI Built-in Node Documentation
Create videos using Alibaba Tongyi Wanxiang’s high-resolution video generation API
The Wan Vace To Video node allows you to generate videos through text prompts and supports multiple input methods, including text, images, videos, masks, and control signals.
This node combines input conditions (prompts), control videos, and masks to generate high-quality videos. It first preprocesses and encodes the inputs, then applies the conditional information to generate the final video latent representation. When a reference image is provided, it serves as the initial reference for the video. Control videos and masks can be used to guide the generation process, making the generated video more aligned with expectations.
Parameter Description
Required Parameters
Parameter | Type | Default | Range | Description |
---|---|---|---|---|
positive | CONDITIONING | - | - | Positive prompt condition |
negative | CONDITIONING | - | - | Negative prompt condition |
vae | VAE | - | - | VAE model for encoding/decoding |
width | INT | 832 | 16-MAX_RESOLUTION | Video width, step size 16 |
height | INT | 480 | 16-MAX_RESOLUTION | Video height, step size 16 |
length | INT | 81 | 1-MAX_RESOLUTION | Number of video frames, step size 4 |
batch_size | INT | 1 | 1-4096 | Batch size |
strength | FLOAT | 1.0 | 0.0-1000.0 | Condition strength, step size 0.01 |
Optional Parameters
Parameter | Type | Description |
---|---|---|
control_video | IMAGE | Control video for guiding the generation process |
control_masks | MASK | Control masks defining which areas should be controlled |
reference_image | IMAGE | Reference image as starting point or reference (single image) |
Output Parameters
Parameter | Type | Description |
---|---|---|
positive | CONDITIONING | Processed positive prompt condition |
negative | CONDITIONING | Processed negative prompt condition |
latent | LATENT | Generated video latent representation |
trim_latent | INT | Parameter for trimming latent representation, default value is 0. When a reference image is provided, this value is set to the shape size of the reference image in latent space. It indicates how much content from the reference image downstream nodes should trim from the generated latent representation to ensure proper control of the reference image’s influence in the final video output. |
Source Code
[Source code update time: 2025-05-15]