WanVaceToVideo - ComfyUI Built-in Node Documentation

The WanVaceToVideo node processes video conditioning data for video generation models. It takes positive and negative conditioning inputs along with video control data and prepares latent representations for video generation. The node handles video upscaling, masking, and VAE encoding to create the appropriate conditioning structure for video models.

Inputs

Parameter	Description	Data Type	Required	Range
`positive`	Positive conditioning input for guiding the generation	CONDITIONING	Yes	-
`negative`	Negative conditioning input for guiding the generation	CONDITIONING	Yes	-
`vae`	VAE model used for encoding images and video frames	VAE	Yes	-
`width`	Output video width in pixels (default: 832, step: 16)	INT	Yes	16 to MAX_RESOLUTION
`height`	Output video height in pixels (default: 480, step: 16)	INT	Yes	16 to MAX_RESOLUTION
`length`	Number of frames in the video (default: 81, step: 4)	INT	Yes	1 to MAX_RESOLUTION
`batch_size`	Number of videos to generate simultaneously (default: 1)	INT	Yes	1 to 4096
`strength`	Control strength for video conditioning (default: 1.0, step: 0.01)	FLOAT	Yes	0.0 to 1000.0
`control_video`	Optional input video for control conditioning. If not provided, a neutral gray video is created automatically.	IMAGE	No	-
`control_masks`	Optional masks for controlling which parts of the video to modify. If not provided, a full white mask is used.	MASK	No	-
`reference_image`	Optional reference image for additional conditioning. When provided, it is encoded and prepended to the latent sequence.	IMAGE	No	-

Note: When control_video is provided, it will be upscaled to match the specified width and height. If control_masks are provided, they must match the dimensions of the control video. The reference_image is encoded through the VAE and prepended to the latent sequence when provided. The length parameter determines the number of frames, and the latent length is calculated as ((length - 1) // 4) + 1.

Outputs

Output Name	Description	Data Type
`positive`	Positive conditioning with video control data (vace_frames, vace_mask, vace_strength) applied	CONDITIONING
`negative`	Negative conditioning with video control data (vace_frames, vace_mask, vace_strength) applied	CONDITIONING
`latent`	Empty latent tensor ready for video generation with shape [batch_size, 16, latent_length, height/8, width/8]	LATENT
`trim_latent`	Number of latent frames to trim when reference image is used (0 if no reference image is provided)	INT

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): e31638510efa11e35f297becb4a9f070fdb84d34878868aaf3525e589e5abb0b

​Inputs

​Outputs

Inputs

Outputs