Wan22FunControlToVideo - ComfyUI Built-in Node Documentation

The Wan22FunControlToVideo node prepares conditioning and latent representations for video generation using the Wan video model architecture. It processes positive and negative conditioning inputs along with optional reference images and control videos to create the necessary latent space representations for video synthesis. The node handles spatial scaling and temporal dimensions to generate appropriate conditioning data for video models.

Inputs

Parameter	Description	Data Type	Required	Range
`positive`	Positive conditioning input for guiding the video generation	CONDITIONING	Yes	-
`negative`	Negative conditioning input for guiding the video generation	CONDITIONING	Yes	-
`vae`	VAE model used for encoding images to latent space	VAE	Yes	-
`width`	Output video width in pixels (default: 832, step: 16)	INT	Yes	16 to MAX_RESOLUTION
`height`	Output video height in pixels (default: 480, step: 16)	INT	Yes	16 to MAX_RESOLUTION
`length`	Number of frames in the video sequence (default: 81, step: 4)	INT	Yes	1 to MAX_RESOLUTION
`batch_size`	Number of video sequences to generate (default: 1)	INT	Yes	1 to 4096
`ref_image`	Optional reference image for providing visual guidance	IMAGE	No	-
`control_video`	Optional control video for guiding the generation process	IMAGE	No	-

Note: The length parameter is processed in chunks of 4 frames, and the node automatically handles temporal scaling for the latent space. When ref_image is provided, it influences the conditioning through reference latents. When control_video is provided, it directly affects the concat latent representation used in conditioning. The start_image parameter is not exposed as an input in this node’s schema but is referenced in the execution logic.

Outputs

Output Name	Description	Data Type
`positive`	Modified positive conditioning with video-specific latent data including concat latent, mask, and optional reference latents	CONDITIONING
`negative`	Modified negative conditioning with video-specific latent data including concat latent, mask, and optional reference latents	CONDITIONING
`latent`	Empty latent tensor with appropriate dimensions for video generation based on batch size, latent channels, and spatial/temporal scaling	LATENT

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 3cc49969226d304e73ec924ccce902c7ae1eee819b4274ad4ffa10e67a4ea211

​Inputs

​Outputs

Inputs

Outputs