Skip to main content
The Wan22FunControlToVideo node prepares conditioning and latent representations for video generation using the Wan video model architecture. It processes positive and negative conditioning inputs along with optional reference images and control videos to create the necessary latent space representations for video synthesis. The node handles spatial scaling and temporal dimensions to generate appropriate conditioning data for video models.

Inputs

ParameterDescriptionData TypeRequiredRange
positivePositive conditioning input for guiding the video generationCONDITIONINGYes-
negativeNegative conditioning input for guiding the video generationCONDITIONINGYes-
vaeVAE model used for encoding images to latent spaceVAEYes-
widthOutput video width in pixels (default: 832, step: 16)INTYes16 to MAX_RESOLUTION
heightOutput video height in pixels (default: 480, step: 16)INTYes16 to MAX_RESOLUTION
lengthNumber of frames in the video sequence (default: 81, step: 4)INTYes1 to MAX_RESOLUTION
batch_sizeNumber of video sequences to generate (default: 1)INTYes1 to 4096
ref_imageOptional reference image for providing visual guidanceIMAGENo-
control_videoOptional control video for guiding the generation processIMAGENo-
Note: The length parameter is processed in chunks of 4 frames, and the node automatically handles temporal scaling for the latent space. When ref_image is provided, it influences the conditioning through reference latents. When control_video is provided, it directly affects the concat latent representation used in conditioning. The start_image parameter is not exposed as an input in this node’s schema but is referenced in the execution logic.

Outputs

Output NameDescriptionData Type
positiveModified positive conditioning with video-specific latent data including concat latent, mask, and optional reference latentsCONDITIONING
negativeModified negative conditioning with video-specific latent data including concat latent, mask, and optional reference latentsCONDITIONING
latentEmpty latent tensor with appropriate dimensions for video generation based on batch size, latent channels, and spatial/temporal scalingLATENT
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 3cc49969226d304e73ec924ccce902c7ae1eee819b4274ad4ffa10e67a4ea211