Skip to main content
The Wan22ImageToVideoLatent node creates video latent representations from images. It generates a blank video latent space with specified dimensions and can optionally encode a starting image sequence into the beginning frames. When a start image is provided, it encodes the image into the latent space and creates a corresponding noise mask for the inpainted regions.

Inputs

ParameterDescriptionData TypeRequiredRange
vaeThe VAE model used for encoding images into latent spaceVAEYes-
widthThe width of the output video in pixels (default: 1280, step: 32)INTYes32 to MAX_RESOLUTION
heightThe height of the output video in pixels (default: 704, step: 32)INTYes32 to MAX_RESOLUTION
lengthThe number of frames in the video sequence (default: 49, step: 4)INTYes1 to MAX_RESOLUTION
batch_sizeThe number of batches to generate (default: 1)INTYes1 to 4096
start_imageOptional starting image sequence to encode into the video latentIMAGENo-
Note: When start_image is provided, the node encodes the image sequence into the beginning frames of the latent space and generates a corresponding noise mask. The width and height parameters must be divisible by 16 for proper latent space dimensions. The length parameter determines the number of frames in the video latent; the latent space’s temporal dimension is calculated as ((length - 1) // 4) + 1.

Outputs

Output NameDescriptionData Type
samplesThe generated video latent representationLATENT
noise_maskThe noise mask indicating which regions should be denoised during generationLATENT
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): d12982594b1e38e7db26630fe3d5bde84bcd540e95abb6ce50cac196ea953901