Wan22ImageToVideoLatent - ComfyUI Built-in Node Documentation

The Wan22ImageToVideoLatent node creates video latent representations from images. It generates a blank video latent space with specified dimensions and can optionally encode a starting image sequence into the beginning frames. When a start image is provided, it encodes the image into the latent space and creates a corresponding noise mask for the inpainted regions.

Inputs

Parameter	Description	Data Type	Required	Range
`vae`	The VAE model used for encoding images into latent space	VAE	Yes	-
`width`	The width of the output video in pixels (default: 1280, step: 32)	INT	Yes	32 to MAX_RESOLUTION
`height`	The height of the output video in pixels (default: 704, step: 32)	INT	Yes	32 to MAX_RESOLUTION
`length`	The number of frames in the video sequence (default: 49, step: 4)	INT	Yes	1 to MAX_RESOLUTION
`batch_size`	The number of batches to generate (default: 1)	INT	Yes	1 to 4096
`start_image`	Optional starting image sequence to encode into the video latent	IMAGE	No	-

Note: When start_image is provided, the node encodes the image sequence into the beginning frames of the latent space and generates a corresponding noise mask. The width and height parameters must be divisible by 16 for proper latent space dimensions. The length parameter determines the number of frames in the video latent; the latent space’s temporal dimension is calculated as ((length - 1) // 4) + 1.

Outputs

Output Name	Description	Data Type
`samples`	The generated video latent representation	LATENT
`noise_mask`	The noise mask indicating which regions should be denoised during generation	LATENT

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): d12982594b1e38e7db26630fe3d5bde84bcd540e95abb6ce50cac196ea953901

​Inputs

​Outputs

Inputs

Outputs