Kandinsky5ImageToVideo - ComfyUI Built-in Node Documentation

The Kandinsky5ImageToVideo node prepares conditioning and latent space data for video generation using the Kandinsky model. It creates an empty video latent tensor and can optionally encode a starting image to guide the initial frames of the generated video, modifying the positive and negative conditioning accordingly.

Inputs

Parameter	Description	Data Type	Required	Range
`positive`	The positive conditioning prompts to guide the video generation.	CONDITIONING	Yes	N/A
`negative`	The negative conditioning prompts to steer the video generation away from certain concepts.	CONDITIONING	Yes	N/A
`vae`	The VAE model used to encode the optional starting image into the latent space.	VAE	Yes	N/A
`width`	The width of the output video in pixels (default: 768).	INT	No	16 to 8192 (step 16)
`height`	The height of the output video in pixels (default: 512).	INT	No	16 to 8192 (step 16)
`length`	The number of frames in the video (default: 121).	INT	No	1 to 8192 (step 4)
`batch_size`	The number of video sequences to generate simultaneously (default: 1).	INT	No	1 to 4096
`start_image`	An optional starting image. If provided, it is encoded and used to replace the noisy start of the model’s output latents.	IMAGE	No	N/A

Note: When a start_image is provided, it is automatically resized to match the specified width and height using bilinear interpolation. The first length frames of the image batch are used for encoding. The encoded latent is then injected into both the positive and negative conditioning to guide the video’s initial appearance.

Outputs

Output Name	Description	Data Type
`positive`	The modified positive conditioning, potentially updated with encoded start image data.	CONDITIONING
`negative`	The modified negative conditioning, potentially updated with encoded start image data.	CONDITIONING
`latent`	An empty video latent tensor with zeros, shaped for the specified dimensions.	LATENT
`cond_latent`	The clean, encoded latent representation of the provided start images. This is used internally to replace the noisy beginning of the generated video latents.	LATENT

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): c171187b89f102d608ddee9dc981e56674d62b02d936b3cf4dee3ce86760fd0e

​Inputs

​Outputs

Inputs

Outputs