> ## Documentation Index
> Fetch the complete documentation index at: https://docs.comfy.org/llms.txt
> Use this file to discover all available pages before exploring further.

# Kandinsky5ImageToVideo - ComfyUI Built-in Node Documentation

> Complete documentation for the Kandinsky5ImageToVideo node in ComfyUI. Learn its inputs, outputs, parameters and usage.

> This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! [Edit on GitHub](https://github.com/Comfy-Org/embedded-docs/blob/main/comfyui_embedded_docs/docs/Kandinsky5ImageToVideo/en.md)

The Kandinsky5ImageToVideo node prepares conditioning and latent space data for video generation using the Kandinsky model. It creates an empty video latent tensor and can optionally encode a starting image to guide the initial frames of the generated video, modifying the positive and negative conditioning accordingly.

## Inputs

| Parameter     | Data Type    | Required | Range                | Description                                                                                                               |
| ------------- | ------------ | -------- | -------------------- | ------------------------------------------------------------------------------------------------------------------------- |
| `positive`    | CONDITIONING | Yes      | N/A                  | The positive conditioning prompts to guide the video generation.                                                          |
| `negative`    | CONDITIONING | Yes      | N/A                  | The negative conditioning prompts to steer the video generation away from certain concepts.                               |
| `vae`         | VAE          | Yes      | N/A                  | The VAE model used to encode the optional starting image into the latent space.                                           |
| `width`       | INT          | No       | 16 to 8192 (step 16) | The width of the output video in pixels (default: 768).                                                                   |
| `height`      | INT          | No       | 16 to 8192 (step 16) | The height of the output video in pixels (default: 512).                                                                  |
| `length`      | INT          | No       | 1 to 8192 (step 4)   | The number of frames in the video (default: 121).                                                                         |
| `batch_size`  | INT          | No       | 1 to 4096            | The number of video sequences to generate simultaneously (default: 1).                                                    |
| `start_image` | IMAGE        | No       | N/A                  | An optional starting image. If provided, it is encoded and used to replace the noisy start of the model's output latents. |

**Note:** When a `start_image` is provided, it is automatically resized to match the specified `width` and `height` using bilinear interpolation. The first `length` frames of the image batch are used for encoding. The encoded latent is then injected into both the `positive` and `negative` conditioning to guide the video's initial appearance.

## Outputs

| Output Name   | Data Type    | Description                                                                                                                                                   |
| ------------- | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `positive`    | CONDITIONING | The modified positive conditioning, potentially updated with encoded start image data.                                                                        |
| `negative`    | CONDITIONING | The modified negative conditioning, potentially updated with encoded start image data.                                                                        |
| `latent`      | LATENT       | An empty video latent tensor with zeros, shaped for the specified dimensions.                                                                                 |
| `cond_latent` | LATENT       | The clean, encoded latent representation of the provided start images. This is used internally to replace the noisy beginning of the generated video latents. |

***

**Source fingerprint (SHA-256):** `c171187b89f102d608ddee9dc981e56674d62b02d936b3cf4dee3ce86760fd0e`