> ## Documentation Index
> Fetch the complete documentation index at: https://docs.comfy.org/llms.txt
> Use this file to discover all available pages before exploring further.

# HunyuanVideo15ImageToVideo - ComfyUI Built-in Node Documentation

> Complete documentation for the HunyuanVideo15ImageToVideo node in ComfyUI. Learn its inputs, outputs, parameters and usage.

> This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! [Edit on GitHub](https://github.com/Comfy-Org/embedded-docs/blob/main/comfyui_embedded_docs/docs/HunyuanVideo15ImageToVideo/en.md)

The HunyuanVideo15ImageToVideo node prepares conditioning and latent space data for video generation based on the HunyuanVideo 1.5 model. It creates an initial latent representation for a video sequence and can optionally integrate a starting image or a CLIP vision output to guide the generation process.

## Inputs

| Parameter            | Data Type            | Required | Range                 | Description                                                                                                                       |
| -------------------- | -------------------- | -------- | --------------------- | --------------------------------------------------------------------------------------------------------------------------------- |
| `positive`           | CONDITIONING         | Yes      | -                     | The positive conditioning prompts that describe what the video should contain.                                                    |
| `negative`           | CONDITIONING         | Yes      | -                     | The negative conditioning prompts that describe what the video should avoid.                                                      |
| `vae`                | VAE                  | Yes      | -                     | The VAE (Variational Autoencoder) model used to encode the starting image into the latent space.                                  |
| `width`              | INT                  | No       | 16 to MAX\_RESOLUTION | The width of the output video frames in pixels. Must be divisible by 16. (default: 848)                                           |
| `height`             | INT                  | No       | 16 to MAX\_RESOLUTION | The height of the output video frames in pixels. Must be divisible by 16. (default: 480)                                          |
| `length`             | INT                  | No       | 1 to MAX\_RESOLUTION  | The total number of frames in the video sequence. (default: 33)                                                                   |
| `batch_size`         | INT                  | No       | 1 to 4096             | The number of video sequences to generate in a single batch. (default: 1)                                                         |
| `start_image`        | IMAGE                | No       | -                     | An optional starting image to initialize the video generation. If provided, it is encoded and used to condition the first frames. |
| `clip_vision_output` | CLIP\_VISION\_OUTPUT | No       | -                     | Optional CLIP vision embeddings to provide additional visual conditioning for the generation.                                     |

**Note:** When a `start_image` is provided, it is automatically resized to match the specified `width` and `height` using bilinear interpolation. The first `length` frames of the image batch are used. The encoded image is then added to both the `positive` and `negative` conditioning as a `concat_latent_image` with a corresponding `concat_mask`.

## Outputs

| Output Name | Data Type    | Description                                                                                                      |
| ----------- | ------------ | ---------------------------------------------------------------------------------------------------------------- |
| `positive`  | CONDITIONING | The modified positive conditioning, which may now include the encoded starting image or CLIP vision output.      |
| `negative`  | CONDITIONING | The modified negative conditioning, which may now include the encoded starting image or CLIP vision output.      |
| `latent`    | LATENT       | An empty latent tensor with dimensions configured for the specified batch size, video length, width, and height. |
