Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.comfy.org/llms.txt

Use this file to discover all available pages before exploring further.

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Overview

This node prepares an image-to-video generation setup for AR (Auto-Regressive) video models. It takes a starting image, encodes it into the latent space using a VAE, and stores the encoded image in the model’s configuration. This allows the video sampling process to use the image as the first frame, effectively seeding the generation without needing a separate image-to-video model architecture.

Inputs

ParameterData TypeRequiredRangeDescription
modelMODELYes-The AR video model to be used for generation.
vaeVAEYes-The VAE model used to encode the starting image into latent space.
start_imageIMAGEYes-The initial image that will serve as the first frame of the generated video.
widthINTYes16 to 8192 (step: 16)The width of the generated video frames (default: 832).
heightINTYes16 to 8192 (step: 16)The height of the generated video frames (default: 480).
lengthINTYes1 to 1024 (step: 4)The total number of frames in the generated video (default: 81).
batch_sizeINTYes1 to 64The number of video sequences to generate in a single batch (default: 1).

Outputs

Output NameData TypeDescription
MODELMODELThe cloned model with the encoded start image stored in its configuration for video generation.
LATENTLATENTAn empty latent tensor with the correct dimensions for the video generation process.

Source fingerprint (SHA-256): 0445b279ba49fa946050cfa70d1e6b13240eaa600b99dfe63f27c3203dc4b61b