Skip to main content

Overview

This node prepares an image-to-video generation setup for AR (Auto-Regressive) video models. It takes a starting image, encodes it into the latent space using a VAE, and stores the encoded image in the model’s configuration. This allows the video sampling process to use the image as the first frame, effectively seeding the generation without needing a separate image-to-video model architecture.

Inputs

ParameterDescriptionData TypeRequiredRange
modelThe AR video model to be used for generation.MODELYes-
vaeThe VAE model used to encode the starting image into latent space.VAEYes-
start_imageThe initial image that will serve as the first frame of the generated video.IMAGEYes-
widthThe width of the generated video frames (default: 832).INTYes16 to 8192 (step: 16)
heightThe height of the generated video frames (default: 480).INTYes16 to 8192 (step: 16)
lengthThe total number of frames in the generated video (default: 81).INTYes1 to 1024 (step: 4)
batch_sizeThe number of video sequences to generate in a single batch (default: 1).INTYes1 to 64

Outputs

Output NameDescriptionData Type
MODELThe cloned model with the encoded start image stored in its configuration for video generation.MODEL
LATENTAn empty latent tensor with the correct dimensions for the video generation process.LATENT
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 0445b279ba49fa946050cfa70d1e6b13240eaa600b99dfe63f27c3203dc4b61b