ARVideoI2V - ComfyUI Built-in Node Documentation

Overview

This node prepares an image-to-video generation setup for AR (Auto-Regressive) video models. It takes a starting image, encodes it into the latent space using a VAE, and stores the encoded image in the model’s configuration. This allows the video sampling process to use the image as the first frame, effectively seeding the generation without needing a separate image-to-video model architecture.

Inputs

Parameter	Description	Data Type	Required	Range
`model`	The AR video model to be used for generation.	MODEL	Yes	-
`vae`	The VAE model used to encode the starting image into latent space.	VAE	Yes	-
`start_image`	The initial image that will serve as the first frame of the generated video.	IMAGE	Yes	-
`width`	The width of the generated video frames (default: 832).	INT	Yes	16 to 8192 (step: 16)
`height`	The height of the generated video frames (default: 480).	INT	Yes	16 to 8192 (step: 16)
`length`	The total number of frames in the generated video (default: 81).	INT	Yes	1 to 1024 (step: 4)
`batch_size`	The number of video sequences to generate in a single batch (default: 1).	INT	Yes	1 to 64

Outputs

Output Name	Description	Data Type
`MODEL`	The cloned model with the encoded start image stored in its configuration for video generation.	MODEL
`LATENT`	An empty latent tensor with the correct dimensions for the video generation process.	LATENT

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 0445b279ba49fa946050cfa70d1e6b13240eaa600b99dfe63f27c3203dc4b61b

​Overview

​Inputs

​Outputs

Overview

Inputs

Outputs