LTXVImgToVideo - ComfyUI Built-in Node Documentation

The LTXVImgToVideo node converts an input image into a video latent representation for video generation models. It takes a single image and extends it into a sequence of frames using the VAE encoder, then applies conditioning with strength control to determine how much of the original image content is preserved versus modified during video generation.

Inputs

Parameter	Description	Data Type	Required	Range
`positive`	Positive conditioning prompts for guiding the video generation	CONDITIONING	Yes	-
`negative`	Negative conditioning prompts for avoiding certain elements in the video	CONDITIONING	Yes	-
`vae`	VAE model used for encoding the input image into latent space	VAE	Yes	-
`image`	Input image to be converted into video frames	IMAGE	Yes	-
`width`	Output video width in pixels (default: 768, step: 32)	INT	No	64 to MAX_RESOLUTION
`height`	Output video height in pixels (default: 512, step: 32)	INT	No	64 to MAX_RESOLUTION
`length`	Number of frames in the generated video (default: 97, step: 8)	INT	No	9 to MAX_RESOLUTION
`batch_size`	Number of videos to generate simultaneously (default: 1)	INT	No	1 to 4096
`strength`	Control over how much of the original image content is preserved in the first frame of the generated video. A value of 1.0 preserves the original image completely, while 0.0 allows maximum modification (default: 1.0)	FLOAT	No	0.0 to 1.0

Outputs

Output Name	Description	Data Type
`positive`	Processed positive conditioning with video frame masking applied	CONDITIONING
`negative`	Processed negative conditioning with video frame masking applied	CONDITIONING
`latent`	Video latent representation containing the encoded frames and noise mask for video generation	LATENT

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 1f9d897d1f461270106bf44106acc90db422a04e6bce10ad3bca22127e96ffab

​Inputs

​Outputs

Inputs

Outputs