CosmosImageToVideoLatent - ComfyUI Built-in Node Documentation

The CosmosImageToVideoLatent node creates video latent representations from input images. It generates a blank video latent and optionally encodes start and/or end images into the beginning and/or end frames of the video sequence. When images are provided, it also creates corresponding noise masks to indicate which parts of the latent should be preserved during generation.

Inputs

Parameter	Description	Data Type	Required	Range
`vae`	The VAE model used for encoding images into latent space	VAE	Yes	-
`width`	The width of the output video in pixels (default: 1280)	INT	Yes	16 to MAX_RESOLUTION
`height`	The height of the output video in pixels (default: 704)	INT	Yes	16 to MAX_RESOLUTION
`length`	The number of frames in the video sequence (default: 121)	INT	Yes	1 to MAX_RESOLUTION
`batch_size`	The number of latent batches to generate (default: 1)	INT	Yes	1 to 4096
`start_image`	Optional image to encode at the beginning of the video sequence	IMAGE	No	-
`end_image`	Optional image to encode at the end of the video sequence	IMAGE	No	-

Note: When neither start_image nor end_image are provided, the node returns a blank latent without any noise mask. When either image is provided, the corresponding sections of the latent are encoded and masked accordingly.

Outputs

Output Name	Description	Data Type
`latent`	The generated video latent representation with optional encoded images and corresponding noise masks	LATENT

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 4fefd1b6c38c93c260ef8376e8d69ba610a556b3c8555863016a1afd45885eaf

​Inputs

​Outputs

Inputs

Outputs