SVD_img2vid_Conditioning - ComfyUI Built-in Node Documentation

The SVD_img2vid_Conditioning node prepares conditioning data for video generation using Stable Video Diffusion. It takes an initial image and processes it through CLIP vision and VAE encoders to create positive and negative conditioning pairs, along with an empty latent space for video generation. This node sets up the necessary parameters for controlling motion, frame rate, and augmentation levels in the generated video.

Inputs

Parameter	Description	Data Type	Required	Range
`clip_vision`	CLIP vision model for encoding the input image	CLIP_VISION	Yes	-
`init_image`	Initial image to use as the starting point for video generation	IMAGE	Yes	-
`vae`	VAE model for encoding the image into latent space	VAE	Yes	-
`width`	Output video width (default: 1024, step: 8)	INT	Yes	16 to MAX_RESOLUTION
`height`	Output video height (default: 576, step: 8)	INT	Yes	16 to MAX_RESOLUTION
`video_frames`	Number of frames to generate in the video (default: 14)	INT	Yes	1 to 4096
`motion_bucket_id`	Controls the amount of motion in the generated video (default: 127)	INT	Yes	1 to 1023
`fps`	Frames per second for the generated video (default: 6)	INT	Yes	1 to 1024
`augmentation_level`	Level of noise augmentation to apply to the input image (default: 0.0, step: 0.01)	FLOAT	Yes	0.0 to 10.0

Outputs

Output Name	Description	Data Type
`positive`	Positive conditioning data containing image embeddings and video parameters	CONDITIONING
`negative`	Negative conditioning data with zeroed embeddings and video parameters	CONDITIONING
`latent`	Empty latent space tensor ready for video generation	LATENT

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 33b295b6f2e459852aaa95d9dca26c724aa2e9ad0f884a1c7760766530a00a09

​Inputs

​Outputs

Inputs

Outputs