Skip to main content
The SVD_img2vid_Conditioning node prepares conditioning data for video generation using Stable Video Diffusion. It takes an initial image and processes it through CLIP vision and VAE encoders to create positive and negative conditioning pairs, along with an empty latent space for video generation. This node sets up the necessary parameters for controlling motion, frame rate, and augmentation levels in the generated video.

Inputs

ParameterDescriptionData TypeRequiredRange
clip_visionCLIP vision model for encoding the input imageCLIP_VISIONYes-
init_imageInitial image to use as the starting point for video generationIMAGEYes-
vaeVAE model for encoding the image into latent spaceVAEYes-
widthOutput video width (default: 1024, step: 8)INTYes16 to MAX_RESOLUTION
heightOutput video height (default: 576, step: 8)INTYes16 to MAX_RESOLUTION
video_framesNumber of frames to generate in the video (default: 14)INTYes1 to 4096
motion_bucket_idControls the amount of motion in the generated video (default: 127)INTYes1 to 1023
fpsFrames per second for the generated video (default: 6)INTYes1 to 1024
augmentation_levelLevel of noise augmentation to apply to the input image (default: 0.0, step: 0.01)FLOATYes0.0 to 10.0

Outputs

Output NameDescriptionData Type
positivePositive conditioning data containing image embeddings and video parametersCONDITIONING
negativeNegative conditioning data with zeroed embeddings and video parametersCONDITIONING
latentEmpty latent space tensor ready for video generationLATENT
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 33b295b6f2e459852aaa95d9dca26c724aa2e9ad0f884a1c7760766530a00a09