Skip to main content
The SV3D_Conditioning node prepares conditioning data for 3D video generation using the SV3D model. It takes an initial image and processes it through CLIP vision and VAE encoders to create positive and negative conditioning, along with a latent representation. The node generates camera elevation and azimuth sequences for multi-frame video generation based on the specified number of video frames.

Inputs

ParameterDescriptionData TypeRequiredRange
clip_visionThe CLIP vision model used for encoding the input imageCLIP_VISIONYes-
init_imageThe initial image that serves as the starting point for 3D video generationIMAGEYes-
vaeThe VAE model used for encoding the image into latent spaceVAEYes-
widthThe output width for the generated video frames (default: 576, must be divisible by 8)INTYes16 to MAX_RESOLUTION
heightThe output height for the generated video frames (default: 576, must be divisible by 8)INTYes16 to MAX_RESOLUTION
video_framesThe number of frames to generate for the video sequence (default: 21)INTYes1 to 4096
elevationThe camera elevation angle in degrees for the 3D view (default: 0.0)FLOATYes-90.0 to 90.0

Outputs

Output NameDescriptionData Type
positiveThe positive conditioning data containing image embeddings and camera parameters for generationCONDITIONING
negativeThe negative conditioning data with zeroed embeddings for contrastive generationCONDITIONING
latentAn empty latent tensor with dimensions matching the specified video frames and resolutionLATENT
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): a1d4b7f0106bcdc7c9640f6e12986d9b452f785882caaa2072ba1a5da0913f69