SV3D_Conditioning - ComfyUI Built-in Node Documentation

The SV3D_Conditioning node prepares conditioning data for 3D video generation using the SV3D model. It takes an initial image and processes it through CLIP vision and VAE encoders to create positive and negative conditioning, along with a latent representation. The node generates camera elevation and azimuth sequences for multi-frame video generation based on the specified number of video frames.

Inputs

Parameter	Description	Data Type	Required	Range
`clip_vision`	The CLIP vision model used for encoding the input image	CLIP_VISION	Yes	-
`init_image`	The initial image that serves as the starting point for 3D video generation	IMAGE	Yes	-
`vae`	The VAE model used for encoding the image into latent space	VAE	Yes	-
`width`	The output width for the generated video frames (default: 576, must be divisible by 8)	INT	Yes	16 to MAX_RESOLUTION
`height`	The output height for the generated video frames (default: 576, must be divisible by 8)	INT	Yes	16 to MAX_RESOLUTION
`video_frames`	The number of frames to generate for the video sequence (default: 21)	INT	Yes	1 to 4096
`elevation`	The camera elevation angle in degrees for the 3D view (default: 0.0)	FLOAT	Yes	-90.0 to 90.0

Outputs

Output Name	Description	Data Type
`positive`	The positive conditioning data containing image embeddings and camera parameters for generation	CONDITIONING
`negative`	The negative conditioning data with zeroed embeddings for contrastive generation	CONDITIONING
`latent`	An empty latent tensor with dimensions matching the specified video frames and resolution	LATENT

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): a1d4b7f0106bcdc7c9640f6e12986d9b452f785882caaa2072ba1a5da0913f69

​Inputs

​Outputs

Inputs

Outputs