StableZero123_Conditioning - ComfyUI Built-in Node Documentation

The StableZero123_Conditioning node processes an input image and camera angles to generate conditioning data and latent representations for 3D model generation. It uses a CLIP vision model to encode the image features, combines them with camera embedding information based on elevation and azimuth angles, and produces positive and negative conditioning along with a latent representation for downstream 3D generation tasks.

Inputs

Parameter	Description	Data Type	Required	Range
`clip_vision`	The CLIP vision model used to encode image features	CLIP_VISION	Yes	-
`init_image`	The input image to be processed and encoded	IMAGE	Yes	-
`vae`	The VAE model used for encoding pixels to latent space	VAE	Yes	-
`width`	Output width for the latent representation (default: 256, must be divisible by 8)	INT	Yes	16 to MAX_RESOLUTION
`height`	Output height for the latent representation (default: 256, must be divisible by 8)	INT	Yes	16 to MAX_RESOLUTION
`batch_size`	Number of samples to generate in the batch (default: 1)	INT	Yes	1 to 4096
`elevation`	Camera elevation angle in degrees (default: 0.0)	FLOAT	Yes	-180.0 to 180.0
`azimuth`	Camera azimuth angle in degrees (default: 0.0)	FLOAT	Yes	-180.0 to 180.0

Note: The width and height parameters must be divisible by 8 as the node automatically divides them by 8 to create the latent representation dimensions.

Outputs

Output Name	Description	Data Type
`positive`	Positive conditioning data combining image features and camera embeddings	CONDITIONING
`negative`	Negative conditioning data with zero-initialized features	CONDITIONING
`latent`	Latent representation with dimensions [batch_size, 4, height//8, width//8]	LATENT

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 197b4efaf13837500f2c3aaf589facc384b3f0bbd026aaa75a7fee509bd0bc51

​Inputs

​Outputs

Inputs

Outputs