Skip to main content
The StableZero123_Conditioning node processes an input image and camera angles to generate conditioning data and latent representations for 3D model generation. It uses a CLIP vision model to encode the image features, combines them with camera embedding information based on elevation and azimuth angles, and produces positive and negative conditioning along with a latent representation for downstream 3D generation tasks.

Inputs

ParameterDescriptionData TypeRequiredRange
clip_visionThe CLIP vision model used to encode image featuresCLIP_VISIONYes-
init_imageThe input image to be processed and encodedIMAGEYes-
vaeThe VAE model used for encoding pixels to latent spaceVAEYes-
widthOutput width for the latent representation (default: 256, must be divisible by 8)INTYes16 to MAX_RESOLUTION
heightOutput height for the latent representation (default: 256, must be divisible by 8)INTYes16 to MAX_RESOLUTION
batch_sizeNumber of samples to generate in the batch (default: 1)INTYes1 to 4096
elevationCamera elevation angle in degrees (default: 0.0)FLOATYes-180.0 to 180.0
azimuthCamera azimuth angle in degrees (default: 0.0)FLOATYes-180.0 to 180.0
Note: The width and height parameters must be divisible by 8 as the node automatically divides them by 8 to create the latent representation dimensions.

Outputs

Output NameDescriptionData Type
positivePositive conditioning data combining image features and camera embeddingsCONDITIONING
negativeNegative conditioning data with zero-initialized featuresCONDITIONING
latentLatent representation with dimensions [batch_size, 4, height//8, width//8]LATENT
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 197b4efaf13837500f2c3aaf589facc384b3f0bbd026aaa75a7fee509bd0bc51