Inputs
| Parameter | Description | Data Type | Required | Range |
|---|---|---|---|---|
clip_vision | The CLIP vision model used to encode image features | CLIP_VISION | Yes | - |
init_image | The input image to be processed and encoded | IMAGE | Yes | - |
vae | The VAE model used for encoding pixels to latent space | VAE | Yes | - |
width | Output width for the latent representation (default: 256, must be divisible by 8) | INT | Yes | 16 to MAX_RESOLUTION |
height | Output height for the latent representation (default: 256, must be divisible by 8) | INT | Yes | 16 to MAX_RESOLUTION |
batch_size | Number of samples to generate in the batch (default: 1) | INT | Yes | 1 to 4096 |
elevation | Camera elevation angle in degrees (default: 0.0) | FLOAT | Yes | -180.0 to 180.0 |
azimuth | Camera azimuth angle in degrees (default: 0.0) | FLOAT | Yes | -180.0 to 180.0 |
width and height parameters must be divisible by 8 as the node automatically divides them by 8 to create the latent representation dimensions.
Outputs
| Output Name | Description | Data Type |
|---|---|---|
positive | Positive conditioning data combining image features and camera embeddings | CONDITIONING |
negative | Negative conditioning data with zero-initialized features | CONDITIONING |
latent | Latent representation with dimensions [batch_size, 4, height//8, width//8] | LATENT |
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
Source fingerprint (SHA-256):
197b4efaf13837500f2c3aaf589facc384b3f0bbd026aaa75a7fee509bd0bc51