Inputs
| Parameter | Description | Data Type | Required | Range |
|---|---|---|---|---|
clip | The CLIP model used for tokenizing and encoding the text prompt. | CLIP | Yes | |
image_encoder | An optional vision encoder model. If provided, it will be used to encode the input images, and the resulting embeddings will be added to the conditioning. | CLIPVision | No | |
prompt | The text prompt to be encoded. This field supports multiline input and dynamic prompts. | STRING | Yes | |
auto_resize_images | When enabled (default: True), input images will be automatically resized based on their pixel area before being passed to the VAE for encoding. | BOOLEAN | No | |
vae | An optional VAE model. If provided, it will be used to encode the input images into latent representations, which are added to the conditioning as reference latents. | VAE | No | |
image1 | The first optional reference image. | IMAGE | No | |
image2 | The second optional reference image. | IMAGE | No | |
image3 | The third optional reference image. | IMAGE | No |
image1, image2, image3). The image_encoder and vae inputs are only utilized if at least one image is provided. When auto_resize_images is True and a vae is connected, images are resized to have a total pixel area close to 1024x1024 before encoding.
Outputs
| Output Name | Description | Data Type |
|---|---|---|
CONDITIONING | The final conditioning output, which contains the encoded text prompt and may include encoded image embeddings and/or reference latents if images were provided. | CONDITIONING |
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
Source fingerprint (SHA-256):
5edda1e70c2189c164fbde427999e74bfa21f4401feb7067e483802ca1c2df31