> ## Documentation Index
> Fetch the complete documentation index at: https://docs.comfy.org/llms.txt
> Use this file to discover all available pages before exploring further.

# TextEncodeZImageOmni - ComfyUI Built-in Node Documentation

> Complete documentation for the TextEncodeZImageOmni node in ComfyUI. Learn its inputs, outputs, parameters and usage.

> This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! [Edit on GitHub](https://github.com/Comfy-Org/embedded-docs/blob/main/comfyui_embedded_docs/docs/TextEncodeZImageOmni/en.md)

The TextEncodeZImageOmni node is an advanced conditioning node that encodes a text prompt along with optional reference images into a conditioning format suitable for image generation models. It can process up to three images, optionally encoding them with a vision encoder and/or a VAE to produce reference latents, and integrates these visual references with the text prompt using a specific template structure.

## Inputs

| Parameter            | Data Type  | Required | Range | Description                                                                                                                                                           |
| -------------------- | ---------- | -------- | ----- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `clip`               | CLIP       | Yes      |       | The CLIP model used for tokenizing and encoding the text prompt.                                                                                                      |
| `image_encoder`      | CLIPVision | No       |       | An optional vision encoder model. If provided, it will be used to encode the input images, and the resulting embeddings will be added to the conditioning.            |
| `prompt`             | STRING     | Yes      |       | The text prompt to be encoded. This field supports multiline input and dynamic prompts.                                                                               |
| `auto_resize_images` | BOOLEAN    | No       |       | When enabled (default: True), input images will be automatically resized based on their pixel area before being passed to the VAE for encoding.                       |
| `vae`                | VAE        | No       |       | An optional VAE model. If provided, it will be used to encode the input images into latent representations, which are added to the conditioning as reference latents. |
| `image1`             | IMAGE      | No       |       | The first optional reference image.                                                                                                                                   |
| `image2`             | IMAGE      | No       |       | The second optional reference image.                                                                                                                                  |
| `image3`             | IMAGE      | No       |       | The third optional reference image.                                                                                                                                   |

**Note:** The node can accept a maximum of three images (`image1`, `image2`, `image3`). The `image_encoder` and `vae` inputs are only utilized if at least one image is provided. When `auto_resize_images` is True and a `vae` is connected, images are resized to have a total pixel area close to 1024x1024 before encoding.

## Outputs

| Output Name    | Data Type    | Description                                                                                                                                                      |
| -------------- | ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `CONDITIONING` | CONDITIONING | The final conditioning output, which contains the encoded text prompt and may include encoded image embeddings and/or reference latents if images were provided. |
