TextEncodeZImageOmni - ComfyUI Built-in Node Documentation

The TextEncodeZImageOmni node is an advanced conditioning node that encodes a text prompt along with optional reference images into a conditioning format suitable for image generation models. It can process up to three images, optionally encoding them with a vision encoder and/or a VAE to produce reference latents, and integrates these visual references with the text prompt using a specific template structure.

Inputs

Parameter	Description	Data Type	Required
`clip`	The CLIP model used for tokenizing and encoding the text prompt.	CLIP	Yes
`image_encoder`	An optional vision encoder model. If provided, it will be used to encode the input images, and the resulting embeddings will be added to the conditioning.	CLIPVision	No
`prompt`	The text prompt to be encoded. This field supports multiline input and dynamic prompts.	STRING	Yes
`auto_resize_images`	When enabled (default: True), input images will be automatically resized based on their pixel area before being passed to the VAE for encoding.	BOOLEAN	No
`vae`	An optional VAE model. If provided, it will be used to encode the input images into latent representations, which are added to the conditioning as reference latents.	VAE	No
`image1`	The first optional reference image.	IMAGE	No
`image2`	The second optional reference image.	IMAGE	No
`image3`	The third optional reference image.	IMAGE	No

Note: The node can accept a maximum of three images (image1, image2, image3). The image_encoder and vae inputs are only utilized if at least one image is provided. When auto_resize_images is True and a vae is connected, images are resized to have a total pixel area close to 1024x1024 before encoding.

Outputs

Output Name	Description	Data Type
`CONDITIONING`	The final conditioning output, which contains the encoded text prompt and may include encoded image embeddings and/or reference latents if images were provided.	CONDITIONING

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 5edda1e70c2189c164fbde427999e74bfa21f4401feb7067e483802ca1c2df31

​Inputs

​Outputs

Inputs

Outputs