TextEncodeQwenImageEditPlus - ComfyUI Built-in Node Documentation

The TextEncodeQwenImageEditPlus node processes text prompts and optional images to generate conditioning data for image generation or editing tasks. It uses a specialized template to analyze input images and understand how text instructions should modify them, then encodes this information for use in subsequent generation steps. The node can handle up to three input images and optionally generate reference latents when a VAE is provided.

Inputs

Parameter	Description	Data Type	Required	Range
`clip`	The CLIP model used for tokenization and encoding	CLIP	Yes	-
`prompt`	Text instruction describing the desired image modification (supports multiline input and dynamic prompts)	STRING	Yes	-
`vae`	Optional VAE model for generating reference latents from input images	VAE	No	-
`image1`	First optional input image for analysis and modification	IMAGE	No	-
`image2`	Second optional input image for analysis and modification	IMAGE	No	-
`image3`	Third optional input image for analysis and modification	IMAGE	No	-

Note: When a VAE is provided, the node generates reference latents from all input images. The node can process up to three images simultaneously. Images are automatically resized to 384x384 pixels for vision-language processing, and to dimensions divisible by 8 (with a target area of 1024x1024 pixels) for VAE encoding.

Outputs

Output Name	Description	Data Type
`CONDITIONING`	Encoded conditioning data containing text tokens and optional reference latents for image generation	CONDITIONING

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 40e0104e1a5fd88afb889948bc43559f99049a91c03c3f9885455b6dbfde343e

​Inputs

​Outputs

Inputs

Outputs