This node is designed to encode text input using a CLIP model specifically customized for the SDXL architecture. It uses a dual encoder system (CLIP-L and CLIP-G) to process text descriptions, resulting in more accurate image generation.

Inputs

ParameterData TypeDescription
clipCLIPCLIP model instance used for text encoding.
widthINTSpecifies the image width in pixels, default 1024.
heightINTSpecifies the image height in pixels, default 1024.
crop_wINTWidth of the crop area in pixels, default 0.
crop_hINTHeight of the crop area in pixels, default 0.
target_widthINTTarget width for the output image, default 1024.
target_heightINTTarget height for the output image, default 1024.
text_gSTRINGGlobal text description for overall scene description.
text_lSTRINGLocal text description for detail description.

Outputs

ParameterData TypeDescription
CONDITIONINGCONDITIONINGContains encoded text and conditional information needed for image generation.