The ClipTextEncodeFlux node is used to encode text prompts into Flux-compatible conditioning embeddings.
CLIPTextEncodeFlux
is an advanced text encoding node in ComfyUI, specifically designed for the Flux architecture. It uses a dual-encoder mechanism (CLIP-L and T5XXL) to process both structured keywords and detailed natural language descriptions, providing the Flux model with more accurate and comprehensive text understanding for improved text-to-image generation quality.
This node is based on a dual-encoder collaboration mechanism:
clip_l
input is processed by the CLIP-L encoder, extracting style, theme, and other keyword features—ideal for concise descriptions.t5xxl
input is processed by the T5XXL encoder, which excels at understanding complex and detailed natural language scene descriptions.guidance
parameter to generate unified conditioning embeddings (CONDITIONING
) for downstream Flux sampler nodes, controlling how closely the generated content matches the text description.Parameter | Data Type | Input Method | Default | Range | Description |
---|---|---|---|---|---|
clip | CLIP | Node input | None | - | Must be a CLIP model supporting the Flux architecture, including both CLIP-L and T5XXL encoders |
clip_l | STRING | Text box | None | Up to 77 tokens | Suitable for concise keyword descriptions, such as style or theme |
t5xxl | STRING | Text box | None | Nearly unlimited | Suitable for detailed natural language descriptions, expressing complex scenes and details |
guidance | FLOAT | Slider | 3.5 | 0.0 - 100.0 | Controls the influence of text conditions on the generation process; higher values mean stricter adherence to the text |
Output Name | Data Type | Description |
---|---|---|
CONDITIONING | CONDITIONING | Contains the fused embeddings from both encoders and the guidance parameter, used for conditional image generation |
clip_l input (keyword style):
masterpiece, best quality, portrait, oil painting, dramatic lighting
t5xxl input (natural language description):
A highly detailed portrait in oil painting style, featuring dramatic chiaroscuro lighting that creates deep shadows and bright highlights, emphasizing the subject's features with renaissance-inspired composition.
clip_l
and t5xxl
to leverage the dual-encoder advantageclip_l
guidance
parameter based on the generated results