ClipTextEncodeSdxlRefiner - ComfyUI Built-in Node Documentation

This node is specifically designed for the SDXL Refiner model to convert text prompts into conditioning information by incorporating aesthetic scores and dimensional information to enhance the conditions for generation tasks, thereby improving the final refinement effect. It acts like a professional art director, not only conveying your creative intent but also injecting precise aesthetic standards and specification requirements into the work.

About SDXL Refiner

SDXL Refiner is a specialized refinement model that focuses on enhancing image details and quality based on the SDXL base model. This process is like having an art retoucher:

First, it receives preliminary images or text descriptions generated by the base model
Then, it guides the refinement process through precise aesthetic scoring and dimensional parameters
Finally, it focuses on processing high-frequency image details to improve overall quality

Refiner can be used in two ways:

As a standalone refinement step for post-processing images generated by the base model
As part of an expert integration system, taking over processing during the low-noise phase of generation

Inputs

Parameter Name	Data Type	Input Type	Default Value	Value Range	Description
`clip`	CLIP	Required	-	-	CLIP model instance used for text tokenization and encoding, the core component for converting text into model-understandable format
`ascore`	FLOAT	Optional	6.0	0.0-1000.0	Controls the visual quality and aesthetics of generated images, similar to setting quality standards for artwork: - High scores(7.5-8.5): Pursues more refined, detail-rich effects - Medium scores(6.0-7.0): Balanced quality control - Low scores(2.0-3.0): Suitable for negative prompts
`width`	INT	Required	1024	64-16384	Specifies output image width (pixels), must be multiple of 8. SDXL performs best when total pixel count is close to 1024×1024 (about 1M pixels)
`height`	INT	Required	1024	64-16384	Specifies output image height (pixels), must be multiple of 8. SDXL performs best when total pixel count is close to 1024×1024 (about 1M pixels)
`text`	STRING	Required	-	-	Text prompt description, supports multi-line input and dynamic prompt syntax. In Refiner, text prompts should focus more on describing desired visual quality and detail characteristics

Outputs

Output Name	Data Type	Description
`CONDITIONING`	CONDITIONING	Refined conditional output containing integrated encoding of text semantics, aesthetic standards, and dimensional information, specifically for guiding SDXL Refiner model in precise image refinement

Notes

This node is specifically optimized for the SDXL Refiner model and differs from regular CLIPTextEncode nodes
An aesthetic score of 7.5 is recommended as the baseline, which is the standard setting used in SDXL training
All dimensional parameters must be multiples of 8, and total pixel count close to 1024×1024 (about 1M pixels) is recommended
The Refiner model focuses on enhancing image details and quality, so text prompts should emphasize desired visual effects rather than scene content
In practical use, Refiner is typically used in the later stages of generation (approximately the last 20% of steps), focusing on detail optimization

conditioning

Image

Loader

Latent

Advanced

Sampling

3D

API Node

ClipTextEncodeSdxlRefiner - ComfyUI Built-in Node Documentation

About SDXL Refiner

Inputs

Outputs

Notes

conditioning

Image

Loader

Latent

Advanced

Sampling

3D

API Node

​About SDXL Refiner

​Inputs

​Outputs

​Notes

About SDXL Refiner

Inputs

Outputs

Notes