The CLIPTextEncodeHunyuanDiT node’s main function is to convert input text into a form that the model can understand. It is an advanced conditioning node specifically designed for the dual text encoder architecture of the HunyuanDiT model. Its primary role is like a translator, converting our text descriptions into “machine language” that the AI model can understand. The bert and mt5xl inputs prefer different types of prompt inputs.

Inputs

ParameterData TypeDescription
clipCLIPA CLIP model instance used for text tokenization and encoding, which is core to generating conditions.
bertSTRINGText input for encoding, prefers phrases and keywords, supports multiline and dynamic prompts.
mt5xlSTRINGAnother text input for encoding, supports multiline and dynamic prompts (multilingual), can use complete sentences and complex descriptions.

Outputs

ParameterData TypeDescription
CONDITIONINGCONDITIONINGThe encoded conditional output used for further processing in generation tasks.