TextEncodeAceStepAudio - ComfyUI Built-in Node Documentation

The TextEncodeAceStepAudio node processes text inputs for audio conditioning by combining tags and lyrics into tokens, then encoding them with adjustable lyrics strength. It takes a CLIP model along with text descriptions and lyrics, tokenizes them together, and generates conditioning data suitable for audio generation tasks. The node allows fine-tuning the influence of lyrics through a strength parameter that controls their impact on the final output.

Inputs

Parameter	Description	Data Type	Required	Range
`clip`	The CLIP model used for tokenization and encoding	CLIP	Yes	-
`tags`	Text tags or descriptions for audio conditioning (supports multiline input and dynamic prompts)	STRING	Yes	-
`lyrics`	Lyrics text for audio conditioning (supports multiline input and dynamic prompts)	STRING	Yes	-
`lyrics_strength`	Controls the strength of lyrics influence on the conditioning output (default: 1.0, step: 0.01)	FLOAT	No	0.0 - 10.0

Outputs

Output Name	Description	Data Type
`conditioning`	The encoded conditioning data containing processed text tokens with applied lyrics strength	CONDITIONING

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 79cdc3b7d0728a7fdb771243bc1b30f252cc322892df634584698a8f2c4d1633

​Inputs

​Outputs

Inputs

Outputs