LTXVReferenceAudio - ComfyUI Built-in Node Documentation

The LTXV Reference Audio node is used for speaker identity transfer in audio generation. It encodes a reference audio clip into the conditioning for a model, allowing the generated audio to adopt the speaker’s voice characteristics. It can also apply identity guidance, which runs an extra processing step to amplify the speaker identity effect.

Inputs

Parameter	Description	Data Type	Required	Range
`model`	The model to be patched with identity guidance.	MODEL	Yes	-
`positive`	The positive conditioning input.	CONDITIONING	Yes	-
`negative`	The negative conditioning input.	CONDITIONING	Yes	-
`reference_audio`	Reference audio clip whose speaker identity to transfer. ~5 seconds recommended (training duration). Shorter or longer clips may degrade voice identity transfer.	AUDIO	Yes	-
`audio_vae`	LTXV Audio VAE for encoding the reference audio.	VAE	Yes	-
`identity_guidance_scale`	Strength of identity guidance. Runs an extra forward pass without reference each step to amplify speaker identity. Set to 0 to disable (no extra pass). (default: 3.0)	FLOAT	No	0.0 - 100.0
`start_percent`	Start of the sigma range where identity guidance is active. (default: 0.0)	FLOAT	No	0.0 - 1.0
`end_percent`	End of the sigma range where identity guidance is active. (default: 1.0)	FLOAT	No	0.0 - 1.0

Outputs

Output Name	Description	Data Type
`model`	The model patched with the identity guidance function.	MODEL
`positive`	The positive conditioning, now containing the encoded reference audio data.	CONDITIONING
`negative`	The negative conditioning, now containing the encoded reference audio data.	CONDITIONING

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): a25e24a08df73b8a34fd476544634e396a0eec5b6dc630e911c371f1b16931b8

​Inputs

​Outputs

Inputs

Outputs