Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.comfy.org/llms.txt

Use this file to discover all available pages before exploring further.

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
The LTXV Reference Audio node is used for speaker identity transfer in audio generation. It encodes a reference audio clip into the conditioning for a model, allowing the generated audio to adopt the speaker’s voice characteristics. It can also apply identity guidance, which runs an extra processing step to amplify the speaker identity effect.

Inputs

ParameterData TypeRequiredRangeDescription
modelMODELYes-The model to be patched with identity guidance.
positiveCONDITIONINGYes-The positive conditioning input.
negativeCONDITIONINGYes-The negative conditioning input.
reference_audioAUDIOYes-Reference audio clip whose speaker identity to transfer. ~5 seconds recommended (training duration). Shorter or longer clips may degrade voice identity transfer.
audio_vaeVAEYes-LTXV Audio VAE for encoding the reference audio.
identity_guidance_scaleFLOATNo0.0 - 100.0Strength of identity guidance. Runs an extra forward pass without reference each step to amplify speaker identity. Set to 0 to disable (no extra pass). (default: 3.0)
start_percentFLOATNo0.0 - 1.0Start of the sigma range where identity guidance is active. (default: 0.0)
end_percentFLOATNo0.0 - 1.0End of the sigma range where identity guidance is active. (default: 1.0)

Outputs

Output NameData TypeDescription
modelMODELThe model patched with the identity guidance function.
positiveCONDITIONINGThe positive conditioning, now containing the encoded reference audio data.
negativeCONDITIONINGThe negative conditioning, now containing the encoded reference audio data.

Source fingerprint (SHA-256): a25e24a08df73b8a34fd476544634e396a0eec5b6dc630e911c371f1b16931b8