Skip to main content
The LTXV Reference Audio node is used for speaker identity transfer in audio generation. It encodes a reference audio clip into the conditioning for a model, allowing the generated audio to adopt the speaker’s voice characteristics. It can also apply identity guidance, which runs an extra processing step to amplify the speaker identity effect.

Inputs

ParameterDescriptionData TypeRequiredRange
modelThe model to be patched with identity guidance.MODELYes-
positiveThe positive conditioning input.CONDITIONINGYes-
negativeThe negative conditioning input.CONDITIONINGYes-
reference_audioReference audio clip whose speaker identity to transfer. ~5 seconds recommended (training duration). Shorter or longer clips may degrade voice identity transfer.AUDIOYes-
audio_vaeLTXV Audio VAE for encoding the reference audio.VAEYes-
identity_guidance_scaleStrength of identity guidance. Runs an extra forward pass without reference each step to amplify speaker identity. Set to 0 to disable (no extra pass). (default: 3.0)FLOATNo0.0 - 100.0
start_percentStart of the sigma range where identity guidance is active. (default: 0.0)FLOATNo0.0 - 1.0
end_percentEnd of the sigma range where identity guidance is active. (default: 1.0)FLOATNo0.0 - 1.0

Outputs

Output NameDescriptionData Type
modelThe model patched with the identity guidance function.MODEL
positiveThe positive conditioning, now containing the encoded reference audio data.CONDITIONING
negativeThe negative conditioning, now containing the encoded reference audio data.CONDITIONING
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): a25e24a08df73b8a34fd476544634e396a0eec5b6dc630e911c371f1b16931b8