ElevenLabsSpeechToSpeech - ComfyUI Built-in Node Documentation

The ElevenLabs Speech to Speech node transforms an input audio file from one voice to another. It uses the ElevenLabs API to convert speech while preserving the original content and emotional tone of the audio.

Inputs

Parameter	Description	Data Type	Required	Range
`voice`	Target voice for the transformation. Connect from Voice Selector or Instant Voice Clone.	CUSTOM	Yes	-
`audio`	Source audio to transform.	AUDIO	Yes	-
`stability`	Voice stability. Lower values give broader emotional range, higher values produce more consistent but potentially monotonous speech (default: 0.5).	FLOAT	No	0.0 - 1.0
`model`	Model to use for speech-to-speech transformation. Each option provides a specific set of voice settings (similarity_boost, style, use_speaker_boost, speed).	DYNAMICCOMBO	No	`eleven_multilingual_sts_v2` `eleven_english_sts_v2`
`output_format`	Audio output format (default: “mp3_44100_192”).	COMBO	No	`"mp3_44100_192"` `"opus_48000_192"`
`seed`	Seed for reproducibility (default: 0).	INT	No	0 - 4294967295
`remove_background_noise`	Remove background noise from input audio using audio isolation (default: False).	BOOLEAN	No	-

Outputs

Output Name	Description	Data Type
`audio`	The transformed audio file in the specified output format.	AUDIO

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): a3cd602181d134b9ab517bfac092ea30b62ef5a9942a905c0c3e6959b34370ca

​Inputs

​Outputs

Inputs

Outputs