ElevenLabsTextToSpeech - ComfyUI Built-in Node Documentation

The ElevenLabs Text to Speech node converts written text into spoken audio using the ElevenLabs API. It allows you to select a specific voice and fine-tune various speech characteristics like stability, speed, and style to generate a customized audio output.

Inputs

Parameter	Description	Data Type	Required	Range
`voice`	Voice to use for speech synthesis. Connect from Voice Selector or Instant Voice Clone.	CUSTOM	Yes	N/A
`text`	The text to convert to speech.	STRING	Yes	N/A
`stability`	Voice stability. Lower values give broader emotional range, higher values produce more consistent but potentially monotonous speech (default: 0.5).	FLOAT	No	0.0 - 1.0
`apply_text_normalization`	Text normalization mode. ‘auto’ lets the system decide, ‘on’ always applies normalization, ‘off’ skips it.	COMBO	No	`"auto"` `"on"` `"off"`
`model`	Model to use for text-to-speech. Selecting a model reveals its specific parameters.	DYNAMICCOMBO	No	`"eleven_multilingual_v2"` `"eleven_v3"`
`language_code`	ISO-639-1 or ISO-639-3 language code (e.g., ‘en’, ‘es’, ‘fra’). Leave empty for automatic detection (default: "").	STRING	No	N/A
`seed`	Seed for reproducibility (determinism not guaranteed) (default: 1).	INT	No	0 - 2147483647
`output_format`	Audio output format.	COMBO	No	`"mp3_44100_192"` `"opus_48000_192"`

Model-Specific Parameters: When the model parameter is set to "eleven_multilingual_v2", the following additional parameters become available:

speed: Speech speed. 1.0 is normal, <1.0 slower, >1.0 faster (default: 1.0, range: 0.7 - 1.3).
similarity_boost: Similarity boost. Higher values make the voice more similar to the original (default: 0.75, range: 0.0 - 1.0).
use_speaker_boost: Boost similarity to the original speaker voice (default: False).
style: Style exaggeration. Higher values increase stylistic expression but may reduce stability (default: 0.0, range: 0.0 - 0.2).

When the model parameter is set to "eleven_v3", the following additional parameters become available:

speed: Speech speed. 1.0 is normal, <1.0 slower, >1.0 faster (default: 1.0, range: 0.7 - 1.3).
similarity_boost: Similarity boost. Higher values make the voice more similar to the original (default: 0.75, range: 0.0 - 1.0).

Outputs

Output Name	Description	Data Type
`audio`	The generated audio from the text-to-speech conversion.	AUDIO

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 78ed1c6af2d0b1cc0293d725492a8b104b6d0c6bc18d9971b75047db946cdd33

​Inputs

​Outputs

Inputs

Outputs