> ## Documentation Index
> Fetch the complete documentation index at: https://docs.comfy.org/llms.txt
> Use this file to discover all available pages before exploring further.

# ElevenLabsTextToSpeech - ComfyUI Built-in Node Documentation

> Complete documentation for the ElevenLabsTextToSpeech node in ComfyUI. Learn its inputs, outputs, parameters and usage.

> This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! [Edit on GitHub](https://github.com/Comfy-Org/embedded-docs/blob/main/comfyui_embedded_docs/docs/ElevenLabsTextToSpeech/en.md)

The ElevenLabs Text to Speech node converts written text into spoken audio using the ElevenLabs API. It allows you to select a specific voice and fine-tune various speech characteristics like stability, speed, and style to generate a customized audio output.

## Inputs

| Parameter                  | Data Type    | Required | Range                                         | Description                                                                                                                                         |
| -------------------------- | ------------ | -------- | --------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| `voice`                    | CUSTOM       | Yes      | N/A                                           | Voice to use for speech synthesis. Connect from Voice Selector or Instant Voice Clone.                                                              |
| `text`                     | STRING       | Yes      | N/A                                           | The text to convert to speech.                                                                                                                      |
| `stability`                | FLOAT        | No       | 0.0 - 1.0                                     | Voice stability. Lower values give broader emotional range, higher values produce more consistent but potentially monotonous speech (default: 0.5). |
| `apply_text_normalization` | COMBO        | No       | `"auto"`<br />`"on"`<br />`"off"`             | Text normalization mode. 'auto' lets the system decide, 'on' always applies normalization, 'off' skips it.                                          |
| `model`                    | DYNAMICCOMBO | No       | `"eleven_multilingual_v2"`<br />`"eleven_v3"` | Model to use for text-to-speech. Selecting a model reveals its specific parameters.                                                                 |
| `language_code`            | STRING       | No       | N/A                                           | ISO-639-1 or ISO-639-3 language code (e.g., 'en', 'es', 'fra'). Leave empty for automatic detection (default: "").                                  |
| `seed`                     | INT          | No       | 0 - 2147483647                                | Seed for reproducibility (determinism not guaranteed) (default: 1).                                                                                 |
| `output_format`            | COMBO        | No       | `"mp3_44100_192"`<br />`"opus_48000_192"`     | Audio output format.                                                                                                                                |

**Model-Specific Parameters:**
When the `model` parameter is set to `"eleven_multilingual_v2"`, the following additional parameters become available:

* `speed`: Speech speed. 1.0 is normal, \<1.0 slower, >1.0 faster (default: 1.0, range: 0.7 - 1.3).
* `similarity_boost`: Similarity boost. Higher values make the voice more similar to the original (default: 0.75, range: 0.0 - 1.0).
* `use_speaker_boost`: Boost similarity to the original speaker voice (default: False).
* `style`: Style exaggeration. Higher values increase stylistic expression but may reduce stability (default: 0.0, range: 0.0 - 0.2).

When the `model` parameter is set to `"eleven_v3"`, the following additional parameters become available:

* `speed`: Speech speed. 1.0 is normal, \<1.0 slower, >1.0 faster (default: 1.0, range: 0.7 - 1.3).
* `similarity_boost`: Similarity boost. Higher values make the voice more similar to the original (default: 0.75, range: 0.0 - 1.0).

## Outputs

| Output Name | Data Type | Description                                             |
| ----------- | --------- | ------------------------------------------------------- |
| `audio`     | AUDIO     | The generated audio from the text-to-speech conversion. |

***

**Source fingerprint (SHA-256):** `0cd570fbb152e07ba028e96df56abc08dde8941d043386fd076f42a1e1dc6016`