Inputs
| Parameter | Description | Data Type | Required | Range |
|---|---|---|---|---|
voice | Voice to use for speech synthesis. Connect from Voice Selector or Instant Voice Clone. | CUSTOM | Yes | N/A |
text | The text to convert to speech. | STRING | Yes | N/A |
stability | Voice stability. Lower values give broader emotional range, higher values produce more consistent but potentially monotonous speech (default: 0.5). | FLOAT | No | 0.0 - 1.0 |
apply_text_normalization | Text normalization mode. ‘auto’ lets the system decide, ‘on’ always applies normalization, ‘off’ skips it. | COMBO | No | "auto""on""off" |
model | Model to use for text-to-speech. Selecting a model reveals its specific parameters. | DYNAMICCOMBO | No | "eleven_multilingual_v2""eleven_v3" |
language_code | ISO-639-1 or ISO-639-3 language code (e.g., ‘en’, ‘es’, ‘fra’). Leave empty for automatic detection (default: ""). | STRING | No | N/A |
seed | Seed for reproducibility (determinism not guaranteed) (default: 1). | INT | No | 0 - 2147483647 |
output_format | Audio output format. | COMBO | No | "mp3_44100_192""opus_48000_192" |
model parameter is set to "eleven_multilingual_v2", the following additional parameters become available:
speed: Speech speed. 1.0 is normal, <1.0 slower, >1.0 faster (default: 1.0, range: 0.7 - 1.3).similarity_boost: Similarity boost. Higher values make the voice more similar to the original (default: 0.75, range: 0.0 - 1.0).use_speaker_boost: Boost similarity to the original speaker voice (default: False).style: Style exaggeration. Higher values increase stylistic expression but may reduce stability (default: 0.0, range: 0.0 - 0.2).
model parameter is set to "eleven_v3", the following additional parameters become available:
speed: Speech speed. 1.0 is normal, <1.0 slower, >1.0 faster (default: 1.0, range: 0.7 - 1.3).similarity_boost: Similarity boost. Higher values make the voice more similar to the original (default: 0.75, range: 0.0 - 1.0).
Outputs
| Output Name | Description | Data Type |
|---|---|---|
audio | The generated audio from the text-to-speech conversion. | AUDIO |
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
Source fingerprint (SHA-256):
0cd570fbb152e07ba028e96df56abc08dde8941d043386fd076f42a1e1dc6016