Stable Audio 3 (GitHub) is Stability AI’s latest open-source audio generation model, trained on fully licensed music data and licensed for commercial use. It uses a dedicated subgraph node to produce high-quality stereo audio — including music, sound effects, and instruments — from text descriptions, with optional Qwen-powered category-aware reprompting. Stable Audio 3 comes in three variants:Documentation Index
Fetch the complete documentation index at: https://docs.comfy.org/llms.txt
Use this file to discover all available pages before exploring further.
- Small-SFX — Sound effects and short ambiance, up to 2:00. Small enough to run on CPU.
- Small-Music — Short music loops, on-device-friendly, up to 2:00.
- Medium — Longer tracks with stronger structure and musicality, up to ~6:20. Requires a GPU.
Available workflows
Stable Audio 3 Medium
Download Workflow
Download JSON or search “Stable Audio 3 Medium” in Template Library
Run on Comfy Cloud
Open in Comfy Cloud
The Stable Audio 3 Medium workflow is a full-featured text-to-audio generation pipeline. You provide a short text idea, optional duration, seed, and category — the workflow expands your prompt using Qwen with a category-aware reprompt template, then generates stereo audio via the Stable Audio 3 checkpoint.
How to use:
- Text idea — Enter a short description of the sound, music, or effect you want (e.g. “upbeat electronic dance track with heavy bass”)
- Duration — Set the desired clip length in seconds (default varies)
- Seed — Control reproducibility by adjusting the seed value
- Category — Choose a reprompt preset: Music, Instrument, SFX, or One-shot
- Enable reprompt — Toggle
use_reprompton to let Qwen expand your short idea into a detailed prompt before generation - Click Run (
Ctrl/Cmd + Enter) to generate. The audio will be saved toComfyUI/output/audio/
Stable Audio 3 Medium Base
Download Workflow
Download JSON or search “Stable Audio 3 Medium Base” in Template Library
Run on Comfy Cloud
Open in Comfy Cloud
A simplified version of Stable Audio 3 without Qwen reprompt expansion. It expects a complete text prompt and passes it directly to the model. Use this when you already have a detailed prompt and want faster generation.
How to use:
- Text prompt — Enter a detailed description of the audio you want
- Duration — Set the clip length in seconds
- Seed — Control reproducibility
- Click Run (
Ctrl/Cmd + Enter) to generate
Model download
When loading the workflow, ComfyUI will prompt you with download links for any missing models. To set up manually, download the files below and place them in the correct folders.Checkpoints
stable_audio_3_medium.safetensors
For the Medium workflow. Place in models/checkpoints/
stable_audio_3_medium_base.safetensors
For the Medium Base workflow. Place in models/checkpoints/
Text encoders
t5gemma_b_b_ul2.safetensors
Required for all Stable Audio 3 workflows. Place in models/text_encoders/
qwen3.5_2b_bf16.safetensors
Required for the Medium workflow (Qwen reprompt). Place in models/text_encoders/