Stable Audio 1.0 is Stability AI’s first open-source audio generation model. It takes a text prompt and generates an audio clip. In ComfyUI, it works like a standard text-to-audio pipeline: CLIP encodes the prompt, a KSampler denoises the latent, and the VAE decodes it to audio. Related links:Documentation Index
Fetch the complete documentation index at: https://docs.comfy.org/llms.txt
Use this file to discover all available pages before exploring further.
Workflow
Download Workflow
Download JSON or search “Stable Audio 1.0” in Template Library
Run on Comfy Cloud
Open in Comfy Cloud
The workflow uses standard ComfyUI nodes — no custom nodes required. It loads the Stable Audio 1.0 checkpoint, encodes your prompt via a CLIP text encoder (t5-base), denoises the latent audio with a KSampler, and decodes it to audio through the model’s VAE.
How to use:
- Load the checkpoint — The
CheckpointLoaderSimplenode usesstable-audio-open-1.0.safetensors - Write a prompt — Enter your description in the
CLIPTextEncodenode (e.g. “heaven church electronic dance music”) - Set duration — Adjust the
EmptyLatentAudionode’s length value (default 47.6 seconds) - Click Run (
Ctrl/Cmd + Enter) to generate. The audio will be saved toComfyUI/output/audio/
Model download
When loading the workflow, ComfyUI will prompt you with download links for any missing models. To set up manually, download the files below and place them in the correct folders.Checkpoint
stable-audio-open-1.0.safetensors
2.3GB. Place in models/checkpoints/
Text encoder
t5-base.safetensors
Text encoder for prompt conditioning. Place in models/text_encoders/