ComfyUI ACE-Step Native Example
This guide will help you create dynamic music using the ACE-Step model in ComfyUI
ACE-Step is an open-source foundational music generation model jointly developed by Chinese team StepFun and ACE Studio, aimed at providing music creators with efficient, flexible and high-quality music generation and editing tools.
The model is released under the Apache-2.0 license and is free for commercial use.
As a powerful music generation foundation, ACE-Step provides rich extensibility. Through fine-tuning techniques like LoRA and ControlNet, developers can customize the model according to their actual needs. Whether it’s audio editing, vocal synthesis, accompaniment production, voice cloning or style transfer applications, ACE-Step provides stable and reliable technical support. This flexible architecture greatly simplifies the development process of music AI applications, allowing more creators to quickly apply AI technology to music creation.
Currently, ACE-Step has released related training code, including LoRA model training, and the corresponding ControlNet training code will be released in the future. You can visit their Github to learn more details.
ACE-Step ComfyUI Text-to-Audio Generation Workflow Example
1. Download Workflow and Related Models
Click the button below to download the corresponding workflow file. Drag it into ComfyUI to load the workflow information. The workflow includes model download information.
Click the button below to download the corresponding workflow file. Drag it into ComfyUI to load the workflow information. The workflow includes model download information.
Download Json Format Workflow File
You can also manually download ace_step_v1_3.5b.safetensors and save it to the ComfyUI/models/checkpoints
folder
2. Complete the Workflow Step by Step
- Ensure the
Load Checkpoints
node has loaded theace_step_v1_3.5b.safetensors
model - Input corresponding music styles etc. in the
tags
field ofTextEncodeAceStepAudio
- Input corresponding lyrics in the
lyrics
field ofTextEncodeAceStepAudio
- Click the
Run
button, or use the shortcutCtrl(cmd) + Enter
to execute the generation - After the generation is complete, you can view the generated audio in the
Save Audio
node. You can click to play and preview. The audio will also be saved toComfyUI/output/audio
(subdirectory determined by theSave Audio
node).
ACE-Step ComfyUI Audio-to-Audio Workflow
Similar to image-to-image workflows, you can input a piece of music and use the workflow below to resample and generate music. You can also adjust the difference from the original audio by controlling the denoise
parameter in the Ksampler
.
1. Download Workflow File
Click the button below to download the corresponding workflow file. Drag it into ComfyUI to load the workflow information.
Download Json Format Workflow File
2. Complete the Workflow Step by Step
- Ensure the
Load Checkpoints
node has loaded theace_step_v1_3.5b.safetensors
model - Upload the music you want to edit in the
LoadAudio
node (you can use the results generated from the text-to-audio workflow in this article) - Input corresponding music styles etc. in the
tags
field ofTextEncodeAceStepAudio
- Input corresponding lyrics in the
lyrics
field ofTextEncodeAceStepAudio
- Modify the
denoise
parameter in theKsampler
node to adjust the amount of noise added during sampling, which controls the similarity to the original audio (smaller values result in greater similarity to the original audio; if set to1.00
, it can be considered as if there is no audio input) - Click the
Run
button, or use the shortcutCtrl(cmd) + Enter
to execute the audio generation - After the generation is complete, you can view the generated audio in the
Save Audio
node. You can click to play and preview. The audio will also be saved toComfyUI/output/audio
(subdirectory determined by theSave Audio
node).
You can also implement the lyrics modification and editing functionality from the ACE-Step project page, modifying the original lyrics to change the audio effect.
ACE-Step Prompt Guide
ACE currently uses two types of prompts: tags
and lyrics
.
tags
: Mainly used to describe music styles, scenes, etc. Similar to prompts we use for other generations, they primarily describe the overall style and requirements of the audio, separated by English commaslyrics
: Mainly used to describe lyrics, supporting lyric structure tags such as [verse], [chorus], and [bridge] to distinguish different parts of the lyrics. You can also input instrument names for purely instrumental music
You can find rich examples of tags
and lyrics
on the ACE-Step model homepage. You can refer to these examples to try corresponding prompts. This document’s prompt guide is organized based on the project to help you quickly try combinations to achieve your desired effect.
Tags (prompt)
Mainstream Music Styles
Use short tag combinations to generate specific music styles
- electronic
- rock
- pop
- funk
- soul
- cyberpunk
- Acid jazz
- electro
- em (electronic music)
- soft electric drums
- melodic
Scene Types
Combine specific usage scenarios and atmospheres to generate music that matches the corresponding mood
- background music for parties
- radio broadcasts
- workout playlists
Instrumental Elements
- saxophone
- jazz
- piano, violin
Vocal Types
- female voice
- male voice
- clean vocals
Professional Terms
Use some professional terms commonly used in music to precisely control music effects
-
110 bpm (beats per minute is 110)
-
fast tempo
-
slow tempo
-
loops
-
fills
-
acoustic guitar
-
electric bass
Lyrics
Lyric Structure Tags
- [outro]
- [verse]
- [chorus]
- [bridge]
Multilingual Support
- ACE-Step V1 supports multiple languages. When used, ACE-Step converts different languages into English letters and then generates music.
- In ComfyUI, we haven’t fully implemented the conversion of all languages to English letters. Currently, only Japanese hiragana and katakana characters are implemented.
So if you need to use multiple languages for music generation, you need to first convert the corresponding language to English letters, and then input the language code abbreviation at the beginning of the
lyrics
, such as Chinese[zh]
, Korean[ko]
, etc.
For example:
Currently, ACE-Step supports 19 languages, but the following ten languages have better support:
- English
- Chinese: [zh]
- Russian: [ru]
- Spanish: [es]
- Japanese: [ja]
- German: [de]
- French: [fr]
- Portuguese: [pt]
- Italian: [it]
- Korean: [ko]
The language tags above have not been fully tested at the time of writing this documentation. If any language tag is incorrect, please submit an issue to our documentation repository and we will make timely corrections.