Documentation Index
Fetch the complete documentation index at: https://docs.comfy.org/llms.txt
Use this file to discover all available pages before exploring further.
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
Overview
This node processes an audio input to extract features that can be used to guide a video generation model. It analyzes the audio to detect tempo, beats, and other musical characteristics, then packages this information into a format suitable for conditioning a video model, allowing the generated video to be synchronized with the audio.Inputs
| Parameter | Data Type | Required | Range | Description |
|---|---|---|---|---|
audio | AUDIO | Yes | - | The audio input to be analyzed and encoded. |
video_frames | INT | Yes | Min: 1, Max: 268435456 (MAX_RESOLUTION), Step: 4 | The number of frames in the target video. Used to calculate the frame rate for synchronization (default: 149). |
audio_inject_scale | FLOAT | Yes | Min: 0.0, Max: 10.0, Step: 0.01 | The scale for the audio features when injected into the video model (default: 1.0). |
Outputs
| Output Name | Data Type | Description |
|---|---|---|
audio_encoder_output | AUDIO_ENCODER_OUTPUT | A dictionary containing the processed audio features, the calculated frame rate (fps), and the audio injection scale. This output is used to condition the video generation model. |
fps_string | STRING | A text string describing the calculated frame rate (fps) based on the audio length and the number of video frames. This string is intended to be used in the prompt for the video model. |
Source fingerprint (SHA-256):
1318323b202ca26c920a860534062dc7f20e3b10d13eb9825a890e26b5fde731