Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.comfy.org/llms.txt

Use this file to discover all available pages before exploring further.

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Overview

This node processes an audio input to extract features that can be used to guide a video generation model. It analyzes the audio to detect tempo, beats, and other musical characteristics, then packages this information into a format suitable for conditioning a video model, allowing the generated video to be synchronized with the audio.

Inputs

ParameterData TypeRequiredRangeDescription
audioAUDIOYes-The audio input to be analyzed and encoded.
video_framesINTYesMin: 1, Max: 268435456 (MAX_RESOLUTION), Step: 4The number of frames in the target video. Used to calculate the frame rate for synchronization (default: 149).
audio_inject_scaleFLOATYesMin: 0.0, Max: 10.0, Step: 0.01The scale for the audio features when injected into the video model (default: 1.0).

Outputs

Output NameData TypeDescription
audio_encoder_outputAUDIO_ENCODER_OUTPUTA dictionary containing the processed audio features, the calculated frame rate (fps), and the audio injection scale. This output is used to condition the video generation model.
fps_stringSTRINGA text string describing the calculated frame rate (fps) based on the audio length and the number of video frames. This string is intended to be used in the prompt for the video model.

Source fingerprint (SHA-256): 1318323b202ca26c920a860534062dc7f20e3b10d13eb9825a890e26b5fde731