Skip to main content

Overview

This node processes an audio input to extract features that can be used to guide a video generation model. It analyzes the audio to detect tempo, beats, and other musical characteristics, then packages this information into a format suitable for conditioning a video model, allowing the generated video to be synchronized with the audio.

Inputs

ParameterDescriptionData TypeRequiredRange
audioThe audio input to be analyzed and encoded.AUDIOYes-
video_framesThe number of frames in the target video. Used to calculate the frame rate for synchronization (default: 149).INTYesMin: 1, Max: 268435456 (MAX_RESOLUTION), Step: 4
audio_inject_scaleThe scale for the audio features when injected into the video model (default: 1.0).FLOATYesMin: 0.0, Max: 10.0, Step: 0.01

Outputs

Output NameDescriptionData Type
audio_encoder_outputA dictionary containing the processed audio features, the calculated frame rate (fps), and the audio injection scale. This output is used to condition the video generation model.AUDIO_ENCODER_OUTPUT
fps_stringA text string describing the calculated frame rate (fps) based on the audio length and the number of video frames. This string is intended to be used in the prompt for the video model.STRING
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 1318323b202ca26c920a860534062dc7f20e3b10d13eb9825a890e26b5fde731