WanDancerEncodeAudio - ComfyUI Built-in Node Documentation

Overview

This node processes an audio input to extract features that can be used to guide a video generation model. It analyzes the audio to detect tempo, beats, and other musical characteristics, then packages this information into a format suitable for conditioning a video model, allowing the generated video to be synchronized with the audio.

Inputs

Parameter	Description	Data Type	Required	Range
`audio`	The audio input to be analyzed and encoded.	AUDIO	Yes	-
`video_frames`	The number of frames in the target video. Used to calculate the frame rate for synchronization (default: 149).	INT	Yes	Min: 1, Max: 268435456 (MAX_RESOLUTION), Step: 4
`audio_inject_scale`	The scale for the audio features when injected into the video model (default: 1.0).	FLOAT	Yes	Min: 0.0, Max: 10.0, Step: 0.01

Outputs

Output Name	Description	Data Type
`audio_encoder_output`	A dictionary containing the processed audio features, the calculated frame rate (fps), and the audio injection scale. This output is used to condition the video generation model.	AUDIO_ENCODER_OUTPUT
`fps_string`	A text string describing the calculated frame rate (fps) based on the audio length and the number of video frames. This string is intended to be used in the prompt for the video model.	STRING

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 1318323b202ca26c920a860534062dc7f20e3b10d13eb9825a890e26b5fde731

​Overview

​Inputs

​Outputs

Overview

Inputs

Outputs