Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.comfy.org/llms.txt

Use this file to discover all available pages before exploring further.

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
The Kling Avatar 2.0 node generates broadcast-style digital human videos from a single reference photo and an audio file. It creates a talking avatar video with an optional text prompt to define the avatar’s actions, emotions, and camera movements.

Inputs

ParameterData TypeRequiredRangeDescription
imageIMAGEYes-Avatar reference image. Width and height must be at least 300px. Aspect ratio must be between 1:2.5 and 2.5:1.
sound_fileAUDIOYes-Audio input. Must be between 2 and 300 seconds in duration.
modeCOMBOYes"std"
"pro"
The generation mode to use.
promptSTRINGNo-Optional prompt to define avatar actions, emotions, and camera movements. (default: empty string)
seedINTYes0 to 2147483647Seed controls whether the node should re-run; results are non-deterministic regardless of seed. (default: 0)
Note: The image and sound_file inputs have specific validation requirements. The image must be at least 300x300 pixels with an aspect ratio between 1:2.5 and 2.5:1. The audio file must be between 2 and 300 seconds long.

Outputs

Output NameData TypeDescription
outputVIDEOThe generated digital human video.

Source fingerprint (SHA-256): d9264e250c578dcb38612c192f8567a8f48c6624e030d8765b13bb71aae2d0b8