Skip to main content
The Kling Avatar 2.0 node generates broadcast-style digital human videos from a single reference photo and an audio file. It creates a talking avatar video with an optional text prompt to define the avatar’s actions, emotions, and camera movements.

Inputs

ParameterDescriptionData TypeRequiredRange
imageAvatar reference image. Width and height must be at least 300px. Aspect ratio must be between 1:2.5 and 2.5:1.IMAGEYes-
sound_fileAudio input. Must be between 2 and 300 seconds in duration.AUDIOYes-
modeThe generation mode to use.COMBOYes"std"
"pro"
promptOptional prompt to define avatar actions, emotions, and camera movements. (default: empty string)STRINGNo-
seedSeed controls whether the node should re-run; results are non-deterministic regardless of seed. (default: 0)INTYes0 to 2147483647
Note: The image and sound_file inputs have specific validation requirements. The image must be at least 300x300 pixels with an aspect ratio between 1:2.5 and 2.5:1. The audio file must be between 2 and 300 seconds long.

Outputs

Output NameDescriptionData Type
outputThe generated digital human video.VIDEO
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): d9264e250c578dcb38612c192f8567a8f48c6624e030d8765b13bb71aae2d0b8