KlingAvatarNode - ComfyUI Built-in Node Documentation

The Kling Avatar 2.0 node generates broadcast-style digital human videos from a single reference photo and an audio file. It creates a talking avatar video with an optional text prompt to define the avatar’s actions, emotions, and camera movements.

Inputs

Parameter	Description	Data Type	Required	Range
`image`	Avatar reference image. Width and height must be at least 300px. Aspect ratio must be between 1:2.5 and 2.5:1.	IMAGE	Yes	-
`sound_file`	Audio input. Must be between 2 and 300 seconds in duration.	AUDIO	Yes	-
`mode`	The generation mode to use.	COMBO	Yes	`"std"` `"pro"`
`prompt`	Optional prompt to define avatar actions, emotions, and camera movements. (default: empty string)	STRING	No	-
`seed`	Seed controls whether the node should re-run; results are non-deterministic regardless of seed. (default: 0)	INT	Yes	0 to 2147483647

Note: The image and sound_file inputs have specific validation requirements. The image must be at least 300x300 pixels with an aspect ratio between 1:2.5 and 2.5:1. The audio file must be between 2 and 300 seconds long.

Outputs

Output Name	Description	Data Type
`output`	The generated digital human video.	VIDEO

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 34f64d168d6c407d9ab474e2637513c172090a0f0c219675d32f8e70ff84f2ae

HitPawVideoEnhance - ComfyUI Built-in Node Documentation

KlingCameraControlI2VNode - ComfyUI Built-in Node Documentation

​Inputs

​Outputs

Inputs

Outputs