KlingLipSyncAudioToVideoNode - ComfyUI Built-in Node Documentation

Kling Lip Sync Audio to Video Node synchronizes mouth movements in a video file to match the audio content of an audio file. This node analyzes the vocal patterns in the audio and adjusts the facial movements in the video to create realistic lip-syncing. The process requires both a video containing a distinct face and an audio file with clearly distinguishable vocals.

Inputs

Parameter	Description	Data Type	Required	Range
`video`	The video file containing a face to be lip-synced	VIDEO	Yes	-
`audio`	The audio file containing vocals to sync with the video	AUDIO	Yes	-
`voice_language`	The language of the voice in the audio file (default: “en”)	COMBO	Yes	`"en"` `"zh"` `"es"` `"fr"` `"de"` `"it"` `"pt"` `"pl"` `"tr"` `"ru"` `"nl"` `"cs"` `"ar"` `"ja"` `"hu"` `"ko"`

Important Constraints:

The audio file should not be larger than 5MB
The video file should not be larger than 100MB
Video dimensions should be between 720px and 1920px in height/width
Video duration should be between 2 seconds and 10 seconds
The audio must contain clearly distinguishable vocals
The video must contain a distinct face

Outputs

Output Name	Description	Data Type
`output`	The processed video with lip-synced mouth movements	VIDEO
`video_id`	The unique identifier for the processed video	STRING
`duration`	The duration of the processed video	STRING

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 2f88af3191ac4f9c5c9fa1aaa5b744a12620b66e55a1fe0ab16b8b3b61110128

KlingImageToVideoWithAudio - ComfyUI Built-in Node Documentation

KlingLipSyncTextToVideoNode - ComfyUI Built-in Node Documentation

​Inputs

​Outputs

Inputs

Outputs