LTXVConcatAVLatent - ComfyUI Built-in Node Documentation

The LTXVConcatAVLatent node combines a video latent representation and an audio latent representation into a single, concatenated latent output. It merges the samples tensors from both inputs and, if present, their noise_mask tensors as well, preparing them for further processing in a video generation pipeline.

Inputs

Parameter	Description	Data Type	Required	Range
`video_latent`	The latent representation of the video data.	LATENT	Yes
`audio_latent`	The latent representation of the audio data.	LATENT	Yes

Note: The samples tensors from the video_latent and audio_latent inputs are concatenated. If either input contains a noise_mask, it will be used; if one is missing, a mask of ones (same shape as the corresponding samples) is created for it. The resulting masks are then also concatenated.

Outputs

Output Name	Description	Data Type
`latent`	A single latent dictionary containing the concatenated `samples` and, if applicable, the concatenated `noise_mask` from the video and audio inputs.	LATENT

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): ae619048b36d205d63e6054760c443f438b77d561e05458ce2d9d3f4a2024e74

LTXVAudioVAEEncode - ComfyUI Built-in Node Documentation

LTXVEmptyLatentAudio - ComfyUI Built-in Node Documentation

​Inputs

​Outputs

Inputs

Outputs