samples tensors from both inputs and, if present, their noise_mask tensors as well, preparing them for further processing in a video generation pipeline.
Inputs
| Parameter | Description | Data Type | Required | Range |
|---|---|---|---|---|
video_latent | The latent representation of the video data. | LATENT | Yes | |
audio_latent | The latent representation of the audio data. | LATENT | Yes |
samples tensors from the video_latent and audio_latent inputs are concatenated. If either input contains a noise_mask, it will be used; if one is missing, a mask of ones (same shape as the corresponding samples) is created for it. The resulting masks are then also concatenated.
Outputs
| Output Name | Description | Data Type |
|---|---|---|
latent | A single latent dictionary containing the concatenated samples and, if applicable, the concatenated noise_mask from the video and audio inputs. | LATENT |
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
Source fingerprint (SHA-256):
ae619048b36d205d63e6054760c443f438b77d561e05458ce2d9d3f4a2024e74