Skip to main content
The VOIDInpaintConditioning node prepares the conditioning data needed for inpainting with CogVideoX models. It takes a source video and a preprocessed quadmask, encodes them through the VAE, and combines them into a 32-channel conditioning signal that the model uses to fill in the masked areas.

Inputs

ParameterDescriptionData TypeRequiredRange
positiveThe positive conditioning to be augmented with the inpainting latent informationCONDITIONINGYes-
negativeThe negative conditioning to be augmented with the inpainting latent informationCONDITIONINGYes-
vaeThe VAE model used to encode the mask and masked video into latent spaceVAEYes-
videoSource video frames [T, H, W, 3]IMAGEYes-
quadmaskPreprocessed quadmask from VOIDQuadmaskPreprocess [T, H, W]MASKYes-
widthThe width to resize the video and mask to (default: 672)INTYes16 to MAX_RESOLUTION (step: 8)
heightThe height to resize the video and mask to (default: 384)INTYes16 to MAX_RESOLUTION (step: 8)
lengthNumber of pixel frames to process. For CogVideoX-Fun-V1.5 (patch_size_t=2), latent_t must be even — lengths that produce odd latent_t are rounded down (e.g. 49 → 45) (default: 45)INTYes1 to MAX_RESOLUTION (step: 1)
batch_sizeThe batch size for the output noise latent (default: 1)INTYes1 to 64

Outputs

Output NameDescriptionData Type
positiveThe positive conditioning with the inpainting latent information addedCONDITIONING
negativeThe negative conditioning with the inpainting latent information addedCONDITIONING
latentA zero-filled noise latent tensor with shape [batch_size, 16, latent_t, latent_h, latent_w]LATENT
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 47c40f8bb7527bc28178f3ad78e7bfb579e07d42efd0398d5760335c6b5b3c44