Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.comfy.org/llms.txt

Use this file to discover all available pages before exploring further.

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
The WanDancerVideo node prepares conditioning data and an empty latent tensor for video generation with the WanDancer model. It combines positive and negative conditioning with optional inputs like a starting image, mask, CLIP vision embeddings, and audio features to control the generated video.

Inputs

ParameterData TypeRequiredRangeDescription
positiveCONDITIONINGYesThe positive conditioning to guide video generation.
negativeCONDITIONINGYesThe negative conditioning to guide video generation.
vaeVAEYesThe VAE used to encode the start image into the latent space.
widthINTYes16 to MAX_RESOLUTION (step: 16)The width of the generated video in pixels (default: 480).
heightINTYes16 to MAX_RESOLUTION (step: 16)The height of the generated video in pixels (default: 832).
lengthINTYes1 to MAX_RESOLUTION (step: 4)The number of frames in the generated video. Should stay 149 for WanDancer (default: 149).
clip_vision_outputCLIP_VISION_OUTPUTNoThe CLIP vision embeddings for the first frame.
clip_vision_output_refCLIP_VISION_OUTPUTNoThe CLIP vision embeddings for the reference image.
start_imageIMAGENoThe initial image(s) to be encoded. Can be any number of frames, up to the specified length.
maskMASKNoImage conditioning mask for the start image(s). White areas are kept, black areas are generated. Used for local generations.
audio_encoder_outputAUDIO_ENCODER_OUTPUTNoThe output from an audio encoder, providing audio features, fps, and inject scale for audio-conditional generation.
Note on Parameter Constraints:
  • The start_image and mask inputs are optional but can be used together. When start_image is provided, it is encoded and concatenated with the latent. If mask is also provided, it controls which parts of the start image are kept (white) and which are regenerated (black). If mask is not provided, the entire start image area is used as a conditioning guide.
  • The clip_vision_output and clip_vision_output_ref inputs are optional and can be used together to provide visual context for the first frame and a reference image.
  • The audio_encoder_output input is optional and provides audio features for audio-conditional generation.

Outputs

Output NameData TypeDescription
positiveCONDITIONINGThe positive conditioning with any additional data (concat latent, CLIP vision, audio) attached.
negativeCONDITIONINGThe negative conditioning with any additional data (concat latent, CLIP vision, audio) attached.
latentLATENTAn empty latent tensor with dimensions matching the specified video length, height, and width.

Source fingerprint (SHA-256): 0a75b24c8e5c164d81b08eb438862d94d4409ece8dc22c126979347e2350c828