Skip to main content
The WanDancerVideo node prepares conditioning data and an empty latent tensor for video generation with the WanDancer model. It combines positive and negative conditioning with optional inputs like a starting image, mask, CLIP vision embeddings, and audio features to control the generated video.

Inputs

ParameterDescriptionData TypeRequiredRange
positiveThe positive conditioning to guide video generation.CONDITIONINGYes
negativeThe negative conditioning to guide video generation.CONDITIONINGYes
vaeThe VAE used to encode the start image into the latent space.VAEYes
widthThe width of the generated video in pixels (default: 480).INTYes16 to MAX_RESOLUTION (step: 16)
heightThe height of the generated video in pixels (default: 832).INTYes16 to MAX_RESOLUTION (step: 16)
lengthThe number of frames in the generated video. Should stay 149 for WanDancer (default: 149).INTYes1 to MAX_RESOLUTION (step: 4)
clip_vision_outputThe CLIP vision embeddings for the first frame.CLIP_VISION_OUTPUTNo
clip_vision_output_refThe CLIP vision embeddings for the reference image.CLIP_VISION_OUTPUTNo
start_imageThe initial image(s) to be encoded. Can be any number of frames, up to the specified length.IMAGENo
maskImage conditioning mask for the start image(s). White areas are kept, black areas are generated. Used for local generations.MASKNo
audio_encoder_outputThe output from an audio encoder, providing audio features, fps, and inject scale for audio-conditional generation.AUDIO_ENCODER_OUTPUTNo
Note on Parameter Constraints:
  • The start_image and mask inputs are optional but can be used together. When start_image is provided, it is encoded and concatenated with the latent. If mask is also provided, it controls which parts of the start image are kept (white) and which are regenerated (black). If mask is not provided, the entire start image area is used as a conditioning guide.
  • The clip_vision_output and clip_vision_output_ref inputs are optional and can be used together to provide visual context for the first frame and a reference image.
  • The audio_encoder_output input is optional and provides audio features for audio-conditional generation.

Outputs

Output NameDescriptionData Type
positiveThe positive conditioning with any additional data (concat latent, CLIP vision, audio) attached.CONDITIONING
negativeThe negative conditioning with any additional data (concat latent, CLIP vision, audio) attached.CONDITIONING
latentAn empty latent tensor with dimensions matching the specified video length, height, and width.LATENT
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 0a75b24c8e5c164d81b08eb438862d94d4409ece8dc22c126979347e2350c828