Documentation Index
Fetch the complete documentation index at: https://docs.comfy.org/llms.txt
Use this file to discover all available pages before exploring further.
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHubThe WanDancerVideo node prepares conditioning data and an empty latent tensor for video generation with the WanDancer model. It combines positive and negative conditioning with optional inputs like a starting image, mask, CLIP vision embeddings, and audio features to control the generated video.
Inputs
| Parameter | Data Type | Required | Range | Description |
|---|---|---|---|---|
positive | CONDITIONING | Yes | The positive conditioning to guide video generation. | |
negative | CONDITIONING | Yes | The negative conditioning to guide video generation. | |
vae | VAE | Yes | The VAE used to encode the start image into the latent space. | |
width | INT | Yes | 16 to MAX_RESOLUTION (step: 16) | The width of the generated video in pixels (default: 480). |
height | INT | Yes | 16 to MAX_RESOLUTION (step: 16) | The height of the generated video in pixels (default: 832). |
length | INT | Yes | 1 to MAX_RESOLUTION (step: 4) | The number of frames in the generated video. Should stay 149 for WanDancer (default: 149). |
clip_vision_output | CLIP_VISION_OUTPUT | No | The CLIP vision embeddings for the first frame. | |
clip_vision_output_ref | CLIP_VISION_OUTPUT | No | The CLIP vision embeddings for the reference image. | |
start_image | IMAGE | No | The initial image(s) to be encoded. Can be any number of frames, up to the specified length. | |
mask | MASK | No | Image conditioning mask for the start image(s). White areas are kept, black areas are generated. Used for local generations. | |
audio_encoder_output | AUDIO_ENCODER_OUTPUT | No | The output from an audio encoder, providing audio features, fps, and inject scale for audio-conditional generation. |
- The
start_imageandmaskinputs are optional but can be used together. Whenstart_imageis provided, it is encoded and concatenated with the latent. Ifmaskis also provided, it controls which parts of the start image are kept (white) and which are regenerated (black). Ifmaskis not provided, the entire start image area is used as a conditioning guide. - The
clip_vision_outputandclip_vision_output_refinputs are optional and can be used together to provide visual context for the first frame and a reference image. - The
audio_encoder_outputinput is optional and provides audio features for audio-conditional generation.
Outputs
| Output Name | Data Type | Description |
|---|---|---|
positive | CONDITIONING | The positive conditioning with any additional data (concat latent, CLIP vision, audio) attached. |
negative | CONDITIONING | The negative conditioning with any additional data (concat latent, CLIP vision, audio) attached. |
latent | LATENT | An empty latent tensor with dimensions matching the specified video length, height, and width. |
Source fingerprint (SHA-256):
0a75b24c8e5c164d81b08eb438862d94d4409ece8dc22c126979347e2350c828