- Purpose: Prepare the conditioning information needed for video generation, using the Wan 2.1 Fun Control model.
Inputs
| Parameter Name | Description | Required | Data Type | Default Value |
|---|---|---|---|---|
| positive | Standard ComfyUI positive conditioning data, typically from a “CLIP Text Encode” node. The positive prompt describes the content, subject matter, and artistic style that the user envisions for the generated video. | Yes | CONDITIONING | N/A |
| negative | Standard ComfyUI negative conditioning data, typically generated by a “CLIP Text Encode” node. The negative prompt specifies elements, styles, or artifacts that the user wants to avoid in the generated video. | Yes | CONDITIONING | N/A |
| vae | Requires a VAE (Variational Autoencoder) model compatible with the Wan 2.1 Fun model family, used for encoding and decoding image/video data. | Yes | VAE | N/A |
| width | The desired width of output video frames in pixels, with a default value of 832, minimum value of 16, maximum value determined by nodes.MAX_RESOLUTION, and a step size of 16. | Yes | INT | 832 |
| height | The desired height of output video frames in pixels, with a default value of 480, minimum value of 16, maximum value determined by nodes.MAX_RESOLUTION, and a step size of 16. | Yes | INT | 480 |
| length | The total number of frames in the generated video, with a default value of 81, minimum value of 1, maximum value determined by nodes.MAX_RESOLUTION, and a step size of 4. | Yes | INT | 81 |
| batch_size | The number of videos generated in a single batch, with a default value of 1, minimum value of 1, and maximum value of 4096. | Yes | INT | 1 |
| clip_vision_output | (Optional) Visual features extracted by a CLIP vision model, allowing for visual style and content guidance. | No | CLIP_VISION_OUTPUT | None |
| start_image | (Optional) An initial image that influences the beginning of the generated video. | No | IMAGE | None |
| control_video | (Optional) Allows users to provide a preprocessed ControlNet reference video that will guide the motion and potential structure of the generated video. | No | IMAGE | None |
Outputs
| Parameter Name | Description | Data Type |
|---|---|---|
| positive | Provides enhanced positive conditioning data, including encoded start_image and control_video. | CONDITIONING |
| negative | Provides negative conditioning data that has also been enhanced, containing the same concat_latent_image. | CONDITIONING |
| latent | A dictionary containing an empty latent tensor with the key “samples”. | LATENT |
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub