Inputs
| Parameter | Description | Data Type | Required | Range |
|---|---|---|---|---|
positive | The positive conditioning input. | CONDITIONING | Yes | - |
negative | The negative conditioning input. | CONDITIONING | Yes | - |
vae | The VAE model used for encoding images and video frames. | VAE | Yes | - |
width | The width of the output video in pixels (default: 512). Must be divisible by 8. | INT | Yes | 32 to MAX_RESOLUTION |
height | The height of the output video in pixels (default: 896). Must be divisible by 8. | INT | Yes | 32 to MAX_RESOLUTION |
length | The number of frames in the video (default: 81). Must be divisible by 4. | INT | Yes | 1 to MAX_RESOLUTION |
batch_size | The number of videos to generate in a batch (default: 1). | INT | Yes | 1 to 4096 |
clip_vision_output | Optional CLIP vision output for conditioning. | CLIP_VISION_OUTPUT | No | - |
reference_image | An optional reference image for conditioning. | IMAGE | No | - |
pose_video | Video used for pose conditioning. Will be downscaled to half the resolution of the main video. | IMAGE | No | - |
pose_strength | Strength of the pose latent (default: 1.0). | FLOAT | Yes | 0.0 to 10.0 |
pose_start | Start step to use pose conditioning (default: 0.0). | FLOAT | Yes | 0.0 to 1.0 |
pose_end | End step to use pose conditioning (default: 1.0). | FLOAT | Yes | 0.0 to 1.0 |
pose_video input is processed only for the first length frames. The reference_image is processed only for the first image in the batch. When reference_image is provided, a zero-filled latent of the same size is used for the negative conditioning. When clip_vision_output is provided, it is applied to both positive and negative conditioning. The pose_video is downscaled to half the resolution of the main video before encoding.
Outputs
| Output Name | Description | Data Type |
|---|---|---|
positive | The modified positive conditioning, potentially containing embedded reference image latents, CLIP vision output, or pose video latents. | CONDITIONING |
negative | The modified negative conditioning, potentially containing embedded reference image latents, CLIP vision output, or pose video latents. | CONDITIONING |
latent | An empty latent tensor of shape [batch_size, 16, ((length - 1) // 4) + 1, height // 8, width // 8]. | LATENT |
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
Source fingerprint (SHA-256):
01c0912474602c33fa0c3e277db90e0eb83edbcea307a860921bab486d267cc8