Skip to main content
The WanSCAILToVideo node prepares conditioning and an empty latent space for video generation. It processes optional inputs like reference images, pose videos, and CLIP vision outputs, embedding them into the positive and negative conditioning for a video model. The node outputs the modified conditioning and a blank latent tensor of the specified video dimensions.

Inputs

ParameterDescriptionData TypeRequiredRange
positiveThe positive conditioning input.CONDITIONINGYes-
negativeThe negative conditioning input.CONDITIONINGYes-
vaeThe VAE model used for encoding images and video frames.VAEYes-
widthThe width of the output video in pixels (default: 512). Must be divisible by 8.INTYes32 to MAX_RESOLUTION
heightThe height of the output video in pixels (default: 896). Must be divisible by 8.INTYes32 to MAX_RESOLUTION
lengthThe number of frames in the video (default: 81). Must be divisible by 4.INTYes1 to MAX_RESOLUTION
batch_sizeThe number of videos to generate in a batch (default: 1).INTYes1 to 4096
clip_vision_outputOptional CLIP vision output for conditioning.CLIP_VISION_OUTPUTNo-
reference_imageAn optional reference image for conditioning.IMAGENo-
pose_videoVideo used for pose conditioning. Will be downscaled to half the resolution of the main video.IMAGENo-
pose_strengthStrength of the pose latent (default: 1.0).FLOATYes0.0 to 10.0
pose_startStart step to use pose conditioning (default: 0.0).FLOATYes0.0 to 1.0
pose_endEnd step to use pose conditioning (default: 1.0).FLOATYes0.0 to 1.0
Note: The pose_video input is processed only for the first length frames. The reference_image is processed only for the first image in the batch. When reference_image is provided, a zero-filled latent of the same size is used for the negative conditioning. When clip_vision_output is provided, it is applied to both positive and negative conditioning. The pose_video is downscaled to half the resolution of the main video before encoding.

Outputs

Output NameDescriptionData Type
positiveThe modified positive conditioning, potentially containing embedded reference image latents, CLIP vision output, or pose video latents.CONDITIONING
negativeThe modified negative conditioning, potentially containing embedded reference image latents, CLIP vision output, or pose video latents.CONDITIONING
latentAn empty latent tensor of shape [batch_size, 16, ((length - 1) // 4) + 1, height // 8, width // 8].LATENT
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 01c0912474602c33fa0c3e277db90e0eb83edbcea307a860921bab486d267cc8