Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.comfy.org/llms.txt

Use this file to discover all available pages before exploring further.

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
The WanSCAILToVideo node prepares conditioning and an empty latent space for video generation. It processes optional inputs like reference images, pose videos, and CLIP vision outputs, embedding them into the positive and negative conditioning for a video model. The node outputs the modified conditioning and a blank latent tensor of the specified video dimensions.

Inputs

ParameterData TypeRequiredRangeDescription
positiveCONDITIONINGYes-The positive conditioning input.
negativeCONDITIONINGYes-The negative conditioning input.
vaeVAEYes-The VAE model used for encoding images and video frames.
widthINTYes32 to MAX_RESOLUTIONThe width of the output video in pixels (default: 512). Must be divisible by 8.
heightINTYes32 to MAX_RESOLUTIONThe height of the output video in pixels (default: 896). Must be divisible by 8.
lengthINTYes1 to MAX_RESOLUTIONThe number of frames in the video (default: 81). Must be divisible by 4.
batch_sizeINTYes1 to 4096The number of videos to generate in a batch (default: 1).
clip_vision_outputCLIP_VISION_OUTPUTNo-Optional CLIP vision output for conditioning.
reference_imageIMAGENo-An optional reference image for conditioning.
pose_videoIMAGENo-Video used for pose conditioning. Will be downscaled to half the resolution of the main video.
pose_strengthFLOATYes0.0 to 10.0Strength of the pose latent (default: 1.0).
pose_startFLOATYes0.0 to 1.0Start step to use pose conditioning (default: 0.0).
pose_endFLOATYes0.0 to 1.0End step to use pose conditioning (default: 1.0).
Note: The pose_video input is processed only for the first length frames. The reference_image is processed only for the first image in the batch. When reference_image is provided, a zero-filled latent of the same size is used for the negative conditioning. When clip_vision_output is provided, it is applied to both positive and negative conditioning. The pose_video is downscaled to half the resolution of the main video before encoding.

Outputs

Output NameData TypeDescription
positiveCONDITIONINGThe modified positive conditioning, potentially containing embedded reference image latents, CLIP vision output, or pose video latents.
negativeCONDITIONINGThe modified negative conditioning, potentially containing embedded reference image latents, CLIP vision output, or pose video latents.
latentLATENTAn empty latent tensor of shape [batch_size, 16, ((length - 1) // 4) + 1, height // 8, width // 8].

Source fingerprint (SHA-256): 01c0912474602c33fa0c3e277db90e0eb83edbcea307a860921bab486d267cc8