Skip to main content
The WanFirstLastFrameToVideo node creates video conditioning by combining start and end frames with text prompts. It generates a latent representation for video generation by encoding the first and last frames, applying masks to guide the generation process, and incorporating CLIP vision features when available. This node prepares both positive and negative conditioning for video models to generate coherent sequences between specified start and end points.

Inputs

ParameterDescriptionData TypeRequiredRange
positivePositive text conditioning for guiding the video generationCONDITIONINGYes-
negativeNegative text conditioning for guiding the video generationCONDITIONINGYes-
vaeVAE model used for encoding images to latent spaceVAEYes-
widthOutput video width (default: 832, step: 16)INTYes16 to MAX_RESOLUTION
heightOutput video height (default: 480, step: 16)INTYes16 to MAX_RESOLUTION
lengthNumber of frames in the video sequence (default: 81, step: 4)INTYes1 to MAX_RESOLUTION
batch_sizeNumber of videos to generate simultaneously (default: 1)INTYes1 to 4096
clip_vision_start_imageCLIP vision features extracted from the start imageCLIP_VISION_OUTPUTNo-
clip_vision_end_imageCLIP vision features extracted from the end imageCLIP_VISION_OUTPUTNo-
start_imageStarting frame image for the video sequenceIMAGENo-
end_imageEnding frame image for the video sequenceIMAGENo-
Note: When both start_image and end_image are provided, the node creates a video sequence that transitions between these two frames. The clip_vision_start_image and clip_vision_end_image parameters are optional but when provided, their CLIP vision features are concatenated and applied to both positive and negative conditioning. The start_image is cropped to the first length frames, and the end_image is cropped to the last length frames before processing.

Outputs

Output NameDescriptionData Type
positivePositive conditioning with applied video frame encoding and CLIP vision featuresCONDITIONING
negativeNegative conditioning with applied video frame encoding and CLIP vision featuresCONDITIONING
latentEmpty latent tensor with dimensions matching the specified video parametersLATENT
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): c9017cbed8d90e0c22a27c035784396fe1fa1551d586e2ec148e0621228162c0