Skip to main content
The WanAnimateToVideo node generates video content by combining multiple conditioning inputs including pose references, facial expressions, and background elements. It processes various video inputs to create coherent animated sequences while maintaining temporal consistency across frames. The node handles latent space operations and can extend existing videos by continuing motion patterns.

Inputs

ParameterDescriptionData TypeRequiredRange
positivePositive conditioning for guiding the generation towards desired contentCONDITIONINGYes-
negativeNegative conditioning for steering the generation away from unwanted contentCONDITIONINGYes-
vaeVAE model used for encoding and decoding image dataVAEYes-
widthOutput video width in pixels (default: 832, step: 16)INTYes16 to MAX_RESOLUTION
heightOutput video height in pixels (default: 480, step: 16)INTYes16 to MAX_RESOLUTION
lengthNumber of frames to generate (default: 77, step: 4)INTYes1 to MAX_RESOLUTION
batch_sizeNumber of videos to generate simultaneously (default: 1)INTYes1 to 4096
clip_vision_outputOptional CLIP vision model output for additional conditioningCLIP_VISION_OUTPUTNo-
reference_imageReference image used as starting point for generationIMAGENo-
face_videoVideo input providing facial expression guidanceIMAGENo-
pose_videoVideo input providing pose and motion guidanceIMAGENo-
continue_motion_max_framesMaximum number of frames to continue from previous motion (default: 5, step: 4)INTYes1 to MAX_RESOLUTION
background_videoBackground video to composite with generated contentIMAGENo-
character_maskMask defining character regions for selective processingMASKNo-
continue_motionPrevious motion sequence to continue from for temporal consistencyIMAGENo-
video_frame_offsetThe amount of frames to seek in all the input videos. Used for generating longer videos by chunk. Connect to the video_frame_offset output of the previous node for extending a video. (default: 0, step: 1)INTYes0 to MAX_RESOLUTION
Parameter Constraints:
  • When pose_video is provided, the output length will be adjusted to match the pose video duration if the trim_to_pose_video logic is active (currently set to False in the source code)
  • face_video is automatically resized to 512x512 resolution and normalized to a range of -1.0 to 1.0 when processed
  • continue_motion frames are limited by the continue_motion_max_frames parameter; only the last continue_motion_max_frames frames from the input are used
  • Input videos (face_video, pose_video, background_video, character_mask) are offset by video_frame_offset before processing; if the offset exceeds the video length, the input is ignored
  • If character_mask contains only one frame, it will be repeated across all frames
  • When clip_vision_output is provided, it’s applied to both positive and negative conditioning
  • If reference_image is not provided, a black image (all zeros) is used as the default reference
  • If continue_motion is not provided, the initial frames are filled with gray (0.5 intensity) noise

Outputs

Output NameDescriptionData Type
positiveModified positive conditioning with additional video context including CLIP vision output, pose video latent, face video pixels, concatenated latent image, and concatenated maskCONDITIONING
negativeModified negative conditioning with additional video context including CLIP vision output, pose video latent, face video pixels (inverted), concatenated latent image, and concatenated maskCONDITIONING
latentGenerated video content in latent space format with shape [batch_size, 16, latent_length + trim_latent, latent_height, latent_width]LATENT
trim_latentLatent space trimming information indicating the number of latent frames to trim from the beginning (corresponds to reference image latent frames)INT
trim_imageImage space trimming information for reference motion frames, indicating the number of image frames to trim from the beginningINT
video_frame_offsetUpdated frame offset for continuing video generation in chunks, calculated as the previous offset plus the generated lengthINT
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 2ec2afbc57f58a5b7ce0ecc3730618633d435439ce2d650b18be531c1edddff0