WanAnimateToVideo - ComfyUI Built-in Node Documentation

The WanAnimateToVideo node generates video content by combining multiple conditioning inputs including pose references, facial expressions, and background elements. It processes various video inputs to create coherent animated sequences while maintaining temporal consistency across frames. The node handles latent space operations and can extend existing videos by continuing motion patterns.

Inputs

Parameter	Description	Data Type	Required	Range
`positive`	Positive conditioning for guiding the generation towards desired content	CONDITIONING	Yes	-
`negative`	Negative conditioning for steering the generation away from unwanted content	CONDITIONING	Yes	-
`vae`	VAE model used for encoding and decoding image data	VAE	Yes	-
`width`	Output video width in pixels (default: 832, step: 16)	INT	Yes	16 to MAX_RESOLUTION
`height`	Output video height in pixels (default: 480, step: 16)	INT	Yes	16 to MAX_RESOLUTION
`length`	Number of frames to generate (default: 77, step: 4)	INT	Yes	1 to MAX_RESOLUTION
`batch_size`	Number of videos to generate simultaneously (default: 1)	INT	Yes	1 to 4096
`clip_vision_output`	Optional CLIP vision model output for additional conditioning	CLIP_VISION_OUTPUT	No	-
`reference_image`	Reference image used as starting point for generation	IMAGE	No	-
`face_video`	Video input providing facial expression guidance	IMAGE	No	-
`pose_video`	Video input providing pose and motion guidance	IMAGE	No	-
`continue_motion_max_frames`	Maximum number of frames to continue from previous motion (default: 5, step: 4)	INT	Yes	1 to MAX_RESOLUTION
`background_video`	Background video to composite with generated content	IMAGE	No	-
`character_mask`	Mask defining character regions for selective processing	MASK	No	-
`continue_motion`	Previous motion sequence to continue from for temporal consistency	IMAGE	No	-
`video_frame_offset`	The amount of frames to seek in all the input videos. Used for generating longer videos by chunk. Connect to the video_frame_offset output of the previous node for extending a video. (default: 0, step: 1)	INT	Yes	0 to MAX_RESOLUTION

Parameter Constraints:

When pose_video is provided, the output length will be adjusted to match the pose video duration if the trim_to_pose_video logic is active (currently set to False in the source code)
face_video is automatically resized to 512x512 resolution and normalized to a range of -1.0 to 1.0 when processed
continue_motion frames are limited by the continue_motion_max_frames parameter; only the last continue_motion_max_frames frames from the input are used
Input videos (face_video, pose_video, background_video, character_mask) are offset by video_frame_offset before processing; if the offset exceeds the video length, the input is ignored
If character_mask contains only one frame, it will be repeated across all frames
When clip_vision_output is provided, it’s applied to both positive and negative conditioning
If reference_image is not provided, a black image (all zeros) is used as the default reference
If continue_motion is not provided, the initial frames are filled with gray (0.5 intensity) noise

Outputs

Output Name	Description	Data Type
`positive`	Modified positive conditioning with additional video context including CLIP vision output, pose video latent, face video pixels, concatenated latent image, and concatenated mask	CONDITIONING
`negative`	Modified negative conditioning with additional video context including CLIP vision output, pose video latent, face video pixels (inverted), concatenated latent image, and concatenated mask	CONDITIONING
`latent`	Generated video content in latent space format with shape [batch_size, 16, latent_length + trim_latent, latent_height, latent_width]	LATENT
`trim_latent`	Latent space trimming information indicating the number of latent frames to trim from the beginning (corresponds to reference image latent frames)	INT
`trim_image`	Image space trimming information for reference motion frames, indicating the number of image frames to trim from the beginning	INT
`video_frame_offset`	Updated frame offset for continuing video generation in chunks, calculated as the previous offset plus the generated length	INT

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 2ec2afbc57f58a5b7ce0ecc3730618633d435439ce2d650b18be531c1edddff0

​Inputs

​Outputs

Inputs

Outputs