> ## Documentation Index
> Fetch the complete documentation index at: https://docs.comfy.org/llms.txt
> Use this file to discover all available pages before exploring further.

# WanAnimateToVideo - ComfyUI Built-in Node Documentation

> Complete documentation for the WanAnimateToVideo node in ComfyUI. Learn its inputs, outputs, parameters and usage.

The WanAnimateToVideo node generates video content by combining multiple conditioning inputs including pose references, facial expressions, and background elements. It processes various video inputs to create coherent animated sequences while maintaining temporal consistency across frames. The node handles latent space operations and can extend existing videos by continuing motion patterns.

## Inputs

| Parameter                    | Description                                                                                                                                                                                                    | Data Type            | Required | Range                 |
| ---------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------- | -------- | --------------------- |
| `positive`                   | Positive conditioning for guiding the generation towards desired content                                                                                                                                       | CONDITIONING         | Yes      | -                     |
| `negative`                   | Negative conditioning for steering the generation away from unwanted content                                                                                                                                   | CONDITIONING         | Yes      | -                     |
| `vae`                        | VAE model used for encoding and decoding image data                                                                                                                                                            | VAE                  | Yes      | -                     |
| `width`                      | Output video width in pixels (default: 832, step: 16)                                                                                                                                                          | INT                  | Yes      | 16 to MAX\_RESOLUTION |
| `height`                     | Output video height in pixels (default: 480, step: 16)                                                                                                                                                         | INT                  | Yes      | 16 to MAX\_RESOLUTION |
| `length`                     | Number of frames to generate (default: 77, step: 4)                                                                                                                                                            | INT                  | Yes      | 1 to MAX\_RESOLUTION  |
| `batch_size`                 | Number of videos to generate simultaneously (default: 1)                                                                                                                                                       | INT                  | Yes      | 1 to 4096             |
| `clip_vision_output`         | Optional CLIP vision model output for additional conditioning                                                                                                                                                  | CLIP\_VISION\_OUTPUT | No       | -                     |
| `reference_image`            | Reference image used as starting point for generation                                                                                                                                                          | IMAGE                | No       | -                     |
| `face_video`                 | Video input providing facial expression guidance                                                                                                                                                               | IMAGE                | No       | -                     |
| `pose_video`                 | Video input providing pose and motion guidance                                                                                                                                                                 | IMAGE                | No       | -                     |
| `continue_motion_max_frames` | Maximum number of frames to continue from previous motion (default: 5, step: 4)                                                                                                                                | INT                  | Yes      | 1 to MAX\_RESOLUTION  |
| `background_video`           | Background video to composite with generated content                                                                                                                                                           | IMAGE                | No       | -                     |
| `character_mask`             | Mask defining character regions for selective processing                                                                                                                                                       | MASK                 | No       | -                     |
| `continue_motion`            | Previous motion sequence to continue from for temporal consistency                                                                                                                                             | IMAGE                | No       | -                     |
| `video_frame_offset`         | The amount of frames to seek in all the input videos. Used for generating longer videos by chunk. Connect to the video\_frame\_offset output of the previous node for extending a video. (default: 0, step: 1) | INT                  | Yes      | 0 to MAX\_RESOLUTION  |

**Parameter Constraints:**

* When `pose_video` is provided, the output length will be adjusted to match the pose video duration if the `trim_to_pose_video` logic is active (currently set to `False` in the source code)
* `face_video` is automatically resized to 512x512 resolution and normalized to a range of -1.0 to 1.0 when processed
* `continue_motion` frames are limited by the `continue_motion_max_frames` parameter; only the last `continue_motion_max_frames` frames from the input are used
* Input videos (`face_video`, `pose_video`, `background_video`, `character_mask`) are offset by `video_frame_offset` before processing; if the offset exceeds the video length, the input is ignored
* If `character_mask` contains only one frame, it will be repeated across all frames
* When `clip_vision_output` is provided, it's applied to both positive and negative conditioning
* If `reference_image` is not provided, a black image (all zeros) is used as the default reference
* If `continue_motion` is not provided, the initial frames are filled with gray (0.5 intensity) noise

## Outputs

| Output Name          | Description                                                                                                                                                                                  | Data Type    |
| -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------ |
| `positive`           | Modified positive conditioning with additional video context including CLIP vision output, pose video latent, face video pixels, concatenated latent image, and concatenated mask            | CONDITIONING |
| `negative`           | Modified negative conditioning with additional video context including CLIP vision output, pose video latent, face video pixels (inverted), concatenated latent image, and concatenated mask | CONDITIONING |
| `latent`             | Generated video content in latent space format with shape \[batch\_size, 16, latent\_length + trim\_latent, latent\_height, latent\_width]                                                   | LATENT       |
| `trim_latent`        | Latent space trimming information indicating the number of latent frames to trim from the beginning (corresponds to reference image latent frames)                                           | INT          |
| `trim_image`         | Image space trimming information for reference motion frames, indicating the number of image frames to trim from the beginning                                                               | INT          |
| `video_frame_offset` | Updated frame offset for continuing video generation in chunks, calculated as the previous offset plus the generated length                                                                  | INT          |

> This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! [Edit on GitHub](https://github.com/Comfy-Org/embedded-docs/blob/main/comfyui_embedded_docs/docs/WanAnimateToVideo/en.md)

***

**Source fingerprint (SHA-256):** `2ec2afbc57f58a5b7ce0ecc3730618633d435439ce2d650b18be531c1edddff0`