Skip to main content
SCAIL-2 is an end-to-end character animation model built on Wan2.1. It drives a reference character image with a driving video, enabling both character animation (making a character perform the motion) and in-video character replacement (swapping a tracked person in a video with the reference character). Key Features:
  • End-to-End Character Animation: Drive a still character image with motion from a driving video
  • Two Modes: Animation Mode (character performs the motion) and Replacement Mode (swap tracked person with reference character)
  • Long Video Support: Chunk-based extended generation with frame overlap between segments
  • Built-in ComfyUI Nodes: Uses native WanSCAILToVideo, SCAIL2ColoredMask, and SAM3 tracking — no custom nodes required beyond standard model downloads
Related Links:

SCAIL-2 Character Replacement Workflow

Run in Comfy Cloud

Open in Comfy Cloud

Download Workflow

Download JSON or search “SCAIL-2” in Template Library
Make sure your ComfyUI is updated.Workflows in this guide can be found in the Workflow Templates. If you can’t find them in the template, your ComfyUI may be outdated. (Desktop version’s update will delay sometime)If nodes are missing when loading a workflow, possible reasons:
  1. You are not using the latest ComfyUI version (Nightly version)
  2. Some nodes failed to import at startup

How the Workflow Works

This workflow uses two subgraph nodes — a Base subgraph (first segment) and an Extend subgraph (subsequent segments) — to support character animation for both short and long videos.
  1. Load a driving video (pose_video) and a reference character image
  2. Base subgraph processes the first segment (81 frames by default)
  3. Extend subgraph processes additional segments 2+, chaining previous_frames from the prior segment
  4. Preview the result and save

Learn about Subgraph

This workflow uses Subgraph nodes for modular processing. Check out the Subgraph documentation to learn how to customize and extend the workflow.

Long Videos

For longer videos, calculate the number of segments: ceil(total_frames / 76). Each segment except the first uses the Extend subgraph. Duplicate the Extend node for more segments, chain the previous_frames output, and increment segment_index.
Note: WanSCAILToVideo cannot queue all segments automatically — run each segment manually.

Two Modes

Modereplace_modeDriving Mask BGSummary
Replacementtrue (default)WhiteSwap the tracked person in the driving video with the reference character
AnimationfalseBlackReference character performs the driving motion
Set the replace_mode parameter on both subgraph nodes.

Inputs and Parameters

Shared Parameters (Base & Extend)

ParameterDescription
pose_videoThe driving video containing motion to transfer
reference_imageThe character image to animate or insert
promptOutput video description
replace_modetrue = Replacement, false = Animation
segment_index1 for first chunk, 2+ for continuation. Pose offset = 76 × (index − 1)
width / heightOutput resolution, e.g. 896×512. Must be divisible by 16
frame_countFrames per segment (default: 81)
previous_frame_countOverlap frames between segments (default: 5)
pose_strength / pose_start / pose_endPose conditioning strength and timing

SAM3 Tracking (two inputs)

The sam3_video_object and sam3_image_object inputs control the SAM3 mask tracking — not the SCAIL-2 output prompt. These determine which objects are tracked for the colored masks:
InputTargetOutput
sam3_video_objectDriving videopose_video_mask
sam3_image_objectReference imagereference_image_mask
  • Use open-vocabulary text (default: human)
  • Use the same term when the subject is the same across video and reference
  • Use different terms if the video and reference need different focus (e.g., crowded scenes)

Model Installation

Update ComfyUI to the latest version first for the built-in WanSCAILToVideo and SCAIL2ColoredMask nodes.

Required Models

diffusion_models text_encoders (choose one) clip_vision vae loras checkpoints

File Storage Locations

ComfyUI/
├── models/
│   ├── diffusion_models/
│   │   └── wan2.1_14B_SCAIL_2_fp16.safetensors
│   ├── text_encoders/
│   │   └── umt5_xxl_fp8_e4m3fn_scaled.safetensors
│   ├── clip_vision/
│   │   └── clip_vision_h.safetensors
│   ├── vae/
│   │   └── Wan2_1_VAE_bf16.safetensors
│   ├── loras/
│   │   ├── lightx2v_I2V_14B_480p_cfg_step_distill_rank64_bf16.safetensors
│   │   └── wan2.1_SCAIL_2_DPO_lora_bf16.safetensors
│   └── checkpoints/
│       └── sam3.1_multiplex_fp16.safetensors