- End-to-End Character Animation: Drive a still character image with motion from a driving video
- Two Modes: Animation Mode (character performs the motion) and Replacement Mode (swap tracked person with reference character)
- Long Video Support: Chunk-based extended generation with frame overlap between segments
- Built-in ComfyUI Nodes: Uses native
WanSCAILToVideo,SCAIL2ColoredMask, andSAM3tracking — no custom nodes required beyond standard model downloads
SCAIL-2 Character Replacement Workflow
Run in Comfy Cloud
Open in Comfy Cloud
Download Workflow
Download JSON or search “SCAIL-2” in Template Library
How the Workflow Works
This workflow uses two subgraph nodes — a Base subgraph (first segment) and an Extend subgraph (subsequent segments) — to support character animation for both short and long videos.- Load a driving video (
pose_video) and a reference character image - Base subgraph processes the first segment (81 frames by default)
- Extend subgraph processes additional segments 2+, chaining
previous_framesfrom the prior segment - Preview the result and save
Learn about Subgraph
This workflow uses Subgraph nodes for modular processing. Check out the Subgraph documentation to learn how to customize and extend the workflow.
Long Videos
For longer videos, calculate the number of segments:ceil(total_frames / 76). Each segment except the first uses the Extend subgraph. Duplicate the Extend node for more segments, chain the previous_frames output, and increment segment_index.
Note: WanSCAILToVideo cannot queue all segments automatically — run each segment manually.
Two Modes
| Mode | replace_mode | Driving Mask BG | Summary |
|---|---|---|---|
| Replacement | true (default) | White | Swap the tracked person in the driving video with the reference character |
| Animation | false | Black | Reference character performs the driving motion |
replace_mode parameter on both subgraph nodes.
Inputs and Parameters
Shared Parameters (Base & Extend)
| Parameter | Description |
|---|---|
pose_video | The driving video containing motion to transfer |
reference_image | The character image to animate or insert |
prompt | Output video description |
replace_mode | true = Replacement, false = Animation |
segment_index | 1 for first chunk, 2+ for continuation. Pose offset = 76 × (index − 1) |
width / height | Output resolution, e.g. 896×512. Must be divisible by 16 |
frame_count | Frames per segment (default: 81) |
previous_frame_count | Overlap frames between segments (default: 5) |
pose_strength / pose_start / pose_end | Pose conditioning strength and timing |
SAM3 Tracking (two inputs)
Thesam3_video_object and sam3_image_object inputs control the SAM3 mask tracking — not the SCAIL-2 output prompt. These determine which objects are tracked for the colored masks:
| Input | Target | Output |
|---|---|---|
sam3_video_object | Driving video | pose_video_mask |
sam3_image_object | Reference image | reference_image_mask |
- Use open-vocabulary text (default:
human) - Use the same term when the subject is the same across video and reference
- Use different terms if the video and reference need different focus (e.g., crowded scenes)
Model Installation
Update ComfyUI to the latest version first for the built-in WanSCAILToVideo and SCAIL2ColoredMask nodes.Required Models
diffusion_models text_encoders (choose one) clip_vision vae loras- lightx2v_I2V_14B_480p_cfg_step_distill_rank64_bf16.safetensors
- wan2.1_SCAIL_2_DPO_lora_bf16.safetensors