VOID: Video Inpainting in ComfyUI

Portable or self deployed users
Desktop or Cloud users

Make sure your ComfyUI is updated.

Workflows in this guide can be found in the Workflow Templates. If you can’t find them in the template, your ComfyUI may be outdated. (Desktop version’s update will delay sometime)If nodes are missing when loading a workflow, possible reasons:

You are not using the latest ComfyUI version (Nightly version)
Some nodes failed to import at startup

VOID (Video Object Inpainting and Deletion) is a powerful video inpainting model open-sourced by Netflix. It uses a two-pass diffusion pipeline built on CogVideoX to remove objects from videos and fill the resulting holes with temporally coherent content. VOID removes objects along with all interactions they induce on the scene — not just secondary effects like shadows and reflections, but physical interactions like objects falling when a person is removed. For example, if a person holding a guitar is removed, VOID also removes the person’s effect on the guitar, causing it to fall naturally. VOID is natively supported in ComfyUI (PR #13403), and its complete model weights are available under the Apache 2.0 License. VOID Model - GitHub | Paper (arXiv) | 🤗 Diffusers Pipeline

Before (left) — the original footage with the snowboarder. After (right) — the processed result after removing the snowboarder from the scene. VOID removes unwanted objects while maintaining natural motion, lighting, and scene coherence across frames.

Key strengths

Interaction-aware removal — removes not just the object, but all physical interactions it caused on the scene (shadows, reflections, falling objects)
Object removal, not single-frame patching — produces coherent motion and lighting across the entire clip
Two-pass refinement — Pass 2 provides superior temporal stability (fewer jitters and flashes) compared to Pass 1 alone, especially on longer cuts or textured backgrounds

Limitations: Unclear masks, chaotic motion, or targets that dominate the frame may still produce suboptimal results — prompting cannot fix fundamentally wrong segmentation.

VOID Video Inpainting Workflow

1. Download Workflow

Update your ComfyUI to the latest version, then go to Workflow -> Browse Templates and find “VOID: Video Inpainting” under the Utility category.

Download JSON Workflow File

Download workflow

Run on Comfy Cloud

Open in cloud

2. Download Models

All models are hosted on the Comfy-Org VOID model repository. Diffusion Models — the core two-pass inpainting model:

void_pass2.safetensors — Refinement pass, better temporal stability
void_pass1.safetensors — Primary pass

VAE:

cogvideox_vae.safetensors

Optical Flow:

raft_large_C_T_SKHT_V2-ff5fadd5.safetensors

SAM3 Checkpoint — for segmentation:

sam3.1_multiplex_fp16.safetensors

Text Encoder:

t5xxl_fp16.safetensors

📂 ComfyUI/
├── 📂 models/
│   ├── 📂 checkpoints/
│   │   └── sam3.1_multiplex_fp16.safetensors
│   ├── 📂 text_encoders/
│   │   └── t5xxl_fp16.safetensors
│   ├── 📂 vae/
│   │   └── cogvideox_vae.safetensors
│   ├── 📂 optical_flow/
│   │   └── raft_large_C_T_SKHT_V2-ff5fadd5.safetensors
│   └── 📂 diffusion_models/
│       ├── void_pass2.safetensors
│       └── void_pass1.safetensors

3. Using the Workflow

Inputs:

Source video — Load a video via the Load Video node (place it in the ComfyUI input/ folder)
Positive prompt (inpaint fill) — Describe the scene after removal. Focus on what remains and how it looks, not on what was removed
- Example: empty kitchen counter, daylight, tiles visible
Negative prompt — Optional anti-artifact list; can be left empty
SAM3 object prompt — A short label for what to mask out. SAM3 uses semantic understanding to create a segmentation mask for the target object.
- Example: person in blue jacket, red cup on table
- Max tokens for SAM3 prompts is 32. To prompt multiple subjects separately, separate with commas and use :N to specify the max objects detected per prompt: eye:2, window panels:4

Modes:

Prompt	Role
SAM3 object	What is removed (SAM3 creates the mask via semantic segmentation)
Positive (inpaint)	How the hole is filled across time

Use Pass 2 (refinement pass) for longer clips or textured backgrounds where temporal stability matters. Pass 1 alone is faster but may show more jitter.

Learn about Subgraph

This workflow uses Subgraph nodes for modular video processing. Check out the Subgraph documentation to learn how to customize and extend the workflow.

Additional Notes

Mask quality matters — a clean, tight mask around the target object produces the best results
Prompt writing tip — describe the scene as it should appear naturally after removal, not the removal itself
Use negative prompt only when you see repeating defects (watermarks, blur, extra limbs)
Two-pass workflow — the template runs Pass 1 then Pass 2 automatically; you can also run just Pass 1 for faster iterations during testing

​Key strengths

​VOID Video Inpainting Workflow

​1. Download Workflow