SDPose: Pose Detection in ComfyUI

Portable or self deployed users
Desktop or Cloud users

Make sure your ComfyUI is updated.

Workflows in this guide can be found in the Workflow Templates. If you can’t find them in the template, your ComfyUI may be outdated. (Desktop version’s update will delay sometime)If nodes are missing when loading a workflow, possible reasons:

You are not using the latest ComfyUI version (Nightly version)
Some nodes failed to import at startup

SDPose is a whole-body pose detection model that extracts human keypoints from images and videos. Combined with the RT-DETRv4 object detector, it supports multi-person detection and out-of-domain (OOD) pose estimation, making it a versatile tool for animation pipelines, pose-driven generation, and motion tracking workflows. SDPose + RT-DETRv4 are natively supported in ComfyUI (PR #12748). The model weights are available on Hugging Face. SDPose Model on Hugging Face | RT-DETRv4 Paper (arXiv) | SDPose Paper (arXiv)

Key strengths

Whole-body keypoints — detects body, hands, face, and feet keypoints in a unified model
Multi-person support — detects and labels multiple people in a single image or video
Configurable outputs — choose which body parts to visualize (body, hands, face, feet) and control stick/font size
Bounding box detection — includes object detection with tunable thresholds and class selection
Image and video support — dedicated workflows for single images, videos, and OOD pose estimation

Limitations: Detection accuracy depends on image resolution and subject visibility. Extremely occluded or very small subjects may produce fewer keypoints.

SDPose Workflows

Four workflows are available depending on your use case:

Workflow	Input	Output	Use Case
Multi-Person (Image)	Single image	Pose map + BBoxes	Photos with multiple people
Multi-Person (Video)	Video	Per-frame pose map + BBoxes	Video pose tracking
OOD Image to Pose	Single image	Pose map	Style transfer / image-to-pose
OOD Video to Pose Map	Video	Per-frame pose map	Video-to-pose animation

1. Download Workflows

Update your ComfyUI to the latest version, then go to Workflow -> Browse Templates and find SDPose workflows under the Utility category.

Multi-Person (Image)

Run in Comfy Cloud

Download Image Workflow

Download JSON

Multi-Person (Video)

Run in Comfy Cloud

Download Video Workflow

Download JSON

OOD Image to Pose

Run in Comfy Cloud

Download OOD Image Workflow

Download JSON

OOD Video to Pose Map

Run in Comfy Cloud

Download OOD Video Workflow

Download JSON

2. Download Models

The SDPose and RT-DETRv4 model checkpoints are hosted on the Comfy-Org SDPose model repository. checkpoints (SDPose model):

sdpose_wholebody_fp16.safetensors

diffusion_models (RT-DETRv4 detector):

rt_detr_v4-x-hgnet_fp16.safetensors (recommended)
rt_detr_v4-x-hgnet_fp32.safetensors (full precision, larger)

Place them in the following directory structure:

📂 ComfyUI/
└── 📂 models/
    ├── 📂 checkpoints/
    │   └── sdpose_wholebody_fp16.safetensors
    └── 📂 diffusion_models/
        ├── rt_detr_v4-x-hgnet_fp16.safetensors
        └── rt_detr_v4-x-hgnet_fp32.safetensors

3. Using the Workflows

Multi-Person (Image)

Input — Load an image via the Load Image node. Use an image with one or more people (example: group_photo.png).
Detection — The Image to Pose Map (SDPose Multi-Person) subgraph processes the image and outputs:
- IMAGE — pose skeleton visualization overlaid on the image
- keypoints — raw whole-body keypoint data
- bboxes — bounding box coordinates
Drawing Options — Configure which body parts to draw:
- draw_body, draw_hands, draw_face, draw_feet — toggle visibility
- stick_width, face_point_size — adjust visual style
- score_threshold — minimum confidence for displaying keypoints
Detection Options:
- resize_type.longer_size — scale the longer dimension before detection
- max_detections — maximum number of people to detect
- detect_threshold — detection confidence threshold
- detect_class — object class to detect (default: person)

Multi-Person (Video)

Same as the image workflow but processes video frames sequentially. Use Load Video to input a video file and Save Video to export the result.

OOD Image to Pose

Uses the SDPose model to generate a clean pose map from an image, without bounding box visualization. This is useful for style transfer where you want to extract the skeleton pose from one image and apply it to another.

OOD Video to Pose Map

Generates per-frame pose maps from a video. The output is a video file where each frame contains the extracted pose skeleton, suitable for downstream animation or ControlNet workflows.

Learn about Subgraph

These workflows use Subgraph nodes for modular processing. Check out the Subgraph documentation to learn how to customize and extend the workflows.

Additional Notes

Model directory — the SDPose checkpoint goes in models/checkpoints/, and the RT-DETRv4 detector goes in models/diffusion_models/
Input image example — the group_photo.png file is available in the workflow template’s input/ directory for testing
Keypoint output — the POSE_KEYPOINT type can be connected to downstream nodes that accept pose data for conditional generation
Update required — SDPose + RT-DETRv4 support is available in recent ComfyUI versions. Make sure your ComfyUI is up to date.

​Key strengths

​SDPose Workflows

​1. Download Workflows

Multi-Person (Image)

Download Image Workflow

Multi-Person (Video)

Download Video Workflow

OOD Image to Pose

Download OOD Image Workflow

OOD Video to Pose Map

Download OOD Video Workflow

​2. Download Models

​3. Using the Workflows

​Multi-Person (Image)

​Multi-Person (Video)

​OOD Image to Pose

​OOD Video to Pose Map

Learn about Subgraph

​Additional Notes

Key strengths

SDPose Workflows

1. Download Workflows

2. Download Models

3. Using the Workflows

Multi-Person (Image)

Multi-Person (Video)

OOD Image to Pose

OOD Video to Pose Map

Additional Notes