Key strengths
- Whole-body keypoints — detects body, hands, face, and feet keypoints in a unified model
- Multi-person support — detects and labels multiple people in a single image or video
- Configurable outputs — choose which body parts to visualize (body, hands, face, feet) and control stick/font size
- Bounding box detection — includes object detection with tunable thresholds and class selection
- Image and video support — dedicated workflows for single images, videos, and OOD pose estimation
Limitations: Detection accuracy depends on image resolution and subject visibility. Extremely occluded or very small subjects may produce fewer keypoints.
SDPose Workflows
Four workflows are available depending on your use case:| Workflow | Input | Output | Use Case |
|---|---|---|---|
| Multi-Person (Image) | Single image | Pose map + BBoxes | Photos with multiple people |
| Multi-Person (Video) | Video | Per-frame pose map + BBoxes | Video pose tracking |
| OOD Image to Pose | Single image | Pose map | Style transfer / image-to-pose |
| OOD Video to Pose Map | Video | Per-frame pose map | Video-to-pose animation |
1. Download Workflows
Update your ComfyUI to the latest version, then go toWorkflow -> Browse Templates and find SDPose workflows under the Utility category.
Multi-Person (Image)
Run in Comfy Cloud
Download Image Workflow
Download JSON
Multi-Person (Video)
Run in Comfy Cloud
Download Video Workflow
Download JSON
OOD Image to Pose
Run in Comfy Cloud
Download OOD Image Workflow
Download JSON
OOD Video to Pose Map
Run in Comfy Cloud
Download OOD Video Workflow
Download JSON
2. Download Models
The SDPose and RT-DETRv4 model checkpoints are hosted on the Comfy-Org SDPose model repository. checkpoints (SDPose model): diffusion_models (RT-DETRv4 detector):- rt_detr_v4-x-hgnet_fp16.safetensors (recommended)
- rt_detr_v4-x-hgnet_fp32.safetensors (full precision, larger)
3. Using the Workflows
Multi-Person (Image)
- Input — Load an image via the
Load Imagenode. Use an image with one or more people (example:group_photo.png). - Detection — The
Image to Pose Map (SDPose Multi-Person)subgraph processes the image and outputs:- IMAGE — pose skeleton visualization overlaid on the image
- keypoints — raw whole-body keypoint data
- bboxes — bounding box coordinates
- Drawing Options — Configure which body parts to draw:
draw_body,draw_hands,draw_face,draw_feet— toggle visibilitystick_width,face_point_size— adjust visual stylescore_threshold— minimum confidence for displaying keypoints
- Detection Options:
resize_type.longer_size— scale the longer dimension before detectionmax_detections— maximum number of people to detectdetect_threshold— detection confidence thresholddetect_class— object class to detect (default: person)
Multi-Person (Video)
Same as the image workflow but processes video frames sequentially. UseLoad Video to input a video file and Save Video to export the result.
OOD Image to Pose
Uses the SDPose model to generate a clean pose map from an image, without bounding box visualization. This is useful for style transfer where you want to extract the skeleton pose from one image and apply it to another.OOD Video to Pose Map
Generates per-frame pose maps from a video. The output is a video file where each frame contains the extracted pose skeleton, suitable for downstream animation or ControlNet workflows.Learn about Subgraph
These workflows use Subgraph nodes for modular processing. Check out the Subgraph documentation to learn how to customize and extend the workflows.
Additional Notes
- Model directory — the SDPose checkpoint goes in
models/checkpoints/, and the RT-DETRv4 detector goes inmodels/diffusion_models/ - Input image example — the
group_photo.pngfile is available in the workflow template’sinput/directory for testing - Keypoint output — the POSE_KEYPOINT type can be connected to downstream nodes that accept pose data for conditional generation
- Update required — SDPose + RT-DETRv4 support is available in recent ComfyUI versions. Make sure your ComfyUI is up to date.