Skip to main content
Make sure your ComfyUI is updated.Workflows in this guide can be found in the Workflow Templates. If you can’t find them in the template, your ComfyUI may be outdated. (Desktop version’s update will delay sometime)If nodes are missing when loading a workflow, possible reasons:
  1. You are not using the latest ComfyUI version (Nightly version)
  2. Some nodes failed to import at startup
SDPose is a whole-body pose detection model that extracts human keypoints from images and videos. Combined with the RT-DETRv4 object detector, it supports multi-person detection and out-of-domain (OOD) pose estimation, making it a versatile tool for animation pipelines, pose-driven generation, and motion tracking workflows. SDPose + RT-DETRv4 are natively supported in ComfyUI (PR #12748). The model weights are available on Hugging Face. SDPose Model on Hugging Face | RT-DETRv4 Paper (arXiv) | SDPose Paper (arXiv)

Key strengths

  • Whole-body keypoints — detects body, hands, face, and feet keypoints in a unified model
  • Multi-person support — detects and labels multiple people in a single image or video
  • Configurable outputs — choose which body parts to visualize (body, hands, face, feet) and control stick/font size
  • Bounding box detection — includes object detection with tunable thresholds and class selection
  • Image and video support — dedicated workflows for single images, videos, and OOD pose estimation
Limitations: Detection accuracy depends on image resolution and subject visibility. Extremely occluded or very small subjects may produce fewer keypoints.

SDPose Workflows

Four workflows are available depending on your use case:
WorkflowInputOutputUse Case
Multi-Person (Image)Single imagePose map + BBoxesPhotos with multiple people
Multi-Person (Video)VideoPer-frame pose map + BBoxesVideo pose tracking
OOD Image to PoseSingle imagePose mapStyle transfer / image-to-pose
OOD Video to Pose MapVideoPer-frame pose mapVideo-to-pose animation

1. Download Workflows

Update your ComfyUI to the latest version, then go to Workflow -> Browse Templates and find SDPose workflows under the Utility category.

Multi-Person (Image)

Run in Comfy Cloud

Download Image Workflow

Download JSON

Multi-Person (Video)

Run in Comfy Cloud

Download Video Workflow

Download JSON

OOD Image to Pose

Run in Comfy Cloud

Download OOD Image Workflow

Download JSON

OOD Video to Pose Map

Run in Comfy Cloud

Download OOD Video Workflow

Download JSON

2. Download Models

The SDPose and RT-DETRv4 model checkpoints are hosted on the Comfy-Org SDPose model repository. checkpoints (SDPose model): diffusion_models (RT-DETRv4 detector): Place them in the following directory structure:
📂 ComfyUI/
└── 📂 models/
    ├── 📂 checkpoints/
    │   └── sdpose_wholebody_fp16.safetensors
    └── 📂 diffusion_models/
        ├── rt_detr_v4-x-hgnet_fp16.safetensors
        └── rt_detr_v4-x-hgnet_fp32.safetensors

3. Using the Workflows

Multi-Person (Image)

  • Input — Load an image via the Load Image node. Use an image with one or more people (example: group_photo.png).
  • Detection — The Image to Pose Map (SDPose Multi-Person) subgraph processes the image and outputs:
    • IMAGE — pose skeleton visualization overlaid on the image
    • keypoints — raw whole-body keypoint data
    • bboxes — bounding box coordinates
  • Drawing Options — Configure which body parts to draw:
    • draw_body, draw_hands, draw_face, draw_feet — toggle visibility
    • stick_width, face_point_size — adjust visual style
    • score_threshold — minimum confidence for displaying keypoints
  • Detection Options:
    • resize_type.longer_size — scale the longer dimension before detection
    • max_detections — maximum number of people to detect
    • detect_threshold — detection confidence threshold
    • detect_class — object class to detect (default: person)

Multi-Person (Video)

Same as the image workflow but processes video frames sequentially. Use Load Video to input a video file and Save Video to export the result.

OOD Image to Pose

Uses the SDPose model to generate a clean pose map from an image, without bounding box visualization. This is useful for style transfer where you want to extract the skeleton pose from one image and apply it to another.

OOD Video to Pose Map

Generates per-frame pose maps from a video. The output is a video file where each frame contains the extracted pose skeleton, suitable for downstream animation or ControlNet workflows.

Learn about Subgraph

These workflows use Subgraph nodes for modular processing. Check out the Subgraph documentation to learn how to customize and extend the workflows.

Additional Notes

  • Model directory — the SDPose checkpoint goes in models/checkpoints/, and the RT-DETRv4 detector goes in models/diffusion_models/
  • Input image example — the group_photo.png file is available in the workflow template’s input/ directory for testing
  • Keypoint output — the POSE_KEYPOINT type can be connected to downstream nodes that accept pose data for conditional generation
  • Update required — SDPose + RT-DETRv4 support is available in recent ComfyUI versions. Make sure your ComfyUI is up to date.