SDPoseKeypointExtractor - ComfyUI Built-in Node Documentation

The SDPoseKeypointExtractor node detects human pose keypoints from input images using the SDPose model. It can process full images or specific regions defined by bounding boxes and outputs the detected keypoints in the OpenPose format, which includes the coordinates for each person and a confidence score for each keypoint.

Inputs

Parameter	Description	Data Type	Required	Range
`model`	The SDPose model used for keypoint detection. Must be a model with a `heatmap_head` attribute, specifically from the SDPose repository.	MODEL	Yes	-
`vae`	The VAE model used to encode the input images into the latent space for processing.	VAE	Yes	-
`image`	The input image or batch of images from which to extract pose keypoints.	IMAGE	Yes	-
`batch_size`	The number of images to process at once when running in full-image mode (i.e., when `bboxes` is not provided). This can speed up processing. (default: 16)	INT	No	1 to 10000
`bboxes`	Optional bounding boxes for more accurate detections. Required for multi-person detection. If provided, the node will extract keypoints from each specified region.	BOUNDINGBOX	No	-

Parameter Constraints:

The model input must be a specific SDPose model. If the provided model does not have a heatmap_head attribute, the node will raise an error.
The node operates in two distinct modes based on the bboxes input:
1. Bounding Box Mode: When bboxes is provided, it processes each specified region individually. This is required for detecting multiple people in a single image.
2. Full-Image Mode: When bboxes is not provided, it processes the entire image as a batch. The batch_size parameter only applies in this mode.
When bboxes is provided, it must be a list of bounding boxes. If there are fewer bounding box lists than images, the last list is reused for remaining images.
Bounding boxes with zero width or height (e.g., x2 <= x1 or y2 <= y1) are skipped.

Outputs

Output Name	Description	Data Type
`keypoints`	Keypoints in OpenPose frame format (canvas_width, canvas_height, people). The output contains the detected persons, each with an array of keypoint coordinates (x, y) and their corresponding confidence scores.	POSE_KEYPOINT

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 5e199e9d08a46d934c6d1f4b1a411f0ca95eb5e7f85ec39f59a058c9be598e9f

​Inputs

​Outputs

Inputs

Outputs