Skip to main content

SAM3 Detect Node

Overview

The SAM3 Detect node performs open-vocabulary detection and segmentation using text descriptions, bounding boxes, or point prompts. It can identify and segment objects in an image based on what you describe in text, where you draw boxes, or where you click points.

Inputs

ParameterDescriptionData TypeRequiredRange
modelThe SAM3 model to use for detection and segmentationMODELYes-
imageThe input image to processIMAGEYes-
conditioningText conditioning from CLIPTextEncode. Required when using text prompts for detectionCONDITIONINGNo-
bboxesBounding boxes to segment within. Can be a single box (applied to all frames), a list of boxes (applied to all frames), or a list of lists (per-frame boxes). When provided without text conditioning, the node segments inside each boxBOUNDING_BOXNo-
positive_coordsPositive point prompts as JSON format [{"x": int, "y": int}, ...] using pixel coordinates. These are points you want to include in the segmentationSTRINGNo-
negative_coordsNegative point prompts as JSON format [{"x": int, "y": int}, ...] using pixel coordinates. These are points you want to exclude from the segmentationSTRINGNo-
thresholdConfidence threshold for text-based detections. Only detections with scores above this value are kept (default: 0.5)FLOATNo0.0 to 1.0
refine_iterationsNumber of SAM decoder refinement passes. Higher values can improve mask quality. Set to 0 to use raw detector masks without refinement (default: 2)INTNo0 to 5
individual_masksWhen enabled, outputs separate masks for each detected object instead of combining them into a single mask (default: False)BOOLEANNoTrue/False

Parameter Constraints and Notes

  • Text prompts: To use text-based detection, you must provide conditioning input. When text conditioning is provided, the node runs text-guided detection on the image.
  • Box prompts: When bboxes are provided without text conditioning, the node segments the area inside each bounding box.
  • Point prompts: When positive_coords or negative_coords are provided, the node uses point-based segmentation. Points are scaled to the model’s internal resolution automatically.
  • Multiple prompt types: You can combine different prompt types. For example, you can provide both text conditioning and bounding boxes to restrict text detection to specific areas.
  • Batch processing: The node supports batched images. When processing multiple frames, bounding boxes can be provided per-frame using a list of lists format.
  • JSON format for points: Point coordinates must be provided as valid JSON strings in the format [{"x": 100, "y": 200}, {"x": 150, "y": 250}].

Outputs

Output NameDescriptionData Type
masksSegmentation masks. When individual_masks is False (default), returns a single combined mask per frame. When True, returns individual masks for each detected objectMASK
bboxesDetected bounding boxes with coordinates and confidence scores. Each box includes x, y, width, height, and score valuesBOUNDING_BOX
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 3f61343c284c249476f2010831863c6094260b11d0a348003b270a126c67d399