SAM 3 (Segment Anything Model 3) is Meta’s unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks. Compared to its predecessor SAM 2, SAM 3 introduces the ability to exhaustively segment all instances of an open-vocabulary concept specified by a short text phrase. SAM 3.1 Multiplex is the latest checkpoint release that introduces a shared-memory approach for joint multi-object tracking — significantly faster without sacrificing accuracy. SAM 3.1 is natively supported in ComfyUI (PR #13408), and the model weights are available under the SAM License. SAM 3 GitHub | Paper (arXiv) | 🤗 Model Hub SAM 3.1 segments and tracks objects across video frames based on text prompts. The example above shows the segmentation output with masks applied to the target objects throughout the video.Documentation Index
Fetch the complete documentation index at: https://docs.comfy.org/llms.txt
Use this file to discover all available pages before exploring further.
Key strengths
- Text-driven segmentation — describe what to segment in natural language, no need for manual point/box annotations
- Image and video support — works on both single images and video sequences with tracking across frames
- Multi-object support — segment and track multiple objects simultaneously using comma-separated prompts
- Open-vocabulary — handles a vastly larger set of open-vocabulary concepts than prior works
Limitations: The model has a max token limit of 32 for text prompts. For best results, keep prompts short and specific to the target object.
SAM 3.1 Segment Workflows
1. Download Workflow
Update your ComfyUI to the latest version, then go toWorkflow -> Browse Templates and find the SAM 3.1 workflows under the Utility category.
Video Segmentation:
Download JSON Workflow File
Download video workflow
Run on Comfy Cloud
Open in cloud
Download JSON Workflow File
Download image workflow
Run on Comfy Cloud
Open in cloud
2. Download Models
The SAM 3.1 model is hosted on the Comfy-Org SAM 3.1 model repository. Place it in the following directory structure:3. Using the Workflows
Image Segmentation:- Image — Load an image via the
Load Imagenode (place it in the ComfyUIinput/folder) - Object Prompt — A short text description of the object(s) to segment, e.g.
person,car,cat - The output is a mask applied to the image, with an RGBA preview showing the segmentation result
- Video — Load a video via the
Load Videonode - Object Prompt — Same as image, a short text prompt describing what to track and segment across frames
- The output provides masks and bounding boxes for each frame
| Prompt | Role |
|---|---|
| SAM3 object prompt | Short description of what to segment. Max 32 tokens. |
:N to specify the max amount of objects detected per prompt:
eye:2, window panels:4
Learn about Subgraph
This workflow uses Subgraph nodes for modular processing. Check out the Subgraph documentation to learn how to customize and extend the workflow.
Additional Notes
- Keep prompts short and specific — the model has a 32-token limit per prompt
- Multi-object detection — use commas to separate different object types, and
:Nto cap detections per type - Segmentation masks — the output mask can be used as input to other workflows (e.g., inpainting, background removal)
- Update required — make sure ComfyUI is updated to the latest version to access SAM 3.1 support