Overview
Track objects across video frames using SAM3’s memory-based tracker. This node processes a sequence of video frames and maintains object identities across frames, using either initial masks or text prompts to define what to track.Inputs
| Parameter | Description | Data Type | Required | Range |
|---|---|---|---|---|
images | Video frames as batched images | IMAGE | Yes | Batched video frames |
model | The SAM3 model to use for tracking | MODEL | Yes | SAM3 model |
initial_mask | Mask(s) for the first frame to track (one per object). Required if conditioning is not provided. | MASK | No | One mask per object |
conditioning | Text conditioning for detecting new objects during tracking. Required if initial_mask is not provided. | CONDITIONING | No | Text conditioning |
detection_threshold | Score threshold for text-prompted detection | FLOAT | No | 0.0 to 1.0 (default: 0.5) |
max_objects | Max tracked objects. Initial masks count toward this limit. 0 uses the internal cap of 64. | INT | No | 0 to 64 (default: 0) |
detect_interval | Run detection every N frames (1=every frame). Higher values save compute. | INT | No | 1 to unlimited (default: 1) |
initial_mask or conditioning must be provided. If both are omitted, the node will raise an error.
Outputs
| Output Name | Description | Data Type |
|---|---|---|
track_data | Tracking data containing object masks and metadata across all video frames | SAM3TrackData |
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
Source fingerprint (SHA-256):
36ee256c46ea3816be4d06b64d945b79af530032f29e5e4c8741971c7ebf9fae