Skip to main content

Overview

Track objects across video frames using SAM3’s memory-based tracker. This node processes a sequence of video frames and maintains object identities across frames, using either initial masks or text prompts to define what to track.

Inputs

ParameterDescriptionData TypeRequiredRange
imagesVideo frames as batched imagesIMAGEYesBatched video frames
modelThe SAM3 model to use for trackingMODELYesSAM3 model
initial_maskMask(s) for the first frame to track (one per object). Required if conditioning is not provided.MASKNoOne mask per object
conditioningText conditioning for detecting new objects during tracking. Required if initial_mask is not provided.CONDITIONINGNoText conditioning
detection_thresholdScore threshold for text-prompted detectionFLOATNo0.0 to 1.0 (default: 0.5)
max_objectsMax tracked objects. Initial masks count toward this limit. 0 uses the internal cap of 64.INTNo0 to 64 (default: 0)
detect_intervalRun detection every N frames (1=every frame). Higher values save compute.INTNo1 to unlimited (default: 1)
Note: Either initial_mask or conditioning must be provided. If both are omitted, the node will raise an error.

Outputs

Output NameDescriptionData Type
track_dataTracking data containing object masks and metadata across all video framesSAM3TrackData
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 36ee256c46ea3816be4d06b64d945b79af530032f29e5e4c8741971c7ebf9fae