Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.comfy.org/llms.txt

Use this file to discover all available pages before exploring further.

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Overview

Track objects across video frames using SAM3’s memory-based tracker. This node processes a sequence of video frames and maintains object identities across frames, using either initial masks or text prompts to define what to track.

Inputs

ParameterData TypeRequiredRangeDescription
imagesIMAGEYesBatched video framesVideo frames as batched images
modelMODELYesSAM3 modelThe SAM3 model to use for tracking
initial_maskMASKNoOne mask per objectMask(s) for the first frame to track (one per object). Required if conditioning is not provided.
conditioningCONDITIONINGNoText conditioningText conditioning for detecting new objects during tracking. Required if initial_mask is not provided.
detection_thresholdFLOATNo0.0 to 1.0 (default: 0.5)Score threshold for text-prompted detection
max_objectsINTNo0 to 64 (default: 0)Max tracked objects. Initial masks count toward this limit. 0 uses the internal cap of 64.
detect_intervalINTNo1 to unlimited (default: 1)Run detection every N frames (1=every frame). Higher values save compute.
Note: Either initial_mask or conditioning must be provided. If both are omitted, the node will raise an error.

Outputs

Output NameData TypeDescription
track_dataSAM3TrackDataTracking data containing object masks and metadata across all video frames

Source fingerprint (SHA-256): 36ee256c46ea3816be4d06b64d945b79af530032f29e5e4c8741971c7ebf9fae