Object Tracking
Object tracking follows specific objects across consecutive frames in a video, maintaining a consistent identity for each target over time. While detection runs independently on each frame (answering "what objects are here now?"), tracking links detections across frames (answering "which detection in frame N corresponds to which detection in frame N+1?"). This persistent identity is essential for counting unique objects, analyzing trajectories, and measuring speeds.
Modern tracking methods follow a tracking-by-detection paradigm: a detector (like YOLO) runs on each frame to produce bounding boxes, then an association algorithm links detections across frames using motion prediction (Kalman filter), appearance features (Re-ID embeddings), or both. ByteTrack associates detections in two passes (high-confidence first, then low-confidence), achieving strong results with minimal overhead. BoT-SORT adds camera motion compensation and a more robust appearance model. SAM 2 extended the Segment Anything Model to video, enabling prompted object tracking with pixel-level masks.
Applications include traffic monitoring (counting vehicles, measuring speeds), retail analytics (tracking customer movement through stores), sports analysis (following players and ball), surveillance (person tracking across camera networks), and manufacturing (tracking products through assembly stages). Multi-object tracking (MOT) benchmarks evaluate both detection accuracy and identity consistency using metrics like MOTA, IDF1, and HOTA.