How to learn Object Tracking Basics for Video And Streaming Vision in Computer Vision Engineer for free

Why this matters

Object tracking lets you follow the same physical object across video frames. As a Computer Vision Engineer, you will:

Count and follow people or vehicles across cameras for analytics.
Stabilize IDs for action recognition, sports analytics, and retail flows.
Enable real-time systems (dashcams, robots) to predict where things will be next.
Reduce detector load by predicting positions between detection frames.

Who this is for

Beginners to intermediate engineers in video vision who know detection but are new to tracking.

Prerequisites

Basic linear algebra and probability (vectors, covariance, Gaussian noise).
Familiarity with object detection outputs (bounding boxes, confidence scores).
Comfort with arrays and simple algorithms (sorting, distance metrics).

Concept explained simply

Tracking is keeping the same ID for the same object across frames. Most modern systems use tracking-by-detection: run a detector each frame (or every few frames), then match detections to existing tracks.

Mental model

Imagine each object has a tiny "prediction bubble" that moves forward each frame. New detections land on the scene like pins. You connect each bubble to the closest pin that makes sense. If there is no pin, the bubble drifts forward a bit before you decide it disappeared. If a new pin appears without a bubble, you start a new bubble (a new track).

Core building blocks

State: What you store per track. Common: position and velocity (e.g., x, y, vx, vy) and sometimes box size (w, h).
Prediction: Where the object is expected now (often a Kalman Filter).
Association: Which detection matches which track. Common costs: IoU distance, Mahalanobis distance, appearance similarity.
Assignment: Solve the global matching using the Hungarian algorithm (minimize total cost).
Lifecycle: Create, confirm, update, and delete tracks using rules (min hits to confirm, max age before delete).

Matching costs

IoU-based: Good when boxes overlap well and motion is smooth.
Motion-based: Mahalanobis distance using the predicted covariance (good for gating and occlusion handling).
Appearance-based: Embeddings from a re-identification model to reduce ID switches.

Handling reality

Occlusion: Keep tracks alive for a few frames without detections (max age). Re-link when the object reappears.
False positives: Use confidence thresholds and confirmation logic (e.g., need N hits before declaring a real track).
Real-time: Run the detector every N frames; predict in-between to save compute.

Worked examples

Example 1 — IoU matching with simple track lifecycle

Suppose at frame t we have 2 tracks with boxes:

T1: [100, 50, 40, 40]
T2: [200, 60, 40, 40]

At frame t+1, detections are:

D1: [104, 52, 40, 40]
D2: [241, 60, 40, 40]
D3: [199, 60, 40, 40]

Use IoU as 1 - cost. IoU(T1,D1) is high; IoU(T2,D3) is high; others are small. The Hungarian algorithm picks T1→D1 and T2→D3. D2 remains unmatched → start a new tentative track, say T3. If T3 gets matched again in the next frame, confirm it; otherwise delete it.

Example 2 — One Kalman update for 2D constant velocity

State x = [x, y, vx, vy]^T, dt = 1. Matrices:

F = [[1,0,1,0],[0,1,0,1],[0,0,1,0],[0,0,0,1]]
H = [[1,0,0,0],[0,1,0,0]]
Q = 0.01 I, R = 1.0 I

Prior: x' = [102, 51, 2, 1], P' = F I F^T + Q = [[2.01,0,1,0],[0,2.01,0,1],[1,0,1.01,0],[0,1,0,1.01]]. Measurement z = [101, 52].

Innovation y = z - H x' = [-1, 1]. S = H P' H^T + R = diag(3.01, 3.01).

Kalman gain K ≈ [[0.6678,0],[0,0.6678],[0.3322,0],[0,0.3322]].

Update: x = x' + K y ≈ [101.3322, 51.6678, 1.6678, 1.3322]. Velocities adapt toward the measurement without jumping too much.

Example 3 — Recovering from a brief occlusion

Track T1 is visible at t and t+1, then occluded at t+2 (no detection), then returns at t+3 near where predicted.

Policy: max_age = 2 (keep track alive for up to 2 missed frames), min_hits = 3 (confirm after 3 matches).
At t+2: T1 not matched → increment age_missed = 1; keep predicting.
At t+3: A detection falls within the motion gate (Mahalanobis distance below threshold) → match to T1, reset missed counter, keep ID consistent. Result: no new ID created.

Exercises

These mirror the graded exercises below. Complete them here, then check solutions.

Exercise 1 — Build a simple IoU tracker (no filter)

You have 5 frames of detections (x, y, w, h). Use IoU matching with threshold 0.3, delete tracks if missed for 2 consecutive frames, and create new tracks for unmatched detections. Use a greedy assignment with highest IoU per track (ties broken arbitrarily). Report track IDs per frame.

Detections per frame

F1: A:[10,10,20,20], B:[60,12,20,20]
F2: A:[12,11,20,20], B:[62,12,20,20]
F3: A:[14,12,20,20], B:— (occluded)
F4: A:[16,12,20,20], B:[66,12,20,20]
F5: A:[18,12,20,20], B:[68,13,20,20]

IoU threshold: 0.3
max_age (misses allowed): 2
Report mapping like: F1 T1→A, T2→B; F2 T1→A, T2→B; ...

Checklist:
- New track IDs created on first appearance.
- Track for B survives one missing frame (F3) and re-associates on F4.
- No extra IDs appear for A.

Exercise 2 — Associate with the Hungarian algorithm (by hand)

Given the cost matrix C for 3 tracks (rows T1..T3) and 3 detections (columns D1..D3), find the minimum-cost assignment. Costs are 1 - IoU, lower is better.

Cost matrix

T1: [1.00, 1.00, 0.10]
T2: [0.15, 1.00, 1.00]
T3: [1.00, 0.20, 1.00]

Checklist:
- Each track is assigned to at most one detection.
- Total cost is minimized.
- No unassigned pairs have lower alternative cost than chosen pairs.

Common mistakes and self-check

Using IoU alone for fast or small objects: Add motion gating (Mahalanobis) or appearance features.
Too low IoU threshold: Causes ID switches and false links. Start with 0.3–0.5 and tune.
No lifecycle rules: Without min_hits and max_age you get flickering IDs.
Ignoring frame rate: dt must reflect actual time; otherwise predictions drift.
Wrong box format: Mixing [x,y,w,h] vs [x1,y1,x2,y2] produces bad IoU; standardize early.
Unbounded search: Not gating by distance lets far-away detections match; always gate.

Self-check

Visual: Do IDs remain consistent through brief occlusions?
Numerical: Log per-frame matches, unmatched counts, and ID switches.
Sanity: Does average IoU between tracks and matched detections stay high (e.g., >0.5) for stable objects?

Practical projects

Webcam multi-object tracker:
- Run a lightweight detector every 2–3 frames.
- Between detections, predict with a Kalman filter.
- Use IoU+gating to associate; display stable IDs.
Traffic camera analytics:
- Count vehicles by tracking them across a virtual line.
- Handle stops and starts with max_age and min_hits.
- Export per-track trajectories for speed estimation.
Sports player tracking:
- Combine motion and color histograms as appearance features.
- Evaluate IDF1 and tune thresholds to reduce ID switches.

Learning path

Start: IoU-based matching with track lifecycle (this lesson).
Add motion: Kalman filter with Mahalanobis gating.
Scale to MOT: Hungarian assignment and birth/death logic.
Improve IDs: Appearance embeddings and re-identification.
Evaluate: MOTA, IDF1, HOTA; inspect errors where IDs switch.
Optimize: Run detector sparsely; consider SORT/DeepSORT/ByteTrack ideas.

Mini challenge

Design a tracker for people in a hallway camera at 15 FPS on a CPU-only device. Detector can run at most every 3 frames. How will you keep IDs stable at doorways where occlusions happen?

One possible approach

Run detector every 3 frames; predict in-between with Kalman CV model (dt=1/15).
Gating: Mahalanobis threshold tuned via validation; IoU secondary.
Lifecycle: min_hits=3, max_age=6 (about 0.4s tolerance).
Appearance: 64D color+texture embedding to reduce switches at doors.

Next steps

Implement a Kalman-based MOT with Hungarian assignment.
Add re-identification features and compare IDF1 before/after.
Practice on different scenes (crowds, traffic, indoor) to tune thresholds robustly.

Quick Test note

The Quick Test below is available to everyone. Sign in to save your progress and resume later.

Menu

Object Tracking Basics

Table of Contents

Why this matters

Who this is for

Prerequisites

Concept explained simply

Mental model

Core building blocks

Matching costs

Handling reality

Worked examples

Exercises

Exercise 1 — Build a simple IoU tracker (no filter)

Exercise 2 — Associate with the Hungarian algorithm (by hand)

Common mistakes and self-check

Practical projects

Learning path

Mini challenge

Next steps

Quick Test note

Practice Exercises

Build a simple IoU tracker (no filter)

Instructions

Expected Output

Associate with the Hungarian algorithm (by hand)

Object Tracking Basics — Quick Test

Have questions about Object Tracking Basics?

AI Assistant