Topic Not Found

Why this matters

Computer Vision Engineers rely on vector indexes to search, compare, and organize image embeddings at scale. Typical tasks:

Image similarity search: find visually similar products, scenes, or faces from millions of images.
Duplicate/near-duplicate detection to clean datasets and catalogs.
Content moderation: fast lookups against known risky content embeddings.
Recommendation: retrieve visually similar items to power related products.

Choosing the right index can cut latency from seconds to milliseconds and reduce memory costs, while maintaining recall and quality.

Concept explained simply

An embedding turns an image into a vector (a list of numbers). A vector index is a data structure that makes finding the nearest vectors fast. Different index types trade memory, speed, and accuracy.

Key terms

Distance metric: how we measure similarity (e.g., cosine, L2, inner product).
Recall@k: fraction of true nearest neighbors found among top-k results.
Latency: time to answer a query (watch P95/P99, not just average).
ANN: Approximate Nearest Neighbor search. Faster than brute-force, may miss some neighbors.
Filtering: restricting search by metadata (e.g., category=“shoes”).

Mental model

Imagine a huge library of image descriptors. A flat scan reads every card to find the closest ones: accurate but slow. An inverted file (IVF) first jumps to a shelf (coarse cluster) where good matches likely live, then looks only there. HNSW builds a multi-level map to hop quickly toward close neighbors. PQ (Product Quantization) makes each card smaller by compressing it, speeding memory-bound searches at the cost of some precision.

The building blocks

Flat (brute-force): exact results; scales poorly; good baseline and small collections.
IVF (coarse quantizer): partitions vectors into nlist clusters; search with nprobe clusters.
HNSW: graph-based; excellent latency/recall; supports dynamic inserts; memory overhead per vector.
PQ/OPQ: compress vectors to compact codes; pair with IVF to reduce RAM and speed I/O.
Metrics: cosine (often normalize vectors first), L2, inner product. For normalized vectors, L2 and cosine rankings align.
Dimensionality reduction: PCA to reduce dimension (e.g., 512 → 256) with slight recall loss but big memory wins.

Worked examples

Example 1: Near-duplicate product images

Goal: mark near-duplicates in a 2M image catalog with 384-d embeddings.

Normalize embeddings to unit length.
Use cosine similarity; threshold ~0.9 for near-dup (adjust after sampling).
Index choice: HNSW for high recall and dynamic updates. Start with M=16, efConstruction=200, efSearch=64.
Process: query each image; if a neighbor above threshold is found, link as duplicate.

Why not IVF-PQ?

IVF-PQ is great for huge scales and memory savings but may slightly lower recall around tight thresholds. If dedup precision matters, start with HNSW. Revisit once metrics are stable and memory pressure grows.

Example 2: 1M images, 512-d, fast search API

Constraints: P95 latency < 50 ms; recall@10 ≥ 0.95.
Candidate: HNSW (good latency and recall). Try M=16–32; efSearch=64–128.
Memory estimate: embeddings 1,000,000 × 512 × 4 B ≈ 2.05 GB; plus HNSW overhead (~64–128 B/vec) ≈ 64–128 MB; total ~2.1–2.2 GB.
Outcome: likely meets latency on a single machine; tune efSearch upward if recall is low.

Example 3: 10M images, 256-d, memory constrained

Constraints: 32 GB RAM; P95 < 120 ms; recall@10 ≥ 0.9.
Candidate: IVF-PQ with OPQ.
Setup: nlist ≈ sqrt(10M) ≈ 3162 (round to 4096). PQ: m=16, 8 bits/code → 16 B per vector code. Coarse centroids and codebooks add overhead.
Memory estimate: 10M × 16 B ≈ 160 MB codes + IVF lists and centroids (hundreds of MB). Fits in RAM with headroom.
Tune: nprobe 32–256; increase nprobe to raise recall; re-ranking top candidates using original vectors if needed.

How to choose an index (quick playbook)

Step 1 — Metric and normalization

If using CLIP-like embeddings: L2-normalize and use cosine similarity (or inner product with normalized vectors).
Keep metric consistent across training, evaluation, and serving.

Step 2 — Scale and update pattern

< 1M vectors: HNSW or Flat (if latency budget allows).
1M–50M: HNSW (if RAM ok) or IVF-PQ for memory efficiency.
Frequent inserts/deletes: HNSW handles dynamic updates better than IVF-PQ.

Step 3 — Memory and compression

Estimate raw: N × dim × 4 bytes (float32).
Need big savings? Use PQ/OPQ, possibly with on-disk storage + caching.
Optional PCA down to 256 or 192 dims if recall impact is acceptable.

Step 4 — Tuning knobs

HNSW: increase efSearch for higher recall; M tunes graph connectivity.
IVF: increase nprobe to visit more clusters; increase nlist for finer partitions.
PQ: increase code size (e.g., m=16 vs 8) for better recall at higher memory.

Step 5 — Filtering and re-ranking

If metadata filters are common: pre-filter candidate partitions or run vector search then filter and, if needed, broaden probe.
Re-rank top candidates using original (non-quantized) vectors to recover accuracy.

Common mistakes and self-check

Mistake: Mixing metrics (e.g., cosine in training, L2 in serving). Self-check: verify normalization and metric match end-to-end.
Mistake: Underestimating memory overhead. Self-check: include index structure overhead (graph links, centroids, codes).
Mistake: Tuning only for average latency. Self-check: verify P95/P99 under realistic concurrency.
Mistake: Ignoring cold-start. Self-check: warm caches or pre-load index segments; measure first-query latency.
Mistake: Skipping re-ranking after PQ. Self-check: compare recall/precision with and without exact re-score on top hits.
Mistake: Too few evaluation queries. Self-check: use diverse queries, categories, and edge cases; report recall@k and nDCG.

Exercises

These mirror the tasks in the Exercises panel below.

Exercise 1 — Choose the right index for a 5M-image search

Scenario: 5,000,000 images, 384-d float32 embeddings; target P95 < 100 ms; recall@10 ≥ 0.9; RAM 64 GB; frequent filter by region and category; daily inserts of 50k images.

Estimate raw memory for embeddings.
Pick metric and normalization.
Propose an index type and parameters.
Account for filtering and updates.
Provide a brief justification and memory estimate including index overhead.

Deliverable: a short plan (bullets) with numbers.

Exercise 2 — Tune recall vs latency

Choose either HNSW or IVF-PQ and propose a tuning plan to reach recall@10 ≥ 0.92 with P95 < 80 ms on 2M vectors (512-d). Include 3 parameter trials and your expected trend for recall and latency. Specify when you would add re-ranking.

List parameters for each trial (e.g., HNSW efSearch; IVF nprobe; PQ code size).
Predict which trial hits the target first.

Practical projects

Visual product search: build an HNSW index for a 1M catalog; measure recall@10 vs efSearch; add category filter and re-ranking.
Dataset deduplication: use cosine similarity thresholds to cluster near-duplicates; compare HNSW vs Flat on a 200k sample.
Landmark retrieval: compress with IVF-OPQ; test nlist and nprobe settings; evaluate memory savings and P95 latency.

Who this is for

Computer Vision Engineers building retrieval, deduplication, or recommendation features.
ML practitioners deploying embedding-based search services.

Prerequisites

Basics of embeddings and similarity metrics (cosine, L2).
Comfort with batch processing and evaluating recall/latency.

Learning path

Start: Understand metrics and normalization.
Then: Compare Flat, HNSW, IVF, and PQ trade-offs.
Next: Practice tuning efSearch/nprobe and adding re-ranking.
Finally: Evaluate with recall@k and P95/P99 latency under load.

Next steps

Instrument your service for recall@k and tail latency.
Design a small offline grid search for index parameters.
Plan an A/B test to validate quality improvements.

Mini challenge

Design a migration plan from HNSW to IVF-PQ for a 20M-image system under 32 GB RAM. Specify how you will maintain recall during the switch, which parameters you will sweep first, and how you will roll back if quality drops.

Quick test

Take the quick test below to check your understanding. Everyone can take it for free; only logged-in users will have their progress saved.

Menu

Vector Index Concepts

Table of Contents