What you'll learn
- What hard negative mining is and why it boosts metric learning and detection.
- The difference between random, semi-hard, and hard negatives.
- How to add mining to triplet/contrastive training and object detection (OHEM).
- Batch construction tips, sanity checks, and monitoring.
Who this is for
- Computer Vision Engineers building retrieval, face/product matching, or detection systems.
- ML practitioners improving embedding quality and training efficiency.
Prerequisites
- Basic deep learning (CNNs/transformers) and training loops.
- Understanding of embeddings and similarity (cosine/Euclidean).
- Familiarity with triplet or contrastive loss is helpful.
Why this matters at work
- Face or product matching: most pairs are easy; mining focuses learning on confusing lookalikes.
- Image retrieval: improves recall@K by enlarging margins around decision boundaries.
- Object detection: OHEM reduces false positives by training on tough background proposals.
Concept explained simply
Hard negative mining means deliberately training on negative examples that are deceptively similar to the anchor (or class of interest). These are the mistakes your model is most likely to make, so fixing them gives the biggest accuracy gains.
Mental model: border guards and lookalikes
Imagine guards checking IDs: thousands are obviously correct, but a few lookalikes are tricky. Training guards on the tricky cases makes them better at telling apart genuine vs. impostors. In embeddings, tricky negatives are those close to the anchor but not the same class.
Core mining strategies
Random negatives
Pick any negative example. Simple and stable, but often too easy—low learning signal once the model improves.
Semi-hard negatives
Negatives that are hard enough to violate the margin (or sit close to it) but not extreme outliers. Common default: balances learning signal and stability.
Hard negatives
The closest negatives to the anchor in embedding space. Strong signal but may include noisy labels or outliers; use with care and curriculum.
Batch-hard mining
Within a mini-batch, select the hardest positive and hardest negative per anchor (e.g., "batch-hard triplet"). Requires multiple samples per class in each batch.
Distance-weighted sampling
Sample negatives with probability inversely related to their distance distribution, avoiding extreme outliers while focusing on informative pairs.
OHEM (Online Hard Example Mining) for detection
From many region proposals, select those with highest loss for training. Cuts easy background, focuses on false positives and borderline cases.
Losses and when to mine
- Triplet loss: sample anchor-positive-negative triplets; mining decides which negatives to use.
- Contrastive/NT-Xent: mining determines which negative pairs contribute most to the objective.
- Classification (detection): OHEM chooses proposals with top classification/regression loss.
Worked examples
Example 1: Face recognition with triplet loss
Anchor A: Person X image. Positive P: another Person X image. Candidate negatives N: images of other people.
- Random negative: any other person; often far from A.
- Semi-hard negative: same lighting/pose as A, different person, distance just within margin.
- Hard negative: near-duplicate lookalike of A from a different identity.
Training impact: Semi-hard stabilizes convergence; hard negatives rapidly improve boundary but may need label audit and careful LR/schedule.
Example 2: Product image retrieval
Goal: same product should be nearest neighbors; other products should be far.
- Negatives from same category (e.g., two black sneakers) are more informative than different categories (sneaker vs. toaster).
- Mining picks visually similar but different SKUs to expand the margin in confusing subspaces (color, silhouette).
Result: Higher recall@1 and recall@5 after focusing on category-level lookalikes.
Example 3: Object detection OHEM
Given thousands of proposals per image, you train on the subset with highest loss (e.g., background windows falsely classified as object). This reduces false positives by forcing the classifier to learn fine-grained distinctions.
Step-by-step: add hard negative mining
- Prepare batches: Ensure each batch has multiple instances per class (e.g., 4–8 identities, 4–8 images each) for effective in-batch mining.
- Compute embeddings: Forward pass the whole batch.
- Build pair/triplet candidates: For each anchor, find positives and candidate negatives in the batch.
- Select negatives: Choose semi-hard or hard negatives per anchor using distance thresholds or top-k nearest.
- Compute loss: Triplet/contrastive loss using selected pairs/triplets.
- Train with safeguards: Start with semi-hard, add a small fraction of hardest negatives later (curriculum). Monitor collapse signals.
# Pseudocode sketch
for batch in loader:
E = model(batch.images) # embeddings [B, D]
D = pairwise_distance(E) # [B, B]
triplets = []
for anchor in classes_in_batch:
P = positives(anchor)
N = negatives(anchor)
# semi-hard: d(a,n) < d(a,p) + margin, but not the absolute closest outlier
for p in P:
n_candidates = [n for n in N if D[a,n] < D[a,p] + margin]
if n_candidates:
n = choose_closest(n_candidates) # or distance-weighted sample
triplets.append((a, p, n))
loss = triplet_loss(E, triplets)
loss.backward(); optimizer.step()
Common mistakes and self-check
- Picking only the hardest negatives from the start, causing instability or collapse.
- Too few samples per class per batch; miner cannot find meaningful positives.
- Mining across noisy labels; hard negatives may actually be mislabeled positives.
- Ignoring distribution drift; mined negatives may overfit narrow visual cues.
Self-checks:
- At least 2–4 samples per class appear in each batch.
- A healthy fraction of pairs/triplets have non-zero loss (not all zero, not all exploding).
- Validation recall@K improves; false positives decrease in detection.
- No sudden embedding collapse (all vectors similar) during training.
Practical projects
- Build a small face verification model with batch-hard triplets; report ROC AUC and FAR@FRR.
- Product retrieval on a subset of a catalog; compare random vs. semi-hard vs. distance-weighted negatives by recall@1/5.
- Train a detector with and without OHEM; compare false positive rate at a fixed precision.
Exercises
Complete these in order. They mirror the exercises below the lesson and include solutions you can reveal.
- Exercise 1: Implement semi-hard negative selection for triplet loss inside a mini-batch. Define how you filter candidates and pick one negative per anchor-positive pair.
- Exercise 2: Outline OHEM for a detector’s classification head: how to collect proposals, compute losses, and choose the top-N hard examples per image.
- I can compute in-batch pairwise distances efficiently.
- I can select negatives using margin-based rules.
- I can describe OHEM selection and integrate it into a training step.
Mini challenge
You train a fashion retrieval model. Early epochs show almost all triplets have zero loss, but validation recall@1 is low. What change would you try first, and why?
Show guidance
Increase per-class samples per batch and switch from random to semi-hard mining (or distance-weighted sampling). You need informative negatives; zero-loss means pairs are too easy.
Learning path
- Before: Embedding basics, distance metrics, and normalization.
- Now: Hard negative mining (this lesson) applied to metric learning and detection.
- Next: Curriculum strategies, proxy-based losses, and scalable miners (memory banks, ANN search).
Next steps
- Integrate semi-hard mining into your current project; log triplet counts and recall@K.
- Trial a small fraction of hardest negatives after stability is reached.
- For detection, try OHEM with a cap per image to avoid overfitting anomalies.
Quick test
The quick test below is available to everyone. Only logged-in users will have their progress saved.