Topic Not Found

Why this matters

In real computer vision projects, some classes are rare: cracks vs intact surfaces, pedestrians at night vs daytime, defects vs non-defects, or wildlife species with few sightings. If you ignore imbalance, your model may look accurate overall yet fail on the classes you actually care about.

Safety use case: catching rare defects or hazards.
Product quality: detecting uncommon packaging errors.
Fairness: ensuring minority classes are represented (e.g., medical findings).

Concept explained simply

Class imbalance happens when some labels have many more examples than others. Models trained on such data tend to favor the majority class, missing rare classes. Fixing this starts with measuring the imbalance, then adjusting how you collect, select, and annotate data so training sees a balanced, representative sample.

Mental model

Think of your dataset as a diet for your model. If it mostly eats one food, it will learn to crave and recognize only that. You must plan the diet: add enough of each food (class), decide portions (sampling), and sometimes create healthy substitutes (augmentation) so the model learns all classes well.

How to detect imbalance

Count per class: number of images, instances (for detection), or pixels (for segmentation).
Look for long-tail: a few classes with many samples, many classes with few samples.
Stratify by conditions: day/night, weather, camera type, region, patient demographics. You may have hidden imbalance inside subclasses.
Baseline metrics: report per-class precision, recall, and F1 (macro averages). For rare positives, also track PR AUC, not only accuracy.

Quick checklist: data health for imbalance

Per-class counts computed for train/val/test separately
Stratified split preserved across all sets
At least 50–100 instances per class to start (if feasible)
Rare but critical classes flagged for priority collection

Data-centric strategies that work

1) Stratified splits

Ensure train/val/test maintain similar class proportions to avoid misleading validation results.

2) Targeted data collection

Collect more samples of minority classes: schedule specific routes/times, query sources by conditions, or simulate scenarios (e.g., night scenes).

3) Annotation quotas

Plan batches so each labeling cycle includes a minimum target per rare class. Stop when per-class targets are met.

4) Oversampling and undersampling

Oversample minority examples or undersample majority ones when building training batches. Use caps (e.g., max 5× oversample) to reduce overfitting risk.

5) Targeted augmentation

Apply stronger, realistic augmentations to minority classes (lighting, blur, cutout). Keep them faithful to deployment conditions.

6) Hard negative mining

Include challenging non-target samples that commonly produce false positives. This improves precision on rare classes.

7) Label hierarchy and merging

When classes are too sparse, temporarily merge into a parent class (e.g., rare bird species → "bird"). Split later when data grows.

8) Active learning

Prioritize samples with high uncertainty or predicted minority classes for annotation. This uses labeler time efficiently.

When to use which strategy

Short-term fix: oversample minorities + stratified splits.
Mid-term: targeted collection + quotas + hard negatives.
Long-term: active learning pipeline + label hierarchy management.

Worked examples

Example 1: Classification long-tail

Counts: Background 50k, Crack 1.5k. If you train naively, accuracy looks high but recall for Crack is poor. Plan:

Stratify splits to preserve the 33:1 ratio.
Oversample Crack up to 5× (1.5k → 7.5k effective), undersample Background to 25k for training balance.
Augment Crack (contrast changes, blur) to mimic low-light inspections.
Collect 2k new Crack images from flagged assets (targeted collection).

Example 2: Object detection with rare class

Counts (instances): Person 120k, Wheelchair 800. Plan:

Set per-class sampling so images with Wheelchair are seen more often.
Hard negative mining: include people pushing strollers to reduce false positives.
Active learning: prioritize frames where model is uncertain around wheel-like shapes.
Validation: monitor per-class AP, especially AP@0.5 for Wheelchair and PR AUC.

Example 3: Segmentation with sparse pixels

Road pixels dominate; lane markings are thin. Plan:

Tile large images so tiles containing lane pixels appear more often.
Augment lane tiles with brightness/blur and small affine transforms.
Consider merging lane subtypes temporarily if any subtype has too few instances.
Evaluate macro IoU (equal weight per class) to avoid majority bias.

Process checklists

Before you train

Compute per-class counts and long-tail ratio
Create stratified train/val/test splits
Define augmentation policy for minority classes
Set oversample/undersample limits (e.g., 5× max oversample)
Write per-epoch sampling quotas

During data collection and annotation

Define annotation quotas per minority class
Route collection to conditions where minorities occur
Use active learning to prioritize high-uncertainty samples
Track progress with per-class counts dashboard

Validation and monitoring

Report per-class precision, recall, F1 (macro average)
Prefer PR AUC when positives are rare
Inspect confusion matrix focusing on minority classes
Check failure cases and mine hard negatives

Exercises

Complete these exercises to practice planning for imbalance. Your answers can be brief but should be specific.

Exercise 1: Plan a resampling scheme (ID: ex1)

You have 10,000 labeled images for binary classification: Car=6,000, Pedestrian=2,500, Bicycle=1,200, Wheelchair=300. You will train for 1 epoch with about 8,000 images. Set per-class effective counts using oversampling/undersampling with max 5× oversampling for any class. Propose a plan that improves balance and specify the oversampling multipliers and final effective per-class counts for the epoch.

Constraint: total effective images ≈ 8,000
Constraint: no class oversampled by more than 5×
Tip: you can undersample majority classes as needed

Exercise 2: Annotation and collection playbook (ID: ex2)

For a detection dataset with rare Wheelchair instances (800 out of 120k total instances), draft a 2-week plan to improve minority representation. Include:

Data sourcing actions (where/when to collect)
Annotation quotas per day
Active learning selection rule
Validation metric(s) to monitor
Stop/adjust rules

Common mistakes and self-check

Mistake: Only tracking accuracy. Self-check: Report macro F1 and per-class metrics.
Mistake: Oversampling minorities without caps. Self-check: Limit oversample multipliers and add augmentations.
Mistake: Non-stratified splits. Self-check: Ensure val/test reflect real deployment ratios.
Mistake: Ignoring hard negatives. Self-check: Add challenging look-alikes to reduce false positives.
Mistake: Over-fragmented labels. Self-check: Merge to a parent class when counts per subclass are too low.

Practical projects

Rebalance a public long-tail image dataset: compute counts, design sampling/augmentation, and report macro F1 before/after.
Build an active learning loop: select high-uncertainty images for annotation and track per-class count improvements.
Create a hard-negative mining set for a chosen minority class and measure precision gains.

Who this is for

Computer Vision Engineers, Data Scientists, and Annotation Leads who curate image/video datasets and want robust performance on rare but critical classes.

Prerequisites

Basic understanding of classification/detection/segmentation tasks
Comfort with dataset splits and evaluation metrics
Familiarity with data augmentation basics

Learning path

Measure imbalance and set validation metrics
Design stratified splits
Plan sampling (over/under) with caps and augmentations
Prioritize collection/annotation with quotas and active learning
Monitor per-class metrics; iterate with hard negatives and label hierarchy adjustments

Next steps

Apply a capped oversampling scheme to your current dataset and re-evaluate macro F1.
Draft a 2-week annotation plan with quotas for your rarest class.
Set up a simple active learning selection: uncertainty or minority-predicted samples.

Mini challenge

Pick one minority class in your dataset. In one page, describe:

Current counts and target counts for the next two weeks
Your sampling plan (oversample/undersample multipliers)
Three realistic augmentations you will apply
Two hard negatives you will mine
Metrics you will track and your stop rule

Quick Test

This test is available to everyone. Only logged-in users will have their progress saved.

Menu

Handling Class Imbalance

Table of Contents