Why this matters
In real computer vision projects, some classes are rare: cracks vs intact surfaces, pedestrians at night vs daytime, defects vs non-defects, or wildlife species with few sightings. If you ignore imbalance, your model may look accurate overall yet fail on the classes you actually care about.
- Safety use case: catching rare defects or hazards.
- Product quality: detecting uncommon packaging errors.
- Fairness: ensuring minority classes are represented (e.g., medical findings).
Concept explained simply
Class imbalance happens when some labels have many more examples than others. Models trained on such data tend to favor the majority class, missing rare classes. Fixing this starts with measuring the imbalance, then adjusting how you collect, select, and annotate data so training sees a balanced, representative sample.
Mental model
Think of your dataset as a diet for your model. If it mostly eats one food, it will learn to crave and recognize only that. You must plan the diet: add enough of each food (class), decide portions (sampling), and sometimes create healthy substitutes (augmentation) so the model learns all classes well.
How to detect imbalance
- Count per class: number of images, instances (for detection), or pixels (for segmentation).
- Look for long-tail: a few classes with many samples, many classes with few samples.
- Stratify by conditions: day/night, weather, camera type, region, patient demographics. You may have hidden imbalance inside subclasses.
- Baseline metrics: report per-class precision, recall, and F1 (macro averages). For rare positives, also track PR AUC, not only accuracy.
Quick checklist: data health for imbalance
- Per-class counts computed for train/val/test separately
- Stratified split preserved across all sets
- At least 50–100 instances per class to start (if feasible)
- Rare but critical classes flagged for priority collection
Data-centric strategies that work
1) Stratified splits
Ensure train/val/test maintain similar class proportions to avoid misleading validation results.
2) Targeted data collection
Collect more samples of minority classes: schedule specific routes/times, query sources by conditions, or simulate scenarios (e.g., night scenes).
3) Annotation quotas
Plan batches so each labeling cycle includes a minimum target per rare class. Stop when per-class targets are met.
4) Oversampling and undersampling
Oversample minority examples or undersample majority ones when building training batches. Use caps (e.g., max 5× oversample) to reduce overfitting risk.
5) Targeted augmentation
Apply stronger, realistic augmentations to minority classes (lighting, blur, cutout). Keep them faithful to deployment conditions.
6) Hard negative mining
Include challenging non-target samples that commonly produce false positives. This improves precision on rare classes.
7) Label hierarchy and merging
When classes are too sparse, temporarily merge into a parent class (e.g., rare bird species → "bird"). Split later when data grows.
8) Active learning
Prioritize samples with high uncertainty or predicted minority classes for annotation. This uses labeler time efficiently.
When to use which strategy
- Short-term fix: oversample minorities + stratified splits.
- Mid-term: targeted collection + quotas + hard negatives.
- Long-term: active learning pipeline + label hierarchy management.
Worked examples
Example 1: Classification long-tail
Counts: Background 50k, Crack 1.5k. If you train naively, accuracy looks high but recall for Crack is poor. Plan:
- Stratify splits to preserve the 33:1 ratio.
- Oversample Crack up to 5× (1.5k → 7.5k effective), undersample Background to 25k for training balance.
- Augment Crack (contrast changes, blur) to mimic low-light inspections.
- Collect 2k new Crack images from flagged assets (targeted collection).
Example 2: Object detection with rare class
Counts (instances): Person 120k, Wheelchair 800. Plan:
- Set per-class sampling so images with Wheelchair are seen more often.
- Hard negative mining: include people pushing strollers to reduce false positives.
- Active learning: prioritize frames where model is uncertain around wheel-like shapes.
- Validation: monitor per-class AP, especially AP@0.5 for Wheelchair and PR AUC.
Example 3: Segmentation with sparse pixels
Road pixels dominate; lane markings are thin. Plan:
- Tile large images so tiles containing lane pixels appear more often.
- Augment lane tiles with brightness/blur and small affine transforms.
- Consider merging lane subtypes temporarily if any subtype has too few instances.
- Evaluate macro IoU (equal weight per class) to avoid majority bias.
Process checklists
Before you train
- Compute per-class counts and long-tail ratio
- Create stratified train/val/test splits
- Define augmentation policy for minority classes
- Set oversample/undersample limits (e.g., 5× max oversample)
- Write per-epoch sampling quotas
During data collection and annotation
- Define annotation quotas per minority class
- Route collection to conditions where minorities occur
- Use active learning to prioritize high-uncertainty samples
- Track progress with per-class counts dashboard
Validation and monitoring
- Report per-class precision, recall, F1 (macro average)
- Prefer PR AUC when positives are rare
- Inspect confusion matrix focusing on minority classes
- Check failure cases and mine hard negatives
Exercises
Complete these exercises to practice planning for imbalance. Your answers can be brief but should be specific.
Exercise 1: Plan a resampling scheme (ID: ex1)
You have 10,000 labeled images for binary classification: Car=6,000, Pedestrian=2,500, Bicycle=1,200, Wheelchair=300. You will train for 1 epoch with about 8,000 images. Set per-class effective counts using oversampling/undersampling with max 5× oversampling for any class. Propose a plan that improves balance and specify the oversampling multipliers and final effective per-class counts for the epoch.
- Constraint: total effective images ≈ 8,000
- Constraint: no class oversampled by more than 5×
- Tip: you can undersample majority classes as needed
Exercise 2: Annotation and collection playbook (ID: ex2)
For a detection dataset with rare Wheelchair instances (800 out of 120k total instances), draft a 2-week plan to improve minority representation. Include:
- Data sourcing actions (where/when to collect)
- Annotation quotas per day
- Active learning selection rule
- Validation metric(s) to monitor
- Stop/adjust rules
Common mistakes and self-check
- Mistake: Only tracking accuracy. Self-check: Report macro F1 and per-class metrics.
- Mistake: Oversampling minorities without caps. Self-check: Limit oversample multipliers and add augmentations.
- Mistake: Non-stratified splits. Self-check: Ensure val/test reflect real deployment ratios.
- Mistake: Ignoring hard negatives. Self-check: Add challenging look-alikes to reduce false positives.
- Mistake: Over-fragmented labels. Self-check: Merge to a parent class when counts per subclass are too low.
Practical projects
- Rebalance a public long-tail image dataset: compute counts, design sampling/augmentation, and report macro F1 before/after.
- Build an active learning loop: select high-uncertainty images for annotation and track per-class count improvements.
- Create a hard-negative mining set for a chosen minority class and measure precision gains.
Who this is for
Computer Vision Engineers, Data Scientists, and Annotation Leads who curate image/video datasets and want robust performance on rare but critical classes.
Prerequisites
- Basic understanding of classification/detection/segmentation tasks
- Comfort with dataset splits and evaluation metrics
- Familiarity with data augmentation basics
Learning path
- Measure imbalance and set validation metrics
- Design stratified splits
- Plan sampling (over/under) with caps and augmentations
- Prioritize collection/annotation with quotas and active learning
- Monitor per-class metrics; iterate with hard negatives and label hierarchy adjustments
Next steps
- Apply a capped oversampling scheme to your current dataset and re-evaluate macro F1.
- Draft a 2-week annotation plan with quotas for your rarest class.
- Set up a simple active learning selection: uncertainty or minority-predicted samples.
Mini challenge
Pick one minority class in your dataset. In one page, describe:
- Current counts and target counts for the next two weeks
- Your sampling plan (oversample/undersample multipliers)
- Three realistic augmentations you will apply
- Two hard negatives you will mine
- Metrics you will track and your stop rule
Quick Test
This test is available to everyone. Only logged-in users will have their progress saved.