Topic Not Found

Who this is for

Computer Vision Engineers building classification, detection, or segmentation models.
Ml/AI practitioners who can train a model but struggle to improve validation performance reliably.
People who want a practical, step-by-step approach to tuning without guesswork.

Prerequisites

Know how to train a basic vision model (e.g., CNN/ResNet/U-Net) and evaluate validation metrics.
Comfortable with train/val splits, loss functions, and basic Python ML workflows.
Can run multiple training jobs and read logs/learning curves.

Why this matters

Real CV tasks rarely work “out of the box.” Hyperparameters like learning rate, weight decay, batch size, and augmentation strength can swing performance by 5–20%+. For production models, tuning is the difference between flaky and reliable.

Product quality: Reduce false positives in defect detection by tuning confidence thresholds and regularization.
Speed vs accuracy: Choose batch size and learning rate schedules to hit latency targets while maintaining accuracy.
Robustness: Control overfitting with weight decay and augmentation strength for stable generalization.

Concept explained simply

Hyperparameters are the training “dials” you set before learning starts. Tuning is systematically testing dial settings to find what leads to the best validation metric under a limited compute budget.

Mental model

Imagine tuning a radio with many knobs. You cannot perfectly try every combination, so you:

Start with the most sensitive knobs (learning rate, weight decay).
Search broadly at low cost to find a good region.
Then zoom in for fine-grained improvements.

What to tune in vision models

Learning rate (LR): most critical for convergence and stability. Try on a log scale.
Weight decay (L2): controls overfitting; interacts with LR.
Batch size: affects stability, speed, and generalization; adjust LR when batch size changes.
Optimizer + momentum/betas: e.g., SGD with momentum vs Adam/AdamW; momentum/betas adjust smoothing.
Augmentation strength: flips, color jitter, crop/resize, CutMix/MixUp strength.
Dropout/Label smoothing: extra regularization levers.
Learning rate schedule: cosine/step decay/warmup; schedule shape can matter as much as base LR.
Training duration: epochs and early stopping patience.

Starter ranges that work surprisingly often

Learning rate (Adam/AdamW): 1e-4, 3e-4, 1e-3, 3e-3
Learning rate (SGD): 1e-3, 3e-3, 1e-2, 3e-2
Weight decay (AdamW/SGD+WD): 1e-6, 1e-5, 1e-4, 5e-4, 1e-3
Batch size: 16, 32, 64 (scale LR with batch size)
Momentum (SGD): 0.8–0.95; Betas (Adam/AdamW): (0.9, 0.999) or (0.9, 0.98)
Augmentation strength: light → medium first; only go heavy if overfitting
Label smoothing: 0.0–0.1

A simple tuning workflow (budget-aware)

Define target metric and budget. Example: maximize mIoU within 20 runs or 6 GPU-hours.
Get a baseline. Train once with sensible defaults. Save validation metric and time.
Coarse search on sensitive knobs. Random-search LR and weight decay (log-uniform). Use short training (e.g., half epochs) and early stopping to prune bad configs.
Fix best region; refine. Narrow LR/WD around top results; add schedule choice and augmentation strength.
Stabilize. Confirm with 2–3 seeds if variance looks high. Then train full duration.
Lock + document. Save config, seed, curves, and the decision rationale.

Early stopping tip

If a config underperforms the current best by a clear margin for several validation checks (e.g., 5), stop it and reuse the time for new trials.

Random vs Grid vs Bayesian

Random search: great first pass in high dimensions; simple and effective.
Grid search: fine for 1–2 parameters or tiny spaces.
Bayesian/successive halving: efficient when each run is expensive and you can rank partial results.

Worked examples

1) Image classification (ResNet, small dataset)

Goal: improve top-1 accuracy without overfitting.

Baseline: AdamW, LR=1e-3, WD=1e-4, batch=32, cosine schedule, light aug. Val acc: 82%.
Coarse random search: LR in [1e-4, 3e-4, 1e-3, 3e-3], WD in [1e-6, 1e-5, 1e-4, 5e-4, 1e-3], 12 trials, half epochs.
Best half-epoch config: LR=3e-3, WD=5e-4 → promising.
Refine around LR ∈ {2e-3, 3e-3}, WD ∈ {3e-4, 5e-4, 7e-4} with full epochs.
Result: LR=2e-3, WD=5e-4, acc=86.4% (+4.4%).

Why it worked

Slightly higher LR moved faster to a better basin; moderate WD controlled overfitting on the small dataset.

2) Object detection (medium dataset)

Goal: raise mAP without hurting training stability.

Baseline: SGD+momentum 0.9, LR=0.01, WD=1e-4, batch=16, step decay. mAP@0.5: 59.
Coarse search: LR ∈ {0.003, 0.01, 0.03}, momentum ∈ {0.85, 0.9, 0.95}, WD ∈ {1e-5, 1e-4, 5e-4}.
Top candidates (short runs) prefer higher momentum with slightly lower LR.
Refine + schedule test: try cosine vs step; keep LR≈0.007–0.012, momentum 0.92–0.95.
Result: LR=0.008, momentum=0.94, WD=1e-4, cosine schedule → mAP=62.2 (+3.2).

3) Semantic segmentation (U-Net)

Goal: improve mIoU and boundary quality.

Baseline: AdamW, LR=1e-3, WD=1e-4, batch=8, light aug. mIoU=71.5.
Symptoms: overfitting after epoch 20 (train loss ↓, val mIoU stagnates).
Actions: increase WD, add mild color+cutout aug, try label smoothing 0.05.
Refine LR near 7e-4–1.5e-3; try batch=8 vs 12 (scale LR accordingly).
Result: LR=8e-4, WD=3e-4, batch=12, label smoothing=0.05 → mIoU=75.8 (+4.3).

Common mistakes and self-check

Searching linearly instead of on a log scale for LR/WD. Self-check: Did you try 1e-4, 3e-4, 1e-3, 3e-3?
Changing many knobs at once. Self-check: Limit early searches to 1–2 key parameters.
No fixed seed for comparisons. Self-check: Use a seed; re-run top configs with 2–3 seeds only at the end.
Ignoring time budget. Self-check: Track runtime per trial; stop low performers early.
Heavy augmentation on tiny datasets too soon. Self-check: Start light; add strength if overfitting appears.
Not adjusting LR when batch size changes. Self-check: Scale LR roughly linearly with batch size.

Exercises

These mirror the exercises below. Do them in order and record your results.

Exercise 1: Tune learning rate and batch size for a small image classifier. Target: +2% validation accuracy within 12 trials.
Exercise 2: Control overfitting by tuning weight decay and augmentation strength for a segmentation or classification task.

Completion checklist

Defined target metric and trial budget.
Recorded baseline metric, curves, and runtime.
Ran a coarse random search and pruned bad configs early.
Refined around top results and validated with at least 2 seeds if variance was high.
Saved final config and rationale.

Mini challenge

You inherited a detection model with unstable training: loss oscillates and mAP is flat. Propose a 10-trial plan.

Suggested approach (peek)

Trials 1–4: random LR around baseline on log scale; keep WD fixed.
Trials 5–6: momentum sweep (0.9–0.95) with the best LR so far.
Trials 7–8: try cosine schedule + warmup steps.
Trials 9–10: small WD tweaks if overfitting; else confirm best with another seed.

Practical projects

Build a robust small-dataset classifier: start with a tiny dataset, implement baseline, then apply the workflow to hit a defined accuracy target.
Segment simple shapes or defects: synthesize masks, train U-Net baseline, then tune WD, LR schedule, and augmentation for better boundary mIoU.
Accuracy–throughput trade-off: choose a task, fix an inference time budget, and tune batch size + LR schedule to maximize validation metric without violating throughput.

Learning path

Now: master basic tuning (LR, WD, batch size, augmentation).
Next: learning rate schedules and warmup, advanced regularization (label smoothing, mixup/cutmix).
Then: efficient search methods (successive halving, Bayesian ideas) and multi-objective tuning (accuracy vs latency).

Next steps

Pick one of the practical projects and run a 10–20 trial campaign.
Document each decision. Keep a leaderboard of configs.
Take the quick test below to check understanding. Note: The test is available to everyone; only logged-in users get saved progress.

Quick Test

Short, practical questions to confirm the core ideas before moving on.

Menu

Hyperparameter Tuning Basics

Table of Contents