Why this matters
Transfer learning lets you use powerful pre-trained vision models (ResNet, EfficientNet, ViT, Swin) and adapt them to your dataset quickly. In real Computer Vision Engineer work, this means:
- Building accurate classifiers or detectors with limited labeled data.
- Reducing training cost and time while improving baseline metrics.
- De-risking projects by starting from proven features instead of training from scratch.
Who this is for
- Engineers and students who can train basic CNNs and want better results fast.
- Practitioners moving from classic ML to deep vision projects.
- Teams with small datasets or limited compute.
Prerequisites
- Basics of CNNs/ViTs (convolutions or attention, pooling/patching, activations).
- Understanding of training loops, loss functions, and metrics.
- Ability to load datasets and perform augmentations.
Concept explained simply
Pre-trained models learned general visual features from huge datasets (like ImageNet). Transfer learning means keeping these useful features and training a small part of the model (often the final layers) on your task.
Mental model
- Bottom layers: edges, textures. Very general; change slowly.
- Middle layers: parts, shapes. Somewhat general.
- Top layers: task-specific patterns. Adapt these first.
Common strategies
- Feature extractor (linear probing): freeze backbone; train only a new head.
- Partial fine-tuning: unfreeze top blocks; use lower learning rate for frozen/older layers.
- Full fine-tuning: unfreeze all layers; use careful scheduling and regularization.
- Parameter-efficient fine-tuning (PEFT): add small adapter layers or LoRA to keep most weights frozen.
How to choose a strategy
Small data or large shift from ImageNet? Prefer freezing more layers at first.
Worked examples
Example 1: Small dataset classification (freeze backbone)
Dataset: 1,000 images, 5 classes, similar to natural images. Use a pre-trained ResNet, freeze all conv layers, train a new classifier head.
Minimal PyTorch sketch
# Pseudocode sketch
import torch, torch.nn as nn
from torchvision import models
num_classes = 5
model = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V2)
for p in model.parameters():
p.requires_grad = False
in_features = model.fc.in_features
model.fc = nn.Linear(in_features, num_classes) # new head
# Train only the head with higher LR
optimizer = torch.optim.AdamW(model.fc.parameters(), lr=1e-3, weight_decay=1e-4)
- Pros: fast, low overfitting risk.
- Cons: may plateau below best possible accuracy.
Example 2: Moderate data + some domain shift (unfreeze top blocks)
Dataset: 15,000 images, textures differ from ImageNet (e.g., medical-like but still RGB). Unfreeze last ResNet block, use lower LR for backbone and higher for head.
Discriminative LR idea
# Unfreeze last block
for name, p in model.named_parameters():
if name.startswith("layer4") or name.startswith("fc"):
p.requires_grad = True
else:
p.requires_grad = False
# Two parameter groups
backbone_params = [p for n,p in model.named_parameters() if p.requires_grad and n.startswith("layer4")]
head_params = model.fc.parameters()
optimizer = torch.optim.SGD([
{"params": backbone_params, "lr": 1e-4},
{"params": head_params, "lr": 5e-3}
], momentum=0.9, weight_decay=1e-4)
- Pros: better adaptation to new textures.
- Watch out: reset optimizer if you change which layers are trainable during training.
Example 3: Object detection transfer (Faster R-CNN)
Start from a model pre-trained on COCO. Replace the classifier head for your classes, keep backbone mostly frozen, then optionally unfreeze top layers.
High-level steps
- Load pre-trained detector (e.g., ResNet50-FPN backbone).
- Replace box predictor to match your class count.
- Freeze backbone at first; train head.
- Unfreeze top backbone layers if mAP plateaus.
- Use class-aware augmentations (random crop, scale jittering).
Practical setup tips
- Normalize inputs using the pre-trained model's mean/std and expected resolution.
- Use early stopping on validation loss or F1/mAP and keep the best checkpoint.
- For small data: strong augmentations (color jitter, flips, mixup/cutmix) and regularization (weight decay, label smoothing).
- BatchNorm: if backbone is frozen, keep BN layers in eval mode to avoid changing running stats.
- Schedulers: cosine decay with warmup is a solid default.
Exercises
Do these before the quick test. Aim for concise, actionable answers.
Exercise 1: Design a fine-tuning plan (paper exercise)
Scenario: 2,000 product photos, 8 classes, some domain shift (studio lighting), limited GPU (single 8GB). Propose a plan containing:
- Model choice + why
- Layers to freeze/unfreeze
- Learning rates per group
- Augmentations
- Epochs, batch size, early stopping metric
Write your plan now.
Exercise 2: Linear probe then gradual unfreeze (code sketch)
Fill the steps:
- Load a pre-trained backbone.
- Freeze all layers; attach a new classification head.
- Train head for N epochs.
- Unfreeze top block; set discriminative LRs; reset optimizer.
- Train with cosine decay + warmup; early stop on macro F1.
Checklist
- You normalized inputs using the model's mean/std.
- You used the correct input size for the chosen model.
- You reset the optimizer after unfreezing layers.
- You tracked a class-appropriate metric (e.g., macro F1 for imbalance).
Compare your answers with the solutions in the exercise panel below.
Common mistakes and self-check
- Training all layers on tiny data leads to overfitting. Self-check: does validation loss diverge while training loss drops?
- Too high LR for backbone causes catastrophic forgetting. Self-check: sudden metric collapse after unfreezing?
- Forgetting to keep BN in eval when backbone is frozen. Self-check: validation metrics fluctuate wildly batch-to-batch.
- Mismatched input size/normalization. Self-check: unusually low accuracy from the first epoch; verify transforms.
- Not resetting optimizer after changing trainable layers. Self-check: optimizer still holds momentum for frozen params.
- Ignoring class imbalance. Self-check: accuracy looks fine but minority class F1 is poor; use weighted loss or focal loss.
Practical projects
Project 1: Flowers classification (small data)
- Start with linear probe on ResNet/EfficientNet.
- Add strong augmentations; report accuracy and macro F1.
- Gradually unfreeze top block; compare gains.
Project 2: Industrial defect visual check
- Binary classification with high precision target.
- Use class-weighted loss, heavy augmentations, and early stopping on AUROC.
- Try ViT vs CNN; document trade-offs in compute vs accuracy.
Project 3: Detector adaptation
- Fine-tune a pre-trained Faster R-CNN on a small custom dataset.
- Freeze backbone at first; unfreeze top layers if mAP plateaus.
- Apply scale jitter and random crops; track mAP@0.5.
Learning path
- Revise CNN/ViT basics (1 day): layers, shapes, activations.
- Set up linear probing baseline (0.5–1 day): frozen backbone + new head.
- Partial fine-tuning (1–2 days): unfreeze top block, discriminative LR, scheduler.
- Advanced techniques (1–2 days): label smoothing, mixup/cutmix, focal loss for imbalance.
- Evaluation & reporting (0.5 day): choose metrics, run ablations, write a short model card.
Next steps
- Explore parameter-efficient fine-tuning (adapters/LoRA) to reduce trainable params.
- Try self-supervised pretraining checkpoints (e.g., MoCo/MAE-style) when labels are scarce.
- Experiment with knowledge distillation to compress your fine-tuned model.
- Optimize for deployment: pruning and quantization after you lock metrics.
Mini challenge
In 2 hours, create a linear-probe baseline and then unfreeze one top block. Report: metric, confusion matrix, and 3 bullet insights about errors. Keep compute minimal.
Note: The Quick Test is available to everyone; only logged-in users will have saved progress.