luvv to helpDiscover the Best Free Online Tools
Topic 3 of 8

Transfer Learning For Vision

Learn Transfer Learning For Vision for free with explanations, exercises, and a quick test (for Computer Vision Engineer).

Published: January 5, 2026 | Updated: January 5, 2026

Why this matters

Transfer learning lets you use powerful pre-trained vision models (ResNet, EfficientNet, ViT, Swin) and adapt them to your dataset quickly. In real Computer Vision Engineer work, this means:

  • Building accurate classifiers or detectors with limited labeled data.
  • Reducing training cost and time while improving baseline metrics.
  • De-risking projects by starting from proven features instead of training from scratch.

Who this is for

  • Engineers and students who can train basic CNNs and want better results fast.
  • Practitioners moving from classic ML to deep vision projects.
  • Teams with small datasets or limited compute.

Prerequisites

  • Basics of CNNs/ViTs (convolutions or attention, pooling/patching, activations).
  • Understanding of training loops, loss functions, and metrics.
  • Ability to load datasets and perform augmentations.

Concept explained simply

Pre-trained models learned general visual features from huge datasets (like ImageNet). Transfer learning means keeping these useful features and training a small part of the model (often the final layers) on your task.

Mental model

  • Bottom layers: edges, textures. Very general; change slowly.
  • Middle layers: parts, shapes. Somewhat general.
  • Top layers: task-specific patterns. Adapt these first.
Common strategies
  • Feature extractor (linear probing): freeze backbone; train only a new head.
  • Partial fine-tuning: unfreeze top blocks; use lower learning rate for frozen/older layers.
  • Full fine-tuning: unfreeze all layers; use careful scheduling and regularization.
  • Parameter-efficient fine-tuning (PEFT): add small adapter layers or LoRA to keep most weights frozen.

How to choose a strategy

Step 1: Estimate data size and domain shift.
Small data or large shift from ImageNet? Prefer freezing more layers at first.
Step 2: Start with linear probe to set a baseline.
Step 3: Gradually unfreeze top blocks if validation metrics plateau.
Step 4: Use discriminative learning rates (lower for early layers, higher for the head).
Step 5: Monitor metrics (F1, AUROC, mAP) and early stop on validation.

Worked examples

Example 1: Small dataset classification (freeze backbone)

Dataset: 1,000 images, 5 classes, similar to natural images. Use a pre-trained ResNet, freeze all conv layers, train a new classifier head.

Minimal PyTorch sketch
# Pseudocode sketch
import torch, torch.nn as nn
from torchvision import models

num_classes = 5
model = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V2)
for p in model.parameters():
    p.requires_grad = False
in_features = model.fc.in_features
model.fc = nn.Linear(in_features, num_classes)  # new head

# Train only the head with higher LR
optimizer = torch.optim.AdamW(model.fc.parameters(), lr=1e-3, weight_decay=1e-4)
  • Pros: fast, low overfitting risk.
  • Cons: may plateau below best possible accuracy.

Example 2: Moderate data + some domain shift (unfreeze top blocks)

Dataset: 15,000 images, textures differ from ImageNet (e.g., medical-like but still RGB). Unfreeze last ResNet block, use lower LR for backbone and higher for head.

Discriminative LR idea
# Unfreeze last block
for name, p in model.named_parameters():
    if name.startswith("layer4") or name.startswith("fc"):
        p.requires_grad = True
    else:
        p.requires_grad = False

# Two parameter groups
backbone_params = [p for n,p in model.named_parameters() if p.requires_grad and n.startswith("layer4")]
head_params = model.fc.parameters()

optimizer = torch.optim.SGD([
    {"params": backbone_params, "lr": 1e-4},
    {"params": head_params, "lr": 5e-3}
], momentum=0.9, weight_decay=1e-4)
  • Pros: better adaptation to new textures.
  • Watch out: reset optimizer if you change which layers are trainable during training.

Example 3: Object detection transfer (Faster R-CNN)

Start from a model pre-trained on COCO. Replace the classifier head for your classes, keep backbone mostly frozen, then optionally unfreeze top layers.

High-level steps
  • Load pre-trained detector (e.g., ResNet50-FPN backbone).
  • Replace box predictor to match your class count.
  • Freeze backbone at first; train head.
  • Unfreeze top backbone layers if mAP plateaus.
  • Use class-aware augmentations (random crop, scale jittering).

Practical setup tips

  • Normalize inputs using the pre-trained model's mean/std and expected resolution.
  • Use early stopping on validation loss or F1/mAP and keep the best checkpoint.
  • For small data: strong augmentations (color jitter, flips, mixup/cutmix) and regularization (weight decay, label smoothing).
  • BatchNorm: if backbone is frozen, keep BN layers in eval mode to avoid changing running stats.
  • Schedulers: cosine decay with warmup is a solid default.

Exercises

Do these before the quick test. Aim for concise, actionable answers.

Exercise 1: Design a fine-tuning plan (paper exercise)

Scenario: 2,000 product photos, 8 classes, some domain shift (studio lighting), limited GPU (single 8GB). Propose a plan containing:

  • Model choice + why
  • Layers to freeze/unfreeze
  • Learning rates per group
  • Augmentations
  • Epochs, batch size, early stopping metric

Write your plan now.

Exercise 2: Linear probe then gradual unfreeze (code sketch)

Fill the steps:

  1. Load a pre-trained backbone.
  2. Freeze all layers; attach a new classification head.
  3. Train head for N epochs.
  4. Unfreeze top block; set discriminative LRs; reset optimizer.
  5. Train with cosine decay + warmup; early stop on macro F1.

Checklist

  • You normalized inputs using the model's mean/std.
  • You used the correct input size for the chosen model.
  • You reset the optimizer after unfreezing layers.
  • You tracked a class-appropriate metric (e.g., macro F1 for imbalance).

Compare your answers with the solutions in the exercise panel below.

Common mistakes and self-check

  • Training all layers on tiny data leads to overfitting. Self-check: does validation loss diverge while training loss drops?
  • Too high LR for backbone causes catastrophic forgetting. Self-check: sudden metric collapse after unfreezing?
  • Forgetting to keep BN in eval when backbone is frozen. Self-check: validation metrics fluctuate wildly batch-to-batch.
  • Mismatched input size/normalization. Self-check: unusually low accuracy from the first epoch; verify transforms.
  • Not resetting optimizer after changing trainable layers. Self-check: optimizer still holds momentum for frozen params.
  • Ignoring class imbalance. Self-check: accuracy looks fine but minority class F1 is poor; use weighted loss or focal loss.

Practical projects

Project 1: Flowers classification (small data)
  • Start with linear probe on ResNet/EfficientNet.
  • Add strong augmentations; report accuracy and macro F1.
  • Gradually unfreeze top block; compare gains.
Project 2: Industrial defect visual check
  • Binary classification with high precision target.
  • Use class-weighted loss, heavy augmentations, and early stopping on AUROC.
  • Try ViT vs CNN; document trade-offs in compute vs accuracy.
Project 3: Detector adaptation
  • Fine-tune a pre-trained Faster R-CNN on a small custom dataset.
  • Freeze backbone at first; unfreeze top layers if mAP plateaus.
  • Apply scale jitter and random crops; track mAP@0.5.

Learning path

  1. Revise CNN/ViT basics (1 day): layers, shapes, activations.
  2. Set up linear probing baseline (0.5–1 day): frozen backbone + new head.
  3. Partial fine-tuning (1–2 days): unfreeze top block, discriminative LR, scheduler.
  4. Advanced techniques (1–2 days): label smoothing, mixup/cutmix, focal loss for imbalance.
  5. Evaluation & reporting (0.5 day): choose metrics, run ablations, write a short model card.

Next steps

  • Explore parameter-efficient fine-tuning (adapters/LoRA) to reduce trainable params.
  • Try self-supervised pretraining checkpoints (e.g., MoCo/MAE-style) when labels are scarce.
  • Experiment with knowledge distillation to compress your fine-tuned model.
  • Optimize for deployment: pruning and quantization after you lock metrics.

Mini challenge

In 2 hours, create a linear-probe baseline and then unfreeze one top block. Report: metric, confusion matrix, and 3 bullet insights about errors. Keep compute minimal.

Note: The Quick Test is available to everyone; only logged-in users will have saved progress.

Practice Exercises

2 exercises to complete

Instructions

Scenario: 2,000 product photos, 8 classes, studio lighting (moderate domain shift), single 8GB GPU. Propose a plan with:

  • Model choice + brief justification
  • Which layers to freeze/unfreeze
  • Learning rates per group
  • Augmentations
  • Epochs, batch size, early stopping metric
Expected Output
A short, structured plan (5–10 bullets) that is feasible on a single 8GB GPU and addresses domain shift and overfitting risk.

Transfer Learning For Vision — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Transfer Learning For Vision?

AI Assistant

Ask questions about this tool