Who this is for
Applied Scientists and ML Engineers who build search, recommendation, ranking, classification, or anomaly detection systems and need robust embeddings/features that transfer across tasks.
Prerequisites
- Vectors and similarity (dot product, cosine similarity, Euclidean distance)
- Basic ML training concepts (loss, regularization, overfitting)
- Familiarity with neural encoders (CNNs/Transformers) at a high level
Why this matters
Representation learning is how modern systems turn raw data (text, images, audio, graphs) into compact vectors that power:
- Semantic search and retrieval (find similar items/users/queries)
- Recommendations and ranking (learn interests, intent, and item similarity)
- Cold-start and transfer (pretrain once, adapt to many tasks)
- Anomaly/dedup detection (spot outliers, near-duplicates)
- Clustering and analytics (group content or users by behavior/meaning)
Real tasks you might handle
- Pretrain a sentence encoder that improves downstream classification with minimal labels
- Design image augmentations that make embeddings invariant to lighting but sensitive to defects
- Evaluate embeddings via kNN, linear probes, and Recall@K on a retrieval benchmark
- Diagnose representation collapse and fix it without changing the dataset
Concept explained simply
A representation is a way to describe data so that simple operations (like dot products) reveal what you care about. Good representations make related items close and unrelated items far, according to your task.
Mental model
Imagine compressing each item (a sentence, image, user session) into a point on a map. The map is useful if distances match your goal: similar meaning = nearby; different meaning = far apart. The training objective shapes this map.
Core building blocks
- Encoders: turn raw inputs into vectors (embeddings).
- Objectives: Contrastive (pull positives together, push negatives apart), Reconstruction (autoencoders, masked modeling), Supervised (classification head shapes penultimate features).
- Invariances vs. equivariances: choose what should not change (e.g., color jitter for semantic image embeddings) and what should change predictably (e.g., rotation for pose).
- Similarity functions: cosine similarity (scale-invariant), dot product, Euclidean distance; temperature scaling affects sharpness.
- Regularization: weight decay, dropout, batch/layer norm; representation-specific (variance/covariance penalties, whitening).
Worked examples
Example 1: Text semantic search with cosine similarity
- Goal: Retrieve product titles that mean the same thing as a query (e.g., "wireless earbuds").
- Representation choice: Pretrained sentence encoder; L2-normalize embeddings to unit length.
- Similarity: Cosine similarity (dot product of normalized vectors).
- Why it works: Cosine ignores absolute scale; focuses on direction, which captures semantic content.
- Evaluation: Compute Recall@10 on a set of query–relevant-item pairs. Add a simple linear probe on top of embeddings for a binary "relevant/not" check.
Mini calculation
If q and d are unit vectors and q·d = 0.92, they are very similar; if q·d = 0.10, they are weakly related. Ranking by q·d gives a fast, effective retrieval baseline.
Example 2: Image embeddings for defect detection
- Goal: Make embeddings invariant to lighting and small rotations, but sensitive to surface scratches.
- Augmentations (positives): color jitter, small rotations, random crops that keep the object; no heavy blur (scratches vanish).
- Negatives: different parts or different items.
- Objective: Contrastive (InfoNCE/NT-Xent) with temperature τ around 0.05–0.2.
- Evaluation: kNN classification for defect/non-defect; also measure AUROC of distance-to-nearest-neighbor for anomaly detection.
Common pitfall
Too-strong blur may enforce invariance to the very signal (fine scratches) you need. Align augmentations with the business goal.
Example 3: Linear probe to assess representation quality
- Setup: You have sentence embeddings and a small labeled dataset for sentiment (positive/negative).
- Probe: Train only a logistic regression on fixed embeddings (no encoder updates).
- Interpretation: High probe accuracy → sentiment is linearly separable; low accuracy → either embeddings lack sentiment or labels are noisy/insufficient.
- Next step: If probe is decent, fine-tune the encoder lightly; if poor, revisit pretraining objective or data.
Practical notes you will use
- Choosing similarity: Use cosine with normalized embeddings for retrieval. Use Euclidean when absolute scale matters.
- Temperature: Lower τ sharpens contrastive distributions; too low can destabilize training.
- Batch effects: Contrastive methods benefit from larger effective batch sizes (more negatives). Memory banks/queues can help.
- Collapse detection: Watch per-dimension variance and pairwise cosine similarities. All-same vectors = collapse.
- Evaluation suite: kNN accuracy, linear probe, clustering metrics (silhouette, NMI), and retrieval (Recall@K, mAP).
Equivariance vs invariance, simply
Invariance: representation stays the same when input changes in an irrelevant way (e.g., brightness). Equivariance: representation changes predictably (e.g., rotates when the image rotates). Choose based on the task.
Exercises
Do these now. The same items appear below as interactive tasks in the Exercises section of this page.
- Exercise 1 — Design invariances: For an audio keyword-spotting pretrain task, propose augmentations that preserve the keyword but vary speaker and environment. Pick an objective and sampling strategy.
- Exercise 2 — Evaluate embeddings: Draft a plan to evaluate a new image encoder for product similarity: which metrics, splits, and probes will you use?
- Exercise 3 — Fix collapse: Given signs of nearly identical embeddings regardless of input, list diagnostics and concrete fixes.
- Checklist before checking solutions:
- Did you state what should be invariant vs sensitive?
- Did you pick a similarity metric and explain why?
- Did you include at least two evaluation metrics (e.g., Recall@K and a linear probe)?
- Did you propose at least two collapse mitigations?
Common mistakes and self-check
- Unaligned augmentations: Using transforms that remove the signal you care about. Self-check: Can a human still recognize the label after augmentation?
- Wrong similarity metric: Using Euclidean on unnormalized vectors leads to scale artifacts. Self-check: L2-normalize and compare rankings.
- Overtrusting visualizations: t-SNE/UMAP can be misleading. Self-check: Prefer quantitative metrics (kNN, probes, Recall@K).
- Ignoring temperature: Too low τ can overfit hard negatives. Self-check: Sweep τ and monitor validation retrieval.
- Not testing transfer: Only measuring pretrain loss. Self-check: Always run a small downstream probe.
Practical projects
- Build a small semantic search demo: index 5–10k texts with normalized embeddings; implement cosine similarity retrieval and report Recall@K on a held-out set.
- Image similarity for duplicates: train a contrastive encoder on product photos; evaluate duplicate detection with precision@K.
- Representation report: compare three encoders via kNN, linear probe, and clustering metrics; summarize trade-offs and pick one for production.
Learning path
- Foundations: distances/similarities, normalization, basic regularization.
- Objectives: contrastive (InfoNCE/NT-Xent, triplet), reconstruction (autoencoders, masked modeling).
- Properties: invariance/equivariance, disentanglement, sparsity, smoothness.
- Evaluation: linear probes, kNN, clustering and retrieval metrics, robustness checks.
- Transfer: freezing vs fine-tuning, adapters/LoRA, domain adaptation.
Before you test
Quick Test is available to everyone; only logged-in users get saved progress.
Mini challenge
Take any pretrained encoder you know. In one page, specify: (1) your target invariances and potential harmful invariances, (2) your similarity and temperature choices, (3) your evaluation suite. Keep it concrete and tied to a real task you care about.