luvv to helpDiscover the Best Free Online Tools

Feature Extraction And Embeddings

Learn Feature Extraction And Embeddings for Computer Vision Engineer for free: roadmap, examples, subskills, and a skill exam.

Published: January 5, 2026 | Updated: January 5, 2026

Why this skill matters for Computer Vision Engineers

Feature extraction and embeddings turn raw pixels into compact vectors that capture visual meaning. This skill powers visual search, recommendation, near-duplicate detection, clustering large image libraries, and cross-modal retrieval (e.g., text-to-image). In production, strong embeddings cut costs (fewer heavy models per query), scale to millions of images with vector indexes, and enable rapid iteration via re-ranking and evaluation.

Who this is for

  • Computer Vision Engineers moving from classification/detection toward retrieval, search, and large-scale image understanding.
  • ML engineers building content discovery, deduplication, or recommendation features.
  • Researchers and practitioners interested in metric learning and representation learning.

Prerequisites

  • Comfortable with Python and NumPy.
  • Basic PyTorch or TensorFlow for inference/training.
  • Understanding of convolutional or transformer-based vision backbones.
  • Familiarity with cosine similarity, L2 normalization, and train/validation splits.
Quick refresher: embeddings vs. features

Features are intermediate representations from a model. Embeddings are usually the final, fixed-length vectors used for similarity, indexing, and clustering. Good embeddings place similar items close together and dissimilar items far apart under a chosen distance metric (commonly cosine distance or Euclidean).

Learning path

  1. Week 1: Build image embeddings using a pre-trained backbone. Normalize vectors. Run basic similarity search with a small dataset.
  2. Week 2: Scale to a vector index (Flat, IVF, HNSW). Measure recall@K and latency. Detect near-duplicates via thresholds.
  3. Week 3: Introduce hard negative mining and fine-tuning. Add a re-ranker to boost precision at top-K.
  4. Week 4: Evaluate with retrieval metrics (mAP, NDCG, P@K). Cluster the collection. Try cross-modal embeddings for text-to-image retrieval.

Practical roadmap and milestones

Milestone 1 — Solid baseline embeddings
  • Use a pre-trained ResNet or ViT to extract 512–2048D vectors.
  • L2-normalize vectors; prefer cosine similarity for retrieval.
  • Verify basic nearest neighbor results on a small labeled set.
Milestone 2 — Fast search at scale
  • Index 100k–1M vectors with IVF/HNSW. Tune nlist/nprobe or efConstruction/efSearch.
  • Target recall@10 ≥ 0.95 of exact search with 3–10× speedup.
Milestone 3 — Quality boost with mining and re-ranking
  • Mine hard negatives within mini-batches or from the index.
  • Re-rank top 50–200 candidates using a heavier model or local feature matching.
Milestone 4 — Evaluation and maintenance
  • Track mAP, NDCG, Precision@K, and pairwise duplicate precision/recall.
  • Set up drift checks: embedding norm distribution, nearest-neighbor distance histogram.

Worked examples

Example 1 — Build image embeddings with PyTorch
import torch
import torchvision.models as models
import torchvision.transforms as T
from PIL import Image
import numpy as np

# Load a pre-trained backbone and remove the classifier head
backbone = models.resnet50(weights=models.ResNet50_Weights.DEFAULT)
backbone.fc = torch.nn.Identity()
backbone.eval()

transform = T.Compose([
    T.Resize(256),
    T.CenterCrop(224),
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

def embed_image(path):
    img = Image.open(path).convert('RGB')
    x = transform(img).unsqueeze(0)
    with torch.no_grad():
        v = backbone(x).squeeze(0).numpy()
    # L2-normalize for cosine similarity
    v = v / (np.linalg.norm(v) + 1e-12)
    return v  # 2048-d normalized vector

v1 = embed_image('cat1.jpg')
v2 = embed_image('cat2.jpg')
cos_sim = float(np.dot(v1, v2))
print('Cosine similarity:', cos_sim)

Tip: Always normalize embeddings if you use cosine or inner product search.

Example 2 — Similarity search with a vector index (FAISS, FlatIP)
import faiss
import numpy as np

# Suppose we have N normalized vectors of dim D
D = 2048
index = faiss.IndexFlatIP(D)  # inner product on normalized vectors ~ cosine

# Build an index
db = np.load('db_vectors.npy').astype('float32')  # shape (N, D), already normalized
index.add(db)

# Query
q = np.load('query_vectors.npy').astype('float32')  # shape (Q, D), normalized
k = 5
sims, ids = index.search(q, k)
print('Top-5 ids per query:', ids)
print('Similarities:', sims)

Switch to IVF or HNSW for speed on large datasets, and tune the search parameters to recover high recall of the exact Flat index.

Example 3 — Near-duplicate detection via thresholds
import numpy as np

# db: (N, D) normalized vectors
# Choose a conservative duplicate threshold
DUP_THR = 0.97  # adjust per dataset

# For a small batch, check pairwise duplicates
def find_near_duplicates(batch):
    # batch: (B, D) normalized
    sims = batch @ batch.T
    pairs = []
    B = batch.shape[0]
    for i in range(B):
        for j in range(i+1, B):
            if sims[i, j] >= DUP_THR:
                pairs.append((i, j, float(sims[i, j])))
    return pairs

Start with a high threshold to avoid false positives, then tune based on labeled duplicate pairs.

Example 4 — Hard negative mining (in-batch)
import torch

# embeddings: (B, D) normalized image embeddings
# pos_index[i] gives the index of the positive for anchor i in the batch

def hardest_negatives(embeddings, pos_index):
    with torch.no_grad():
        sims = embeddings @ embeddings.T  # cosine similarity
        B = embeddings.size(0)
        hard_neg_idx = []
        for i in range(B):
            sims_i = sims[i]
            sims_i[i] = -1.0  # ignore self
            sims_i[pos_index[i]] = -1.0  # ignore positive
            hard_neg = torch.argmax(sims_i).item()  # most similar wrong item
            hard_neg_idx.append(hard_neg)
    return hard_neg_idx

Use mined negatives to train with contrastive/triplet losses and improve retrieval quality.

Example 5 — Cross-modal retrieval with a CLIP-like model
# Pseudocode to illustrate the flow
# image_encoder, text_encoder produce normalized embeddings in the same space

def text_to_image_search(text_queries, image_vecs):
    tq = text_encoder(text_queries)  # (T, D), normalized
    sims = tq @ image_vecs.T         # (T, N)
    topk = np.argsort(-sims, axis=1)[:, :10]
    return topk

Cross-modal embeddings enable text-to-image and image-to-text search with the same index.

Example 6 — Re-ranking with local feature matching (ORB)
import cv2

# After coarse retrieval by embeddings, re-rank top-50 using keypoint matches

def orb_match_score(imgA, imgB):
    orb = cv2.ORB_create(1000)
    kp1, des1 = orb.detectAndCompute(imgA, None)
    kp2, des2 = orb.detectAndCompute(imgB, None)
    if des1 is None or des2 is None:
        return 0
    bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
    matches = bf.match(des1, des2)
    matches = sorted(matches, key=lambda m: m.distance)
    return -np.mean([m.distance for m in matches[:30]]) if matches else 0

Local matching boosts precision for near-duplicates or products with small differences.

Drills and exercises

  • Extract embeddings for 1k images; plot the norm distribution and confirm normalization.
  • Compare cosine vs Euclidean distances on the same normalized vectors.
  • Build a Flat index and measure latency/recall vs brute-force NumPy search.
  • Switch to IVF or HNSW; tune parameters to reach ≥95% recall@10.
  • Label 100 duplicate/non-duplicate pairs; find a working similarity threshold.
  • Implement in-batch hard negative mining and confirm loss decreases.
  • Add a re-ranker and measure change in Precision@5.
  • Compute mAP and NDCG for your validation split.
  • Cluster embeddings (k-means); inspect 10 random clusters qualitatively.
  • Run cross-modal retrieval: 20 text queries → top-10 images; review failures.

Common mistakes and debugging tips

  • Not normalizing embeddings when using cosine/inner product. Fix: L2-normalize at both index build and query time.
  • Comparing metrics across different candidate set sizes. Fix: Keep K and candidate pools consistent.
  • Too-aggressive ANN settings causing recall collapse. Fix: Increase nprobe (IVF) or efSearch (HNSW) and re-measure.
  • Mismatched preprocessing between index and query pipelines. Fix: Share the exact transform code and versions.
  • Thresholds copied from another dataset. Fix: Calibrate on your labels; start conservative; iterate.
  • Ignoring class imbalance in evaluation. Fix: Use mAP/NDCG and report metrics per category when relevant.
  • Overfitting during fine-tuning with only easy negatives. Fix: Mine hard negatives or use semi-hard strategies.

Mini project: Visual search and dedup for a product catalog

  1. Collect 5k–20k product images; label 200 duplicate pairs and 200 non-duplicate similar pairs.
  2. Extract normalized embeddings with a pre-trained model; save as float32 vectors.
  3. Build an IVF or HNSW index; target 95% recall@10 vs Flat.
  4. Implement near-duplicate detection with a tuned cosine threshold.
  5. Add re-ranking for top-50 candidates using local ORB matches or a small binary classifier.
  6. Evaluate: Precision@5, Recall@50, mAP, duplicate precision/recall. Track latency.
  7. Optionally fine-tune with contrastive loss using mined hard negatives; re-evaluate.
Evaluation: retrieval metrics to track
  • Precision@K and Recall@K per query.
  • mAP and NDCG on labeled query-gallery splits.
  • Duplicate detection precision/recall at a fixed threshold.
  • Latency: P50/P95 for build and query; index size on disk.

Practical projects

  • Personal photo search: retrieve similar photos and cluster events.
  • Brand logo monitoring: detect duplicates and near-duplicates from social images.
  • Fashion recommendation: image-to-image retrieval with re-ranking by keypoint matches.

Subskills

  • Building Image Embeddings — Extract compact, normalized vectors from images using modern backbones.
  • Similarity Search For Images — Use vector indexes for fast nearest neighbors at scale.
  • Vector Index Concepts — Understand Flat, IVF, HNSW trade-offs and tuning.
  • Hard Negative Mining Basics — Improve training by selecting challenging negatives.
  • Reranking And Retrieval Evaluation — Boost precision with re-rankers and measure with mAP/NDCG.
  • Near Duplicate Detection — Calibrate thresholds for robust duplicate finding.
  • Clustering Image Collections — Group images to explore and deduplicate libraries.
  • Cross Modal Embeddings Basics — Align text and image spaces for cross-modal search.

Next steps

  • Pick one project above and complete it end-to-end with metrics and a short report.
  • Try two different backbones and compare retrieval quality vs latency.
  • Automate evaluation and drift checks; schedule periodic re-indexing.

Skill exam

Take the exam below to check mastery. Everyone can take it for free; if you are logged in, your progress and results will be saved.

Feature Extraction And Embeddings — Skill Exam

This exam checks your understanding of embeddings, similarity search, indexes, mining, re-ranking, evaluation, and cross-modal basics. Everyone can take it for free. If you are logged in, your progress and results will be saved.Rules: closed-notes style; no time limit here. Aim for at least 70% to pass. You can retry as many times as you like.

13 questions70% to pass

Have questions about Feature Extraction And Embeddings?

AI Assistant

Ask questions about this tool