How to learn Reranking And Context Selection for LLM Applications And RAG in NLP Engineer for free

Why this matters

Reranking and context selection determine whether your RAG system feeds the LLM the most useful evidence. Strong reranking reduces hallucinations, cuts token costs, and improves answer accuracy.

Support QA: pick the few most relevant policy snippets from hundreds of pages.
Code assistant: surface the exact function docstring instead of a whole file.
Analytics copilot: prioritize the freshest dashboard notes over outdated ones.

Who this is for

NLP engineers building RAG chatbots, assistants, and search interfaces.
Data scientists evaluating retrieval pipelines and precision/recall trade-offs.
Backend engineers integrating retrieval, ranking, and LLM orchestration.

Prerequisites

Basic RAG pipeline knowledge: chunking, embeddings, vector and keyword retrieval.
Comfort with cosine similarity, BM25 (or similar), and simple scoring formulas.
Familiarity with latency/cost constraints and token budgeting.

Concept explained simply

Think of reranking as a second opinion. Retrieval gives you a rough top-k. Reranking reorders those candidates using better signals (like a cross-encoder) and picks the final few chunks to send to the LLM.

Mental model: a funnel.

Wide: Retriever returns many candidates (high recall, lower precision).
Narrow: Reranker scores and reorders (higher precision).
Final: Context selector builds a compact, diverse, non-redundant window.

Key terms

Top-k: number of candidates taken from the retriever.
Reranker: model or rule that reorders candidates (e.g., cross-encoder).
MMR: Maximal Marginal Relevance; increases diversity and reduces redundancy.
Hit@k / Recall@k: whether a relevant item appears in top-k.
NDCG/MRR: ranking quality metrics emphasizing order.

What is a cross-encoder reranker?

A cross-encoder takes the query and a single candidate together and outputs a relevance score. It is slower than a bi-encoder but usually more accurate for small k (e.g., k ≤ 50).

Workflow: from retrieval to final context

Ingest & chunk: Split documents into coherent, small chunks (e.g., 200–400 tokens) with overlap.
Retrieve candidates: Use dense, sparse, or hybrid retrieval to get top-k (e.g., 50).
Normalize & filter: Normalize scores per source; drop exact duplicates and near-duplicates.
Score features: Combine signals: BM25, dense similarity, source priority, recency, section type.
Rerank: Apply a cross-encoder or a weighted-rule to reorder the top-k.
Diversify: Use MMR or clustering to avoid redundant chunks.
Select context: Pack the highest-value, diverse chunks into the token budget; keep citations/IDs.
Validate: Dry-run a few queries, check if evidence truly supports correct answers.
Log: Store scores, selections, and outcomes for iteration.

Worked examples

Example 1 — FAQ bot for HR policies

Query: "How many vacation days can I carry over?" Retriever returns 20 chunks. Signals available: BM25, dense sim, source (policy handbook > forum), recency.

Rule: score = 0.35*bm25 + 0.45*dense + 0.15*source_priority + 0.05*recency_boost
After scoring, pick top 8, then apply MMR to select final 5, ensuring at least two distinct sections (policy index + detailed clause).
Result: concise context with exact clause and examples; the LLM returns a precise, cited answer.

Example 2 — Code assistant for Python repo

Query: "How do I initialize the Client with OAuth?" Retriever returns 30 snippets mixing README, code, and tests.

Boost docstrings and README sections; downweight tests.
Use cross-encoder reranker on top-40; then exclude near-duplicate snippets of the same function.
Final pack: docstring + usage snippet + config section. The LLM answers with correct parameters and a short example.

Example 3 — Multi-source search with freshness

Query: "Latest refund policy for digital purchases" across knowledge base and release notes.

Normalize per-source scores; add recency boost for last 30 days.
Weighted sum pre-rank → cross-encoder rerank → MMR to keep one policy and one release note.
Outcome: includes updated clause; avoids including the obsolete version.

Choosing a reranker and budget tuning

Cross-encoder: best quality for small k; adds 10–50 ms per pair on GPUs depending on model size.
Lightweight rule-based: fastest; good when signals are strong and clean.
Hybrid: rule pre-filter to top-20, then cross-encode to top-5.
Latency tips: cache frequent pairs, batch reranker calls, and keep k modest (e.g., ≤ 50).

Metrics and evaluation

Offline: Recall@k, Hit@k, MRR, NDCG@k on labeled query-chunk pairs.
Online: Answer accuracy with human or LLM grading, citation correctness, latency, token cost.
Data: include hard negatives (near but wrong) to stress-test reranker and MMR.
Ablate: compare "retriever-only" vs "+ reranker" vs "+ reranker + MMR".

Common mistakes and self-check

Using very large chunks, causing noise and token waste.
Mixing scores across sources without normalization.
No deduplication → repeated content crowds out diverse evidence.
Ignoring intent and section types (e.g., examples vs definitions).
Over-tuning for a few queries; poor generalization.
Not measuring latency and cost alongside accuracy.

Self-check:

[ ] Are scores normalized per source before combining?
[ ] Do top results come from varied sections, not duplicates?
[ ] Does each selected chunk directly support an answer sentence?
[ ] Is the final pack within token budget with room for the prompt?
[ ] Do metrics improve vs a retriever-only baseline?

Practical projects

Company policy QA: Implement hybrid reranking + MMR; target +10% Hit@5 vs baseline.
Code search assistant: Cross-encode top-30 snippets; penalize test files; measure citation correctness.
Reviews search: Add recency and rating signals; ensure final pack includes both summary and a representative review.

Exercises

Do these in a notebook or spreadsheet. They mirror the exercises section below.

Exercise 1 — Weighted reranking and selection (ex1)

For candidates A–F with features below, compute a score and pick the top-4 for context.

Score = 0.35*bm25 + 0.45*dense + 0.15*source_priority + 0.05*recency_boost
recency_boost = max(0, (30 - recency_days)/30)

Data:

A: bm25=0.82, dense=0.60, source_priority=1, recency_days=14
B: bm25=0.75, dense=0.66, source_priority=0, recency_days=5
C: bm25=0.70, dense=0.88, source_priority=1, recency_days=20
D: bm25=0.90, dense=0.40, source_priority=0, recency_days=2
E: bm25=0.68, dense=0.79, source_priority=1, recency_days=9
F: bm25=0.60, dense=0.62, source_priority=0, recency_days=35

Output: list the top-4 IDs in order.

Exercise 2 — MMR diversification (ex2)

Given query-to-candidate similarities and pairwise candidate similarities, select k=3 with MMR (lambda=0.7):

sim(q, d1..d5) = [0.82, 0.80, 0.78, 0.62, 0.60]
Pairwise sim (symmetric): s12=0.85, s13=0.40, s14=0.20, s15=0.10, s23=0.45, s24=0.15, s25=0.05, s34=0.30, s35=0.25, s45=0.50

At each step, pick argmax of: 0.7*sim(q, di) - 0.3*max_j sim(di, dj in selected).

Output: chosen IDs in order.

Mini challenge

You have a 1,500-token budget. Retriever returns 40 chunks from mixed sources (product docs, blog posts, community answers). Design a selection rule combining: normalized dense score, source priority (docs > blog > community), and MMR. Describe your weights, expected top-5 composition, and how you would validate impact in 1–2 days.

Learning path

Before: Text chunking and hybrid retrieval; embeddings quality and indexing.
Now: Reranking and context selection (this lesson).
Next: Prompting with citations, guardrails, and evaluation pipelines.

Next steps

Implement a simple weighted reranker; log metrics against a retriever-only baseline.
Introduce a cross-encoder for the final top-20; measure impact on Hit@5 and latency.
Iterate MMR lambda and chunk sizes; keep an eye on token spend.

When you are ready, take the Quick Test below. It is available to everyone; sign in to save your progress.

Quick Test

Answer a few questions to check your understanding. Your score is saved if you are signed in.

Menu

Reranking And Context Selection

Table of Contents

Why this matters

Who this is for

Prerequisites

Concept explained simply

Workflow: from retrieval to final context

Worked examples

Choosing a reranker and budget tuning

Metrics and evaluation

Common mistakes and self-check

Practical projects

Exercises

Mini challenge

Learning path

Next steps

Quick Test

Practice Exercises

Weighted reranking and selection

Instructions

Expected Output

MMR diversification

Reranking And Context Selection — Quick Test

Have questions about Reranking And Context Selection?

AI Assistant