luvv to helpDiscover the Best Free Online Tools
Topic 4 of 8

Data Leakage Prevention

Learn Data Leakage Prevention for free with explanations, exercises, and a quick test (for Prompt Engineer).

Published: January 8, 2026 | Updated: January 8, 2026

Why this matters

As a Prompt Engineer, your prompts and context often touch real user data, evaluations, or internal instructions. A single careless example, log, or retrieval filter can leak private information, answers, or system prompts. Preventing leakage protects users, your company, and the validity of your model evaluations.

  • Real tasks: write few-shot prompts without giving away answers; design redaction for PII; set safe retrieval filters for RAG; configure logging that keeps signals but not secrets.
  • Impact: avoids privacy breaches, preserves fair evaluations, and increases trust in your AI product.

Concept explained simply

Think of your AI system as a conversation room with windows. Leakage happens when something that must stay inside (private data, answers, keys, system rules) is visible through a window. Prevention means covering windows you don’t need, blurring sensitive parts, and controlling who can look in.

Mental model

  • Minimize: only bring the minimum necessary info into the room.
  • Mask: if you must bring sensitive info, cover it (redact/mask/abstract).
  • Fence: separate rooms (per user, per tenant, per session) so content never crosses by accident.
  • Test: regularly walk outside and see what can be seen (red-team prompts and canary checks).

What counts as leakage?

  • Train/eval contamination: evaluation items or answers appear in training data, examples, or hints.
  • PII/secrets exposure: emails, phone numbers, tokens, API keys, or internal docs show up in outputs, logs, or prompts.
  • Cross-tenant/context bleed: content from user A appears in responses for user B due to shared memory/history.
  • System prompt reveal: hidden instructions exposed via prompt injection or direct queries.

Prevention toolkit (step-by-step)

  1. Classify data
    • Never-share: secrets, tokens, private keys, raw credentials, highly sensitive PII.
    • Restricted: customer identifiers, emails, addresses, phone numbers.
    • Public/safe: already public info or abstracted stats.
  2. Minimize inputs
    • Only include fields strictly needed for the task.
    • Replace long histories with short summaries.
  3. Redact or mask
    • Pattern-based redaction: emails, phone numbers, card numbers, SSNs, access tokens.
    • Use placeholders like [EMAIL], [PHONE], [ORDER_ID], and keep a reversible map outside model context if needed.
    Redaction checklist
    • Emails → [EMAIL]
    • Phone numbers → [PHONE]
    • Payment data → [CARD]
    • Access tokens/keys → [SECRET]
    • Exact names/addresses → [NAME]/[ADDRESS]
  4. Fence memory and sessions
    • Scope memory by user and tenant.
    • Use short, ephemeral sessions; trim histories.
    • Do not reuse conversation history across users.
  5. Safe retrieval (RAG)
    • Filter by user/tenant metadata before similarity search results are shown to the model.
    • Avoid global indexes without access filters.
  6. Hide sensitive reasoning
    • Avoid exposing chain-of-thought for sensitive tasks. Prefer brief, non-sensitive rationales or no rationale.
  7. Canary and adversarial tests
    • Seed harmless canary tokens (e.g., CANARY-ALPHA-93) in restricted docs to detect leaks if they appear in outputs.
    • Probe with injection attempts to reveal system prompts—ensure the model refuses.
  8. Logging with care
    • Log events and categories, not raw secrets or full PII.
    • Apply the same redaction to logs as to prompts.
  9. Evaluate regularly
    • Run PI/secret detectors on outputs.
    • Check evaluation sets for overlap with examples/training.

Worked examples

Example 1 — Few-shot label leakage

Bad prompt (leaks exact answers from evaluation set):

System: You grade product reviews as Positive or Negative.
User: Classify:
Q: "This phone is terrible, battery dies" → A: Negative
Q: "I love the camera and speed" → A: Positive
Q: "This phone is terrible, battery dies" →

Fix:

System: You grade product reviews as Positive or Negative.
User: Learn from these patterns (no overlap with evaluation items):
Example: "Horrible battery life and slow" → Negative (focus on faults)
Example: "Fast, great camera, very happy" → Positive (focus on praise)
Now classify without repeating examples:
Q: "This phone is terrible, battery dies" →

We removed identical evaluation text and used abstracted patterns.

Example 2 — RAG with PII

Original context (unsafe):

Ticket #4217 by Alice Brown (alice.brown@example.com, +1 202-555-0135)
Issue: Card charged twice on 2024-05-03.
Last 4 digits: 8421

Redacted context (safe):

Ticket [TICKET_ID]
Customer: [NAME], [EMAIL], [PHONE]
Issue: Card charged twice on [DATE].
Card: [CARD_LAST4]

Prompt template (minimized):

System: You are a support assistant.
User: Summarize the issue and propose next steps without revealing personal details. Use placeholders as is.

Example 3 — Cross-tenant memory

Problem: A shared memory store caches helpful tips from all users and sometimes surfaces other customers’ details.

Fix: Partition memory by tenant and user, add a short TTL, and summarize before storing. Do not store raw PII.

Memory checklist
  • Key by tenant_id + user_id
  • Short TTL or rolling window
  • Summarize; avoid raw PII
  • No cross-tenant retrieval

Example 4 — Canary detection

Embed a canary like CANARY-ALPHA-93 in a restricted doc. If it appears in outputs, investigate filters and prompts immediately.

Exercises

Progress and saving: The quick test is available to everyone; only logged-in users get saved progress.

Exercise 1 — Redact sensitive data

Rewrite the following prompt so it contains no direct PII while keeping meaning:

"Contact Jane Miller at jane.miller@craftco.com, phone +44 20 7946 0958, about order 1189. Her card ending 9923 failed."
  • Use placeholders like [NAME], [EMAIL], [PHONE], [ORDER_ID], [CARD_LAST4].
  • Keep all necessary task info.
Tip
  • Identify PII items first.
  • Replace with placeholders consistently.
  • Do not add new info.

Exercise 2 — Remove label leakage in few-shot

Design 2 few-shot examples for sentiment analysis that do not include or closely paraphrase the evaluation item:

Evaluation item: "Delivery took ages and the box was damaged"
  • Your examples should show the pattern but not overlap semantically too closely.
  • Include a brief parenthetical rationale, not chain-of-thought.
Tip
  • Vary vocabulary substantially.
  • Teach the decision rule instead of the answer.

Common mistakes and self-check

  • Including evaluation answers in examples. Self-check: Do my examples match any evaluation text?
  • Over-sharing context to be “safe.” Self-check: Can the model solve the task if I remove each field? If yes, remove it.
  • Redacting in prompts but not in logs. Self-check: Are logs subjected to the same redaction rules?
  • Using global retrieval without filters. Self-check: Are queries filtered by user/tenant before ranking?
  • Exposing hidden instructions. Self-check: Does the model refuse when asked to reveal system prompts?

Practical projects

  • Build a redaction preprocessor: given text with PII, output masked text plus a reversible mapping stored outside the model context. Acceptance: zero raw PII tokens in model inputs.
  • Safe RAG demo: index two tenants’ docs with tenant filters; show that each tenant can only retrieve its own docs. Acceptance: canary terms from tenant A never appear for tenant B.
  • Leakage test suite: create prompts that attempt to extract system instructions or PII. Acceptance: 0 critical leaks across 50 adversarial prompts.

Mini challenge

Write a one-paragraph guidance block for your team describing how to choose placeholders for PII and where to store the mapping. Keep it actionable and tool-agnostic.

Learning path

  • Start: Redaction and masking patterns.
  • Next: Safe RAG filtering and metadata design.
  • Then: Evaluation for leakage and canary monitoring.
  • Finally: Production logging policies and privacy reviews.

Who this is for

  • Prompt engineers designing system prompts, few-shot examples, and RAG prompts.
  • Data/ML practitioners integrating models with user data.

Prerequisites

  • Basic prompt engineering skills (system/user roles, few-shot patterns).
  • Familiarity with PII types and privacy basics.

Next steps

  • Implement a redaction pass in your prompt pipeline.
  • Add tenant/user filters to any retrieval step.
  • Create a small leak test set and run it before each release.

Practice Exercises

2 exercises to complete

Instructions

Rewrite the following so it contains no direct PII while keeping meaning. Use placeholders [NAME], [EMAIL], [PHONE], [ORDER_ID], [CARD_LAST4].

"Contact Jane Miller at jane.miller@craftco.com, phone +44 20 7946 0958, about order 1189. Her card ending 9923 failed."
  • Identify all PII and replace consistently.
  • Do not lose task-relevant info.
Expected Output
Contact [NAME] at [EMAIL], phone [PHONE], about order [ORDER_ID]. Card ending [CARD_LAST4] failed.

Data Leakage Prevention — Quick Test

Test your knowledge with 6 questions. Pass with 70% or higher.

6 questions70% to pass

Have questions about Data Leakage Prevention?

AI Assistant

Ask questions about this tool