luvv to helpDiscover the Best Free Online Tools
Topic 3 of 8

Handling PII In Reports

Learn Handling PII In Reports for free with explanations, exercises, and a quick test (for BI Developer).

Published: December 24, 2025 | Updated: December 24, 2025

Why this matters

As a BI Developer, you often work with customer records, employee data, or patient events. Reports can accidentally expose names, emails, addresses, or IDs. Properly handling PII protects users, reduces legal risk, and builds trust. Typical tasks you will face:

  • Mask emails and phone numbers in operational dashboards.
  • Show detailed data only to authorized roles via row- and column-level security.
  • Publish safe, aggregated insights for wide audiences.
  • Audit access so every sensitive view is traceable.
Note on progress

The quick test on this page is available to everyone. Only logged-in users will have their progress saved.

Concept explained simply

PII (Personally Identifiable Information) is any data that can identify a person on its own (direct identifiers) or when combined with other data (quasi-identifiers). Handling PII in reports means minimizing exposure, controlling who can see it, and transforming it when full detail isn’t necessary.

Mental model

Think of PII like a window with frosted glass:

  • Clear glass (raw PII): Full details, only for a few authorized people.
  • Frosted glass (masked/pseudonymized): Enough to work, but identity is protected.
  • Curtains drawn (aggregates only): No row-level PII visible.

What counts as PII

  • Direct identifiers: full name, email, phone, national ID/passport, precise home address, credit card number.
  • Quasi-identifiers: birth date, ZIP/postal code, gender, IP address, device ID, rare job title.
  • Sensitive attributes often protected alongside PII: health data, financial balances, exact geolocation.
Principles you should follow
  • Purpose limitation: Only show PII that supports the report’s purpose.
  • Data minimization: Default to less data; reveal more only when justified.
  • Defense in depth: Combine transformations (masking, pseudonymization) with access controls and auditing.
  • Retention: Don’t keep PII longer than needed for reporting.

Design toolbox for PII-safe reporting

  • Minimize columns: Drop unneeded PII early in the model.
  • Column-level controls: Hide or mask specific columns based on viewer role.
  • Row-level security (RLS): Restrict which rows a viewer can see (e.g., their territory only).
  • Masking: Partially show data (e.g., jo***@company.com, +1-***-***-1234).
  • Pseudonymization: Replace identifiers with stable tokens (Customer #A12F). Keep the mapping in a restricted location.
  • Aggregation and thresholds: Show only grouped metrics with minimum cohort size (e.g., k ≥ 10) to prevent re-identification.
  • Encryption and tokenization: Store and move sensitive data securely. (Implementation details vary by platform.)
  • Auditing and alerts: Log who views what; review anomalies.

Worked examples

Example 1: Email masking in a support dashboard

Goal: Agents see customer issues but not full emails unless they’re in a privileged role.

Approach:

  • Column-level rule: If viewer has role = "Privileged", show email; else show masked.
  • Masking pattern: first 2 characters + *** + domain, e.g., jo***@example.com.
// Pseudocode expression for a computed column in the model
email_shown = if(hasRole("Privileged"), email, left(email,2) + "***@" + split(email,"@")[1])

Result: Most viewers see masked emails; limited users see full emails.

Example 2: Pseudonymized customer IDs for product analytics

Goal: Product managers analyze behavior without seeing real identities.

Approach:

  • Create a surrogate key (e.g., hash with salt or a token service) for CustomerID.
  • Expose only the token in the analytics model. Store the mapping in a secured area accessible to data stewards only.
// Pseudocode
customer_token = tokenService.issue(CustomerID) // stable, non-reversible token for analytics

Result: Reports remain useful (cohort analysis, funnels) while hiding real identities.

Example 3: Aggregate-only people analytics with thresholds

Goal: HR dashboards show attrition rates by department but never reveal small groups.

Approach:

  • Aggregate at department/month.
  • Apply k-anonymity threshold: if group size < 10, suppress or show "Insufficient data".
  • Optional: Add small noise to reduce re-identification risk for borderline groups.
// Pseudocode
if(group_count < 10) then display = "Insufficient data" else display = metric_value

Result: Leaders see trends without exposing individuals.

Step-by-step: implement a PII-safe report

  1. Classify data: Mark columns as Direct Identifier, Quasi-identifier, or Non-PII.
  2. Minimize: Remove unused PII at the source or staging layer.
  3. Decide visibility: Which roles need raw, masked, or aggregate views?
  4. Add transformations: Masking, pseudonymization, or aggregation thresholds.
  5. Apply RLS and column policies: Enforce who sees which rows and columns.
  6. Validate with test users: Try multiple roles; confirm no leakage in tooltips, exports, or drill-through.
  7. Enable auditing: Log report views and sensitive column access.
  8. Review regularly: Re-check as metrics, roles, or data evolve.
Self-check: before publishing
  • Can any user infer a person’s identity from small groups or outliers?
  • Do exports/downloads preserve masking and RLS rules?
  • Are tooltips, drill-through pages, and usage logs safe?

Exercises

Try these tasks. Then check the solutions below or in the Exercises panel.

Exercise 1 (ex1): Email masking rule

Create a computed field email_shown that displays:

  • Full email for viewers in role "Compliance".
  • Masked email (first 2 chars + *** + domain) for all others.

Edge cases: emails shorter than 3 chars; malformed strings without "@".

Exercise 2 (ex2): RLS + column policy

Define policies so that:

  • Role "SalesManager": sees only rows for their Region and full phone numbers.
  • Role "Executive": sees all rows but only masked phone numbers.
  • All others: see aggregates only (no row-level views).
Checklist before you move on
  • All direct identifiers either masked, tokenized, or restricted.
  • Quasi-identifiers only shown at safe aggregation with thresholds.
  • RLS tested with at least two roles.
  • Exports honor the same rules.
  • Audit logging enabled.

Common mistakes and how to avoid them

  • Masking only in visuals: If someone exports underlying data, raw PII appears. Fix: Transform or secure at the model level, not just the chart.
  • Forgetting tooltips/drill-through: Hidden PII can resurface. Fix: Audit all interactions.
  • Small group leaks: Showing groups of 1–2 reveals identity. Fix: Minimum cohort sizes and suppression.
  • Reversible pseudonyms: Using predictable hashes without a salt. Fix: Use salted hashing or token services; restrict mapping table.
  • Static thinking: Roles evolve. Fix: Regular reviews and access recertification.
Quick self-audit

Pick one published report. Can a general user:

  • See raw emails/phones via export?
  • Drill into small groups revealing a unique person?
  • Access a mapping table or raw source by mistake?

If yes to any, pause and fix before wider release.

Practical projects

  • Project 1: Convert a customer service dashboard to PII-safe by adding masking and RLS; document the policy matrix by role.
  • Project 2: Build a pseudonymized analytics model with a restricted mapping location; demonstrate cohort analysis without identities.
  • Project 3: Create an aggregate-only HR report with k-threshold suppression and validate against re-identification attempts.

Mini challenge

You must demo a product dashboard to 200 viewers from mixed departments. Required: no raw identifiers; product team still needs user-level funnels post-demo.

Propose a plan
  • Public demo: aggregate-only pages with k-thresholds; all PII columns removed.
  • Internal product pages: pseudonymized user tokens with RLS limited to product roles.
  • Exports: disabled for demo; internal exports restricted and logged.

Tip: Keep a short "PII Policy" section in your report documentation for reviewers.

Who this is for

  • BI Developers and Analysts publishing to broad audiences.
  • Analytics Engineers preparing semantic models.
  • Team leads approving dashboards with sensitive data.

Prerequisites

  • Basic SQL or data modeling concepts.
  • Familiarity with your BI tool’s data model, row-level, and column-level controls.
  • Understanding of role-based access in your organization.

Learning path

  1. Identify and classify PII in your datasets.
  2. Add masking/pseudonymization at the model layer.
  3. Implement RLS and column permissions; test with role accounts.
  4. Set aggregation thresholds; review exports and drill paths.
  5. Enable auditing and schedule periodic access reviews.

Next steps

  • Harden one existing report using the checklist.
  • Share a short policy note with your stakeholders.
  • Take the quick test to confirm your understanding.

Practice Exercises

2 exercises to complete

Instructions

Build a computed field email_shown that:

  • For role "Compliance": returns the full email.
  • For all other roles: returns first 2 chars + *** + domain (e.g., jo***@example.com).

Handle edge cases:

  • If email has no "@", return "masked".
  • If email length < 3, show first char + *** (if possible) + domain or just "masked".

Demonstrate the output for inputs: "john.smith@acme.com", "li@x.io", "a@b.com", and "brokenstring" for both role types.

Expected Output
Two sets of results (Compliance vs Other) showing fully visible emails for Compliance and correctly masked values for others; malformed input yields 'masked'.

Handling PII In Reports — Quick Test

Test your knowledge with 7 questions. Pass with 70% or higher.

7 questions70% to pass

Have questions about Handling PII In Reports?

AI Assistant

Ask questions about this tool