Who this is for
Backend and platform engineers who read/write user data, design APIs, manage logs, or touch databases. If you build or operate services that could store, process, or transmit user information, this is for you.
Prerequisites
- Basic understanding of HTTP APIs and REST/JSON.
- Comfort with server logs and environment variables.
- Basic SQL and familiarity with application config/secrets.
Why this matters
Privacy is not abstract—it changes daily engineering work. Real tasks you will face:
- Designing a signup endpoint that avoids logging passwords or full emails.
- Splitting PII into a dedicated table, encrypting it, and limiting access.
- Setting retention so old support logs with user info are deleted automatically.
- Answering a data deletion request: find all places the user’s data lives and remove it.
Note: This content is for engineering practice, not legal advice. Follow your company’s privacy/legal guidance and applicable laws (e.g., GDPR/CCPA).
Concept explained simply
Personally Identifiable Information (PII) is any data that can identify a person. Examples: name, email, phone, exact address, government ID, IP address in some contexts, device identifiers. Sensitive data (like health, biometrics, precise location, financial data) needs even stronger protection.
Handling PII well means you only collect what you need, protect it in transit and at rest, minimize exposure (especially in logs), and delete it when it’s no longer required.
Mental model
- Data funnel: Only let necessary data enter your system. Everything else gets dropped or masked at the edge.
- PII map: Know where PII flows—ingest → process → store → share → log → backup →
- Blast radius: Assume something will leak. Design so that leaks expose as little as possible (masking, tokenization, separation, least privilege).
Key principles to apply
1) Data minimization
Collect only the fields you truly need for the stated purpose. Make optional fields genuinely optional. If a unique identifier suffices, do not store a full name or address.
2) Purpose limitation
Use PII only for the purpose it was collected. No silent repurposing (e.g., exporting user emails for unrelated marketing).
3) Security by default
- HTTPS/TLS everywhere. No plaintext secrets in transit.
- Encrypt at rest where feasible (DB column-level or full-disk, plus strong key management).
- Limit who can read PII (least privilege, role-based access).
4) Logging without secrets
- Never log passwords, tokens, full card numbers, full addresses, or full emails.
- Mask:
john.doe@example.com→j***@example.com;+1-202-555-0147→*********47. - Redact request bodies and headers known to contain PII.
5) Retention and deletion
- Set expiration for logs and backups containing PII.
- Implement deletion workflows (user-initiated or automatic after retention).
6) Pseudonymization, hashing, tokenization
- Pseudonymization: replace direct identifiers with reversible tokens stored separately.
- Hashing: one-way (with salt) to compare without storing actual values (good for emails used as lookup).
- Encryption: reversible; use for data that must be shown back to the user.
7) Data subject rights (engineering view)
Users may request access, correction, or deletion. Engineering must make data findable and deletable across primary DBs, caches, logs (where possible), and backups (per policy).
Worked examples
Example 1: Safe logging
Problem: An auth endpoint logs request bodies including emails and phone numbers.
Before (unsafe)
POST /login body: {"email":"alexa.ren@example.com","password":"hunter2","phone":"+44-7700-900123"}
After (safer)
POST /login body: {"email":"a***@example.com","password":"[REDACTED]","phone":"*********23"}
Headers: {"Authorization":"[REDACTED]"}
Implementation tip: provide a redaction layer that pattern-matches emails, phones, tokens, and fields like password, ssn, authorization.
Example 2: Split PII into a separate table
Store operational data and PII separately to reduce exposure.
-- Public/operational table
CREATE TABLE accounts (
id UUID PRIMARY KEY,
username TEXT UNIQUE NOT NULL,
created_at TIMESTAMPTZ NOT NULL
);
-- PII table with tighter access control
CREATE TABLE account_pii (
account_id UUID REFERENCES accounts(id) ON DELETE CASCADE,
email_enc BYTEA NOT NULL,
phone_enc BYTEA,
address_enc BYTEA,
PRIMARY KEY(account_id)
);
-- App roles: app_read cannot SELECT account_pii; app_pii_read can, but only for own tenant.
Store encryption keys via a managed KMS; rotate keys periodically.
Example 3: API design with minimization
Signup form asks for email and password. Phone is optional for 2FA only. Your API should:
- Validate and reject unexpected fields.
- Never return full email in responses—return masked value.
- Default to not collecting phone unless 2FA is enabled.
// Response payload
{"id":"b5d...","email_mask":"a***@example.com","2fa_enabled":false}
Example 4: Deletion workflow
- Mark account for deletion (grace period for recovery, e.g., 14 days).
- Queue a job to purge from: main DB, PII table, caches, analytics identifiers, and scheduled log partitions.
- Document what remains in immutable backups per policy and when those expire.
Exercises (hands-on)
Tip: The Quick Test is available to everyone. Sign in to save your progress.
Exercise 1: Redact PII from logs
Goal: Given raw logs, produce a redacted version that hides emails, passwords, tokens, and phone numbers.
[INFO] login: body {"email":"rachel.green@example.com","password":"supersecret","otp":"123456"}
[DEBUG] header Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
[WARN] user update: {"phone":"+1-202-555-0198","address":"742 Evergreen Terrace"}
[INFO] login: body {"email":"r***@example.com","password":"[REDACTED]","otp":"[REDACTED]"}
[DEBUG] header Authorization: [REDACTED]
[WARN] user update: {"phone":"*********98","address":"742 Evergreen Terrace"}
Hint 1
Mask emails to first character + *** + domain. Redact secrets fully.
Hint 2
Phone masking: keep last two digits only. Replace others with *.
Show solution
[INFO] login: body {"email":"r***@example.com","password":"[REDACTED]","otp":"[REDACTED]"}
[DEBUG] header Authorization: [REDACTED]
[WARN] user update: {"phone":"*********98","address":"742 Evergreen Terrace"}
Exercise 2: Minimal PII schema
Goal: Propose a table design and access policy that separates operational data from PII for a simple ecommerce app (users place orders, support agents view shipping info).
- Developers and services can read orders without PII.
- Only support role can read shipping address and phone.
- Encrypt PII columns and plan deletion.
Hint
Use a PII table keyed by user_id. Link orders to user_id only. Restrict SELECT on PII to support role. Plan ON DELETE CASCADE when user is deleted.
Show solution
CREATE TABLE users (id UUID PRIMARY KEY, created_at TIMESTAMPTZ NOT NULL);
CREATE TABLE user_pii (
user_id UUID PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
email_enc BYTEA NOT NULL,
phone_enc BYTEA,
shipping_address_enc BYTEA
);
CREATE TABLE orders (
id UUID PRIMARY KEY,
user_id UUID REFERENCES users(id),
total_cents INT NOT NULL,
created_at TIMESTAMPTZ NOT NULL
);
-- Roles:
-- role app_read: SELECT on orders only.
-- role support_read: SELECT on user_pii and limited JOIN to show masked email in UI.
-- Encryption keys: managed KMS; rotate; limit decrypt permission to support service.
-- Deletion: deleting users cascades to user_pii; schedule order anonymization after retention.
PII-safe release checklist
- Inputs validated and unexpected fields rejected
- Logs redact emails, phones, tokens, passwords
- PII stored separately with encryption at rest
- Access controlled with least privilege
- Retention and deletion tasks scheduled and tested
- Backups/analytics considered in deletion plan
- Secrets and keys not embedded in code or logs
Common mistakes and self-check
- Logging full identifiers. Self-check: Search logs for patterns like
@, 16+ digits, or "+" followed by digits. - Collecting extra fields. Self-check: For each field, write its purpose. If you can’t, remove it or make it optional.
- Storing PII in the main table. Self-check: Ensure direct identifiers live in a separate, restricted table.
- No deletion path. Self-check: Can you delete or anonymize a user in one workflow, including caches and analytics IDs?
- Weak key handling. Self-check: Are keys in env vars or KMS with rotation, not hard-coded?
Practical projects
Project 1: Log redaction middleware
- Identify sensitive fields and patterns (email, phone, token, password).
- Write a middleware that clones the payload and redacts before logging.
- Add tests: inputs → expected masked outputs.
Project 2: PII vault service
- Create a microservice that stores/retrieves PII by tokenized ID.
- Expose only necessary endpoints; require service-to-service auth.
- Measure: logs contain only tokens, not raw PII.
Project 3: Retention job
- Tag rows with created_at/last_seen.
- Scheduled job soft-deletes then hard-deletes/archives after policy window.
- Emit metrics and a deletion report for audits.
Learning path
- Before: Secure coding basics → AuthN/AuthZ → Secrets management.
- Now: Privacy and PII handling basics (this lesson).
- Next: Data retention and lifecycle, Key management/KMS, Audit logging, Incident response.
Next steps
- Implement a redaction layer in your staging environment.
- Refactor your schema to separate PII where possible.
- Set log and backup retention aligned with policy.
- Run the Quick Test below to check your understanding.
Mini challenge
You receive a request to export a user’s data. Your system has: accounts, account_pii, orders (no PII), and masked logs. What do you include?
Think it through
- All data in
accountsfor the user. - Decrypted fields from
account_piiwhere policy allows returning to the user. - Order metadata tied to the user (without introducing new PII).
- Do not include logs; they are operational and masked.
Document the export query and ensure access is audited.