Why this matters
Compliance mapping is how a Data Architect translates laws and standards into concrete data platform controls. Teams rely on you to show exactly which encryption settings, access rules, retention jobs, and logs satisfy requirements like GDPR, HIPAA, PCI DSS, ISO 27001, or SOC 2.
- Real task: decide which data stores must encrypt specific fields and document proof.
- Real task: map regional data residency needs to cloud regions and data movement policies.
- Real task: connect privacy requests (access, deletion) to technical workflows in the data lake and warehouse.
- Real task: prepare evidence for audits without stopping delivery work.
Concept explained simply
Compliance mapping links each requirement (what the rule says) to a control (what we do) and to evidence (how we prove it).
Requirement → Control → Implementation → Evidence → Owner → Status
- Requirement: a statement from a law/standard (e.g., "Encrypt card data at rest").
- Control: a policy or technical safeguard (e.g., "All PAN columns use AES-256").
- Implementation: exact config/code (e.g., KMS key, table-level encryption setting).
- Evidence: logs, screenshots, configs, tickets that prove it.
- Owner/Status: who is accountable and whether it’s done.
Core terms you will see
- Control baseline: a standard set of controls you map to (ISO 27001 Annex A, SOC 2 TSC, NIST 800-53, CIS).
- Data classification: labeling data (Public/Internal/Confidential/Restricted/PHI/PAN) to decide control strength.
- Data residency: where data is stored/processed (region-level constraints).
- Retention: how long to keep data and when to delete or anonymize.
- Data Subject Request (DSR): access, correction, deletion requests under privacy laws.
- Traceability matrix: the list linking requirements to controls and evidence.
Step-by-step workflow
- Define scope
List systems, datasets, data types (PII/PHI/PAN), regions, processors, and business processes impacted. - Identify applicable rules
By geography, industry, customers, and contracts (e.g., GDPR for EU users, PCI DSS where card data exists). - Pick a control baseline
Choose one canonical set (e.g., ISO 27001 or NIST 800-53). It keeps mapping consistent across multiple regulations. - Normalize requirements
Rewrite each requirement as a clear control statement. Example: "Encrypt sensitive data at rest using managed keys; key rotation ≥ every 12 months." - Design controls
Decide policies and technical measures: encryption standards, RBAC patterns, logging, retention jobs, data residency constraints. - Map to implementations
Attach exact configurations: KMS key IDs, IAM roles, table/column encryption flags, masking/UDMs, pipeline steps, SIEM log sources. - Plan evidence
Decide what proves control operation: logs, config snapshots, successful job runs, tickets, sign-offs. Automate capture where possible. - Gap analysis
Mark unmet requirements, risk rate them, and define remediation owners and due dates. - Monitor and update
Set periodic reviews (e.g., quarterly), tie to change management when new datasets, regions, or vendors appear.
Worked examples
Example 1: GDPR — Right to Erasure in a data lake/warehouse
- Requirement: Individuals can request deletion of their personal data.
- Control: Implement DSR deletion workflow across raw, curated, and analytics layers.
- Implementation: Create a subject ID key; build jobs to locate and delete/anonymize rows in object storage and warehouse tables; configure soft-delete window then purge.
- Evidence: Deletion job run logs, ticket linking request ID to job run ID, warehouse row count diffs, audit trail of approval.
- Owner/Status: Data Platform team; implemented and reviewed quarterly.
Example 2: PCI DSS — Encrypt PAN at rest
- Requirement: PAN must be unreadable anywhere it is stored.
- Control: Strong encryption with centralized key management; limit where PAN is stored.
- Implementation: Column-level encryption for PAN; tokenize in ingest; KMS-managed keys; strict IAM policies; prevent exports to non-compliant zones.
- Evidence: KMS key policy screenshot, table DDL showing encryption, tokenization transform code, SIEM alerts for access attempts.
- Owner/Status: Payments Data team; gap: historical backfill migration in progress.
Example 3: HIPAA — Audit controls for PHI
- Requirement: Record and examine activity in systems that contain PHI.
- Control: Centralized logging for access and query events; restricted admin actions.
- Implementation: Enable warehouse audit logs, object access logs, and DB query history to SIEM; dashboards and alert rules for anomalous access.
- Evidence: SIEM dashboard screenshot, weekly alert report, sampling of access logs tied to user IDs.
- Owner/Status: Security Engineering; operating with monthly review.
Design patterns and checklists
Useful patterns
- Golden source approach: enforce controls at the earliest durable store and propagate downstream via policy-as-code.
- Tag-driven governance: classify datasets with tags (e.g., sensitivity, region) to auto-apply encryption, masking, and retention.
- Privacy-by-design pipelines: standard components for PII discovery, masking, and deletion used across all ETL jobs.
Quick checklist
- Data inventory lists systems, data classes, regions.
- Applicable regulations and contracts identified.
- Single control baseline chosen (and documented).
- Each requirement mapped to a control statement.
- Implementation details include exact configs and code locations.
- Evidence items defined and automatically captured when possible.
- Gaps tracked with owners and due dates.
- Review cadence established; changes trigger re-mapping.
Common mistakes and self-check
- Mistake: Mapping to vague policies only.
Self-check: Can you point to a specific config, job, or code path? - Mistake: Ignoring data copies (exports, caches).
Self-check: Do you track all replicas and downstream sinks? - Mistake: One-off evidence collection during audits only.
Self-check: Is evidence auto-collected and time-stamped regularly? - Mistake: Over-scoping encryption and breaking analytics.
Self-check: Do masking or tokenization satisfy need while enabling queries? - Mistake: No owner for each control.
Self-check: Is an accountable person named with review frequency?
Exercises
Do these to build muscle memory. Compare your answers with the solutions.
Exercise 1: Build a mini mapping
Scenario: You store EU customer emails and order IDs in a cloud data warehouse; product analytics tool receives event streams. Map 4 requirements to controls and evidence.
- Identify applicable rules and baseline.
- Propose control statements for: encryption at rest, access control, data residency, and deletion requests.
- Tie each to specific implementations and evidence.
Exercise 2: Gap analysis
Given: Access logs are enabled but retained only 7 days; retention policy requires 90 days. Identify the gap, risk, and remediation steps with owners and due dates.
Need a hint?
- Think about required vs actual retention.
- Consider storage costs and SIEM pipelines.
- Add validation: test that logs older than 90 days exist.
- Each requirement ties to a concrete system config or code artifact.
- Evidence items are verifiable without manual digging.
- Owners and review cadence included.
Mini challenge
You onboard a new dataset with partial card data (last 4 digits) and customer emails in EU and US regions. Draft three controls covering storage, access, and deletion, including implementation details and evidence.
Suggested approach
- Storage: Tokenize identifiers; ensure region-locked buckets/warehouses; encrypt with region-specific KMS keys.
- Access: RBAC roles per region; conditional masking for emails in analytics views.
- Deletion: DSR workflow keyed by customer ID; job purges EU/US partitions; logs and ticket links as evidence.
Practical projects
- Project 1: Build a traceability matrix for one product domain (10–15 requirements) with owners, implementations, and evidence items.
- Project 2: Create a privacy-by-design ETL template that includes PII tagging, masking, and deletion hooks.
- Project 3: Automate evidence snapshots (configs, key policies, log retention settings) on a schedule and store them in a read-only bucket.
Who this is for
- Data Architects defining platform controls.
- Data Engineers implementing pipelines with privacy/security requirements.
- Security/Compliance partners who need technical traceability.
Prerequisites
- Basic knowledge of cloud storage, data warehouses, and IAM/RBAC.
- Familiarity with encryption, logging, and data lifecycle concepts.
- Ability to read policy/control statements and translate them to configs.
Learning path
- Data classification and inventory.
- Choose a control baseline and normalize requirements.
- Implement core controls: encryption, RBAC, logging, retention.
- Automate evidence and build the traceability matrix.
- Run gap analysis; set remediation and reviews.
Next steps
- Apply the workflow to one live system and capture proof.
- Add automatic checks for drift (e.g., config snapshots compared weekly).
- Expand the matrix across additional datasets and vendors.
Quick Test
This test is available to everyone. Only logged-in users will have their progress saved.