How to learn Compliance Controls Basics for Data Access And Security in Data Platform Engineer for free

Who this is for

Data Platform Engineers who need to design, operate, or improve data systems that handle personal or regulated data. Also useful for analytics engineers and platform SREs partnering with security or compliance teams.

Prerequisites

Basic understanding of data lakes/warehouses and access control concepts (roles, permissions).
Familiarity with PII concepts (names, emails, IDs).
Basic SQL and understanding of data pipelines (batch/streaming).

Why this matters

You will be asked to implement controls for privacy laws (e.g., GDPR-like requests), retention, and auditing.
Stakeholders (security, legal, data owners) expect evidence that controls exist and work.
Controls reduce risk of breaches, fines, and customer trust loss.

Real tasks you might own

Classify datasets and tag PII columns for masking.
Set role-based access for finance, marketing, and external partners.
Implement data retention and deletion workflows for user accounts.
Encrypt data at rest and enforce TLS for in-transit data.
Enable, retain, and review audit logs for access and schema changes.

Concept explained simply

Compliance controls are guardrails that define who can access data, how it is protected, how long it is kept, and how you prove it. They turn legal and security requirements into concrete configurations, processes, and evidence.

Mental model: The 6 questions

What data is sensitive? (classification)
Where does it live and flow? (inventory and lineage)
Who can access it and why? (RBAC/ABAC + approvals)
How is it protected? (encryption, masking, network)
How long is it kept? (retention/deletion)
How is it proven? (audit logs, monitoring, reviews)

Core controls you will use

1) Data classification and inventory

Tag data domains (Customer, Finance, HR).
Tag sensitivity (Public, Internal, Confidential, Restricted).
Tag PII/PHI columns (Email, Phone, SSN-like IDs).
Outcome: You can filter assets by sensitivity and apply controls consistently.

2) Access control (RBAC/ABAC)

Roles map to job functions (e.g., Marketing_Analyst_Read).
Policies restrict Restricted data to minimal roles; approvals required.
Service accounts separated from human users; least privilege by default.

3) Encryption

At rest: enable managed encryption for storage, keys rotated periodically.
In transit: enforce TLS for all data movement.
Key management: restrict key usage to necessary services.

4) Retention and deletion

Define retention periods by dataset class (e.g., Raw events: 90 days, Aggregates: 3 years).
Automate deletion or archival; verify with logs and reports.
Support user deletion requests across systems.

5) Masking and tokenization

Default masked views for PII; unmask only for permitted roles.
Tokenize high-risk identifiers; keep vault separate from analytics.

6) Audit logging and monitoring

Record access, policy changes, schema changes, and data deletions.
Send critical events to a central log; retain for a defined period.
Regular reviews: monthly or quarterly.

7) Data sharing and consent

Share only necessary fields; remove or mask PII by default.
Document legal basis or consent handling where applicable.

8) Third parties and data flows

Maintain a registry of destinations and purposes.
Apply the same controls to extracts (encryption, access, retention).

9) Data localization

Keep data in allowed regions where required; restrict cross-region copy.

Worked examples

Example A: Masking PII in the analytics warehouse

Classify columns: email, phone, dob as Restricted PII.
Create a masked view that shows partial email for most roles.
Grant unmasked access only to a named role with approval.
Enable logging for all SELECTs on PII tables.

Acceptance criteria

Non-privileged users see masked values.
Access attempts to unmasked view are logged.
Role grants require ticket/approval reference.

Example B: 90-day retention for raw event data

Set retention policy: Raw events retained 90 days, then deleted.
Implement a scheduled job that deletes partitions older than 90 days.
Generate a deletion report with counts per day.
Store reports and job logs for 12 months.

Acceptance criteria

No partitions older than 90 days exist.
Deletion logs and reports are accessible to compliance reviewers.

Example C: Right-to-delete (user erasure) workflow

Receive user_id from request system.
Locate user_id across lake, warehouse, and derived tables via lineage or registry.
Delete or anonymize records; reprocess affected aggregates.
Produce evidence: job run ID, tables touched, before/after counts.

Acceptance criteria

All systems updated within defined SLA (e.g., 30 days).
Evidence bundle stored with request ID.

Example D: Controlled data share to a vendor

Classify required fields; exclude PII when not strictly needed.
Create a sanitized export view with only approved fields.
Encrypt export; restrict access to a dedicated service account.
Log all exports and review monthly.

Acceptance criteria

Only approved schema is shared.
All export operations are logged and reviewed.

Exercises

These exercises are available to everyone. Progress is saved only for logged-in users.

Exercise 1: Minimal controls for a customer data mart

Design a minimal, practical compliance control set for a new customer data mart that includes customer_profile, orders, and support_tickets tables.

Classify datasets and sensitive columns.
Define roles and access (who can read which tables/columns).
Define masking rules and when unmasking is allowed.
Define retention per table and audit logging scope.
Write 3–5 acceptance criteria you could show to auditors.

Hints

Start with simple tags: Restricted for PII columns; Confidential for non-PII business data.
Default to masked views; create a single privileged role for unmasking with approvals.

Exercise 2: Retention and deletion workflow

Create a retention plan for raw_events and customer_profile tables and describe the deletion workflow.

Retention windows for each table.
How deletion runs (schedule, criteria) and how you verify success.
What logs and reports you keep and for how long.
How you handle re-processing of aggregates after deletions.

Hints

Use partition-based deletes for time-series data.
Keep deletion evidence separate from the data being deleted.

Self-check checklist

I used classification tags consistently across datasets.
Least-privilege roles are clearly defined.
Masking rules cover all PII columns by default.
Retention windows are clear and automated.
Audit logging scope and review cadence are documented.

Common mistakes and self-check

Inconsistent tagging: Fix by defining allowed values and adding automated checks.
Too-broad roles: Split roles by function and data domain.
Masking only in BI tools: Enforce masking in the warehouse/lake too.
Deletion without evidence: Always produce reports with IDs, counts, and timestamps.
No review of logs: Schedule monthly reviews and track findings.

Quick self-audit

Pick one sensitive table. Can you prove who accessed it last month?
Pick one PII column. Can you show the masked and unmasked paths?
Pick one dataset. Can you show when its retention job last ran and what it deleted?

Practical projects

Build a data classification registry: a simple table holding dataset, owner, sensitivity, PII columns, retention, and review date. Populate for 10 datasets.
Implement masked views for three PII tables and a privileged unmasking role with request ID enforcement.
Create a retention job for raw events with daily deletion and a weekly summary report.

Learning path

Start: Compliance Controls Basics (this lesson).
Next: Implementing Data Masking and Tokenization.
Then: Access Governance and Just-in-Time Access.
Advanced: Automated Lineage, Data Loss Prevention, and Continuous Compliance.

Next steps

Write a one-page control summary for your current platform covering classification, access, masking, encryption, retention, and logging.
Pick one dataset and apply all six control areas end-to-end this week.
Take the Quick Test below to verify your understanding.

Mini challenge

You must share order analytics with a partner. Draft a minimal control plan that includes classification, minimized schema, masking, encryption, logging, and a 90-day access review. Keep it under 10 bullet points. Use

below to compare your plan with a reference outline.

Reference outline

Classify: Confidential dataset, no PII shared.
Schema: Only order_id, product_id, quantity, price, date.
Masking: N/A (no PII); verify no join keys reveal identities.
Encryption: Encrypted at rest and in transit; dedicated service account.
Access: Partner role limited to read-only on shared view.
Logging: All SELECTs on the view logged and reviewed monthly.
Retention: Share refreshed daily; logs retained 12 months.
Review: Quarterly access review with ticketed approvals.

Menu

Compliance Controls Basics

Table of Contents

Who this is for

Prerequisites

Why this matters

Concept explained simply

Mental model: The 6 questions

Core controls you will use

Worked examples

Exercises

Exercise 1: Minimal controls for a customer data mart

Exercise 2: Retention and deletion workflow

Self-check checklist

Common mistakes and self-check

Practical projects

Learning path

Next steps

Mini challenge

Practice Exercises

Minimal controls for a customer data mart

Instructions

Expected Output

Retention and deletion workflow

Compliance Controls Basics — Quick Test

Have questions about Compliance Controls Basics?

AI Assistant