Why this skill matters for Data Platform Engineers
Data Access and Security ensures the right people get the right data at the right time—safely. As a Data Platform Engineer, you design controls that protect sensitive data, enable governed self-serve analytics, and meet compliance requirements without blocking delivery. Mastering identity and permissions, network boundaries, encryption, secrets, and auditability unlocks trusted, scalable platforms.
- Enable least-privilege access for engineers, analysts, and services
- Protect PII via row/column controls, masking, and tokenization
- Keep data private with segmented networks and private connectivity
- Secure data in transit/at rest; manage keys and secret rotation
- Prove compliance with centralized logging and access reviews
Who this is for
- Data Platform Engineers and Data Engineers building secure data lakes/warehouses
- Analytics engineers needing governed access patterns
- Ops/SREs supporting data services in production
Prerequisites
- Basic SQL (DDL/DML), database roles/permissions
- Familiarity with cloud or on-prem networking concepts (VPC/VNet, CIDR, security groups/firewalls)
- Comfort with a scripting language (e.g., Python or Bash) for automation
Learning path
- Foundations
- Understand identities: human vs. service principals
- Learn least-privilege, separation of duties, and defense-in-depth
- Access control
- Implement IAM roles, policies, and role-based access control (RBAC)
- Practice row- and column-level security
- Network security
- Segment environments (dev/test/prod)
- Use private connectivity (private endpoints/peering) and allowlists
- Encryption + secrets
- Enable at-rest and in-transit encryption
- Manage keys in a KMS; rotate credentials and secrets
- Monitoring + compliance
- Centralize audit logs and set alerts
- Run access reviews and document controls
Milestone checklist
- Create a least-privilege IAM role for read-only analytics
- Apply RLS by tenant and mask PII columns
- Restrict data warehouse access to a private network
- Enable encryption at rest + enforce TLS in transit
- Rotate a service credential and update dependent jobs
- Enable audit logging and complete an access review
Worked examples
1) Least-privilege IAM role for read-only analytics
Goal: Grant analysts read-only access to a specific dataset and nothing else.
{
"Version": "2022-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": [
"arn:example:s3:::company-analytics",
"arn:example:s3:::company-Attach to role AnalystReadOnly. Do not include write or delete permissions.
Validation tips
- Attempt a write; it must fail
- Ensure read access is limited to the curated path
2) Row-level security (RLS) by tenant in PostgreSQL
Goal: Analysts see only rows for their tenant.
-- Table has tenant_id column
ALTER TABLE sales ENABLE ROW LEVEL SECURITY;
-- Map DB user to tenant context (example via setting statement)
SET app.tenant_id = 'tenant_a';
-- Policy: only rows for the current tenant
CREATE POLICY tenant_isolation ON sales
USING (tenant_id = current_setting('app.tenant_id')::text);
Test with different app.tenant_id values. Non-matching rows should be invisible.
3) Column masking via a view (PII)
Goal: Show masked email to most users; show full email only to a privileged role.
-- Role that can see PII
CREATE ROLE pii_reader;
-- Masking function
CREATE OR REPLACE FUNCTION mask_email(email text) RETURNS text AS $$
SELECT regexp_replace($1, '(^.).*(@.*$)', '\\1***\\2');
$$ LANGUAGE SQL IMMUTABLE;
-- Base table
CREATE TABLE customers(id bigint, email text, name text);
-- Masked view
CREATE OR REPLACE VIEW customers_safe AS
SELECT
id,
CASE
WHEN pg_has_role(current_user, 'pii_reader', 'USAGE') THEN email
ELSE mask_email(email)
END AS email,
name
FROM customers;
Grant most users access to customers_safe; only pii_reader sees full emails.
4) Network segmentation allowlist
Goal: Restrict data warehouse access to only a private subnet and a bastion.
# Pseudo security group rules
INGRESS: allow tcp/443 from 10.10.0.0/24 (private app subnet)
INGRESS: allow tcp/443 from 10.20.0.10 (bastion host)
EGRESS: restrict to known destinations (object storage, KMS) over TLS
Result: No public access; only approved sources reach the warehouse.
5) Envelope encryption with KMS (concept)
Goal: Encrypt files with a Data Encryption Key (DEK) wrapped by a Key Encryption Key (KEK) from KMS.
- Generate a random DEK locally
- Ask KMS to wrap the DEK using KEK; store wrapped DEK next to the file
- Encrypt the file with the DEK (AES-256-GCM)
- To decrypt: unwrap DEK via KMS, then decrypt the file
# Pseudocode
DEK = random_bytes(32)
wrapped = KMS.wrap(KEK_ID, DEK)
write(file.enc, aes_gcm_encrypt(DEK, file))
write(file.key, wrapped)
Operational note
Rotate KEK in KMS. You usually re-wrap DEKs rather than re-encrypt all data.
Drills and quick exercises
- Create a read-only role for a single schema and prove writes are denied
- Write an RLS policy restricting access by
department_id - Implement a masking view for phone numbers
- Configure TLS-only connections and verify with a failed non-TLS attempt
- Rotate a service account password and update one scheduled job
- Enable audit logs and find: who queried table X in the last 24 hours?
Common mistakes and debugging tips
Over-broad permissions
Start with read-only; add narrowly scoped write as needed. Use policy simulators where available to test.
Forgetting default deny
Many systems allow implicit access. Prefer a default-deny posture and explicitly grant.
RLS policy not applied
Ensure RLS is enabled on the table and that the session context (e.g., tenant ID) is set before queries.
Masking bypass via direct table access
Grant users the view but not the base table. Audit for unexpected table access.
Secrets in code repos
Use a secrets manager and environment variables. Scan repos; rotate any exposed secret immediately.
Unmonitored logs
Centralize and alert on privileged events (GRANTs, key use, failed logins). No alerting = blind spots.
Mini project: Secure analytics sandbox
Build a governed analytics sandbox for a marketing team.
- Scope: Read-only access to curated datasets; masked PII; tenant-based isolation if multi-brand
- Network: Private connectivity from the analytics subnet only
- Encryption: At rest enabled; KEK in KMS; enforce TLS
- Secrets: Use a secrets manager for the BI service principal; rotate it
- Audit: Enable query/access logs and produce a weekly access review
Acceptance criteria
- Marketing analysts can query views but cannot modify tables
- Emails appear masked unless queried by
pii_reader - Public access attempts fail
- All queries are logged with user identity
Practical projects
- Implement RLS + CLS for a customer 360 dataset and document controls
- Create a data product access package: role definitions, views, and audit dashboards
- Automate monthly access reviews with a script that compares grants vs. HR roster
Subskills
- IAM Roles And Least Privilege — define roles, policies, groups; default-deny; narrow grants
- Row Column Level Security Concepts — restrict by row predicates and mask sensitive columns
- Network Segmentation And Private Connectivity — isolate environments; private endpoints/peering
- Encryption And Key Management Basics — at rest, in transit, key lifecycles, envelope encryption
- Secrets Rotation — scheduled rotation, zero-downtime cutovers, revocation playbooks
- Audit Logging And Access Reviews — centralized logs, alerts, periodic certifications
- PII Tagging Masking Tokenization — classify data, dynamic masking, irreversible tokens
- Compliance Controls Basics — align controls to common standards; document and test
Next steps
- Work through the subskills in order, practicing on a sandbox
- Finish the mini project and have a peer attempt to bypass controls
- Take the skill exam to check your understanding (progress is saved for logged-in users)