Why this matters
As a Data Architect, you define how people and systems access data. Good IAM and least privilege design prevents data leaks, limits blast radius during incidents, and keeps auditors happy without slowing teams down.
- Protect sensitive columns (PII, financials) while enabling analytics.
- Grant pipelines just enough rights to read, transform, and write data.
- Support break-glass access for emergencies with full auditability.
- Scale securely across warehouses, lakes, streaming, and BI tools.
Concept explained simply
Identity and Access Management (IAM) controls who (identity) can do what (permission) on which resource (scope), under which conditions (context).
Least privilege means every identity has the minimum permissions needed to perform a task—no more, no less, and only for as long as needed.
Quick glossary
- Identity: user, group, service account, machine identity.
- Permission: an action like read, write, create, delete.
- Role: a bundle of permissions for a job function.
- Policy: a rule that grants or denies permissions to identities on resources.
- Scope: where a policy applies (account, project, dataset, table, column).
- Condition: context limiters (time, network, environment, data tag).
Mental model
Think of your data platform as a building:
- Lobby = project/account. Doors = datasets, tables, streams, buckets.
- Badges = identities. Badge permissions = roles.
- Janitor key ring = service accounts with exact room keys.
- Security rulebook = policies with conditions (time-bound, device, tag).
- Visitor pass = just-in-time temporary access.
If someone doesn’t need a key, don’t issue it. If they need it rarely, make it temporary and logged.
Core principles and patterns
- Deny-by-default: access is explicitly granted, never implied.
- Small, task-oriented roles: read-only, data-prep-writer, data-owner; avoid catch-all admin.
- Separation of duties: no single role owns dev, deploy, and approve production changes.
- Service accounts per workload: one pipeline, one identity, scoped to its datasets.
- Attribute-based controls: use data tags (PII, region) and user attributes (department) where supported.
- Time-bound elevation: just-in-time access with automatic expiry for rare tasks.
- Break-glass: emergency role with strong approval, MFA, and complete audit trail.
- Audit and recertification: review grants and logs periodically; remove stale access.
- Data-layer controls: combine dataset/table grants with column masking and row-level filters.
Worked examples
Example 1 — Analytics warehouse RBAC
Goal: Marketing analysts query sales trends but must not see PII.
- Create roles:
- analyst_viewer: SELECT on sales_agg schema only.
- pii_masked_viewer: SELECT on curated schema with column masking policy on PII columns.
- data_steward: can ALTER masking policies and manage tags, not full admin.
- Grant:
- Marketing analysts -> analyst_viewer.
- Data stewards -> data_steward.
- If analysts need customer segmentation on PII, assign pii_masked_viewer with masking that shows hashed_email instead of email.
- Result: Analysts work freely on aggregates while PII is protected by default.
Example 2 — ETL service account with scoped rights
Goal: Nightly pipeline reads raw/sales, writes curated/sales, cannot delete.
- Create service account: svc-etl-sales.
- Grant permissions:
- storage.objects.get/list on raw/sales/*.
- storage.objects.create on curated/sales/*.
- Explicit deny storage.objects.delete everywhere.
- Rotation: Use short-lived tokens via workload identity/Federation; avoid static keys.
Example 3 — Column masking + row filters
Goal: Regional analysts can see only their region’s rows; PII masked for everyone except compliance.
- Tag PII columns: email, phone -> tag PII=High.
- Row-level policy: region = user.region attribute.
- Masking policy: if role != compliance_reader then mask(email) and mask(phone).
- Grant compliance_reader only to the compliance group with approval flow.
Step-by-step: design minimal IAM for a new data product
- List actors and workloads: humans (analysts, engineers, stewards), services (ingest, transform, BI).
- Classify data: public, internal, confidential, restricted (PII/PHI).
- Define roles per task: reader, writer, owner, steward; avoid custom catch-all.
- Map resources and scopes: projects, buckets, schemas, tables, columns, streams.
- Apply deny-by-default then add only required permissions.
- Add conditions: time-bound elevation, network or environment conditions, data tags.
- Design break-glass: who can request, time limit, MFA, audit.
- Implement logging: capture access logs for datasets, storage, and IAM changes.
- Pilot and test: run access dry-runs; verify least privilege with sample queries.
- Schedule recertification: quarterly reviews to remove stale grants.
Security controls checklist
- Every identity has a defined purpose (no orphan users or keys).
- Roles are task-scoped; no broad admin in daily use.
- Service accounts are per workload with minimal permissions.
- Sensitive data is tagged and protected by column masking/row filters.
- JIT elevation exists and expires automatically.
- Break-glass access is auditable and requires MFA.
- Access logs are enabled and reviewed.
- Quarterly access recertification is planned.
Common mistakes and self-check
- Mistake: Granting broad admin because it’s “easier.”
Self-check: List all admin grants; can you replace with narrower roles? - Mistake: Shared service accounts across pipelines.
Self-check: Map service accounts to workloads 1:1; rotate any shared credentials. - Mistake: Protecting only at dataset level.
Self-check: Do you mask PII columns and apply row filters where needed? - Mistake: Permanent elevation for rare tasks.
Self-check: Audit last 90 days for unused high-privilege grants; convert to JIT. - Mistake: No emergency access plan.
Self-check: Can you describe who, how long, and how it’s audited within 1 minute?
Exercises
Do Exercises 1–2. Compare your answers with the solutions provided below each exercise in the Exercises panel. Progress saving: available to everyone; only logged-in users will have answers and progress saved.
- Exercise 1 (Role model design): Create roles and a minimal permission matrix for a mixed dataset with PII. Expected: clear roles, grants, and masking rules.
- Exercise 2 (Policy writing): Write a deny-by-default policy that allows a service account to read from raw/sales and write to curated/sales, with no delete.
Mini challenge
In one paragraph, describe how you would onboard a new analytics team for a sensitive dataset without risking PII exposure. Mention roles, data tags, and temporary elevation.
Practical projects
- Least-Privilege Blueprint: Document your platform’s standard roles, with scopes and example grants for warehouse, lake, and BI.
- Data Sensitivity Tagging: Tag at least 10 key tables and 5 PII columns; implement masking and verify with test queries.
- JIT Access Pilot: Set up a time-bound elevation flow for on-call engineers and test during a simulated incident.
- Quarterly Review Script: Build a script or query that lists unused permissions and stale accounts over 60 days, then remove them.
Who this is for
- Data Architects and Platform Engineers designing secure data access.
- Data Engineers and Analytics Engineers who manage pipelines and schemas.
- Security/Compliance partners collaborating on data governance.
Prerequisites
- Basic understanding of data warehouses/lakes and BI workflows.
- Familiarity with roles, groups, and service accounts in any cloud/platform.
- Ability to read simple JSON/YAML policies.
Learning path
- Before: Data classification and tagging; Identity fundamentals.
- Now: IAM and least privilege design (this lesson).
- Next: Secure data sharing, secrets management, and audit/monitoring.
Next steps
- Implement the checklist in one real project this week.
- Run a 30-minute tabletop: revoke an over-privileged grant and observe impact.
- Take the quick test to confirm you can apply these patterns.
Quick test
Take the quick test below to check your understanding. Everyone can take it; only logged-in users will have results saved.