Why this matters
Access review processes keep your data warehouse, lakes, and pipelines safe. As a Data Engineer, you grant roles to analysts, manage service accounts for jobs, and publish datasets that may include sensitive fields (PII, financial, health). Regular access reviews reduce data leaks, prevent privilege creep, satisfy audits (SOX/ISO27001), and keep costs down by removing unused access.
- Real tasks you will face:
- Running quarterly reviews of warehouse roles and object permissions
- Recertifying service account access for pipelines
- Cleaning up direct user grants in favor of role-based access
- Proving to auditors that access is appropriate and reviewed
Who this is for
- Data Engineers and Platform Engineers managing IAM for data platforms
- Analytics Engineers who own data models and role grants
- Team leads who approve access requests
Prerequisites
- Basic understanding of RBAC (roles -> privileges -> objects) and least privilege
- Familiarity with your data platform's audit/query logs
- Comfort with SQL and reading IAM policies
Concept explained simply
Access review is a recurring check that each person or service still needs the access they have. You verify, remove, or adjust access, then record evidence.
Mental model: Think of your platform as a building with many rooms. Roles are keys. Access review is walking the halls every quarter, confirming who still needs which keys, taking back unused ones, and noting what you changed.
Access review lifecycle (end-to-end)
1) Define scope
Prioritize high-risk data (PII/finance), production systems, and privileged roles. Decide cadence (e.g., quarterly for high-risk, semi-annual for low-risk).
2) Inventory entitlements
Export users, groups/roles, privileges, and target objects. Flag direct user grants; prefer role-based access.
3) Pull usage evidence
Collect last login, last query, object usage, pipeline run history. Evidence should be time-bounded (e.g., last 90 days).
4) Risk-based prioritization
Review first: owners of sensitive data, powerful roles (admin/write), and dormant accounts.
5) Send review tasks to owners
Manager and data owner attest: Keep / Modify / Remove, with justification. Set deadlines.
6) Decide changes
Prepare removals, downgrades (write -> read), and time-bound temporary access. Handle exceptions clearly.
7) Apply changes safely
Stage changes off-hours; ensure rollback. Avoid breaking pipelines. Use change tickets.
8) Record evidence
Store reviewer decisions, evidence snapshots, and change logs. Keep timestamps and reviewer names.
9) Report metrics
Revocation rate, stale access reduced, time to complete review, exception count, break-glass tests passed.
10) Iterate & automate
Automate exports, evidence gathering, and time-bound grants. Shift to continuous reviews for high-risk items.
Worked examples
Example 1: Quarterly warehouse review (roles and objects)
Scenario: You manage a warehouse with roles ANALYST, DATA_ENGINEER, and FINANCE_READ. You must review object-level grants and user role memberships.
Evidence snapshot (last 90 days):
- user: alice, roles: ANALYST, last_query: 10 days ago
- user: bob, roles: ANALYST, DATA_ENGINEER, last_query: 120 days ago
- user: cara, roles: FINANCE_READ, last_query: 5 days ago
- direct grant found: user bob -> SELECT on sensitive schema (bypasses roles)
Decisions:
- alice: keep ANALYST
- bob: remove DATA_ENGINEER, remove direct SELECT; keep ANALYST only if manager approves business need
- cara: keep FINANCE_READ
Changes:
- Revoke role DATA_ENGINEER from bob
- Revoke direct grant to bob; add to role if needed via role only
Evidence stored: query history, reviewer approvals, SQL change scripts
Example 2: Object storage bucket with PII
Scenario: A bucket contains PII. Several users have write access, and one temporary contractor was given read for a project that ended.
Findings:
- 5 engineers have write but only 2 used it in 90 days
- contractor_jin had READ, last access 150 days ago
- A group "analytics-temp" has wildcard access to the bucket
Decisions:
- Downgrade 3 engineers to READ
- Remove contractor_jin access (expired)
- Replace wildcard group access with path-scoped READ on necessary prefixes only
Evidence: access logs, project end date, approvals
Example 3: Service accounts for ETL
Scenario: Two service accounts power Airflow jobs. One account no longer runs any DAGs.
Findings:
- svc_airflow_prod: last DAG run 1 day ago -> keep
- svc_airflow_legacy: last run 200 days ago -> no usage
Decision:
- Disable svc_airflow_legacy, schedule deletion after 30 days if no rollback request.
- Remove write grants where read-only is sufficient for data pulls.
Evidence: DAG run history, app owner attestation, change tickets.
How to run a basic access review (checklist)
- ā Define scope and risk tiers
- ā Export current entitlements (users, roles, privileges)
- ā Pull usage evidence (last login/query/object access)
- ā Identify direct user grants to replace with roles
- ā Propose removals/downgrades/time-bound access
- ā Send to managers/data owners for attestation
- ā Apply changes with rollback plan
- ā Store evidence and signed-off decisions
- ā Report metrics and exceptions
- ā Automate parts for next cycle
Exercises
Do these in order. They mirror the exercise cards below.
Exercise 1: Plan a quarterly warehouse access review
Using a small dataset (users, roles, last used), decide removals/downgrades, replace direct grants with roles, and draft change/evidence notes. See the exercise card below for the dataset and expected output.
Exercise 2: Create a time-bound access playbook
Design a request-and-approval flow for a sensitive bucket with 30-day expiring access, including rollback and evidence capture.
Common mistakes and self-check
- Mistake: Keeping direct user grants. Fix: Use roles and groups; no direct object grants to users.
- Mistake: No evidence captured. Fix: Always save usage logs and approvals with timestamps.
- Mistake: Break pipelines by revoking service account access abruptly. Fix: Stage changes, monitor, and keep rollback.
- Mistake: One-time cleanup only. Fix: Set a recurring cadence with reminders and SLAs.
- Mistake: Same review depth for all assets. Fix: Risk-based prioritization (sensitive first).
Self-check: Can you explain each removal with a usage metric and a reviewer decision? Can a new auditor reproduce your evidence trail?
Practical projects
- Build a script to export entitlements and last-used evidence across your data platform, saving CSV snapshots per review cycle.
- Create a policy-as-code template for time-bound access (e.g., 30 days) with automatic revocation and alerting.
- Implement a lightweight attestation form (HTML or form tool) that records Keep/Modify/Remove with justification and manager/data owner names.
Learning path
- Start: Access basics (RBAC/ABAC, least privilege)
- Then: Access review processes (this lesson)
- Next: Data classification and masking; Audit logging and monitoring; Secrets and service account governance
Metrics and evidence to keep
- Revocation rate (% of accesses removed) and downgrade count
- Stale access reduced (before vs after)
- Time to complete review and overdue items
- Exception approvals and expiry dates
- Break-glass account test results
Quick Test
You can take the quick test for free. Progress and results are saved only if you are logged in.
Next steps
- Automate evidence gathering and time-bound grants
- Expand reviews to include data masking policies and lineage impacts
- Introduce continuous reviews for high-risk datasets
Mini challenge
Pick one production dataset tagged as sensitive. In one page, propose a review plan: scope, evidence sources, reviewers, revocation rules, and metrics. Keep it practical and implementable within two weeks.