How to learn Access Review Processes for Security And Governance in Data Engineer for free

Why this matters

Access review processes keep your data warehouse, lakes, and pipelines safe. As a Data Engineer, you grant roles to analysts, manage service accounts for jobs, and publish datasets that may include sensitive fields (PII, financial, health). Regular access reviews reduce data leaks, prevent privilege creep, satisfy audits (SOX/ISO27001), and keep costs down by removing unused access.

Real tasks you will face:
- Running quarterly reviews of warehouse roles and object permissions
- Recertifying service account access for pipelines
- Cleaning up direct user grants in favor of role-based access
- Proving to auditors that access is appropriate and reviewed

Who this is for

Data Engineers and Platform Engineers managing IAM for data platforms
Analytics Engineers who own data models and role grants
Team leads who approve access requests

Prerequisites

Basic understanding of RBAC (roles -> privileges -> objects) and least privilege
Familiarity with your data platform's audit/query logs
Comfort with SQL and reading IAM policies

Concept explained simply

Access review is a recurring check that each person or service still needs the access they have. You verify, remove, or adjust access, then record evidence.

Mental model: Think of your platform as a building with many rooms. Roles are keys. Access review is walking the halls every quarter, confirming who still needs which keys, taking back unused ones, and noting what you changed.

Access review lifecycle (end-to-end)

1) Define scope

Prioritize high-risk data (PII/finance), production systems, and privileged roles. Decide cadence (e.g., quarterly for high-risk, semi-annual for low-risk).

2) Inventory entitlements

Export users, groups/roles, privileges, and target objects. Flag direct user grants; prefer role-based access.

3) Pull usage evidence

Collect last login, last query, object usage, pipeline run history. Evidence should be time-bounded (e.g., last 90 days).

4) Risk-based prioritization

Review first: owners of sensitive data, powerful roles (admin/write), and dormant accounts.

5) Send review tasks to owners

Manager and data owner attest: Keep / Modify / Remove, with justification. Set deadlines.

6) Decide changes

Prepare removals, downgrades (write -> read), and time-bound temporary access. Handle exceptions clearly.

7) Apply changes safely

Stage changes off-hours; ensure rollback. Avoid breaking pipelines. Use change tickets.

8) Record evidence

Store reviewer decisions, evidence snapshots, and change logs. Keep timestamps and reviewer names.

9) Report metrics

Revocation rate, stale access reduced, time to complete review, exception count, break-glass tests passed.

10) Iterate & automate

Automate exports, evidence gathering, and time-bound grants. Shift to continuous reviews for high-risk items.

Worked examples

Example 1: Quarterly warehouse review (roles and objects)

Scenario: You manage a warehouse with roles ANALYST, DATA_ENGINEER, and FINANCE_READ. You must review object-level grants and user role memberships.

Evidence snapshot (last 90 days):
- user: alice, roles: ANALYST, last_query: 10 days ago
- user: bob, roles: ANALYST, DATA_ENGINEER, last_query: 120 days ago
- user: cara, roles: FINANCE_READ, last_query: 5 days ago
- direct grant found: user bob -> SELECT on sensitive schema (bypasses roles)
Decisions:
- alice: keep ANALYST
- bob: remove DATA_ENGINEER, remove direct SELECT; keep ANALYST only if manager approves business need
- cara: keep FINANCE_READ
Changes:
- Revoke role DATA_ENGINEER from bob
- Revoke direct grant to bob; add to role if needed via role only
Evidence stored: query history, reviewer approvals, SQL change scripts

Example 2: Object storage bucket with PII

Scenario: A bucket contains PII. Several users have write access, and one temporary contractor was given read for a project that ended.

Findings:
- 5 engineers have write but only 2 used it in 90 days
- contractor_jin had READ, last access 150 days ago
- A group "analytics-temp" has wildcard access to the bucket
Decisions:
- Downgrade 3 engineers to READ
- Remove contractor_jin access (expired)
- Replace wildcard group access with path-scoped READ on necessary prefixes only
Evidence: access logs, project end date, approvals

Example 3: Service accounts for ETL

Scenario: Two service accounts power Airflow jobs. One account no longer runs any DAGs.

Findings:
- svc_airflow_prod: last DAG run 1 day ago -> keep
- svc_airflow_legacy: last run 200 days ago -> no usage
Decision:
- Disable svc_airflow_legacy, schedule deletion after 30 days if no rollback request.
- Remove write grants where read-only is sufficient for data pulls.
Evidence: DAG run history, app owner attestation, change tickets.

How to run a basic access review (checklist)

☐ Define scope and risk tiers
☐ Export current entitlements (users, roles, privileges)
☐ Pull usage evidence (last login/query/object access)
☐ Identify direct user grants to replace with roles
☐ Propose removals/downgrades/time-bound access
☐ Send to managers/data owners for attestation
☐ Apply changes with rollback plan
☐ Store evidence and signed-off decisions
☐ Report metrics and exceptions
☐ Automate parts for next cycle

Exercises

Do these in order. They mirror the exercise cards below.

Exercise 1: Plan a quarterly warehouse access review

Using a small dataset (users, roles, last used), decide removals/downgrades, replace direct grants with roles, and draft change/evidence notes. See the exercise card below for the dataset and expected output.

Exercise 2: Create a time-bound access playbook

Design a request-and-approval flow for a sensitive bucket with 30-day expiring access, including rollback and evidence capture.

Common mistakes and self-check

Mistake: Keeping direct user grants. Fix: Use roles and groups; no direct object grants to users.
Mistake: No evidence captured. Fix: Always save usage logs and approvals with timestamps.
Mistake: Break pipelines by revoking service account access abruptly. Fix: Stage changes, monitor, and keep rollback.
Mistake: One-time cleanup only. Fix: Set a recurring cadence with reminders and SLAs.
Mistake: Same review depth for all assets. Fix: Risk-based prioritization (sensitive first).

Self-check: Can you explain each removal with a usage metric and a reviewer decision? Can a new auditor reproduce your evidence trail?

Practical projects

Build a script to export entitlements and last-used evidence across your data platform, saving CSV snapshots per review cycle.
Create a policy-as-code template for time-bound access (e.g., 30 days) with automatic revocation and alerting.
Implement a lightweight attestation form (HTML or form tool) that records Keep/Modify/Remove with justification and manager/data owner names.

Learning path

Start: Access basics (RBAC/ABAC, least privilege)
Then: Access review processes (this lesson)
Next: Data classification and masking; Audit logging and monitoring; Secrets and service account governance

Metrics and evidence to keep

Revocation rate (% of accesses removed) and downgrade count
Stale access reduced (before vs after)
Time to complete review and overdue items
Exception approvals and expiry dates
Break-glass account test results

Quick Test

You can take the quick test for free. Progress and results are saved only if you are logged in.

Next steps

Automate evidence gathering and time-bound grants
Expand reviews to include data masking policies and lineage impacts
Introduce continuous reviews for high-risk datasets

Mini challenge

Pick one production dataset tagged as sensitive. In one page, propose a review plan: scope, evidence sources, reviewers, revocation rules, and metrics. Keep it practical and implementable within two weeks.

Instructions

You manage a warehouse. Use the dataset below to decide what to keep, remove, or downgrade. Replace any direct grants with role-based access. Draft the change log and the attestation message you will send to reviewers.

Users (last 90 days):
user,roles,last_query_days,manager,resource_owner
alice,[ANALYST],10,lea,warehouse_owner
bob,[ANALYST,DATA_ENGINEER],120,lea,warehouse_owner
cara,[FINANCE_READ],5,dave,finance_owner
devon,[ANALYST],200,lea,warehouse_owner
Findings: direct_grants -> bob: SELECT on sensitive_schema.table_x
Objects:
- sensitive_schema.table_x (PII)
- mart.sales (non-PII)
Rules to apply:
- Remove if no use in 90 days unless exception is justified
- Downgrade write->read when no writes in 90 days
- No direct grants to users; use roles only

Deliverables:

Proposed removals/downgrades/time-bound access
SQL or pseudo-commands to implement
Attestation email text for managers/data owners
Evidence list: what you will store and where

Menu

Access Review Processes

Table of Contents