Topic Not Found

Everyone can take the exercises and test for free. If you log in, your progress is saved automatically.

Who this is for

This lesson is for Platform Engineers and Security-minded Backend Engineers who need reliable audit trails and a repeatable access review process across services, clouds, and internal tools.

Prerequisites

Basic understanding of identity and access management (users, roles, groups, service accounts)
Familiarity with application logging and centralized log collection
Comfort with JSON and structured logs

Why this matters

Real platform tasks where this is critical:

Investigating production incidents: knowing exactly who changed what and when
Proving compliance: SOC 2/ISO 27001 often require audit trails and periodic access recertification
Containing breaches: rapid detection of privilege escalations, break-glass use, and mass data exports
Least-privilege at scale: removing unused access safely and consistently

Concept explained simply

Audit logging: a trustworthy CCTV for your systems. It records who did what, to which resource, when, where, and whether it succeeded.

Access reviews: regular spring-cleaning of permissions so only the right people and services keep the access they truly need.

Mental model

Events are receipts: each immutable, timestamped, and signed record is a receipt for a sensitive action.
Reviews are recurring checkups: schedule them, notify owners, remove unused access, capture evidence.

Core design: audit events and reviews

Event taxonomy (recommended fields)

timestamp (UTC, ISO 8601)
actor: {type: user|service, id, display, org_id, auth_method, mfa: true|false}
action: verb in past tense (granted_role, rotated_secret, exported_data)
resource: {type, id, name, tenant_id}
outcome: success|failure and error_code if any
request: {id, ip, user_agent, location, trace_id/correlation_id}
context: reason, ticket_id/change_id, previous_value/new_value for config changes
integrity: {sequence, hash, prev_hash} for tamper-evident chaining
retention_hint: normal|extended

Keep payloads minimal. For sensitive values, store references or one-way hashes instead of raw data.

Coverage checklist

Identity lifecycle: user/service create, update, disable, delete
Privilege changes: role/grant/revoke, group membership, policy edits
Auth events: login success/failure, MFA status, token mint/refresh
Secrets: create/rotate/revoke/read (at least log access intent)
Production changes: deploys, config changes, data export/import
High-risk flows: break-glass, just-in-time elevation, impersonation

Storage and integrity

Append-only: ship to centralized store; consider write-once (object lock) for compliance
Tamper-evident: hash chain per stream or per tenant; verify regularly
Retention policy: e.g., 400–730 days online, then archive; document exceptions
Access to logs: separation of duties; reader vs admin distinct; monitor access to the logs themselves

Access reviews (recertification)

Scope: apps, roles, groups, privileged systems, production data access
Cadence: monthly for high-risk, quarterly for others; auto-schedule
Ownership: each resource has a reviewer (system owner); fallback to security/platform
Evidence: decisions with reason (keep/remove), date, reviewer identity, linked ticket/change
Signals: last-used data to suggest revocations; flag SoD (segregation of duties) violations
Revocation path: fast and reversible (grace window), with alerts

Worked examples

Example 1 — Designing an event for a role grant

Scenario: User alice@corp grants admin role to bob@corp in service "app-cms".

{
  "timestamp": "2025-11-03T10:12:34Z",
  "actor": {"type": "user", "id": "alice@corp", "display": "Alice", "auth_method": "sso", "mfa": true},
  "action": "granted_role",
  "resource": {"type": "role", "id": "admin", "name": "Administrator", "tenant_id": "marketing"},
  "subject": {"type": "user", "id": "bob@corp"},
  "outcome": "success",
  "request": {"id": "req-9d2f", "ip": "203.0.113.5", "trace_id": "tr-1ab2"},
  "context": {"reason": "oncall coverage", "ticket_id": "CHG-1456"},
  "integrity": {"sequence": 1042, "hash": "h-xyz", "prev_hash": "h-xyw"}
}

Note the subject field to distinguish who received the role from the actor who granted it.

Example 2 — Tamper-evident chain

Create a per-tenant sequence and compute hash = HMAC(prev_hash + event_body). Store sequence, hash, and prev_hash in each event. A verifier replays the chain and alerts on gaps or hash mismatch.

Example 3 — Access review with usage signals

For group "prod-db-readers":

Last used: query audit logs for SELECT events by each member in past 90 days
Reviewer sees suggested removals for members with zero usage
Reviewer approves removals; system executes revocations and logs decisions with evidence

Step-by-step implementation

Define event taxonomy and risk levels; agree on naming and required fields.
Instrument producers: build a small library to emit structured events with correlation and integrity fields.
Centralize: ship to a log platform or SIEM; index key fields (actor.id, action, resource.id, tenant_id).
Secure storage: enable append-only or object lock; restrict write and admin paths; log access to logs.
Dashboards & alerts: create panels for high-risk actions and authentication anomalies.
Access review workflow: define owners, schedule, decision UI (or CSV), automated revocations, and evidence archive.
Runbook & drills: simulate a privilege escalation and verify you can reconstruct the timeline from logs.

Exercises

Complete these hands-on tasks. They are also listed in the Exercises section below. If you are logged in, your progress will be saved.

Exercise 1 — Design an audit event schema (matches ex1)

Create a minimal JSON schema and one example for a privileged config change. Include timestamp, actor, action, resource, outcome, request.id, and integrity fields.

Exercise 2 — Plan a quarterly access review (matches ex2)

Define scope, reviewers, usage signals, decisions, revocation steps, evidence storage, and metrics. Produce a 7-step checklist.

Self-check checklist

Event includes correlation ID and UTC timestamp
Clear separation between actor and subject
Tamper-evident integrity fields present
Access review has owners, cadence, usage signals, and evidence plan
Defined fast revocation with audit trails

Common mistakes and how to self-check

Too much or too little logging: log sensitive actions with context; avoid dumping entire payloads with PII.
No correlation IDs: add request.id or trace_id to connect multi-service workflows.
Mutable logs: without append-only or hash chains, evidence can be challenged. Make tampering detectable.
Inconsistent timestamps: always UTC ISO 8601; ensure time sync.
No tenant or org scoping: include tenant_id to separate customers/environments.
Skipping service accounts: review machine identities as rigorously as humans.
Reviews without revocation: ensure a clear, fast, reversible removal path.

Practical projects

Build an audit pipeline: producer library + centralized index + dashboard for high-risk actions
Implement a tamper-evident verifier that checks hash chains nightly and logs results
Access review run: export membership of one critical group, join with 90-day usage, run review, and document outcomes
Break-glass flow: create a time-bound elevation with auto-expiry and mandatory reason, fully logged

Learning path

Start: Event taxonomy and logging standards
Next: Centralized collection and integrity
Then: Dashboards and alerts for high-risk actions
Finally: Access review workflow and automation

Next steps

Finish the exercises below
Take the quick test to confirm understanding
Pick one practical project and implement it this week

Mini challenge

Within 48 hours, instrument one high-risk action in any service with the full event schema, ship it to your central logs, and create a simple saved search that alerts on failures.

Menu

Audit Logging And Access Reviews

Table of Contents