Menu

Topic 8 of 8

Security Reviews And Threat Modeling

Learn Security Reviews And Threat Modeling for free with explanations, exercises, and a quick test (for Data Architect).

Published: January 18, 2026 | Updated: January 18, 2026

Why this matters

As a Data Architect, your designs move sensitive data across ingestion, storage, processing, and analytics. A single missed threat can lead to data leaks, downtime, or compliance fines. Security reviews and threat modeling help you find risks early and bake in practical mitigations: encryption, access control, isolation, and monitoring.

  • Real tasks you will face: approving a new PII ingestion pipeline, connecting a BI tool to a warehouse, enabling cross-account data sharing, onboarding a third-party connector, or handling schema evolution in streaming.
  • Threat modeling reduces rework, clarifies responsibilities, and improves auditability.

Who this is for

  • Data Architects and Platform Engineers designing or reviewing data pipelines, lakes, warehouses, and streaming systems.
  • Tech leads who need a repeatable, lightweight review process.

Prerequisites

  • Basic understanding of data platform components (ingestion, queue/stream, storage, compute, warehouse, BI).
  • Familiarity with authentication/authorization and encryption concepts.

Concept explained simply

Threat modeling is a structured way to ask: what can go wrong, what are we doing about it, and is that enough? A security review is the meeting and documentation ritual that turns those answers into decisions and backlog items.

Mental model

Use a map-and-attack mindset:

  • Map the system: draw data flows, assets, and trust boundaries.
  • Attack the map: enumerate threats with frameworks like STRIDE (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege) and LINDDUN for privacy (Linkability, Identifiability, Non-repudiation, Detectability, Information disclosure, Unawareness, Non-compliance).
  • Decide: rate risk (likelihood Ă— impact), choose mitigations, document owners and timelines.
Quick glossary
  • Trust boundary: where different levels of trust meet (e.g., internet to VPC, app to data lake).
  • Asset: something valuable (PII dataset, service account, encryption keys).
  • Control: technical or process safeguard (TLS, IAM, masking, monitoring).

A simple step-by-step process

  1. Scope: define the feature, data categories (PII, PCI, health), and success criteria.
  2. Map: draw a data flow diagram (DFD). Mark trust boundaries and assets.
  3. Enumerate threats: run STRIDE for security and LINDDUN for privacy across each data flow and store.
  4. Rate risks: score likelihood and impact (e.g., Low/Med/High). Prioritize High-High first.
  5. Mitigate: select controls (encryption, IAM, network isolation, tokenization, data minimization, logging).
  6. Decide: record accepted, mitigated, or deferred risks with owners and dates.
  7. Validate: tabletop run-through; add tests/monitoring to catch regressions.
Lightweight scoring rubric (use consistently)
  • Likelihood: Low (rare skill/access), Medium (possible via misconfig), High (common misstep or public exposure).
  • Impact: Low (non-sensitive, limited blast radius), Medium (internal data/partial outage), High (PII/financial/availability for many users).

Worked examples

Example 1: PII ingestion to data lake

Context: Batch ingestion from a public API -> ingestion service -> object storage (data lake) -> ETL -> warehouse.

  • Threats (STRIDE):
    • Spoofing: fake API source tokens.
    • Tampering: data altered in transit to storage.
    • Info disclosure: PII exposed in logs or non-prod copies.
    • DoS: ingestion spikes exhaust compute/quota.
    • EoP: over-privileged service role writes to all buckets.
  • Privacy (LINDDUN):
    • Identifiability/linkability via persistent identifiers across datasets.
    • Non-compliance: retention longer than policy; missing consent.
  • Mitigations:
    • Mutual TLS, signed requests, narrow IAM roles, bucket policies with encryption at rest (KMS).
    • Mask PII in logs; separate prod/non-prod datasets with scrubbed fixtures.
    • Lifecycle rules and retention policies; tokenization for BI.
Example 2: Kafka streaming with consumer groups

Context: Producers -> Kafka -> stream processing -> feature store.

  • Threats:
    • Tampering: unauthenticated producer pushes poisoned events.
    • DoS: topic flood; consumer lag grows.
    • Info disclosure: plaintext traffic or open security groups.
    • EoP: consumer service account reads all topics.
  • Mitigations:
    • SASL authentication, ACLs per topic, network rules.
    • Quotas and retention; autoscaling consumers; DLQ for bad events.
    • TLS in transit; secret management; least-privilege roles.
Example 3: BI tool connected to warehouse

Context: Analysts use a BI tool with service accounts. Data includes customer tables and derived aggregates.

  • Threats:
    • Info disclosure: direct table access to raw PII.
    • Repudiation: no audit of who queried what.
    • Non-compliance: exporting full tables to spreadsheets.
  • Mitigations:
    • Row/column-level security; views over raw tables; data masking.
    • Query audit logs with alerts; just-in-time privileged access.
    • Disable exports for sensitive datasets; aggregate-only sharing.

How to run a security review meeting

  • People: author (design owner), security champion, data platform rep, privacy/compliance rep, observer.
  • Inputs: one-page context, DFD with trust boundaries, data classification, draft controls.
  • Agenda (30–45 min):
    • 5 min scope and assumptions.
    • 10 min walk the DFD.
    • 15 min threat enumeration (STRIDE + LINDDUN).
    • 10 min decisions: mitigations, owners, timelines.
    • 5 min validation plan and follow-ups.

Templates you can copy

One-page review template
Context: feature/pipeline summary
Data: categories (PII/PCI/PHI), sources, destinations
Diagram: DFD with trust boundaries
Assumptions: key dependencies and out-of-scope
Threats: top 5 with rationale
Controls: selected mitigations by component
Decisions: accept/mitigate/defer with owners and dates
Validation: tests, monitoring, tabletop date
DFD quick notation
[External] -> (Service) -> [Queue/Topic] -> (ETL/Job) -> [Storage]
Trust boundary: =======
Annotate assets: PII, keys, secrets
Annotate controls: TLS, IAM, KMS, RLS/CLS, tokenization

Exercises

Do these before the quick test. Keep your notes; you will reuse them.

  1. Exercise 1 (mirrors ex1): Draw a DFD for an ingestion-to-warehouse flow and list threats using STRIDE/LINDDUN. Prioritize and pick top 3 mitigations.
  2. Exercise 2 (mirrors ex2): Given a set of threats, score risks, propose controls, and write review decisions with owners and timelines.
Self-check checklist
  • DFD shows all external actors, data stores, and trust boundaries.
  • Each flow has at least one STRIDE and one LINDDUN consideration.
  • Risks are prioritized with a clear rationale.
  • Mitigations map to specific components and are testable.
  • Decisions include owners and due dates.

Common mistakes and how to self-check

  • Missing non-production risk: ensure scrubbed data in dev/test; forbid production PII in sandboxes.
  • Over-privileged roles: review IAM policies for least privilege; rotate keys.
  • Logging leaks: verify logs and metrics do not include sensitive values.
  • Unclear ownership: every mitigation has an owner and a date.
  • No validation: add tests (e.g., automated checks for encryption at rest, RLS/CLS policies) and monitoring alerts.

Practical projects

  • Secure Data Lake Starter: baseline bucket policies, encryption, access patterns, and lifecycle rules applied via IaC.
  • Streaming Guardrails: Kafka ACLs, quotas, TLS, consumer lag alerting, and DLQ pattern.
  • Warehouse Safety Kit: implement column masking, row-level security, and audit logging with example roles.

Learning path

  • Start: Threat modeling basics (STRIDE, LINDDUN) and simple DFDs.
  • Next: Cloud IAM and network isolation patterns for data platforms.
  • Then: Data privacy techniques (masking, tokenization, minimization, retention).
  • Advance: Automating security checks in CI and platform policies.

Mini challenge

You must enable data sharing with a partner for weekly aggregates. Write two options: (1) share aggregate tables only, (2) share a view with row/column filters. For each, list 3 threats and 3 mitigations, then recommend one option with rationale.

Quick test and progress

The quick test below is available to everyone. If you log in, your progress and scores will be saved so you can continue later.

Practice Exercises

2 exercises to complete

Instructions

Draw a simple DFD for this scenario: Public API -> Ingestion Service -> Queue -> ETL Job -> Data Lake -> Warehouse -> BI Tool. Mark trust boundaries (internet to VPC; VPC to managed services; prod to non-prod). Identify assets (PII dataset, service accounts, keys).

  1. For each flow and store, list at least one threat using STRIDE.
  2. List at least three privacy threats using LINDDUN.
  3. Prioritize the top five risks (likelihood Ă— impact) and propose one mitigation each.
Expected Output
A diagram or structured list of components with at least 8 threats (security + privacy), a prioritized top-5 risk list, and five mapped mitigations tied to components.

Security Reviews And Threat Modeling — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Security Reviews And Threat Modeling?

AI Assistant

Ask questions about this tool