How to learn Defining Target Data Architecture for Data Architecture Strategy in Data Architect for free

Who this is for

Current and aspiring Data Architects, Senior Data Engineers, and Platform Leads who need to define a clear, pragmatic target data architecture that aligns with business outcomes.

Why this matters

Defining a target data architecture sets the north star for data platforms. It guides investment, prevents tool sprawl, and ensures data products meet reliability, cost, and compliance goals. Real tasks you will face:

Choosing between batch, micro-batch, and streaming for critical use cases.
Designing conceptual and logical diagrams for stakeholders, engineers, and security.
Encoding principles, SLAs/SLIs, and governance into the platform.
Sequencing a migration roadmap from current to target state.

Concept explained simply

Target data architecture is a technology-agnostic blueprint of how data will flow, be governed, and be served to users. You define capabilities, quality attributes, and interaction patterns first—then map technologies.

Mental model: Flow + Planes + Guardrails

Flow: Ingest → Store → Process → Serve → Observe. Optimize this loop for latency, cost, and reliability.
Planes: Data plane (movement/storage), Control plane (orchestration/catalog/policies), Governance plane (security, privacy, quality).
Guardrails: Principles and NFRs (non-functional requirements) that every solution must meet.

Common NFRs to fix early

Freshness/latency targets (e.g., T+5 min for ops dashboards, T+24h for finance).
Reliability (e.g., 99.9% successful runs/month).
Scalability (peak volumes, concurrency).
Cost per query/pipeline, storage tiers.
Compliance (PII handling, residency, retention, lineage).
Security (RBAC/ABAC, encryption in transit/at rest, key management).

A practical process to define the target state

Capture business outcomes and use cases: decisions to support, SLAs, data domains, regulatory constraints.
Set principles: e.g., "data as a product", "open formats first", "privacy by design", "automate everything".
Define capability map: ingestion, storage, processing, serving, metadata, quality, governance, observability, CI/CD, cost management.
Sketch conceptual architecture: components and planes, no vendor logos.
Design logical architecture: flows, interfaces, policies, patterns (CDC, event streams, batch).
Map patterns to use cases: per use case choose batch/stream, modeling approach, and serving method.
Evaluate technology options: fit/gap against NFRs, TCO, team skills.
Plan migration: interim states, dependencies, risk mitigation, KPIs.

Deliverables checklist

[ ] Principles (1 page)
[ ] Capability map (one diagram)
[ ] Conceptual and logical diagrams
[ ] Pattern catalog with selection criteria
[ ] Decision log and risk register
[ ] Roadmap with milestones and KPIs

Worked examples

1) E-commerce near-real-time inventory and customer 360

Drivers: low stock alerts in under 2 minutes; marketing audience building within 1 hour; PII protection.

Ingest: CDC from OLTP for orders, event stream for clicks.
Store: lakehouse (open formats + ACID tables) for raw/curated; warm warehouse-style serving for BI.
Process: streaming for inventory; micro-batch for 360 enrichment.
Serve: semantic layer for BI; feature extracts for campaigns.
Governance: data catalog, tag-based access, PII masking.

Why this works

Streaming satisfies sub-2-minute inventory latency; micro-batch keeps costs balanced for 360; open formats prevent lock-in; masking handles PII.

2) Finance with data residency and auditability

Drivers: GDPR, right to be forgotten, regional processing, strong lineage.

Ingest: secure transfer with schema validation.
Store: regional data zones; immutable raw + curated with retention policies.
Process: audit-friendly pipelines with change history.
Serve: standardized marts; restricted PII views.
Governance: policy engine, lineage, access reviews, deletion workflows.

Why this works

Regional storage and policy enforcement satisfy residency; immutable logs and lineage support audits; controlled views protect PII.

3) IoT telemetry + ML features

Drivers: millions of events/minute, second-level anomaly detection, feature reuse.

Ingest: event streaming for device telemetry.
Store: time-series friendly storage + lakehouse for long-term.
Process: streaming aggregations; micro-batch for feature computation.
Serve: low-latency API endpoints; feature store for ML reuse.
Observability: lag, throughput, data quality checks.

Why this works

Event streams meet high frequency; feature store decouples ML from pipelines; observability ensures SLA adherence.

How to choose patterns quickly

If freshness target ≤ 5 minutes → streaming/event-driven; otherwise micro-batch or batch.
Dimensional modeling for BI marts; medallion/layered for lakehouse; data vault for auditability and change tracking.
Serving: BI dashboards via SQL/semantic layer; ML via feature store; apps via APIs.
Cost: prefer micro-batch when minute-level freshness is not required; tier cold data to cheaper storage.

Self-check questions

What is the strictest freshness requirement? Design to that.
Where is PII? How is it protected end-to-end?
What happens on failure? Retries, idempotency, alerting?
Can we trace lineage from report back to source?

Prerequisites

Comfort with data pipelines, SQL, and basic distributed systems concepts.
Understanding of data governance basics (privacy, access control, lineage).
Ability to read/create simple architecture diagrams.

Learning path

Gather business outcomes and NFRs; write crisp principles.
Draft capability map and conceptual diagram.
Define per-use-case patterns (batch/stream, modeling, serving).
Add logical flows, interfaces, governance controls.
Evaluate tech options against NFRs and team skills.
Create migration roadmap with interim states and KPIs.
Review with stakeholders; iterate and finalize decision log.

Exercises

Do these to lock in the skill. Write your answers; compare with the solutions provided.

Exercise 1 — Map NFRs to architecture choices

Scenario: A subscription business needs churn dashboards refreshed within 10 minutes, cost per daily active user report must stay low, and PII must be masked for analysts.

Tasks: propose principles, define freshness/reliability targets, choose ingestion/processing/serving patterns, and explain PII handling.

Exercise 2 — Sketch a target conceptual and logical design

Scenario: A logistics firm wants real-time vehicle tracking (≤ 30 sec), hourly ETA predictions, and monthly finance reconciliations.

Tasks: list capabilities, draw a conceptual diagram (text is fine), outline logical flows (what triggers what), and specify observability signals (lag, failure, cost).

Submission checklist

[ ] Clear principles and NFRs
[ ] Capability map and target freshness per use case
[ ] Conceptual diagram components (no vendors)
[ ] Logical flows and governance controls
[ ] KPIs to track success

Common mistakes and how to self-check

Starting with tools: Self-check: can you explain the why (use cases/NFRs) without naming a product?
Overusing real-time: Self-check: which decisions truly need sub-minute latency?
Ignoring governance: Self-check: where is PII tagged, masked, and audited?
No interim states: Self-check: what is the smallest deployable step that delivers value safely?
Unmeasured success: Self-check: what KPIs will prove the architecture works (e.g., freshness achieved, cost per query)?

Practical projects

Retailer blueprint: pick two use cases (marketing 360 and inventory). Produce principles, capability map, conceptual/logical diagrams, and a 2-phase migration plan.
Latency lab: build a tiny pipeline with both micro-batch and stream paths; measure end-to-end freshness and cost footprint; recommend the cheaper path that meets targets.
Governance first: implement a tagging scheme for PII and a masking policy across raw/curated/serving layers; validate with a sample dataset.

Next steps

Deepen security and governance patterns (policy as code, lineage, retention).
Refine modeling choices (dimensional, data vault, medallion) per use case.
Collaborate with platform engineering to codify the target state as IaC.

Mini challenge

Pick one of the worked examples and write a two-step migration plan: Phase 1 (90 days) and Phase 2 (next 90 days). Include risks and KPIs.

Show an example answer

Phase 1: Implement streaming ingest for critical feed, curated table with masking, and a basic semantic layer for the top dashboard; KPIs: freshness ≤ 2 min, pipeline success ≥ 99.5%.

Phase 2: Expand domain coverage, add lineage and automated quality checks, optimize cost tiers; KPIs: coverage +50%, alert MTTR < 15 min, cost per query -30%.

Note: The quick test is available to everyone. Only logged-in users have their progress saved.

Menu

Defining Target Data Architecture

Table of Contents

Who this is for

Why this matters

Concept explained simply

Mental model: Flow + Planes + Guardrails

A practical process to define the target state

Worked examples

1) E-commerce near-real-time inventory and customer 360

2) Finance with data residency and auditability

3) IoT telemetry + ML features

How to choose patterns quickly

Prerequisites

Learning path

Exercises

Exercise 1 — Map NFRs to architecture choices

Exercise 2 — Sketch a target conceptual and logical design

Common mistakes and how to self-check

Practical projects

Next steps

Mini challenge

Practice Exercises

Map NFRs to architecture choices

Instructions

Expected Output

Sketch conceptual and logical design for logistics

Defining Target Data Architecture — Quick Test

Have questions about Defining Target Data Architecture?

AI Assistant