What is Data Architecture Strategy (for a Data Architect)?
Data Architecture Strategy defines how your organization collects, stores, shares, governs, and activates data to achieve measurable business outcomes. As a Data Architect, you convert goals (e.g., faster analytics, ML enablement, compliance) into an actionable target architecture, a standards-based platform roadmap, and a realistic build-vs-buy plan.
What this unlocks for you
- Clear target state and migration path that stakeholders can fund and follow.
- Coherent technology choices aligned to scale, cost, and compliance needs.
- Data product model so domains can deliver reliable, discoverable datasets.
- Quarter-by-quarter roadmap that reduces risk and shows incremental value.
Who this is for
- Data Architects defining or modernizing an enterprise data platform.
- Senior Data/Analytics Engineers growing into architecture leadership.
- Platform Leads responsible for standardizing tooling and practices.
Prerequisites
- Comfort with data warehousing, batch/stream processing, and data modeling.
- Basic understanding of cloud services (object storage, compute, networking).
- Familiarity with CI/CD, IaC concepts, and observability basics.
Outcomes
- Define a target data architecture tied to business domains and outcomes.
- Evaluate build vs buy with a transparent scoring model.
- Author a platform roadmap with standards and adoption milestones.
- Create data product contracts with SLOs for quality and freshness.
- Plan for scale and cost with simple capacity and TCO models.
Learning path
1) Clarify business outcomes and domains
Identify key value streams, target use cases, and domain boundaries. Define success metrics.
2) Draft your target architecture
Map ingestion, storage, processing, serving, governance, and observability. Note key SLOs.
3) Create standards and selection criteria
Document conventions (naming, schemas, data contracts), SLAs/SLOs, and tech selection rules.
4) Evaluate build vs buy
Score options with weighted criteria: time-to-value, TCO, scalability, operability, lock-in, compliance.
5) Plan roadmap and costs
Quarterly increments with clear deliverables, pilots, adoption gates, and cost/scaling assumptions.
6) Align stakeholders
Review trade-offs, publish principles, agree on metrics and a change process.
Worked examples
Example 1 — Target architecture outline
# Domains: Sales, Marketing, Finance
# Core capabilities
- Ingestion: batch (files), streaming (events)
- Storage: object storage (raw), warehouse (curated), feature store (ML)
- Processing: orchestration, streaming jobs, dbt/ELT
- Serving: BI semantic layer, APIs, reverse ETL
- Governance: catalog, lineage, PII tagging, RBAC, data contracts
- Observability: data quality checks, freshness SLOs, cost dashboards
# ASCII view
[Sources] --(ingest)--> [Raw/Obj Store] --(ELT)--> [Warehouse]
| \
+--(events)--> [Stream] --(proc)--> [Feature Store]
[Warehouse] --(BI/SQL)--> [Dashboards]
[Warehouse] --(Reverse ETL)--> [Ops Apps]
[Contracts+Catalog] span across all
Mini task: list non-functional requirements (NFRs)
Write down target SLOs: availability, latency, freshness, data quality, and recovery time. Use numeric targets.
Example 2 — Build vs Buy scoring
{
"criteria_weights": {"time_to_value": 0.30, "TCO": 0.25, "scalability": 0.20, "operability": 0.15, "lock_in": 0.10},
"scores": {
"build": {"time_to_value": 2, "TCO": 3, "scalability": 4, "operability": 3, "lock_in": 5},
"buy": {"time_to_value": 5, "TCO": 4, "scalability": 4, "operability": 4, "lock_in": 2}
}
}
# Weighted total (0-5 scale)
# build = 2*.30 + 3*.25 + 4*.20 + 3*.15 + 5*.10 = 0.60 + 0.75 + 0.80 + 0.45 + 0.50 = 3.10
# buy = 5*.30 + 4*.25 + 4*.20 + 4*.15 + 2*.10 = 1.50 + 1.00 + 0.80 + 0.60 + 0.20 = 4.10
# Decision: Buy (faster value, manageable TCO)
Mini task: adapt the weights
Re-weight criteria for your context. If compliance matters most, increase that weight and re-score.
Example 3 — Platform roadmap (4 quarters)
- Q1: Raw zone + catalog + identity/RBAC. Pilot ingestion for Sales.
- Q2: Curated warehouse + dbt standards + data contracts v1. First BI dashboards.
- Q3: Streaming pipeline + quality SLOs + cost FinOps dashboard.
- Q4: Feature store + reverse ETL + multi-domain onboarding playbook.
# Naming convention (example)
raw.<domain>_<source>_<entity>__y=<YYYY>/m=<MM>/d=<DD>
wh.<domain>.<model_type>_<entity> (e.g., wh.sales.dim_customer)
Example 4 — Technology selection criteria (YAML)
must_have:
- supports_sql_acid
- columnar_storage
- fine_grained_rbac
- region_residency_controls
nice_to_have:
- time_travel
- built_in_lineage
- serverless_auto_scaling
constraints:
- budget_tier: medium
- talent_pool: strong_sql_medium_python
evaluation:
- benchmark_tpch_sf100
- security_review
- operability_walkthrough
Example 5 — Cost and scalability planning
# Simple monthly TCO model (illustrative)
warehouse_compute_hours = 400 # hrs
warehouse_price_per_hour = 4.0 # $/hr
object_storage_tb = 50 # TB
object_storage_price = 20 # $/TB
stream_msgs_million = 800 # million msgs
stream_price_per_million = 0.40 # $/M
monthly_cost = (warehouse_compute_hours*warehouse_price_per_hour) \
+ (object_storage_tb*object_storage_price) \
+ (stream_msgs_million*stream_price_per_million)
# monthly_cost = 1600 + 1000 + 320 = $2,920
Plan thresholds: when cost > budget or p95 latency > SLO, trigger scaling or optimization playbooks.
Example 6 — Data product contract (JSON)
{
"product": "customer_ltv",
"version": "1.2.0",
"owner": "growth_analytics",
"schema": {
"customer_id": "string",
"ltv_usd": "number",
"as_of_date": "date"
},
"SLOs": {
"freshness_minutes": 60,
"availability_percent": 99.5,
"accuracy_percent": 98
},
"quality_checks": ["not_null(customer_id)", "non_negative(ltv_usd)"]
}
Mini task: define one more product
Create a contract for a "marketing_attribution" product: list fields, SLOs, and 2 quality checks.
Drills and exercises
- Write three business outcomes and a measurable KPI for each.
- Sketch your target architecture and mark cross-cutting concerns.
- Define five tech selection must-haves and three nice-to-haves.
- Create a decision matrix for one build-vs-buy choice.
- Draft a 2-quarter roadmap with clear adoption gates.
- Write a data product contract with 2 SLOs and 2 quality checks.
- Estimate monthly TCO with at least three cost drivers.
Common mistakes and debugging tips
Jumping to tools before outcomes
Tip: Start with domains and use cases. Write success metrics first; only then shortlist tools.
Ignoring operating model
Tip: Define who owns data products, on-call, and access controls. Add these to your standards.
Underestimating cost at scale
Tip: Model 3x and 10x data volumes. Add budget alarms and autoscaling policies.
Vague contracts
Tip: Make schemas explicit, versioned, and tied to SLOs. Add breaking change rules.
One-time roadmap
Tip: Review quarterly. Use metrics (adoption, incident rate, cost) to adjust priorities.
Mini project: Design a platform strategy for a subscription app
Goal: Enable churn prediction and revenue analytics within 2 quarters.
Deliverables
- Target architecture diagram and NFRs.
- Tech selection criteria + build-vs-buy decision for streaming and warehouse.
- Two data product contracts: subscriptions, churn_features.
- Q1–Q2 roadmap with adoption gates and TCO estimate.
Acceptance checklist
- Each product has schema, ownership, and SLOs.
- Roadmap includes at least one pilot and a rollback plan.
- TCO modeled at current and 3x scale.
- Risks and mitigations listed (top 5).
Next steps
- Pick one domain and draft a minimal target architecture with SLOs.
- Write your first data product contract and add two quality checks.
- Estimate a 2-quarter roadmap and TCO at 1x and 3x scale.
- Then, take the Skill Exam below to validate your understanding.