What a Data Architect does
Data Architects design how data flows, is stored, secured, and made useful across an organization. You define standards and patterns, make platform choices, and guide teams to implement reliable, scalable, and compliant data systems.
- Day-to-day: shape data models and integration patterns, review designs, set governance rules, guide build vs. buy decisions, and align stakeholders.
- Typical deliverables: architecture blueprints, conceptual/logical/physical models, data contracts, governance policies, lineage and catalog standards, performance and cost guidelines, and reference implementations.
Example daily schedule (expand)
- 9:00 Stand-up with data engineers and analysts
- 10:00 Review proposed schema changes and data contracts
- 11:00 Meet security to align on PII controls
- 13:00 Deep dive on performance/cost of a warehouse workload
- 15:00 Write/approve Architecture Decision Records (ADRs)
- 16:00 Mentoring and backlog triage
Who this is for
- Engineers or analysts who enjoy systems thinking, data modeling, and long-term platform strategy.
- People who like clarifying ambiguity, documenting decisions, and aligning diverse teams.
- Those who value reliability, data quality, security, and measurable business outcomes.
Prerequisites
- Comfort with databases (relational and/or cloud data warehouses) and SQL basics.
- Understanding of batch and streaming integration concepts.
- Ability to read/write technical documentation and communicate trade-offs clearly.
Nice-to-have (optional)
- Experience with ETL/ELT tools and orchestration
- Exposure to data governance and security frameworks
- Basic cloud knowledge (storage, compute, IAM)
Hiring expectations by level
Junior / Associate Data Architect
- Supports modeling and documentation under guidance
- Implements reference patterns, learns governance and security basics
- Contributes to small components and POCs
Mid-level Data Architect
- Owns a domain’s data models and integration patterns
- Drives data contracts and quality/observability standards
- Partners with engineers to deliver scalable, cost-aware solutions
Senior / Lead Data Architect
- Sets cross-domain standards and long-term platform strategy
- Leads complex migrations (e.g., on-prem to cloud)
- Balances security, privacy, performance, and cost; mentors others
Salary and growth
- Junior/Associate: ~$80k–$120k
- Mid-level: ~$110k–$160k
- Senior/Lead/Principal: ~$150k–$220k+
- Manager/Head of Data Architecture: ~$170k–$250k+
- Contract/Freelance: ~$90–$160/hr
Varies by country/company; treat as rough ranges.
Where you can work
- Industries: finance, healthcare, retail, tech, manufacturing, government, SaaS
- Teams: data platform, analytics engineering, data governance, security, enterprise architecture
- Work modes: product companies, consultancies, agencies, and internal platform teams
Skill map
As a Data Architect, you’ll combine strategy, modeling, integration, governance, security, and delivery skills. The skills below match the Skills section of this page.
- Data Architecture Strategy: principles, roadmaps, and decision frameworks
- Conceptual and Logical Data Modeling: business entities and relationships
- Physical Data Modeling and Storage Design: schemas, indexing/partitioning
- Dimensional Modeling for Analytics: star/snowflake, facts, slowly changing dimensions
- Integration Architecture (ETL/ELT): batch vs. streaming, CDC, orchestration
- Data Governance and Stewardship: policies, roles, compliance, lifecycle
- Security and Privacy Architecture: PII/PHI controls, encryption, access models
- Data Quality and Observability: rules, SLAs/SLOs, monitoring and alerting
- Metadata and Lineage Architecture: catalogs, lineage, data contracts
- Performance and Scalability: workload management, cost optimization
- Architecture Delivery and Communication: ADRs, roadmaps, stakeholder alignment
Mini task: Model a domain
Sketch a conceptual model for a ride-sharing platform. Identify entities (Rider, Driver, Trip, Payment), key relationships (Rider—Trip, Driver—Trip), and 3–5 business rules (e.g., one Trip has one Rider but can have multiple status changes). Then propose 2–3 quality checks (e.g., trip_end_time >= trip_start_time).
Practical portfolio projects
- Data Platform Blueprint
Deliverables: current-state vs. target-state diagrams, principles, NFRs, platform choices, ADRs, and a 3–6 month roadmap. - Analytics Warehouse with Dimensional Modeling
Deliverables: star schema for a business domain (e.g., Orders), SCD handling, data contracts, and sample queries with performance notes. - Streaming Integration (CDC to Analytics)
Deliverables: CDC design, idempotent processing, schema evolution policy, replay strategy, and end-to-end lineage diagram. - Governance and Catalog Rollout
Deliverables: data classification scheme, ownership (stewards), glossary terms, catalog structure, and steward workflow. - Performance and Cost Optimization Plan
Deliverables: workload profiling, partitioning/clustering plan, storage tiering, and a before/after cost-performance comparison. - Quality and Observability Framework
Deliverables: tiered SLAs/SLOs, data tests, alert thresholds, escalation paths, and dashboard mockups.
Mini task: Write one ADR
Decision: ELT vs. ETL for the new warehouse. Context, options, decision, consequences (performance, cost, team skills), date, and owner.
Interview preparation checklist
- Explain conceptual vs. logical vs. physical models with a simple example.
- Design a star schema and justify grain, facts, and dimensions.
- Describe your approach to CDC, schema evolution, and idempotency.
- Discuss PII handling: classification, masking, key management, and auditability.
- Show how you measure data quality and define SLAs/SLOs.
- Walk through an ADR you wrote and the trade-offs involved.
- Estimate and optimize a workload for performance and cost.
- Describe lineage and catalog strategies for self-serve analytics.
- Communicate a migration plan with milestones and rollback steps.
Practice whiteboard prompt
Design a data platform for a subscription business. Include ingestion (batch + events), storage layers, serving patterns, governance, and how you enforce contracts. Call out at least three NFRs and how you’ll validate them.
Common mistakes (and how to avoid them)
- Big-bang designs with no delivery proof: favor iterative delivery with reference implementations.
- Over-modeling early: start conceptual, validate with real queries, then refine.
- Ignoring data quality and contracts: define tests and contracts before integrating consumers.
- One-size-fits-all tooling: select patterns per workload (OLTP vs. analytics vs. streaming).
- Neglecting security/privacy early: classify data and define controls up front.
- Unclear ownership: assign stewards and producers/consumers with clear responsibilities.
Mini task: Quality-first plan
Pick one domain and write 5 concrete data tests (nulls, ranges, referential integrity, duplicates, late-arriving data) and who triages failures.
Learning path
Recommended sequence that balances fundamentals and delivery:
- Conceptual and Logical Data Modeling
- Dimensional Modeling for Analytics
- Integration Architecture (ETL/ELT)
- Data Quality and Observability
- Metadata and Lineage Architecture
- Security and Privacy Architecture
- Performance and Scalability
- Data Governance and Stewardship
- Physical Data Modeling and Storage Design
- Data Architecture Strategy
- Architecture Delivery and Communication
4-week starter plan
- Week 1: Conceptual/logical modeling + mini project
- Week 2: Dimensional modeling + write 10 validation tests
- Week 3: Integration patterns + CDC design note
- Week 4: Security/privacy basics + one ADR and a small roadmap
Focus sprints
- Cost and performance sprint: profile, partition strategy, right-size compute, caching
- Governance sprint: catalog setup, ownership, classification, access policies
Next steps
- Pick one portfolio project and deliver a thin slice within two weeks.
- Document everything: models, ADRs, contracts, test plans, runbooks.
- Practice the interview checklist aloud and refine your artifacts.
Pick a skill to start — see the Skills section below.