Menu

Topic 4 of 8

Architecture Diagrams And RFCs

Learn Architecture Diagrams And RFCs for free with explanations, exercises, and a quick test (for Data Architect).

Published: January 18, 2026 | Updated: January 18, 2026

Why this matters

As a Data Architect, you turn business and data needs into clear designs others can build safely and efficiently. Two core tools help you communicate and drive decisions: architecture diagrams and RFCs (Request for Comments). Diagrams give a shared visual model of systems and data flows. RFCs document context, options, trade-offs, and decisions. Together they reduce misunderstandings, speed reviews, and create a durable record.

  • Real tasks you will do: map ingestion pipelines, document trust boundaries for PII, present options for warehouse vs. lakehouse, align teams on migration plans, and justify tool choices with trade-offs.
  • Outcome: faster approvals, fewer reworks, and predictable delivery.

Concept explained simply

Diagrams show what exists and how it connects. RFCs explain why choices were made.

  • Diagrams: pictures of components, data flows, and boundaries to align engineers, analysts, and stakeholders.
  • RFCs: short, structured documents proposing a change, comparing options, and recording a decision and its consequences.

Mental model

Think of your work as a map and a travel note:

  • The map (diagram) helps everyone navigate.
  • The travel note (RFC) explains why you chose a route, what you traded off, and what to watch out for.

Core artifacts you will create

1) Architecture diagrams
  • C4 model levels (use what helps): System Context, Container, Component.
  • Data flow diagrams: sources, transformations, sinks; directions and protocols.
  • Lineage views: how a dataset is derived; include trust zones and PII tags.
  • Supporting: sequence diagrams for interactions, ERDs for data structures.
2) RFCs (Request for Comments)
  • Purpose: propose or change an architecture with clear trade-offs.
  • Sections: Context, Problem, Goals/Non-goals, Constraints, Options, Decision, Consequences, Risks, Rollout, Open questions.
  • Lifecycle: Draft → Review → Decision → Implement → Record learnings.

Standards and conventions

  • Naming: consistent product and dataset names; avoid ambiguous acronyms.
  • Legend: shapes, colors, and line styles explained in a small legend box.
  • Boundaries: draw trust zones (Public, Corp, Restricted) and note PII/sensitive tags.
  • Arrows: show direction and protocol (e.g., JDBC, CDC, HTTPS). Label frequency and latency (e.g., hourly, sub-second).
  • Versioning: store source diagrams and RFCs in version control; put a version and date on the document.
  • Single source of truth: one canonical diagram per scope; link variants from it (scope-specific views).
  • Accessibility: simple shapes, limited color palette, descriptive text; avoid tiny fonts.

Worked examples

Example 1: Batch warehouse ingestion

Show design

Goal: Nightly ingest from App DB to Warehouse.

  • Containers: App DB (Postgres), Orchestrator (Airflow), Object Storage (S3), Compute (Spark), Warehouse (BigQuery/Redshift), BI Tool.
  • Flow: App DB → Extract to S3 (CSV/Parquet) → Spark transform → Load to Warehouse → BI dashboards refresh.
  • Labels: Frequency: nightly; SLT: 6 AM; Protocols: JDBC, S3, Warehouse API.
  • Trust zones: App DB and S3 in Corp zone, Warehouse in Restricted zone; PII masked in transform step.

Key trade-offs: Simple and cost-effective vs. not real-time.

Example 2: Real-time CDC with schema governance

Show design

Goal: Sub-second ingestion for real-time metrics.

  • Containers: Source DB (MySQL), CDC (Debezium), Kafka, Schema Registry, Stream Processor (Flink), Lakehouse (Delta/Iceberg), Serving Store (Elasticsearch), Metrics Service.
  • Flow: MySQL binlog → Debezium → Kafka (with Schema Registry) → Flink transforms → Lakehouse and Serving Store.
  • Labels: Latency: < 2s; Data contracts enforced via schemas; PII hashing in Flink.
  • Boundaries: Internet ingress isolated; Restricted zone for PII.

Trade-offs: Low latency and fresh data vs. higher ops complexity.

Example 3: Governance and PII controls

Show design

Goal: Ensure access control and masking for sensitive data.

  • Add a Data Catalog + Policy Engine (e.g., tags, row/column-level policies).
  • Diagram layers: ingest, storage, processing, access; overlay tags: PII, PCI.
  • Data lineage: Raw → Clean → Curated; policy enforcement on Curated only.

Trade-offs: Strong compliance vs. added complexity in data access paths.

Example 4: RFC excerpt—Warehouse vs. Lakehouse

Show RFC snippet

Context: Analytics team needs scalable storage and open file formats. Current warehouse has high cost at peak.

  • Goals: lower cost, maintain SQL usability, support ML-ready files.
  • Options: A) Stay on warehouse; B) Lakehouse with open table format; C) Hybrid (warehouse for BI, lakehouse for ML).
  • Evaluation: Cost, performance, governance, complexity, migration risk.
  • Decision: C) Hybrid for 12 months, reevaluate after adoption metrics.
  • Consequences: Two systems to operate; lower storage cost for ML; keep BI speed.
  • Risks: Skill gaps; Mitigation: training and phased rollout.

Exercises

Do these in order. They mirror the graded exercises below. Use the checklist to self-review.

Exercise 1: Draw a batch pipeline container and data flow diagram

Scenario: Move data nightly from an app database to a warehouse with transforms.

  • Include: source, extract, staging, transform, load, access.
  • Add labels: frequency, SLT, protocols, trust boundaries, PII handling.
  • Deliverables: A one-page diagram and a 5-bullet rationale.

Exercise 2: Write a short RFC comparing two ingestion options

Scenario: Choose Kafka vs. Managed streaming service for CDC.

  • Fill sections: Context, Goals/Non-goals, Constraints, Options (A/B), Decision, Consequences, Risks, Rollout, Open questions.
  • Keep it under 2 pages; use objective trade-offs (cost, latency, ops, vendor lock-in).

Exercise checklist

  • [ ] Diagram has a legend and labeled arrows.
  • [ ] Trust boundaries and PII are clearly marked.
  • [ ] Frequency/latency labels match requirements.
  • [ ] RFC states goals and non-goals explicitly.
  • [ ] At least two options with trade-offs are compared with evidence.
  • [ ] Decision and consequences are clear and testable.
  • [ ] Risks have mitigations and a phased rollout.

Common mistakes and self-check

  • Too much detail: Diagram shows every table/topic. Self-check: Can a new engineer grasp it in 60 seconds? If not, simplify or layer.
  • No legend: Readers guess meanings. Add a small legend for shapes/colors/arrows.
  • Missing boundaries: Security reviewers need trust zones. Draw them and tag sensitive data.
  • Unstated non-goals: RFCs drift in scope. Add Non-goals to keep focus.
  • Option bias: Only one option presented. Always include at least two and compare against the same criteria.
  • No consequences: Decisions seem costless. List trade-offs so stakeholders accept impact.
  • Stale artifacts: Diagrams diverge from reality. Put version/date and update when key changes happen.

Practical projects

  • Project 1: Batch analytics stack. Create diagrams and an RFC to move nightly data from OLTP to a cloud warehouse with masking. Include a cost vs. latency trade-off.
  • Project 2: Real-time metrics. Propose a CDC pipeline using streaming; include schema governance and a phased rollout.
  • Project 3: Data governance overlay. Add access policies and lineage to an existing diagram; write an RFC for policy rollout with test plan.

Learning path

  • Step 1: Learn diagram layers (context, container, data flow). Practice on a small system.
  • Step 2: Add security and governance overlays (trust zones, PII tags).
  • Step 3: Write short RFCs (1–2 pages). Start with a real change request.
  • Step 4: Facilitate reviews. Invite feedback, capture decisions and open questions.
  • Step 5: Version and maintain. Update artifacts as systems evolve.

Who this is for

  • Data Architects and Senior Data Engineers who need to communicate designs.
  • Analytics Engineers shaping data models and pipelines.
  • Tech Leads coordinating across platform, data, and product teams.

Prerequisites

  • Basic understanding of data platforms (OLTP vs. OLAP, batch vs. streaming).
  • Familiarity with one cloud or data stack.
  • Comfort writing concise technical documents.

Next steps

  • Complete the exercises and then take the quick test below.
  • Progress note: The quick test is available to everyone; only logged-in users get saved progress.
  • After passing, pick a Practical project and get a peer review on your RFC.

Mini challenge

In one page, redesign a current pipeline diagram to show trust boundaries and PII flows. Add a 5-bullet RFC addendum listing risks, mitigations, and a 2-week rollout plan. Keep it crisp and decision-ready.

Practice Exercises

2 exercises to complete

Instructions

Scenario: Nightly ingest from an app database to a warehouse with transformations.

  • Include components: source DB, extractor, staging storage, transformer, warehouse, BI.
  • Add arrows with frequency, latency target, and protocols.
  • Mark trust boundaries (e.g., Corp, Restricted) and PII handling (masking/hash).
  • Create a short rationale (5 bullets) explaining key choices.
Expected Output
A one-page diagram with legend and labels, plus a 5-bullet rationale covering frequency, SLT, PII handling, and key trade-offs.

Architecture Diagrams And RFCs — Quick Test

Test your knowledge with 7 questions. Pass with 70% or higher.

7 questions70% to pass

Have questions about Architecture Diagrams And RFCs?

AI Assistant

Ask questions about this tool