Who this is for
Backend engineers, tech leads, and collaborators who need to capture and communicate technical decisions clearly and keep them discoverable over time.
Prerequisites
- Basic familiarity with your team's tech stack (services, databases, infrastructure).
- Ability to compare options using pros/cons and constraints.
- Comfort writing concise technical notes.
Why this matters
Real backend work involves trade-offs. You choose databases, protocols, release strategies, and failover approaches under deadlines. Without a lightweight record of why a choice was made, teams repeat debates, break compatibility, or reverse decisions after staff changes. Architecture Decision Records (ADRs) and decision logs:
- Speed onboarding: new teammates can see the "why" behind the current architecture.
- Reduce rework: prevents re-arguing settled issues.
- Enable audits and compliance: trace changes and approvals.
- Improve incident response: quickly recall why a risky trade-off was accepted.
Concept explained simply
An ADR is a short document that captures one significant technical decision, with its context, options, decision, and consequences. A decision log is a running list of smaller or operational decisions (date, decision, owner, brief rationale), often linked to deeper ADRs when needed.
Mental model
Think of your system as a tree of choices. ADRs are the labeled signposts at the big forks in the road; decision logs are the mileage notes along the path. Both keep future travelers from getting lost.
When to write an ADR vs only a decision log
- Write an ADR when the decision affects architecture, long-term cost, scalability, or developer workflow; or is hard to reverse.
- Use only a decision log entry when the decision is small, operational, easily reversed, or time-bound.
Lightweight ADR template you can copy
Title: <concise statement of the decision> Status: Proposed | Accepted | Deprecated | Superseded by #<ADR> Date: YYYY-MM-DD Context: <problem, constraints, drivers> Options Considered: <2–4 options with brief pros/cons> Decision: <the chosen option> Consequences: <positive, negative, follow-ups, risks> Deciders: <roles or names> References: <tickets, diagrams, docs>
Step-by-step: writing a solid ADR
Write 2–4 sentences: what hurts now, what constraints exist (SLA, time, budget, compliance), and success criteria.
Gather 2–4 viable options that meet constraints. Include a “do nothing” baseline.
For each option, list pros, cons, and risks in your context. Be specific (e.g., “adds 20–40ms inter-DC latency”).
State the chosen option and the decisive reasons tied to constraints and goals.
Call out follow-up tasks, migrations, ownership changes, and risks you accept.
Mark Proposed/Accepted and list deciders. Update to Deprecated/Superseded if things change.
Worked examples
Example 1: Service-to-service protocol (REST vs gRPC)
Title: Adopt gRPC for internal service-to-service calls
Status: Accepted
Date: 2026-01-15
Context: Internal calls between services add ~30ms per hop via REST+JSON. Need type safety, streaming, and lower latency. External APIs must remain REST.
Options Considered:
1) Keep REST+JSON
+ No changes to clients; easy debugging.
- Higher payload size; no streaming; brittle contracts.
2) gRPC over HTTP/2 (Chosen)
+ Smaller payloads; codegen; streaming; contract-first.
- Requires gateway for REST clients; new ops knowledge.
3) Message broker only
+ Async resilience; decoupling.
- Not suitable for request/response paths.
Decision: Adopt gRPC for internal calls; maintain REST at edge via gateway.
Consequences: Build gateway; update CI to generate stubs; train team; measure latency improvements; fallback path documented.
Deciders: Backend TL, Platform TL, SRE lead
References: Design doc #42; latency dashboard
Example 2: Data store for audit logs
Title: Use Append-Only Object Storage for Audit Logs
Status: Accepted
Date: 2026-01-12
Context: Must retain 7 years of write-once audit logs; queries are rare and batch-like. Cost and immutability are priorities.
Options:
1) Relational DB
+ Familiar; SQL queries.
- High storage cost; risk of accidental mutation.
2) Object storage + Parquet (Chosen)
+ Cheap at scale; append-only; good for batch queries.
- Higher query latency; need ETL.
3) Dedicated log DB
+ Fast queries.
- Vendor lock-in; cost.
Decision: Store logs as Parquet in object storage; query via periodic ETL to warehouse.
Consequences: Build ETL jobs; define schema evolution policy; add integrity checks; document restoration runbooks.
Deciders: Data Eng Lead, Backend TL
Example 3: Feature flag strategy
Title: Standardize on Server-Side Feature Flags
Status: Accepted
Date: 2026-01-10
Context: Inconsistent rollout methods (env vars, ifdefs) slow releases and complicate rollbacks.
Options:
1) Keep mixed approaches
- Hard to audit; no gradual rollout.
2) Client-side flags
+ Works for web;
- Leaks logic to clients; mobile releases are slow.
3) Server-side flags via a central service (Chosen)
+ Central control; gradual rollout; auditing.
- Requires shared SDKs and governance.
Decision: Use server-side flags with a central service and SDKs.
Consequences: Create flag naming conventions; add kill-switch policy; monitor flag debt; training session.
Deciders: Eng Manager, QA Lead, SRE
Decision log entries (smaller decisions)
Example decision log entries
2026-01-09 | Owner: SRE | Decision: Increase connection pool from 50 to 100 in payments API | Rationale: CPU 40%, pool saturation; mitigates 503s | Review date: 2026-02-09 2026-01-14 | Owner: Backend TL | Decision: Disable batch job retry on 429 | Rationale: Causing thundering herd; temporary until rate limits adjusted | Follow-up: ADR #15
Quality checklist for ADRs
- Problem and constraints are stated in plain language.
- Includes a "do nothing" baseline option.
- Trade-offs are tied to measurable impacts (latency, cost, risk).
- Decision clearly maps to constraints and goals.
- Consequences list follow-up tasks and risks.
- Status is set and deciders are named.
- Short (1–2 pages); links to deeper docs if needed.
Exercises
Do these now. They mirror the exercises below. Aim for concise, decision-focused writing.
- Exercise ex1: Write an ADR proposing a switch from REST+JSON to gRPC for internal service calls in a growing microservice system. Include context (latency, type safety), options (REST, gRPC, message broker), the decision, and consequences (gateway, codegen, training). Target 250–400 words.
- Exercise ex2: Create three decision log entries from a noisy chat thread about scaling a read replica: increase instance size, add read-only endpoint, and set monitoring alert thresholds. For each, write date, owner, decision, short rationale, and follow-up.
Self-check
- Can a new teammate understand the problem and the chosen option in under 2 minutes?
- Are trade-offs concrete and contextual, not generic?
- Did you list at least one risk and a follow-up task?
- Is status clear (Proposed/Accepted), with named deciders?
Common mistakes and how to spot them
- Vague context: “Needs to scale” without metrics. Fix by adding constraints (e.g., “handle 5k RPS, p95 < 150ms”).
- Option bias: Presenting only the desired option. Fix by adding a baseline and at least one realistic alternative.
- Hand-wavy consequences: No follow-ups listed. Fix by adding specific tasks, owners, and risks.
- Stale status: ADR never updated after reversal. Fix by marking Deprecated or Superseded and linking the new ADR.
- Too long: Treating ADRs as novels. Fix by moving details to references; keep ADR scannable.
Practical projects
- Create a repo folder called /adr with three ADRs: protocol choice, data storage choice, and rollout strategy. Use the template above.
- Set up a team-visible decision log (e.g., README section). Add five entries from the last sprint’s changes.
- Run a “decision review” session: pick two ADRs, check them against the quality checklist, and propose improvements.
Learning path
- Learn the ADR template and practice with a small decision.
- Capture one real team decision this week as an ADR; get peer review.
- Start a decision log and add every small operational choice for two weeks.
- Introduce statuses and link related ADRs and logs for traceability.
- Periodically prune: deprecate or supersede outdated ADRs.
Mini challenge
Your team debates rate limiting: per-IP at the edge vs per-user in the service vs token bucket per endpoint. Draft a 200–300 word ADR picking one. Include context (traffic spikes), options, decision, and at least two consequences.
Next steps
- Apply the template to a current initiative and request a 10-minute review from peers.
- Schedule a monthly 15-minute “decision hygiene” review to update statuses.
- Take the quick test below to check your understanding.
Quick Test
Everyone can take the test for free. Logged-in users will have their progress saved automatically.