Menu

Topic 2 of 7

Writing Technical Specs

Learn Writing Technical Specs for free with explanations, exercises, and a quick test (for Backend Engineer).

Published: January 20, 2026 | Updated: January 20, 2026

Why this matters

Backend engineers spend real time aligning on requirements, risk, and impact. A good technical spec:

  • Aligns product, backend, frontend, data, security, and ops early.
  • Catches risks before code: data loss, migrations, performance, costs, security/privacy.
  • Speeds reviews and implementation with a clear contract (APIs, schemas, SLAs).
  • Reduces rework by stating goals and non-goals up front.
  • Makes rollout safe with a migration and rollback plan.

Concept explained simply

A technical spec is a shared contract for how a change will work and how you will deliver it. It focuses on what, why, and how safely, not just code.

Mental model

  • North star: goals and success metrics.
  • Guardrails: non-goals, constraints, NFRs (latency, reliability, security, cost).
  • Map: architecture, data flows, API/schema, rollout, and observability.

What to include in a solid backend spec

  • Summary: one paragraph of the change and impact.
  • Context & Problem: current behavior, pain, and drivers.
  • Goals & Non-Goals: what must be achieved and what is explicitly out of scope.
  • Functional Requirements: user/system behaviors and acceptance criteria.
  • Non-Functional Requirements (NFRs): latency, throughput, durability, availability, security/privacy, cost budgets.
  • Architecture Overview: components, data stores, queues; include a simple diagram description.
  • API & Schema Changes: endpoints, methods, request/response, status codes, DB schema, versioning strategy.
  • Data Flows & State: read/write paths, consistency model, idempotency, transactions.
  • Failure Modes & SLAs: timeouts, retries, backoff, circuit breakers, error handling, SLAs/SLOs.
  • Dependencies: services, third parties, feature flags, configs.
  • Risks & Trade-offs: alternatives considered, chosen approach and why, mitigations.
  • Migration & Backward Compatibility: phased rollout, dual-writes, backfills, compatibility windows.
  • Security & Privacy: authn/z, secrets, PII handling, logging redaction, compliance notes.
  • Capacity & Scaling: load assumptions, sizing math, growth plan.
  • Observability: logs, metrics, traces, dashboards, alerts.
  • Testing Plan: unit/integration/e2e, load tests, chaos/failure testing.
  • Rollout & Rollback Plan: flags, canary, monitoring gates, clear rollback steps.
  • Open Questions: items to resolve before build.
  • Timeline & Estimates: milestones; identify critical path.
  • Appendix: glossary, diagram links-in-text (describe), data samples.

Step-by-step: how to write it

  1. Draft the summary last, write the problem first.
    Why

    Clarity on the problem shapes the entire design. The summary becomes concise after you decide the approach.

  2. List goals and non-goals.
    Tip

    Make boundaries explicit: “Not building admin UI now.” “No cross-region replication in v1.”

  3. Capture functional requirements.
    Tip

    Use Given/When/Then for acceptance criteria to remove ambiguity.

  4. Set NFRs early.
    Tip

    Latency targets, throughput, error budgets, cost caps. These drive architecture choices.

  5. Sketch the architecture.
    Tip

    Describe components and data flow in text if no diagram tool is available.

  6. Define API and schema changes precisely.
    Tip

    Include request/response examples, status codes, versioning, and idempotency keys.

  7. Plan failure handling and observability.
    Tip

    Time out, retry with backoff, add metrics and alerts for each key path.

  8. Write migration, rollout, and rollback.
    Tip

    Feature flags, canaries, data backfills, and a single-step rollback are your safety net.

  9. List risks and alternatives with reasoning.
    Tip

    Make the trade-offs visible: simplicity vs. performance, consistency vs. availability.

  10. Review, refine, and get sign‑off.
    Tip

    Ask reviewers to challenge assumptions, missing NFRs, and failure modes.

Worked examples

Example 1: Add per‑client rate limiting to public API
  • Summary: Introduce token-bucket rate limiting per API key at the gateway, target 95p latency < 150ms at 300 RPS per key.
  • Goals: Prevent abuse and protect downstream services; Non-goals: UI rate-limit dashboards (v1).
  • NFRs: 99.9% availability; 95p latency < 150ms; cost < $200/month incremental.
  • Architecture: API Gateway with Redis for counters; local leaky-bucket fallback on cache miss.
  • API: 429 with Retry-After on limit exceed; response includes remaining tokens header.
  • Failure: On Redis outage, degrade to local in‑memory limits; alert on 429 burst.
  • Observability: Metrics—RPS by key, 429 rate, Redis latency, token bucket saturation.
  • Rollout: Shadow mode → 1% enforce → 25% → 100%; rollback flips flag to shadow.
Example 2: Migrate /export reports to a gRPC service
  • Summary: Extract CPU-heavy export job from monolith to a gRPC service with a queue.
  • Goals: Reduce monolith p95 CPU by 30%. Non-goals: New export formats.
  • Functional: POST /v1/exports returns job_id; client polls GET /v1/exports/{id}.
  • NFRs: Start < 2s, complete 95% < 2 min, 99% success.
  • Architecture: API → enqueue → worker → object storage; status in Postgres.
  • Schema: exports(id uuid, status, created_at, owner_id, version).
  • Migration: Dual-write status in monolith and new DB for 1 week; cutover flag.
  • Risks: Cross-service latency; Mitigation: co-locate worker with queue.
Example 3: Add soft-delete to Orders
  • Problem: Business needs to hide orders without losing audit.
  • Schema change: add deleted_at TIMESTAMP NULL; unique indexes include (deleted_at).
  • Functional: GET excludes soft-deleted by default; admin can include=true.
  • NFRs: No data loss; migrations online; p95 queries unaffected (< 120ms).
  • Migration: Backfill default NULL; roll forward by deploying read filters first.
  • Rollback: Revert read filters via flag; no destructive changes.

Checklists

Pre-review checklist

  • Problem, goals, non-goals are explicit.
  • Functional requirements use acceptance criteria.
  • NFRs are measurable and testable.
  • API/schema changes are versioned and idempotent.
  • Failure modes and observability are covered.
  • Risks, alternatives, and trade-offs are documented.
  • Rollout and rollback are safe and simple.

Pre-implementation checklist

  • Open questions resolved or time-boxed.
  • Capacity math reviewed.
  • Security/privacy reviewed.
  • Test plan and monitoring dashboards defined.
  • Owner and timeline agreed.

Exercises

These mirror the graded tasks below. Do them here first; then submit in the exercises section.

Exercise 1: Rewrite a vague request into Goals/Non‑Goals and Functional Requirements

Request: "Make the API faster and stop abuse."

  • Write 2–3 goals and 2 non-goals.
  • Write 3 functional requirements with acceptance criteria.
Exercise 2: Define NFRs, capacity, and rollback

Feature: New endpoint POST /v1/invoices to create invoices.

  • Choose p95 latency, throughput, and availability targets.
  • Show a quick capacity calc for peak (e.g., 100 RPS).
  • Describe a one-step rollback.
  • Self-check: Can a peer implement from your spec without a meeting?
  • Self-check: Would your rollback work during a partial outage?

Common mistakes and how to self-check

  • Only describing happy paths. Fix: List failure modes and timeouts.
  • Skipping NFRs. Fix: Add latency, throughput, and availability targets.
  • Under-specifying API versioning. Fix: Include version, compatibility window, and deprecation plan.
  • No rollback. Fix: Add a simple, documented rollback step.
  • Hand-wavy capacity. Fix: Do back-of-the-envelope math.
  • Vague risks. Fix: Write explicit trade-offs and mitigations.

Practical projects

  • One-pager spec: Add idempotent retries to payment webhook processing.
  • Full spec: Introduce a write-through cache for product reads.
  • Spec refactor: Take a past incident and write the spec that would have prevented it.

Who this is for

Backend engineers, tech leads, and SREs who plan, review, or implement service changes.

Prerequisites

  • Comfort with HTTP/REST (or gRPC), data modeling, and basic distributed systems concepts.
  • Familiarity with service monitoring and deployment practices.

Learning path

  • Start: Write a one-pager for a small feature.
  • Next: Produce a full spec with NFRs and rollout for a service change.
  • Advance: Add performance modeling and cost analysis.

Next steps

  • Complete the exercises and compare with solutions.
  • Take the quick test to check understanding.
  • Apply the checklists on your next real task.

Mini challenge

Time-box 30 minutes to draft a one-page spec for adding request pagination to a list endpoint. Include goals, NFRs, API changes, failure modes, and a rollback.

Quick Test

You can take the quick test below for everyone. If you are logged in, your progress will be saved.

Practice Exercises

2 exercises to complete

Instructions

Request: "Make the API faster and stop abuse." Turn this into a spec slice:

  • 2–3 Goals and 2 Non‑Goals.
  • 3 Functional Requirements with acceptance criteria (Given/When/Then).
  • Keep it concise and testable.
Expected Output
A short section with Goals, Non‑Goals, and 3 acceptance-criteria style functional requirements.

Writing Technical Specs — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Writing Technical Specs?

AI Assistant

Ask questions about this tool