Menu

Topic 7 of 8

Feature Flags Basics

Learn Feature Flags Basics for free with explanations, exercises, and a quick test (for Backend Engineer).

Published: January 20, 2026 | Updated: January 20, 2026

Why this matters

Feature flags let you deploy code continuously while controlling who sees changes. This reduces risk, speeds up releases, and enables safe experiments.

  • Turn features on/off instantly (kill switches) without redeploying.
  • Roll out gradually (1% → 5% → 25% → 100%) and monitor impact.
  • Target specific users, regions, or environments.
  • Run A/B tests to validate product and performance assumptions.

Who this is for

  • Backend Engineers shipping APIs and services with CI/CD.
  • Platform/DevOps engineers enabling safe deploys.
  • Any engineer who wants to decouple deployment from release.

Prerequisites

  • Basic understanding of deployments and environments (dev, staging, prod).
  • Comfort with reading simple code/config files.
  • Familiarity with metrics/logs and incident response.

Concept explained simply

A feature flag is a conditional switch in code that decides whether to execute a new path (feature) for some users or conditions. You ship the code dark, keep it off by default, then turn it on in a controlled way.

Mental model

Think of a traffic metering light on a freeway on-ramp. Cars (users) are allowed in gradually to avoid traffic jams (incidents). You can stop the flow instantly (kill switch) if there’s a crash (error spike).

Core building blocks

  • Flag key: stable identifier, e.g., "checkout.v2".
  • Description and owner: why it exists; who cleans it up.
  • Default: off by default for safety.
  • Targeting rules: who/when sees the feature (segments, percentages, attributes).
  • Environments: separate settings for dev/staging/prod.
  • Variations: not just on/off; can be values like numbers or strings.
  • Evaluation: where the decision happens (server or client).
  • Telemetry: logs/metrics to measure impact.
  • Expiration: date to remove the flag once done.

Worked examples

Example 1 — Gradual rollout

We introduce a new pricing endpoint. Off by default, progressively enabled to a percentage of traffic.

// Pseudocode
flag = flags.get("pricing.v2", default=false)
if (flag.enabledFor(user.id, env="prod", percent=5)) {
  return pricingV2()
} else {
  return pricingV1()
}

Plan: 1% for 30 minutes, check error rate/latency; then 5%, 25%, 50%, 100%.

Example 2 — Kill switch

A cache warmer occasionally overloads Redis. Wrap it with a flag for instant disable.

if (flags.isEnabled("infra.cacheWarmer", default=true)) {
  runCacheWarmer()
}

If Redis CPU spikes, flip the flag off without deploying.

Example 3 — Configuration flag

Use a numeric variation to control a concurrency limit.

limit = flags.getNumber("search.maxConcurrent", default=8)
semaphore = new Semaphore(limit)

Adjust at runtime based on system load.

Implementation patterns

  • Server-side evaluation (recommended for backend): evaluate on the server and log decisions; avoids exposing hidden features and allows centralized control.
  • Caching: cache flag configs and refresh periodically; fall back to safe defaults if the flag service is unreachable.
  • Deterministic bucketing: for percentage rollouts, hash a stable key (e.g., user ID) so users consistently see the same variation.
  • Observability: tag logs/metrics with flag key and variation. Watch error rate, latency, CPU, and key business metrics.

Safe rollout steps

  1. Add code paths guarded by a flag. Keep default off.
  2. Ship to staging: enable 100% on staging; run tests and load checks.
  3. Prod canary: enable 1% (deterministic). Monitor technical and business metrics.
  4. Increase gradually if healthy: 1% → 5% → 25% → 50% → 100%.
  5. If issues arise: flip flag off immediately; investigate; fix; retry.
  6. After full release: remove the old path and delete the flag.
Risk checklist
  • Is default safe (off)?
  • Do we have kill switch and quick rollback?
  • Are error budgets and thresholds defined?
  • Is telemetry (metrics/logs/traces) in place?
  • Is there an owner and a removal date?

Security and privacy

  • Do not expose sensitive flags to clients; evaluate on server and send only the resulting behavior or non-sensitive config.
  • Authenticate flag management; log all changes with actor and timestamp.
  • Fail closed: if flag service is unavailable, prefer safe defaults.

Observability

  • Emit counters: flag_evaluations{key,variation}, errors, and latencies.
  • Correlate flag changes with incidents using change logs.
  • Use SLOs to decide rollout pace and when to stop.

Exercises

These mirror the interactive exercise below. Everyone can take them; only logged-in users will have progress saved.

Exercise 1 (ex1) — Design a safe rollout

Design a rollout plan for a new endpoint guarded by flag "orders.v3". Include: default, targeting, percent steps, metrics to watch, rollback criteria, and cleanup plan.

Hints
  • Start with off by default.
  • Pick deterministic bucketing (hash user ID).
  • Define thresholds, e.g., error rate < 1%, p95 latency < 300ms.
  • Checklist: Default off and safe fallback.
  • Checklist: Monitoring dashboards ready.
  • Checklist: Rollout steps and ownership defined.
  • Checklist: Removal date scheduled.

Common mistakes

  • Leaving flags forever: code rots. Self-check: does each flag have an owner and removal date?
  • Client-side secrets: exposing hidden features. Self-check: are sensitive decisions evaluated on the server?
  • Non-deterministic rollout: users flip between versions. Self-check: is bucketing based on stable IDs?
  • No observability: you can’t see impact. Self-check: do logs/metrics include flag keys and variations?
  • No safe default: outages when flag service fails. Self-check: are defaults and fallbacks defined?

Practical projects

  • Add a kill switch around a heavy background job; simulate failure and flip it off.
  • Implement percentage rollout using hashing on user ID; prove determinism in logs.
  • Create a small config-driven flag file (JSON/YAML) with owner, default, rules, and an expiry note; write a linter that warns on expired flags.

Learning path

  • Start: Feature Flags Basics (this lesson).
  • Next: Rollout strategies, canary and blue/green.
  • Then: Observability and incident response basics.
  • Advanced: Experiment design and statistical guardrails.

Next steps

  • Integrate flags into your CI/CD pipeline: enable on staging post-deploy, disable on incident.
  • Automate flag cleanup reminders in code review.
  • Pair with SLOs to drive rollout decisions.

Mini challenge

You deploy a new rate limiter behind a flag. After enabling at 10%, p95 latency improves but error rate rises from 0.2% to 1.6%. What do you do? Write a 3-step action plan (include kill switch, diagnostics you’d check, and a safer retry plan).

About progress and tests

The Quick Test is available to everyone. Only logged-in learners have their answers and progress saved.

Practice Exercises

1 exercises to complete

Instructions

You’re launching a new Orders API version behind flag "orders.v3". Produce a one-page rollout plan that includes:

  • Flag metadata: key, description, owner, default.
  • Targeting rules and deterministic bucketing source.
  • Rollout steps with time windows (1% → 5% → 25% → 50% → 100%).
  • Metrics to watch (error rate, p95 latency, key business KPIs).
  • Rollback criteria and exact steps to disable.
  • Cleanup plan: when and how to remove old code and the flag.
Expected Output
A concise rollout plan document with defaults, rules, stepwise percentages, thresholds (e.g., errors <1%, p95 <300ms), rollback instructions, and a removal date.

Feature Flags Basics — Quick Test

Test your knowledge with 9 questions. Pass with 70% or higher.

9 questions70% to pass

Have questions about Feature Flags Basics?

AI Assistant

Ask questions about this tool