How to learn Feature Flags Basics for CI CD And DevOps Basics in Backend Engineer for free

Why this matters

Feature flags let you deploy code continuously while controlling who sees changes. This reduces risk, speeds up releases, and enables safe experiments.

Turn features on/off instantly (kill switches) without redeploying.
Roll out gradually (1% → 5% → 25% → 100%) and monitor impact.
Target specific users, regions, or environments.
Run A/B tests to validate product and performance assumptions.

Who this is for

Backend Engineers shipping APIs and services with CI/CD.
Platform/DevOps engineers enabling safe deploys.
Any engineer who wants to decouple deployment from release.

Prerequisites

Basic understanding of deployments and environments (dev, staging, prod).
Comfort with reading simple code/config files.
Familiarity with metrics/logs and incident response.

Concept explained simply

A feature flag is a conditional switch in code that decides whether to execute a new path (feature) for some users or conditions. You ship the code dark, keep it off by default, then turn it on in a controlled way.

Mental model

Think of a traffic metering light on a freeway on-ramp. Cars (users) are allowed in gradually to avoid traffic jams (incidents). You can stop the flow instantly (kill switch) if there’s a crash (error spike).

Core building blocks

Flag key: stable identifier, e.g., "checkout.v2".
Description and owner: why it exists; who cleans it up.
Default: off by default for safety.
Targeting rules: who/when sees the feature (segments, percentages, attributes).
Environments: separate settings for dev/staging/prod.
Variations: not just on/off; can be values like numbers or strings.
Evaluation: where the decision happens (server or client).
Telemetry: logs/metrics to measure impact.
Expiration: date to remove the flag once done.

Worked examples

Example 1 — Gradual rollout

We introduce a new pricing endpoint. Off by default, progressively enabled to a percentage of traffic.

// Pseudocode
flag = flags.get("pricing.v2", default=false)
if (flag.enabledFor(user.id, env="prod", percent=5)) {
  return pricingV2()
} else {
  return pricingV1()
}

Plan: 1% for 30 minutes, check error rate/latency; then 5%, 25%, 50%, 100%.

Example 2 — Kill switch

A cache warmer occasionally overloads Redis. Wrap it with a flag for instant disable.

if (flags.isEnabled("infra.cacheWarmer", default=true)) {
  runCacheWarmer()
}

If Redis CPU spikes, flip the flag off without deploying.

Example 3 — Configuration flag

Use a numeric variation to control a concurrency limit.

limit = flags.getNumber("search.maxConcurrent", default=8)
semaphore = new Semaphore(limit)

Adjust at runtime based on system load.

Implementation patterns

Server-side evaluation (recommended for backend): evaluate on the server and log decisions; avoids exposing hidden features and allows centralized control.
Caching: cache flag configs and refresh periodically; fall back to safe defaults if the flag service is unreachable.
Deterministic bucketing: for percentage rollouts, hash a stable key (e.g., user ID) so users consistently see the same variation.
Observability: tag logs/metrics with flag key and variation. Watch error rate, latency, CPU, and key business metrics.

Safe rollout steps

Add code paths guarded by a flag. Keep default off.
Ship to staging: enable 100% on staging; run tests and load checks.
Prod canary: enable 1% (deterministic). Monitor technical and business metrics.
Increase gradually if healthy: 1% → 5% → 25% → 50% → 100%.
If issues arise: flip flag off immediately; investigate; fix; retry.
After full release: remove the old path and delete the flag.

Risk checklist

Is default safe (off)?
Do we have kill switch and quick rollback?
Are error budgets and thresholds defined?
Is telemetry (metrics/logs/traces) in place?
Is there an owner and a removal date?

Security and privacy

Do not expose sensitive flags to clients; evaluate on server and send only the resulting behavior or non-sensitive config.
Authenticate flag management; log all changes with actor and timestamp.
Fail closed: if flag service is unavailable, prefer safe defaults.

Observability

Emit counters: flag_evaluations{key,variation}, errors, and latencies.
Correlate flag changes with incidents using change logs.
Use SLOs to decide rollout pace and when to stop.

Exercises

These mirror the interactive exercise below. Everyone can take them; only logged-in users will have progress saved.

Exercise 1 (ex1) — Design a safe rollout

Design a rollout plan for a new endpoint guarded by flag "orders.v3". Include: default, targeting, percent steps, metrics to watch, rollback criteria, and cleanup plan.

Hints

Start with off by default.
Pick deterministic bucketing (hash user ID).
Define thresholds, e.g., error rate < 1%, p95 latency < 300ms.

Checklist: Default off and safe fallback.
Checklist: Monitoring dashboards ready.
Checklist: Rollout steps and ownership defined.
Checklist: Removal date scheduled.

Common mistakes

Leaving flags forever: code rots. Self-check: does each flag have an owner and removal date?
Client-side secrets: exposing hidden features. Self-check: are sensitive decisions evaluated on the server?
Non-deterministic rollout: users flip between versions. Self-check: is bucketing based on stable IDs?
No observability: you can’t see impact. Self-check: do logs/metrics include flag keys and variations?
No safe default: outages when flag service fails. Self-check: are defaults and fallbacks defined?

Practical projects

Add a kill switch around a heavy background job; simulate failure and flip it off.
Implement percentage rollout using hashing on user ID; prove determinism in logs.
Create a small config-driven flag file (JSON/YAML) with owner, default, rules, and an expiry note; write a linter that warns on expired flags.

Learning path

Start: Feature Flags Basics (this lesson).
Next: Rollout strategies, canary and blue/green.
Then: Observability and incident response basics.
Advanced: Experiment design and statistical guardrails.

Next steps

Integrate flags into your CI/CD pipeline: enable on staging post-deploy, disable on incident.
Automate flag cleanup reminders in code review.
Pair with SLOs to drive rollout decisions.

Mini challenge

You deploy a new rate limiter behind a flag. After enabling at 10%, p95 latency improves but error rate rises from 0.2% to 1.6%. What do you do? Write a 3-step action plan (include kill switch, diagnostics you’d check, and a safer retry plan).

About progress and tests

The Quick Test is available to everyone. Only logged-in learners have their answers and progress saved.

Menu

Feature Flags Basics

Table of Contents