How to learn Feature Flags For APIs for Platform And Delivery in API Engineer for free

Why this matters

Feature flags let API engineers ship safely: you can enable features for a subset of users, keep a kill switch for risky integrations, and separate deploy from release. This reduces outages and accelerates learning.

Release new behaviors gradually without breaking clients.
Roll back instantly if metrics go red, without redeploying.
Target tenants, regions, or traffic slices for experiments.

Who this is for

API engineers and platform engineers introducing or improving flag-driven releases.
Backend developers who need safe rollouts, canaries, or kill switches.
Tech leads defining release policies and guardrails.

Prerequisites

Comfort with HTTP APIs (status codes, headers, caching basics).
Basic knowledge of server-side programming and configuration management.
Familiarity with logs/metrics and reading dashboards.

Concept explained simply

A feature flag is a conditional switch in code that chooses between behaviors at runtime. Instead of deploying new code to change behavior, you flip a flag value. Server-side flags are evaluated on the backend, so clients see consistent, authoritative behavior.

Types you will use:

Boolean: on/off (e.g., enable payment provider B).
Multivariate: one of several values (e.g., pick algorithm A/B/C).
Percentage rollout: enable for X% of traffic with stickiness (e.g., by user or tenant).
Targeted: enable for a list (tenant IDs, regions, plans).

Mental model

Think of flags as circuit breakers and traffic routers:

Decision point: "Which path should this request take?"
Inputs: global defaults, environment, targeting rules, and identity (tenant/user/service).
Outputs: the chosen behavior and telemetry events you emit to observe impact.

Quick visual model (text)

Request -> Identify (tenant/user/service) -> Evaluate flag rules -> Select behavior
        -> Execute behavior -> Emit metrics/logs -> Cache decision (short TTL)
        -> Fallback to safe default when evaluation fails

Key patterns and decisions

Evaluation location: Prefer server-side evaluation for APIs. Never trust client-side flags to protect sensitive paths.
Stickiness: Use a stable key (tenantId, userId) to avoid flapping when doing percentage rollouts.
Fallbacks: If the flag system is unavailable, use the last-known value or a safe default (usually the old, proven path) to preserve contract and availability.
Observability: Emit counters for on/off path selection, errors, and latency per variant.
Caching: Cache evaluated decisions briefly (e.g., 30–120 seconds) to reduce latency; avoid caching that breaks targeting.
Compatibility: Flags should not break the public API contract. For contract-breaking changes, use versioning; use flags to prepare the path and gather signals.
Security: Never expose raw internal flag names/values to clients; expose only intentional, stable behavior indicators if needed.

Worked examples

Example 1: Kill switch for a flaky downstream provider

Scenario: You have /payments that calls ProviderA. Add a kill switch to stop calling ProviderA when error rate spikes.

// Pseudocode (Node/Express style)
app.post('/payments', async (req, res) => {
  const useProviderA = flags.getBoolean('payments.providerA.enabled', { tenantId: req.tenantId }, { default: true });
  if (!useProviderA) {
    return res.status(503).json({ message: 'Payments temporarily unavailable. Please retry later.' });
  }
  try {
    const result = await providerA.charge(req.body);
    return res.status(200).json(result);
  } catch (e) {
    metrics.count('payments.providerA.errors');
    return res.status(502).json({ message: 'Upstream error' });
  }
});

Fallback: If flag service is down, default true or last-known value. In an incident, ops flips it to false.
Observation: Monitor 5xx, saturation, and the flag flip event.

Example 2: Gradual rollout of new response field

Scenario: Add field beta_score to /recommendations but avoid surprises.

// Guard behavior behind a flag but keep the contract stable
const enabled = flags.getPercent('recs.beta_score.rollout', { userId }, { default: 0, stickinessKey: 'userId' });
const data = await service.getRecommendations(userId);
if (enabled) {
  data.beta_score = await service.computeBetaScore(userId);
}
// Contract safety: If clients are sensitive, place field under a vendor-specific header gate and Vary on that header.
res.setHeader('Vary', 'X-Beta-Features');

Notes:

Use a Vary header if response shape depends on a request header to keep caches correct.
Keep a default that returns a valid response without the new field.

Example 3: Tenant-targeted early access

Scenario: Enable a faster search path for specific enterprise tenants.

// Pseudocode (Go-ish)
func Search(w http.ResponseWriter, r *http.Request) {
  tenantId := r.Header.Get("X-Tenant-Id")
  fastPath := flags.GetBoolean("search.fast.enabled", map[string]string{"tenantId": tenantId}, true)
  var results []Item
  if fastPath {
    results = index.FastSearch(r.Context(), r.URL.Query())
  } else {
    results = index.SafeSearch(r.Context(), r.URL.Query())
  }
  // Emit telemetry for both variants
  metrics.Count("search.variant", map[string]string{"fast": strconv.FormatBool(fastPath)})
  json.NewEncoder(w).Encode(results)
}

Notes:

Use an allowlist of tenant IDs in the flag rule.
Track latency and errors per variant to decide rollout.

Step-by-step: Add a flag to an existing API

Define the goal: What risk are you reducing? How will you know it works? List metrics and exit criteria.
Choose flag type: boolean, multivariate, or percentage with stickiness.
Pick identity: tenantId or userId for targeting; avoid IP as identity for stickiness.
Decide defaults and fallback: Off by default for new features; on or off for kill switches based on safety. Cache last-known values.
Add code guard: Place the decision at the smallest scope that contains the risk.
Emit telemetry: Counters for selected variant, errors, latency, and an audit log on flips.
Warm up and dry run: Validate both paths in staging; run shadow traffic if possible.
Roll out gradually: 1% → 5% → 25% → 50% → 100%, verifying SLOs at each step.
Cleanup: Remove flag and dead code once fully rolled out and stable.

Checklist: Safe rollout Definition of Done

Flag has clear owner and purpose.
Defaults and fallbacks defined and tested.
Stickiness configured (user/tenant) for percentage rollouts.
Logs/metrics per variant in place.
Cache behavior understood; Vary header set if response differs by request input.
Runbook for flipping/rollback documented.
Cleanup task created with a date.

Common mistakes and self-checks

Mistake: Client-side gating for sensitive server behavior. Self-check: Could a malicious client bypass this? If yes, move evaluation server-side.
Mistake: Missing stickiness causing users to flip variants. Self-check: Same user gets same decision across requests?
Mistake: Breaking caches with mixed responses. Self-check: If request input affects response variant, is Vary set correctly?
Mistake: No fallback for flag unavailability. Self-check: Kill the flag service in a test env; do you still serve a safe response?
Mistake: Long-lived flags and dead code. Self-check: Is there a cleanup date and owner? Add it now.

Exercises

These mirror the tasks below. Use the hints if stuck. Everyone can take the test; only logged-in users will have their progress saved.

Exercise 1: Design a safe flag for a header change (matches ex1)

You want to add an X-Request-ID response header to all endpoints. Some proxies are sensitive to new headers.

Design the flag plan: type, targeting, default, fallback, telemetry.
Define acceptance checks for a 0% → 25% → 100% rollout.

Exercise 2: Implement percent rollout with stickiness (matches ex2)

Write a function enabled(userId, percent) that uses userId % 100 < percent to decide. For userIds [129, 230, 345, 478] and percent=30, compute ON/OFF decisions.

Practical projects

Build a tiny flag evaluator service: In-memory rules (boolean, percentage by userId), HTTP endpoint to read rules, and a cache with last-updated timestamp.
Add a kill switch to an API that calls a third-party. Simulate upstream failures and demonstrate instant rollback via flag flip.
Implement a safe header-gated feature: Add a beta header that enables an extra response field. Ensure Vary is correct and caches behave as expected.

Learning path

Start: Server-side flag basics and safe defaults (this lesson).
Next: Observability for releases—latency, error budgets, variant tagging.
Then: Progressive delivery patterns—canary, blue/green, shadow traffic.
Advanced: Experimentation and multivariate flags while preserving API contracts.

Next steps

Pick one endpoint in your service and add a non-breaking flag around a small change.
Set up metrics per variant and perform a tiny (1%) rollout.
Schedule cleanup once you reach 100%.

Mini challenge

You must change a default timeout from 2s to 5s on a read endpoint without hurting latency SLOs. Sketch a flag plan: what to measure, rollout steps, and the fallback if p95 latency regresses by 10%.

Menu

Feature Flags For APIs

Table of Contents

Why this matters

Who this is for

Prerequisites

Concept explained simply

Mental model

Key patterns and decisions

Worked examples

Step-by-step: Add a flag to an existing API

Checklist: Safe rollout Definition of Done

Common mistakes and self-checks

Exercises

Practical projects

Learning path

Next steps

Mini challenge

Practice Exercises

Design a safe flag for adding X-Request-ID response header

Instructions

Expected Output

Implement percent rollout with stickiness

Feature Flags For APIs — Quick Test

Have questions about Feature Flags For APIs?

AI Assistant