Menu

Topic 6 of 8

Rate Limiting And Throttling Basics

Learn Rate Limiting And Throttling Basics for free with explanations, exercises, and a quick test (for Backend Engineer).

Published: January 20, 2026 | Updated: January 20, 2026

Why this matters

As a Backend Engineer, you must keep APIs responsive, fair, and affordable. Rate limiting and throttling protect your services from abuse, noisy neighbors, traffic spikes, and cost overruns. Real tasks you will encounter include:

  • Setting per-user, per-IP, or per-API-key limits to prevent abuse.
  • Adding burst tolerance so legitimate clients can spike briefly without harming others.
  • Returning correct HTTP 429 responses with actionable headers (e.g., Retry-After).
  • Protecting downstream systems (databases, queues) with concurrency limits.
  • Designing limits that balance user experience and platform safety.

Who this is for

  • Backend developers building public or internal APIs.
  • Engineers integrating gateways or reverse proxies.
  • Anyone responsible for API reliability and cost control.

Prerequisites

  • Basic HTTP knowledge (status codes, headers).
  • Familiarity with API authentication (API keys, tokens).
  • Comfort with basic data structures (counters, queues).

Concept explained simply

Rate limiting is controlling how many requests a client can make in a time window. Throttling is intentionally slowing down or reducing throughput so the system stays healthy. Think of the API as a highway: the speed limit (throttling) and the toll booth allowing only a certain number of cars per minute (rate limit).

Mental model

  • Capacity: How much total work your system can handle per unit time.
  • Fairness: Ensure one client cannot starve others.
  • Elastic bursts: Allow short spikes if there is room; clamp down if the system risks overload.
  • Feedback: Tell clients when to retry and how to behave (backoff).

Core definitions and choices

Fixed window counter

Count requests in discrete windows (e.g., per minute). Simple but suffers from boundary issues (bursts at window edges).

Sliding window

Measures across a moving interval. More accurate fairness than fixed windows. Two common forms: sliding log (per-request timestamps) and sliding window counter (two adjacent windows weighted by time).

Token bucket

Tokens refill at a steady rate; each request consumes one token. Supports bursts up to bucket size while enforcing average rate.

Leaky bucket

Queue where requests exit at a constant rate; if the queue overflows, requests are dropped or delayed. Smooths traffic.

Concurrency limit (throttling)

Caps the number of in-flight operations (e.g., max 50 concurrent DB queries). Excess requests wait or fail fast.

Scopes and keys

Decide what you limit on: IP, user, API key, endpoint, plan/tier, or a combination. Use the narrowest scope that matches business goals.

Responses and headers
  • 429 Too Many Requests
  • Retry-After: seconds or HTTP-date
  • Optional: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset
Backoff strategies

Clients should back off on 429 or 503: exponential backoff with jitter avoids synchronized retries.

Worked examples

Example 1: Fixed window per-minute

Policy: 100 requests per 60 seconds per API key. Windows start on wall-clock minute.

  • If a client makes 100 requests between 12:00:00–12:00:59, request 101 in that window is rejected with 429.
  • At 12:01:00, the counter resets.
  • Edge issue: A client can send 100 at 12:00:59 and 100 at 12:01:00, effectively 200 in 2 seconds. Consider sliding or token bucket if this is harmful.

Example 2: Token bucket for bursts

Policy: Refill 10 tokens/second, bucket size 50.

  • At t=0 with a full bucket (50 tokens), a client can burst up to 50 immediate requests.
  • After the bucket empties, only 10 requests/second are allowed on average.
  • If the client idles for 10 seconds, it regains up to 50 tokens (capped by bucket size).

Example 3: Sliding window counter

Policy: 120 requests per minute. At 12:00:45, weight the counts of 12:00:00–12:00:59 and 11:59:00–11:59:59 proportionally to time overlap. This reduces boundary bursts while using constant memory.

Example 4: Concurrency throttle

Policy: Max 20 concurrent image-processing jobs per customer.

  • If 20 are running and 5 more arrive, queue or reject the 5 with a clear message.
  • When a job finishes, one queued job starts.
  • Protects CPU/memory while keeping latency predictable.

Design checklist

  • Define goals: protect upstream? downstream? fairness? cost?
  • Choose scope: per key, per user, per IP, per endpoint, per plan.
  • Select algorithm: fixed window, sliding window, token bucket, leaky bucket, concurrency limit.
  • Decide on burst tolerance and average rate.
  • Specify responses: status code, Retry-After, helpful X-RateLimit-* headers.
  • Decide on behavior: reject vs. queue vs. slowdown (throttle).
  • Persistence: single-node memory or distributed store (e.g., cache) if multi-instance.
  • Idempotency: encourage idempotency keys for safe retries.
  • Observability: metrics, logs, alerts for limit breaches.
  • Documentation: communicate limits and how to handle 429.

Exercises

These mirror the tasks in the Exercises section below. Try them before opening solutions.

  1. Design two-tier limits (ID: ex1)
    Set a per-minute and per-day policy. Decide allow/deny for given request counts. Provide sample headers.
  2. Simulate a token bucket (ID: ex2)
    Given events and parameters, mark each request allow or deny.
  3. Compute Retry-After backoff plan (ID: ex3)
    Turn 429 responses into a retry schedule with jitter.
  • Self-check: Compare your answers with the expected outputs and solutions. Verify headers, math, and edge cases.

Common mistakes and self-check

  • Only using fixed windows and ignoring boundary bursts. Self-check: Can a client double-spike across minute boundaries?
  • Limiting only by IP, ignoring API keys or users behind NAT. Self-check: Will shared corporate IPs get unfairly blocked?
  • No clear 429 messaging or Retry-After header. Self-check: Does the client know when to try again?
  • Forgetting concurrency limits for expensive endpoints. Self-check: Are slow endpoints protected from pile-ups?
  • Unbounded queues during throttling. Self-check: Can a spike fill memory? Do you have max queue length and timeouts?
  • Central counter as single point of failure. Self-check: What happens if the counter store is down?
  • No jitter in backoff. Self-check: Do many clients retry simultaneously?

Practical projects

  • Per-key API policy: Implement per-API-key token bucket with headers and 429 handling. Include a config file for different plan tiers.
  • Gateway + app split: Enforce coarse limits at the edge (per IP) and fine-grained per endpoint in the app (per key).
  • Concurrency guard: Add a semaphore around an expensive endpoint; reject after a short wait with a clear error.
  • Observability: Export metrics for allowed/denied counts, queue length, and 429 rate; create alert thresholds.

Learning path

  • Start: Understand fixed vs sliding vs token bucket, and concurrency limits.
  • Build: Implement a simple in-memory limiter; then move to a distributed counter/cache.
  • Harden: Add headers, jittered backoff guidance, and observability.
  • Scale: Partition limits by tenant/plan, and protect downstream systems.

Next steps

  • Do the exercises below; then take the Quick Test.
  • Note: The test is available to everyone. Only logged-in users will have their progress saved.
  • Apply a limiter to one real endpoint in your current project.

Mini challenge

Your API has free and paid plans. Free: 30 req/min burst up to 60. Paid: 120 req/min burst up to 240. Add a concurrency cap of 10 for an expensive /render endpoint. Define the exact headers and 429 messages you will return for both plans, and describe how you prevent queue overload.

Exercise details and solutions

Exercise ex1: Design two-tier limits

Policy: Per-IP: 60/min. Per-API-key: 1000/day. Given a client with 40 requests in the current minute and 990 requests today, decide for 25 more requests arriving now. Draft headers for both allow and deny cases.

Hints
  • Check both scopes: per-minute and per-day.
  • Consider remaining counts and when they reset.
  • Return Retry-After if you deny.
Expected output

First 20 requests allowed (reaching 60/min); remaining 5 receive 429 due to minute limit. Daily limit would be hit after 10 more allowed later today. Example headers shown in solution.

Show solution

Allow first 20 now: X-RateLimit-Limit: 60; X-RateLimit-Remaining: 0; X-RateLimit-Reset: unix_ts_end_of_minute. For the next 5: 429 with Retry-After: seconds_until_minute_end and a body like: {"error":"rate_limit","message":"Per-IP 60/min exceeded. Retry after window reset."}. Daily limit: X-RateLimit-Day-Limit: 1000; X-RateLimit-Day-Remaining: 10; X-RateLimit-Day-Reset: unix_ts_midnight.

Exercise ex2: Simulate a token bucket

Bucket size = 50, refill = 10/s. Starting full at t=0s. Request at seconds: [0,0,0,0,1,1,2,3,3,5,10,10,10,12] (14 requests). Mark each as Allow or Deny and track remaining tokens after each event.

Hints
  • Before each request, add refilled tokens since the last request time (capped at bucket size).
  • Consume one token per allowed request.
  • If no tokens remain, Deny.
Expected output

A sequence like: A,A,A,A,A,A,A,A,A,A,A,A,A,A (if tokens never exhaust) or with Deny where you run out; see solution for exact math.

Show solution

t=0: start 50. Requests 1–4 at t=0: Allow x4 → 46 left. t=1: refill +10 capped at 50 → 50; two requests: Allow x2 → 48. t=2: +10 → 50; one request: Allow → 49. t=3: +10 → 50; two requests: Allow x2 → 48. t=5: +20 → 50; one request: Allow → 49. t=10: +50 → 50; three requests: Allow x3 → 47. t=12: +20 → 50; one request: Allow → 49. All 14 allowed; tokens never hit zero due to generous refill and idle periods.

Exercise ex3: Compute Retry-After backoff plan

A client gets 429 with Retry-After: 8 (seconds). Propose a 5-attempt plan using exponential backoff with jitter (base 2s, factor 2, cap 60s). Show the absolute wait before each retry given the server hint.

Hints
  • Respect server hint by waiting at least 8s before the first retry.
  • Then apply backoff: 2,4,8,16... seconds with jitter (e.g., +/-20%).
  • Do not exceed the cap (60s).
Expected output

Wait schedule like: 8s, ~2s±jitter, ~4s±jitter, ~8s±jitter, ~16s±jitter, each clipped to cap if necessary.

Show solution

One acceptable plan (example): Retry1 after 8s; Retry2 after 2.2s; Retry3 after 3.7s; Retry4 after 8.6s; Retry5 after 17.5s. Jitter is random; ensure you do not go below 0 and do not exceed 60s. Document that the first wait honors server guidance.

Reference: Helpful headers

  • X-RateLimit-Limit: total allowed in the window.
  • X-RateLimit-Remaining: remaining in the current window.
  • X-RateLimit-Reset: epoch time when the window resets.
  • Retry-After: seconds or HTTP-date when retry is safe.

Quick Test

Take the Quick Test to check your understanding. Available to everyone; only logged-in users get saved progress.

Practice Exercises

3 exercises to complete

Instructions

Create a policy with both per-minute and per-day limits.
Policy: Per-IP 60 requests/min. Per-API-key 1000 requests/day.
Scenario: A client has already made 40 requests this minute and 990 today. They attempt 25 more requests now.

  • Decide for each of the 25 whether it is Allowed or 429.
  • Draft the headers for the last Allowed request in the minute window.
  • Draft the 429 response (status and headers) for the first Denied request.
  • Include daily limit headers that show remaining for the day after your decisions.
Expected Output
First 20 of the 25 are Allowed (reaching 60/min). Remaining 5 are 429 with Retry-After equal to seconds until the minute ends. Daily remaining becomes 10 after the first 10 of the 20 allowed later today.

Rate Limiting And Throttling Basics — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Rate Limiting And Throttling Basics?

AI Assistant

Ask questions about this tool