Auth Rate Limits And Security Basics

Learn Auth Rate Limits And Security Basics for free with explanations, exercises, and a quick test (for Machine Learning Engineer).

Published: January 1, 2026 | Updated: January 1, 2026

Why this matters

Exercise 2 (ID: ex2) — Implement client retries with idempotency

Write pseudocode that calls POST /v1/predict with:

Idempotency-Key per logical prediction.
Exponential backoff on 429, honoring Retry-After if present.
Stops on 401/403 and refreshes token on 401 if using OAuth2.

Tip

Keep backoff bounded (max 30s). Cap total attempts to 5.

Checklist before you try

I can explain API key vs OAuth2 vs JWT.
I know when to return 401 vs 403.
I can describe token bucket and quotas.
I know which headers to return for limits.
I can implement idempotency keys.

Common mistakes and how to self-check

Mixing 401 and 403: 401 means unauthenticated or bad token/key; 403 means authenticated but not allowed. Self-check: Remove the header entirely—do you return 401?
No rate-limit headers: Clients cannot adapt. Self-check: Ensure X-RateLimit-* and Retry-After are present on 429 and optionally on 200.
Logging secrets: Keys/tokens appear in logs. Self-check: Mask Authorization and x-api-key values.
Unlimited body size: Large payloads can cause OOM. Self-check: Enforce Content-Length and reject oversize.
Missing idempotency for POST: Duplicate work. Self-check: Retry the same request—do you get identical response?
Long-lived tokens without rotation: Risky if leaked. Self-check: Verify token lifetimes and rotation policy.

Practical projects

Secure Predictor v1: Build a minimal /v1/predict endpoint with API key auth, token-bucket limiting, and X-RateLimit headers.
OAuth2 Service Caller: Add a small client that obtains a token (client credentials) and calls your endpoint, verifying scopes.
Idempotent Inference: Implement idempotency storage (short TTL) and demonstrate safe retries under induced 429s.

Quick Test

Take the quick test below to check your understanding. Everyone can take it for free. If you are logged in, your progress will be saved automatically.

Mini challenge

Your model endpoint occasionally spikes to 3x normal traffic when new users sign up. Propose a combined strategy using token bucket parameters, a daily quota, and short-lived OAuth2 tokens to maintain SLOs without blocking legitimate bursts. Write 5 concise bullet points that describe your configuration choices and the client behavior you expect.

Learning path

Next up: request validation and schema enforcement for model inputs.
Then: observability for model APIs (latency percentiles, 4xx/5xx, and saturation).
Finally: multi-tenant isolation and per-tenant quotas.

Next steps

Complete Exercises 1–2 and the Quick Test.
Add rate-limit headers to your current endpoint and verify in a client.
Set a plan for secret rotation and token lifetimes.

Practice Exercises

2 exercises to complete

Instructions

Create a per-API-key policy for three endpoints:

GET /v1/health — weight 0
POST /v1/predict — weight 3
GET /v1/metrics — weight 1

Requirements:

Average 120 requests/min per key.
Allow burst of 20 immediate predict requests.
Include response headers to help clients adjust.

Deliverables: Token bucket parameters (rate, burst), endpoint weights, sample headers for a response where 3 tokens remain and reset is in 15s.

Expected Output

A clear plan with: rate=120 tokens/min, burst allowing at least 20 predict requests (60 tokens) instantly, weights {health:0, predict:3, metrics:1}, and headers like X-RateLimit-Limit, X-RateLimit-Remaining: 3, X-RateLimit-Reset: <epoch in ~15s>.