Why this matters
Exercise 2 (ID: ex2) — Implement client retries with idempotency
Write pseudocode that calls POST /v1/predict with:
- Idempotency-Key per logical prediction.
- Exponential backoff on 429, honoring Retry-After if present.
- Stops on 401/403 and refreshes token on 401 if using OAuth2.
Tip
Keep backoff bounded (max 30s). Cap total attempts to 5.
Checklist before you try
- I can explain API key vs OAuth2 vs JWT.
- I know when to return 401 vs 403.
- I can describe token bucket and quotas.
- I know which headers to return for limits.
- I can implement idempotency keys.
Common mistakes and how to self-check
- Mixing 401 and 403: 401 means unauthenticated or bad token/key; 403 means authenticated but not allowed. Self-check: Remove the header entirely—do you return 401?
- No rate-limit headers: Clients cannot adapt. Self-check: Ensure X-RateLimit-* and Retry-After are present on 429 and optionally on 200.
- Logging secrets: Keys/tokens appear in logs. Self-check: Mask Authorization and x-api-key values.
- Unlimited body size: Large payloads can cause OOM. Self-check: Enforce Content-Length and reject oversize.
- Missing idempotency for POST: Duplicate work. Self-check: Retry the same request—do you get identical response?
- Long-lived tokens without rotation: Risky if leaked. Self-check: Verify token lifetimes and rotation policy.
Practical projects
- Secure Predictor v1: Build a minimal /v1/predict endpoint with API key auth, token-bucket limiting, and X-RateLimit headers.
- OAuth2 Service Caller: Add a small client that obtains a token (client credentials) and calls your endpoint, verifying scopes.
- Idempotent Inference: Implement idempotency storage (short TTL) and demonstrate safe retries under induced 429s.
Quick Test
Take the quick test below to check your understanding. Everyone can take it for free. If you are logged in, your progress will be saved automatically.
Mini challenge
Your model endpoint occasionally spikes to 3x normal traffic when new users sign up. Propose a combined strategy using token bucket parameters, a daily quota, and short-lived OAuth2 tokens to maintain SLOs without blocking legitimate bursts. Write 5 concise bullet points that describe your configuration choices and the client behavior you expect.
Learning path
- Next up: request validation and schema enforcement for model inputs.
- Then: observability for model APIs (latency percentiles, 4xx/5xx, and saturation).
- Finally: multi-tenant isolation and per-tenant quotas.
Next steps
- Complete Exercises 1–2 and the Quick Test.
- Add rate-limit headers to your current endpoint and verify in a client.
- Set a plan for secret rotation and token lifetimes.