Why this matters
As a Prompt Engineer, your prompts only create value when they power real features: chat assistants, content generation tools, search helpers, or decision aids. Integrating with APIs and products turns a good prompt into a reliable, safe, and measurable capability inside an app or workflow.
- Ship LLM-backed features in web/mobile backends.
- Call tools (functions) to fetch data or take actions.
- Stream partial responses for responsive UIs.
- Handle auth, rate limits, retries, and observability.
- Enforce safety: input validation, output constraints, and redaction.
Who this is for
- Prompt Engineers deploying prompts into real apps or workflows.
- Developers adding LLM features to backend or frontend.
- Product folks prototyping AI features with minimal code.
Prerequisites
- Comfort with HTTP requests (status codes, headers, JSON).
- Basic coding in JavaScript or Python.
- Understanding of prompt roles (system/user/assistant) and tokens.
Concept explained simply
Integrating with APIs and products means wrapping your prompt into a reliable service call. You send structured input, get structured output, and guard everything with limits, logs, and tests so that it works the same way tomorrow as it does today.
Mental model
- Contract: Define request/response shapes you can validate.
- Control: Timeouts, retries, and rate limits to stay stable.
- Context: Add the right data (tools, memory, grounding) just-in-time.
- Compliance: Redact PII, apply policies, and moderate content.
- Clarity: Log prompts, params, and outputs for improvement.
Core building blocks
- Authentication: API keys, secrets in environment variables. Rotate and scope them.
- Request shape: messages array, system instructions, temperature, max tokens.
- Response shape: message content, tool calls, finish reasons, usage.
- Retries: Retry only transient errors (timeouts, 429, 5xx) with backoff and jitter.
- Idempotency: Use idempotency keys for create-like actions to avoid duplicates.
- Streaming: Send partial tokens to the UI for fast feedback.
- Tool calling: Define function schemas; the model suggests when to call; your code executes and returns results.
- Validation: JSON schemas for outputs; fall back to reprompting on invalid JSON.
- Safety: Input redaction, output moderation, allow/deny lists, and user consent for sensitive actions.
- Observability: Correlate logs per request; capture prompts, parameters, latency, and errors (respecting privacy).
Mini task: Spot the transient errors
Which should you retry? 429 Too Many Requests (Yes), 500 Internal Server Error (Yes), 400 Bad Request (No), Timeout (Yes).
Worked examples
Example 1 — Basic backend call with retries (Python)
import os, time, json, requests
API_URL = "https://api.example-llm.com/v1/chat/completions"
API_KEY = os.environ.get("LLM_API_KEY")
def call_llm(messages, max_retries=3):
delay = 0.5
for attempt in range(1, max_retries + 1):
try:
resp = requests.post(
API_URL,
headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"},
json={
"model": "my-model",
"messages": messages,
"temperature": 0.3,
"max_tokens": 256
},
timeout=20
)
if resp.status_code in [429, 500, 502, 503, 504]:
raise RuntimeError(f"Transient {resp.status_code}")
resp.raise_for_status()
data = resp.json()
return data["choices"][0]["message"]["content"]
except Exception as e:
if attempt == max_retries:
raise
time.sleep(delay)
delay *= 2
if __name__ == "__main__":
answer = call_llm([
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "Summarize why retries matter in 2 bullets."}
])
print(answer)
- Shows transient error handling.
- Keeps a fixed request shape for predictable outputs.
Example 2 — Streaming to the browser (JavaScript)
async function streamLLM(messages) {
const res = await fetch("/api/llm", { method: "POST", body: JSON.stringify({ messages }) });
const reader = res.body.getReader();
const decoder = new TextDecoder();
let done = false;
let all = "";
while (!done) {
const { value, done: d } = await reader.read();
done = d;
if (value) {
const chunk = decoder.decode(value);
all += chunk;
// Append chunk to UI incrementally (e.g., textarea.value += chunk)
}
}
return all;
}
- Use server endpoint to call the LLM API; stream response to the client.
- Great for chat UIs and long generations.
Example 3 — Tool calling pipeline
// Define a function schema your model can call
const functions = [
{
name: "get_product_price",
description: "Return price for a product SKU",
parameters: {
type: "object",
properties: { sku: { type: "string" } },
required: ["sku"]
}
}
];
async function handleToolCall(toolCall) {
if (toolCall.name === "get_product_price") {
const { sku } = JSON.parse(toolCall.arguments);
// Call your product DB/service
const price = 19.99; // Stubbed
return JSON.stringify({ sku, price, currency: "USD" });
}
return "{}";
}
/* Flow:
1) Send messages + tool definitions.
2) If the model returns a tool call, execute it.
3) Send tool result back as a new message.
4) Get the final user-ready answer. */
- Separates model intent from real action.
- Ensures guardrails: validate inputs before executing a tool.
Example 4 — Webhook-driven product integration
- Your product receives a webhook event (e.g., user asks for a summary).
- Backend collects relevant context (user ID, document snippet).
- Backend calls LLM with a safe prompt and constraints.
- Backend posts the result back to the product UI.
Mini task: Add idempotency
Create an idempotency key from eventId + userId. Store processed keys; if seen again, return the cached response instead of reprocessing.
Implementation checklist
- [ ] Secrets stored in env vars, never in client code.
- [ ] Timeouts set (client and server).
- [ ] Retries with exponential backoff and jitter for 429/5xx/timeouts.
- [ ] Idempotency keys for create-like operations.
- [ ] Input validation and output JSON schema checks.
- [ ] Safety filters: redact PII before logs; moderate risky outputs.
- [ ] Observability: trace ID per request; log latency, tokens, errors.
- [ ] Streaming enabled for long generations (optional but recommended).
- [ ] Tool calling gated with explicit allowlist and parameter validation.
- [ ] Graceful fallbacks: user-facing error messages and retry prompts.
Exercises
Build confidence by completing these two tasks. Aim for correctness first, then polish.
-
Exercise 1: Implement a resilient LLM API wrapper with timeout, retries, and idempotency for a request payload.
- Language: Python or JavaScript.
- Retry 429/5xx/timeouts up to 3 times with exponential backoff.
- Use an idempotency key to avoid duplicate processing.
-
Exercise 2: Add a tool call to fetch a product price and combine it with the model's final answer.
- Validate tool arguments against a schema.
- Return a final, user-friendly message with price and currency.
Self-check
- Does your wrapper return a clear error after max retries?
- Can you reproduce idempotency by resending the same key?
- Does tool calling reject malformed arguments?
- Are prompts, params, and outputs logged without leaking PII?
Common mistakes
- Skipping validation: leads to brittle, inconsistent outputs. Fix: enforce JSON schema and reprompt on failure.
- Retrying everything: wastes time on permanent errors (400). Fix: retry only transient failures.
- Client-side secrets: exposes keys. Fix: keep secrets on the server.
- No streaming: users wait too long. Fix: stream where possible.
- Poor logging: hard to debug. Fix: add trace IDs and structured logs.
- No safety filters: risk of sensitive data leakage. Fix: redact and moderate.
How to self-check
- Force a 429 and verify backoff works.
- Send malformed tool arguments and confirm rejection.
- Inspect logs to ensure no raw PII is stored.
Practical projects
- Contextual FAQ bot: Retrieves answers from a knowledge base, uses tool calling for search, streams replies.
- Content assistant: Drafts posts with a style guide, validates JSON output for title/summary/tags.
- Ops triage helper: Summarizes incidents from logs, calls a tool to look up runbooks, returns a step list.
Learning path
- Master robust API calls: timeouts, retries, idempotency.
- Add streaming and user-friendly partial updates.
- Introduce tool calling with strict schemas and guards.
- Layer in safety (redaction, moderation) and observability.
- Automate evaluations and regression tests for prompts.
Next steps
- Complete both exercises and run through the quick test.
- Pick one practical project and ship a minimal, safe MVP.
- Iterate using logs: improve prompts, tools, and guardrails.
Quick test is available to everyone; only logged-in users get saved progress.
Mini challenge
Add a "tone" parameter to your API wrapper (e.g., friendly, formal) and ensure the model respects it. Validate input against an enum and assert that the output starts with a tone-specific greeting.