Why this matters
Prompt engineers ship reliable AI features. Knowing what models can and cannot do lets you pick the right prompt pattern, add guardrails, and avoid costly failures. Real tasks include:
- Designing prompts that extract structured data from messy text without hallucinations.
- Summarizing long content within a context window limit.
- Building assistants that admit uncertainty when data is missing or too recent.
- Reducing variability across runs for production stability.
Concept explained simply
Large language models (LLMs) are pattern-completers: they continue text based on context. They are great at language understanding and generation, but they aren’t databases, calculators, or search engines by default. Treat them as smart text predictors with helpful latent skills and known error rates.
Mental model
- Autocomplete with a brain: powerful at recognizing and producing patterns learned from data.
- Bounded working memory: only the context window is fully in play; older tokens get truncated.
- Probabilistic: outputs vary; temperature and sampling control randomness.
- Skill adapters: examples, structure, and tools shape outputs into dependable behavior.
What models are good at today
- Summarization, rewriting, and tone/style transformation.
- Information extraction with clear schema and constraints.
- Drafting content and code with guidance and examples.
- Lightweight reasoning and planning when steps are short and verifiable.
- Following format instructions when conflicts are minimized and prompts are concise.
Key limits you must design around
- Hallucination: the model may invent facts when unsure. Prevent by constraining scope, requiring sources when available, and allowing 'I don’t know'.
- Knowledge gaps: training cutoffs and missing domain data. Add retrieval or ask the model to admit uncertainty.
- Context window: long inputs may be truncated; earlier instructions can be forgotten. Summarize and chunk.
- Non-determinism: repeated runs can differ. Lower temperature and add structure for consistency.
- Arithmetic and exactness: basic math is error-prone without tools. Prefer calculators or step-checked logic.
- Safety and bias: models may refuse or produce biased content. Clarify benign intent and filter sensitive requests.
- Length bias and formatting drift: long outputs drift from schema. Keep instructions short and enforce structure.
- Latency and cost: bigger prompts and models are slower. Right-size the model and prompt.
Worked examples
Example 1: Summary within limits
Goal: Summarize a long meeting transcript reliably.
Prompt pattern
Task: Summarize the meeting objectively. Constraints: - Max 120 words, bullet points. - No new facts. If uncertain, say 'unclear'. - Sections: Decisions, Action items, Risks. Input: [Transcript chunk here] Output format: - Decisions: ... - Action items: ... - Risks: ...
Why it works
We restricted length, banned fabrication, and defined clear sections. This reduces drift and hallucination.
Example 2: Schema-true extraction
Goal: Extract product review fields.
Prompt pattern
You are extracting fields. Output JSON only with keys: product, rating_int (1-5), sentiment in {'positive','neutral','negative'}, pros (array), cons (array), mentions_refund (true/false).
If a field is missing, use null. Do not infer brand names not stated.
Text: [review]
Return JSON only.
Why it works
We specified exact keys, value ranges, and a 'null if missing' rule to avoid inventions.
Example 3: Arithmetic and tool use
Goal: Calculate a discount and tax reliably.
Prompt pattern
Task: Compute final price. Given: price=129.99, discount=15%, tax=8.875%. Steps: 1) Subtotal = price * (1 - discount) 2) Final = Subtotal * (1 + tax) Show your steps and final rounded to 2 decimals. If unsure, say 'cannot compute'.
Why it works
We forced explicit steps and allowed refusal if uncertain. For production, prefer a calculator tool or verification step.
Practical heuristics and knobs
- Constrain the task: Define scope, format, and length. Prefer 'null if missing' over guessing.
- Tune randomness: Lower temperature for extraction and tests; allow slightly higher for brainstorming.
- Chunk and stage: Break big tasks into smaller steps. Summarize before analyzing.
- Use examples: Provide 1–3 short, representative few-shot examples for tricky formats.
- Verify: Ask for intermediate checksums, validation flags, or confidence levels.
Exercises (practice)
Do these now. They mirror the exercises below and the Quick Test at the end.
Exercise 1 — Spot capability vs limit
For each scenario, decide if this is a strength to leverage or a limit to mitigate, and state one prompt tactic.
- A: Write a friendly email reply from bullets.
- B: Give the latest stock price for a company.
- C: Convert 200 reviews into a CSV with fixed columns.
- D: Solve multi-step financial math with exact cents.
Exercise 2 — Rewrite for structured extraction
Extract fields from this review text into strict JSON: 'The bag is sturdy but the zipper stuck twice. I might return it.' Fields: product, rating_int, sentiment, pros[], cons[], mentions_refund.
Exercise 3 — Guardrails for unknowns
Write a prompt template for answering product FAQs with rules: admit uncertainty, avoid made-up specs, request more info if ambiguous, include a confidence label.
Self-check checklist
- Did you add 'null if missing' rules for unknown fields?
- Did you specify output schema and forbid extra text?
- Did you limit length and define sections for summaries?
- Did you control randomness for deterministic tasks?
- Did you include a safe way to say 'I don’t know'?
Common mistakes and how to self-check
- Mistake: Overlong prompts with conflicting rules. Fix: Keep core rules short; move examples after rules.
- Mistake: Asking for facts beyond training data. Fix: Allow refusal or request for sources/data.
- Mistake: Schema drift in long outputs. Fix: Strict JSON-only instruction; keep responses short; chunk large jobs.
- Mistake: Expecting perfect math. Fix: Use explicit steps or external calculation.
- Mistake: Ignoring context limits. Fix: Summarize and stage; avoid repeating entire histories.
Mini challenge
Design a prompt that converts a batch of 5 short support tickets into a table with columns: category, urgency (low/med/high), needs_human (true/false). Include strict rules for ambiguity, a 100-word total limit, and a 'null if missing' policy.
Who this is for
- Aspiring and practicing prompt engineers building LLM-powered features.
- Data/ML engineers and analysts who need reliable text processing.
Prerequisites
- Basic understanding of prompts, temperature, and context windows.
- Comfort with JSON and simple data schemas.
Learning path
- Start: Understanding model capabilities and limits (this lesson).
- Next: Structured prompting patterns and validation.
- Then: Retrieval and tool-assisted prompting for reliability.
Practical projects
- Review-to-JSON pipeline with validation flags and confidence scores.
- Meeting summarizer that enforces sections and word caps.
- FAQ assistant that admits uncertainty and asks clarifying questions.
Next steps
- Complete the exercises and take the quick test.
- Apply these patterns to one real dataset you own.
Note on progress
The quick test is available to everyone; only logged-in users get saved progress.