Why this matters
As a Prompt Engineer, you often need the model to return machine-parseable outputs that feed into pipelines, dashboards, or downstream code. Typical tasks include:
- Extracting entities from messy text into strict JSON for ETL.
- Generating CSV rows for bulk imports (products, contacts, tickets).
- Creating XML/YAML payloads for integrations or configuration.
- Producing consistent schemas for evaluation datasets and test harnesses.
- Enforcing domain-specific formats (e.g., ICD codes, ISO country codes).
Getting the format wrong causes parsing errors, broken automations, and wasted review time. This lesson shows how to reliably lock formats.
Who this is for
- Prompt Engineers and Data/ML folks who hand off LLM outputs to code.
- Analysts building structured datasets from unstructured sources.
- Anyone orchestrating multi-step LLM workflows with strict formats.
Prerequisites
- Basic familiarity with JSON/CSV/XML.
- Comfort writing concise, explicit prompts.
- Optional: experience parsing data with your favorite language.
Concept explained simply
Think of structured prompting as a contract:
- Contract: You define a schema and exact format rules.
- Serializer: The model fills the schema with content.
- Validator: You (or your code) check the output strictly.
Mental model
Structure first, content second. Always tell the model what the container looks like before what to put inside. Use strong constraints and reminders.
Format-lock phrases you can reuse
Return ONLY valid JSON. No comments, no code fences, no extra text.
If unsure, use null. Use double quotes for all keys and strings.
Return ONLY CSV. First row is header. No extra lines. Use commas.
Return ONLY XML. Encode special characters (& < > ") correctly.
Patterns and templates
JSON template
System: You are a formatter that outputs only valid JSON.
User: Extract fields from the text below.
Rules:
- Output ONLY a single JSON object.
- Keys: ["title","tags","priority","due_date","description"]
- Types: title:string, tags:array of strings, priority: one of ["low","medium","high"],
due_date: ISO date string (YYYY-MM-DD) or null, description:string
- Do not include explanations.
Text: "...paste text..."
CSV template
System: You output only CSV.
User: Create product rows.
Rules:
- First row is the header: id,name,price_usd,category
- Exactly 3 rows follow.
- No extra lines, no quotes unless needed, comma separator.
Input: "..."
XML template
System: Output only XML, UTF-8.
User: Produce <ticket> with child tags: title, severity, owner, steps (list of step), and labels (list of label).
Rules:
- Wrap in a single root <ticket>.
- Escape special characters.
- severity in {"low","medium","high"}.
Input: "..."
Worked examples
Example 1 — JSON extraction from messy text
Goal: Extract shipping details into strict JSON.
Prompt
Return ONLY valid JSON. No code fences or comments.
Keys and types:
- order_id: string
- items: array of { sku:string, qty:integer }
- ship_to: { name:string, city:string, country_iso2:string }
- express: boolean
If data missing, use null (and empty array for items).
Text:
"Order #A-1049. Need 2x SKU-XL-BLK and one SKU-CAP. Ship: Sam Lee, Berlin, DE. Rush shipping please!"
Possible output
{
"order_id": "A-1049",
"items": [
{"sku": "SKU-XL-BLK", "qty": 2},
{"sku": "SKU-CAP", "qty": 1}
],
"ship_to": {"name": "Sam Lee", "city": "Berlin", "country_iso2": "DE"},
"express": true
}
Example 2 — CSV generation with header
Goal: Generate import-ready product CSV.
Prompt
Return ONLY CSV. First row is header: id,name,price_usd,category
Exactly 3 rows follow. Use integers for id and price with two decimals.
No extra lines.
Products to list: minimalist chair, standing desk, task lamp
Categories: furniture, furniture, lighting
Suggested prices: 89.00, 359.00, 49.00
Possible output
id,name,price_usd,category
1,Minimalist Chair,89.00,furniture
2,Standing Desk,359.00,furniture
3,Task Lamp,49.00,lighting
Example 3 — XML configuration
Goal: Create an XML ticket for a support system.
Prompt
Return ONLY XML. Root <ticket>.
Required children: <title> <severity> <owner> <steps> <labels>.
- severity: one of low, medium, high
- steps: multiple <step> nodes
- labels: multiple <label> nodes
No comments, no prolog, no extra text.
Issue: App crashes after clicking "Export" on large files.
Owner: d.chen
Labels: export, crash, priority
Severity: high
Possible output
<ticket>
<title>Crash when exporting large files</title>
<severity>high</severity>
<owner>d.chen</owner>
<steps>
<step>Open app</step>
<step>Load large file (>500MB)</step>
<step>Click Export</step>
</steps>
<labels>
<label>export</label>
<label>crash</label>
<label>priority</label>
<labels>
</ticket>
How to write robust prompts
- State the format first: “Return ONLY valid JSON/CSV/XML.”
- Specify keys/columns and types.
- Constrain values: enumerations, ISO codes, regex hints.
- Define missing-data behavior: null, empty array, or empty string.
- For CSV: header row, separator, quoting rules, exact row count.
- For XML: root element name, element order, escaping rules.
- Ban extra text: no explanations, no code fences, no comments.
- Add a last-line reminder: “If unsure, output null fields, not explanations.”
Common mistakes and self-check
- Including code fences or explanations around the data. Fix: explicitly say “no code fences, no extra text.”
- Smart quotes or trailing commas in JSON. Fix: ask for double quotes only; avoid comments.
- Wrong separators in CSV (e.g., semicolons). Fix: name the separator.
- Missing header row in CSV. Fix: explicitly require it.
- Invalid enums (e.g., severity: urgent). Fix: list allowed values.
- XML special characters not escaped. Fix: mention escaping explicitly.
Self-check before you use the output
- JSON: Can it parse with a strict JSON parser? Are all keys present?
- CSV: Exactly one header row and N data rows? Right separator? No extra blank lines?
- XML: Single root? Valid nesting? Special characters escaped?
Exercises
Note: Everyone can take the exercises and quick test. Only logged-in users have their progress saved.
Exercise 1 — JSON incident report
Create a prompt that converts a freeform incident note into strict JSON with this schema:
{
"id": string,
"severity": one of ["low","medium","high"],
"services": array of strings,
"started_at": ISO-8601 datetime or null,
"impact_summary": string
}
Input text to handle:
INC-9087 Major outage on checkout + payments since 09:14 UTC. Users report 5xx. Affected: checkout, payments. Severity: HIGH.
- Write the full prompt that enforces the schema and bans extra text.
- Then provide an example of a correct model output.
Show a sample solution
Prompt:
Return ONLY valid JSON. No code fences or explanations.
Schema and rules:
- Keys: id (string), severity ("low"|"medium"|"high"), services (array of strings),
started_at (ISO-8601 string or null), impact_summary (string)
- Use double quotes for all keys and strings.
- If a value is uncertain, use null.
Text:
"INC-9087 Major outage on checkout + payments since 09:14 UTC. Users report 5xx. Affected: checkout, payments. Severity: HIGH."
Expected output example:
{
"id": "INC-9087",
"severity": "high",
"services": ["checkout", "payments"],
"started_at": "2026-01-08T09:14:00Z",
"impact_summary": "Users receive 5xx errors on checkout and payments."
}
Exercise 2 — CSV product feed
Write a prompt that makes the model output ONLY CSV for three rows of products with this exact header and rules:
Header: sku,name,price_usd,in_stock
Rules:
- Exactly 3 data rows.
- price_usd has two decimals.
- in_stock is true or false (lowercase).
- Use comma as separator.
- No extra lines or spaces.
Input products: snow boots 79.99 true; rain jacket 59.00 false; thermal socks 9.50 true
Show a sample solution
Prompt:
Return ONLY CSV. First row is header: sku,name,price_usd,in_stock
Exactly 3 rows follow. Use comma as separator. No extra lines.
- price_usd has two decimals.
- in_stock is true or false.
Data:
- snow boots | SKU-SB-001 | 79.99 | true
- rain jacket | SKU-RJ-002 | 59.00 | false
- thermal socks | SKU-TS-003 | 9.50 | true
Example output:
sku,name,price_usd,in_stock
SKU-SB-001,Snow Boots,79.99,true
SKU-RJ-002,Rain Jacket,59.00,false
SKU-TS-003,Thermal Socks,9.50,true
Exercise checklist
- Format-lock phrase present at the top.
- Keys/columns and types are explicit.
- Enumerations or allowed values are listed.
- Missing-data behavior is defined.
- No code fences, comments, or extra explanations.
Mini challenge
Design a prompt that extracts a job posting into JSON Lines (one line per variant) with fields: title, company, location, salary_min, salary_max, currency, remote (boolean), skills (array). Require EXACTLY 2 variants: a strict extraction and a normalized version (e.g., inferred salary if missing → null). Output must be two JSON objects separated by a single newline, no extra text.
Tip
- State: “Output ONLY two JSON objects, one per line.”
- Define types and allowed currencies.
- For missing salary, use nulls and keep currency consistent if known.
Learning path
- Master format-lock prompts (JSON/CSV/XML) and strict rules.
- Add value constraints and validation thinking (enums, ISO formats).
- Combine with evaluation: test prompts on edge cases and malformed inputs.
Practical projects
- Resume parser: Convert resumes into a hiring JSON schema and build a small validator.
- Support triage: Turn chat transcripts into CSV tickets with severity and tags.
- Catalog normalizer: Generate clean product feeds (CSV) from supplier PDFs.
Next steps
- Introduce automatic validation in your pipeline and iterate prompts when parsing fails.
- Add domain constraints (industry codes, region lists, internal IDs) to improve reliability.
- Prepare a prompt library of reusable format-lock templates for your team.