How to learn Schema Design For JSON And Tables for Domain Adaptation And Knowledge in Prompt Engineer for free

Why this matters

As a Prompt Engineer, you often ask models to extract or generate structured data (JSON) and sometimes store it in tables for analytics, retrieval, or evaluation. Good schema design prevents brittle prompts, reduces parsing errors, and makes downstream work—like scoring outputs or joining with user data—fast and reliable.

Build reliable extraction prompts that always return the same shape.
Design tables that let you quickly query results and evaluate model quality.
Version and evolve schemas without breaking production flows.

Concept explained simply

A schema is the agreed “shape” of data. For JSON, it’s the set of fields, types, and allowed values. For tables, it’s columns and relationships. A clear shape means fewer surprises and easier automation.

Mental model

Objects (nouns) = rows or JSON objects.
Relationships (verbs/links) = foreign keys or join tables.
Events (things that happen) = append-only log with timestamps and actor IDs.
Think read-first: design your shape around the questions you must answer later.

Design workflow

Define use cases: What questions must the data answer? What UI needs it?
List entities and fields: Required vs optional. Enumerate allowed values.
Choose storage: JSON when shape varies or for model I/O. Tables when you need joins, constraints, and fast queries.
Add identifiers: Stable IDs, timestamps, source, and schema_version.
Plan evolution: Default values, backward-compatible changes, deprecation path.

Worked examples

Example 1: FAQs extraction (JSON + tables)

Task: Extract FAQs from product pages for a help assistant.

JSON output shape for the model:

{
  "schema_version": "1.0",
  "source_url": "string",
  "faqs": [
    { "question": "string", "answer": "string", "tags": ["string"], "confidence": 0.0 }
  ]
}

Relational storage for analytics:

table: faq_source(source_id PK, source_url, crawled_at)
table: faq(faq_id PK, source_id FK, question, answer, confidence, created_at)
table: faq_tag(tag_id PK, name UNIQUE)
table: faq_to_tag(faq_id FK, tag_id FK, PRIMARY KEY(faq_id, tag_id))

Why: Variable-length tags fits JSON nicely for model I/O; normalized tables give fast tag filters and deduplication.

Example 2: Tool-call logging (events)

Task: Log tool invocations from an agent for debugging and metrics.

JSON event envelope:

{
  "schema_version": "1.1",
  "event_id": "uuid",
  "session_id": "uuid",
  "type": "tool_call",
  "timestamp": "ISO-8601",
  "tool": "string",
  "input": {},
  "output": {},
  "latency_ms": 0,
  "success": true
}

Tables:

session(session_id PK, user_id, started_at)
event(event_id PK, session_id FK, type, timestamp, tool, latency_ms, success, payload_json JSON)

Why: Keep payload_json flexible; index (session_id, timestamp) for fast timelines.

Example 3: Evaluation results (rubric + scores)

Task: Score model answers on correctness, tone, and safety.

JSON (LLM output) shape:

{
  "schema_version": "1.0",
  "overall": {"label": "pass" | "fail", "rationale": "string"},
  "criteria": [
    {"name": "correctness", "score": 0-5, "notes": "string"},
    {"name": "tone", "score": 0-5, "notes": "string"},
    {"name": "safety", "score": 0-5, "notes": "string"}
  ]
}

Tables:

eval_run(run_id PK, prompt_id, model, created_at)
eval_item(item_id PK, run_id FK, input_ref, overall_label, overall_rationale)
eval_score(score_id PK, item_id FK, criterion, score, notes)

Why: JSON is easy for the model; tables enable slice-and-dice (by model, criterion, time).

JSON patterns you’ll reuse

Discriminator field for union types: { "type": "quote" | "summary", "data": {...} }
Required vs optional: explicitly list required keys and provide defaults in your post-processor.
Enumerations: constrain fields like status to known values.
IDs and timestamps: include id, created_at, and source.
Versioning: top-level schema_version for safe evolution.
Numbers as numbers: avoid numeric strings unless IDs.

Table patterns that keep you fast

Normalization for relationships: separate entities, use join tables for many-to-many.
Composite indexes: index by common filters like (user_id, created_at).
Audit-friendly: append-only event logs; do not overwrite model outputs.
JSON columns for flexibility: store raw model payload in JSON alongside extracted columns you query often.

Prompting LLMs to emit valid JSON

Use short, strict instructions and show one perfect example.

Reusable instruction block

Return JSON only. No prose.
Keys and types:
- schema_version: string (fixed "1.0")
- items: array of { id: string, name: string, category: one of ["a","b","c"], score: number }
If a value is unknown, use null. Do not add extra keys.

State allowed values and units.
Say what to do when unknown (null or empty array).
Ask for JSON only, no explanations.

Validation and resilience

Post-validate: check required keys, types, and enums.
Coerce safe fixes: trim strings, parse numbers, default nulls.
Keep the raw text and raw JSON for debugging.
Log validation errors with schema_version and sample payload.

Exercises

Mirror of the interactive tasks below. Aim to keep outputs short and precise.

Exercise 1 (ex1): Product comparison extractor

Design a JSON output shape for an LLM that compares two laptops and returns a structured verdict. Include: product names, key specs (cpu, ram_gb, storage_gb, weight_kg), pros (array of strings per product), and an overall recommendation with rationale and confidence (0-1). Add schema_version.

Define required vs optional fields.
Constrain units (e.g., RAM in GB).
Include a discriminator for recommendation.label (one of: "A", "B", "tie").

Peek a hint

Keep numbers numeric; use null if unknown.
Top-level metadata like schema_version helps migration.

Exercise 2 (ex2): From JSON to tables

You receive JSON survey results with items scored 0–5 and tags. Propose a minimal set of tables to store runs, individual items, and tags, plus one index that makes filtering by tag and date fast. Sketch how one JSON item maps to rows.

Identify primary keys.
Design the many-to-many between items and tags.
Choose a composite index.

Peek a hint

Join table pattern: item_to_tag(item_id, tag_id).
Index by (created_at, tag_id) or (tag_id, created_at).

Self-check checklist

All required fields identified and justified.
Units and enums are explicit.
IDs and timestamps present where meaningful.
Tables support the top queries you expect.

Common mistakes

Allowing free-form values where enums are needed. Self-check: Can you definitively list allowed states?
Using strings for numbers. Self-check: Will you sort or compute on it? If yes, it should be numeric.
No schema_version. Self-check: How would you add a field without breaking clients?
Over-normalizing early. Self-check: Are you joining three tables for a simple report? Consider denormalizing a few columns.
Missing indexes. Self-check: Which WHERE clauses appear in your queries? Index those columns.

Practical projects

Build a “document to JSON” extractor for policies, then load into tables and run analytics by clause type.
Create an agent event log (JSON) with a dashboard backed by tables for latency and error rates.
Design an evaluation rubric (JSON) and leaderboard (tables) comparing two models over time.

Mini challenge

Given a content moderation task that returns labels {safe, borderline, unsafe} with reasons and spans (start, end), design both a JSON output and minimal tables to audit decisions per user per day. Keep write-speed high and queries simple.

Think it through

JSON: include schema_version, text_id, label (enum), reasons, spans: [{start, end, note}].
Tables: moderation_event(event_id, user_id, text_id, label, created_at, payload_json), moderation_span(event_id FK, start, end, note). Index (user_id, created_at).

Learning path

Learn JSON shape design and enums.
Practice normal forms and join tables.
Combine: design JSON for model I/O and mirrored tables for analytics.
Add validation and versioning.
Optimize with indexes and selective denormalization.

Who this is for

Prompt Engineers turning model outputs into usable datasets.
Data/ML practitioners integrating LLM I/O with analytics.

Prerequisites

Basic JSON familiarity.
Intro SQL knowledge (SELECT, JOIN).
Comfort writing clear prompts.

Next steps

Refine a schema you already use: add enums, IDs, and versioning.
Add a composite index to your most frequent query.
Set up a validator that rejects unexpected keys.

Quick Test

Everyone can take the test. If you are logged in, your progress is saved.

Menu

Schema Design For JSON And Tables

Table of Contents