Why this matters
LLMs pick words statistically, not by your domain standards. A domain glossary and a ruleset turn fuzzy language into consistent, safe outputs. As a Prompt Engineer, you will often:
- Define preferred terms and forbid confusing synonyms (e.g., patient vs. client).
- Standardize units, dates, and capitalization (e.g., mmHg, ISO dates).
- Set tone and structure (e.g., short bullet summaries first, then details).
- Redact sensitive information (e.g., remove emails, phone numbers, IDs).
- Add disclaimers or required phrases.
- Create replacement rules for common errors (e.g., glucose units mg/dL, not mg/mL).
Who this is for
- Prompt Engineers and Applied ML/AI professionals who need reliable, domain-accurate outputs.
- Technical writers and product managers building LLM features for specific industries.
Prerequisites
- Basic prompt design skills (system vs. user prompts, few-shot examples).
- Familiarity with the domain you are targeting, or access to a subject-matter expert (SME).
- Comfort reading small corpora (docs, tickets, emails) to extract terms.
Concept explained simply
A domain glossary is a living list of terms with preferred wording and examples. Rules are enforceable instructions that shape style, formatting, safety, and structure. Together, they act like a language contract for the model.
Mental model: Three gates for every output
- Vocabulary Gate: Does the output use the right terms and definitions?
- Format Gate: Are units, dates, numbers, and capitalization standardized?
- Policy Gate: Is the content safe, on-brand, and compliant (redaction, disclaimers)?
How to build a domain glossary and rules
- Collect raw language: samples of real tasks, tickets, docs, and user intents.
- List candidates: domain nouns, verbs, abbreviations, units, names, and forbidden terms.
- Decide preferred forms: pick one label per concept; add synonyms to map to it.
- Write crisp definitions: one-sentence, user-facing, with do/don’t examples.
- Add formatting policies: dates, numbers, currency, units, capitalization.
- Add guardrails: redaction, banned phrases, mandatory disclaimers.
- Test with real prompts: run 10–20 tasks; note failures; refine entries and rules.
- Version and track changes: mark date, owner, and reason for changes.
Glossary entry template (copy-friendly)
- Term: Preferred label
- Definition: One clear sentence
- Synonyms map to: [list]
- Forbidden terms: [list + reason]
- Examples (good/bad): 1–2 short lines
- Formatting: capitalization, plural, unit or symbol
- Notes: domain-specific nuance
Ruleset template (copy-friendly)
- Style: tone (e.g., concise, confident), voice (active), and structure (summary first).
- Formatting: dates (YYYY-MM-DD), numbers (1,234.56), currency (USD), units (SI).
- Vocabulary enforcement: use preferred terms; auto-replace synonyms.
- Safety: redact PII/PHI patterns; avoid giving medical/financial advice if prohibited.
- Disclaimers: append required statements when specific topics occur.
- Validation & fallback: if unknown term, ask a clarifying question instead of guessing.
Worked examples
1) Healthcare telemedicine notes
- Glossary snippet:
- Term: Blood pressure (BP). Definition: Arterial pressure measured in mmHg.
- Synonyms map: bp, b.p., bloodpressure → Blood pressure (BP)
- Formatting: Use mmHg; show as systolic/diastolic (e.g., 120/80 mmHg).
- Forbidden: “tension” (ambiguous).
- Rules:
- Units: Use SI; BP in mmHg only; glucose mg/dL.
- Safety: Remove names, phone numbers, account IDs from outputs.
- Structure: Summary (1–2 bullets), then Observations, then Recommendations.
- Disclaimer: Add “This summary is informational and not a diagnosis.”
2) E-commerce fashion product pages
- Glossary snippet:
- Term: SKU. Definition: Internal stock-keeping unit identifier.
- Synonyms map: item code, stock id → SKU
- Term: Care instructions. Definition: Steps for washing/drying/ironing.
- Forbidden: “100% guarantee” (legal risk).
- Rules:
- Tone: Positive and factual; no superlatives without evidence.
- Sizes: Use EU/US conversions consistently; include a size guide note.
- Formatting: Title Case for product names; bullet lists for features.
- Safety: Remove emails/phone numbers from seller notes.
3) Financial risk reports
- Glossary snippet:
- Term: Value at Risk (VaR). Definition: Maximum expected loss over a period at a confidence level.
- Synonyms map: value-at-risk → Value at Risk (VaR)
- Formatting: Show confidence as %, horizon in days.
- Forbidden: “guaranteed returns”.
- Rules:
- Numbers: Use two decimals; thousands separators.
- Dates: YYYY-MM-DD.
- Structure: Executive summary → Key metrics table (described in text) → Risks → Notes.
- Compliance: Add “Not investment advice.” if recommendations appear.
Common mistakes and self-check
- Mistake: Vague definitions. Fix: One-sentence, test with a non-expert.
- Mistake: Too many synonyms left ungrouped. Fix: Map all to a single preferred term.
- Mistake: Inconsistent units/dates. Fix: Declare one canonical format.
- Mistake: Missing safety rules. Fix: Add redaction + disclaimers for risky topics.
- Mistake: Never testing on real tasks. Fix: Validate with 10–20 real prompts and iterate.
Self-check before rollout:
- Can a new team member use the glossary without asking questions?
- Do 95%+ outputs follow units, dates, and tone automatically?
- Are sensitive fields reliably redacted in tests?
Step-by-step mini workflow
- Extract 30–100 candidate terms from real docs.
- Cluster into 10–25 core concepts with preferred labels.
- Write definitions and examples; specify forbidden terms.
- Add rules for tone, structure, formats, and safety.
- Test on representative tasks; note deviations; update rules.
Exercises
Do these hands-on tasks. The Quick Test at the end checks your understanding. Everyone can take the test; logged-in users will see saved progress.
Exercise 1: Draft a mini glossary and rules for an EdTech Q&A bot
Task: Create a 10–12 term glossary for a university EdTech Q&A bot (topics: enrollment, tuition payments, deadlines). Add 6–8 rules for tone, structure, dates, and safety.
- Include preferred terms, synonyms mapping, and forbidden terms with reasons.
- Define date format and sensitive info redaction.
Expected output: A compact glossary list and a numbered ruleset.
Hints
- Prefer “tuition” over “school fees”; map synonyms.
- Use YYYY-MM-DD; redact student ID, email, phone.
Show solution
Sample solution (abridged):
- Tuition. Definition: Cost charged per term for enrollment. Synonyms: school fees → Tuition. Forbidden: “price” (too generic).
- Enrollment. Definition: Officially registering for courses. Synonyms: sign-up, registration → Enrollment.
- Financial aid. Definition: Grants, scholarships, or loans that reduce tuition.
- Registrar. Definition: Office managing academic records and registration.
- Student ID. Definition: Unique identifier for a student. Formatting: 8 digits. Safety: redact in outputs.
- Deadline. Definition: Last acceptable date for an action. Formatting: YYYY-MM-DD.
- Waitlist. Definition: Queue for course seats when full.
- Prerequisite. Definition: Course required before another course.
- Semester. Definition: Academic period (Fall, Spring, Summer).
- Transcript. Definition: Official academic record.
- Tone: Helpful, neutral, concise; use active voice.
- Structure: Summary (1–2 lines) → Steps → Notes.
- Dates: Use YYYY-MM-DD (e.g., 2026-09-01).
- Numbers: Use thousands separators and USD for currency.
- Safety: Redact emails, phone numbers, and Student IDs.
- Vocabulary: Use preferred terms; replace “school fees” → “tuition”.
- Fallback: If a policy differs by campus, ask a clarifying question.
Exercise 2: Create normalization rules for units and dates
Task: Define 6–10 normalization rules for a hardware support bot. Topics: measurements (cm/in), weights (kg/lb), timestamps, product codes.
Expected output: A numbered list of rules with examples.
Hints
- Pick one canonical unit, and always show conversions in parentheses.
- Use 24-hour time and ISO dates.
Show solution
- Lengths: Canonical cm; if input in inches, show both (e.g., 25.4 cm (10 in)).
- Weight: Canonical kg; show lb in parentheses with one decimal.
- Temperature: Canonical °C; if °F provided, show both with one decimal.
- Dates: YYYY-MM-DD only; convert from other formats.
- Time: 24-hour, local timezone noted (e.g., 16:30 local).
- Product code: Uppercase, hyphenated (e.g., ZX-120B); strip spaces.
- Decimals: Use period as decimal separator; thousands separators for numbers ≥ 1000.
- Ranges: Use en dash with spaces (e.g., 5 – 10 cm).
Build-and-review checklist
- 10–25 core terms with definitions and examples.
- Synonyms and forbidden terms mapped for each concept.
- One canonical format for dates, numbers, currency, and units.
- Safety rules: redaction + required disclaimers.
- Structure and tone rules with 1–2 examples.
- Validated on 10–20 real tasks; issues logged and fixed.
Practical projects
- Create a glossary and ruleset for a customer support chatbot in cybersecurity (incident, alert, severity, playbook). Validate on 15 tickets.
- Standardize product specs for an IoT catalog; normalize units and add safety (remove device serials).
- Build a ruleset for internal meeting summaries: agenda-first, decisions, action items with owners and due dates.
Learning path
- Start: Collect corpus → extract terms → draft glossary entries.
- Add rules: tone, structure, formatting, and safety.
- Prototype: Run samples through your prompts using the glossary + rules.
- Evaluate: Compare outputs against checklist; iterate.
- Scale: Version the document; share with team; schedule updates.
Mini challenge
Pick any public domain (e.g., travel itineraries). Write 5 glossary entries with synonyms/forbidden terms and 5 rules. Test on two sample prompts. Note one improvement you’d make after reviewing outputs.
Next steps
- Integrate your glossary and rules into your system or prompts as system instructions.
- Schedule a monthly review with an SME to update entries and rules.
- Proceed to the next subskill after passing the Quick Test.
Quick Test
The Quick Test is available to everyone. Logged-in users will see saved progress.