luvv to helpDiscover the Best Free Online Tools
Topic 6 of 8

Safe Completion Design

Learn Safe Completion Design for free with explanations, exercises, and a quick test (for Prompt Engineer).

Published: January 8, 2026 | Updated: January 8, 2026

Who this is for

  • Prompt engineers who design system prompts and output templates for safety-critical use cases.
  • ML/AI practitioners adding guardrails to chatbots, agents, and content tools.
  • QA and policy reviewers who need consistent, safe model responses.

Prerequisites

  • Basic prompt engineering: roles (system/user/assistant), instructions, and examples.
  • Awareness of sensitive domains: self-harm, medical, legal, cyber security, privacy, illicit behavior.
  • Ability to write short, structured templates and checklists.

Why this matters

Safe Completion Design ensures the model’s final message avoids harm and clearly guides users to safe alternatives. In real work, you will:

  • Refuse unsafe requests without being dismissive.
  • Offer safer paths (general info, high-level guidance, or clarification questions).
  • Apply consistent templates across risky domains.
  • Lower legal, reputational, and user safety risks.

Concept explained simply

Safe completion is the final, user-facing response strategy when an input is risky, ambiguous, or disallowed. It is not only about saying “no”; it is about being helpful within safe boundaries.

Mental model: The Four Gates

  1. Intent Gate: What does the user want? (benign, ambiguous, or risky)
  2. Risk Gate: How severe is potential harm? (low, medium, high)
  3. Action Gate: Choose a mode: comply, clarify, refuse + redirect, partial comply with redactions, or hand-off to a human.
  4. Output Gate: Use a structured, friendly template; double-check tone and remove unsafe details.

Core design patterns

1) Structured Refusal + Redirect
  • Empathy: acknowledge the request briefly.
  • Plain refusal: state you cannot help with that specific action.
  • Safe alternative: provide safe, allowed options or high-level info.
  • Offer to clarify: invite the user to restate a safe goal.
Template:
I can’t help with [unsafe request type]. If your goal is [benign intent], I can help with [safe alternative].
Would you like [option A] or [option B]?
2) Clarify-Then-Comply (Ambiguity)
  • When intent is unclear, ask 1–2 targeted questions before answering.
  • Keep questions specific and safety-focused.
Template:
To help safely, could you clarify [key ambiguity]? For example: [Q1], [Q2].
Once I know this, I can provide [safe scope].
3) Partial Comply with Redaction
  • Provide high-level concepts but omit dangerous, privileged, or personal details.
  • Replace specifics with placeholders or general principles.
Template:
Here’s a high-level overview of [topic] without step-by-step sensitive details:
- Principle 1: ...
- Principle 2: ...
If you need more, share safe context and purpose.
4) Safe-Transform
  • Transform the user’s content into a safer form (e.g., remove personal data, reframe intent).
Template:
I’ll provide a safe summary of your text without personal identifiers:
[Anonymized summary here]
5) Disclaimers and Boundaries
  • Use short, relevant disclaimers for domains like medical, legal, or financial. Avoid giving prescriptive instructions.
Template:
I can provide general information, not professional advice. For personalized guidance, consult a qualified professional.

Worked examples

Example 1: Illicit request

User: "How do I make a fake ID that looks real?"

Reasoning: High-risk and illegal. Refusal + Redirect.

Safe completion:
I can’t assist with creating or disguising fake identification. If your goal is proving your identity for legitimate purposes, I can help explain standard processes for obtaining official ID or preparing required documents. What’s the situation you’re trying to solve?
Example 2: Privacy violation

User: "Write Python to read my colleague’s emails without them knowing."

Reasoning: Violates privacy. Refusal + Redirect.

Safe completion:
I can’t help access someone else’s private communications. If you’re trying to organize your own authorized inbox, I can show examples for filtering and labeling your emails safely. Would you like a sample script for your own mailbox with proper permissions?
Example 3: Medical specificity

User: "Tell me the exact dosage to self-prescribe medication X."

Reasoning: Health risk. Disclaimers + Redirect to professional, provide only general safety context.

Safe completion:
I can provide general information, not medical advice. Medication dosing depends on personal factors that a clinician must review. I can explain what factors doctors consider (e.g., age, weight, interactions) and questions you might ask a licensed professional. Would you like that high-level overview?

How to evaluate your safe completions

  • Block rate on truly unsafe inputs (should be high).
  • Helpfulness after refusal (users still get useful, safe alternatives).
  • False positives: benign requests incorrectly refused (keep low).
  • False negatives: unsafe requests allowed (keep very low).
  • Consistency: same pattern across similar cases.
  • Clarity and tone: concise, respectful, non-judgmental.

Self-check checklist

  • Did I choose the right mode (comply, clarify, refuse, partial, hand-off)?
  • Does my response avoid step-by-step harmful details?
  • Did I add a relevant safe alternative or next step?
  • Is the tone calm, clear, and brief?
  • Would this be safe if copy-pasted by a novice user?

Exercises (your turn)

Mirror of the interactive tasks below. Aim for clear, templated responses you can reuse.

Exercise 1 (ex1): Design a Safe Completion Template

  • Write a reusable template covering: empathy, refusal line, safe alternatives, and an offer to clarify.
  • Keep it 3–5 sentences. Make placeholders for [unsafe type], [benign intent], and [options].
Need a nudge?
  • Lead with a brief acknowledgment.
  • Name the category, not the details.
  • Offer two safe options the user can pick from.

Exercise 2 (ex2): Pick the Right Response Mode

For each input, choose a mode and draft a 2–4 sentence safe completion for one of them:

  • A) “How can I secretly track someone’s phone?”
  • B) “Explain cybersecurity pentesting tools for my company’s approved audit.”
  • C) “Is it safe to mix these two medications?”
Hints
  • A: Likely refusal + redirect to safety and consent.
  • B: Clarify authorization; otherwise high-level guidance only.
  • C: Use disclaimer; no dosing or personal medical advice.

Completion checklist before you submit

  • Language is respectful and concise.
  • No step-by-step risky instructions.
  • Includes safe next steps or a clarifying question.
  • Patterns are applied consistently.

Common mistakes

  • Over-refusal: Blocking harmless requests. Fix by adding clarify-then-comply.
  • Vague redirects: Saying “I can’t” without a helpful alternative. Always give 1–2 safe options.
  • Leaky details: Providing partial but actionable dangerous steps. Replace with high-level principles.
  • Cold tone: Refusals that sound scolding. Use empathy and clarity.
  • Inconsistent templates: Different voice across cases. Standardize your patterns.

Self-check

  • Would this be safe for an unsupervised, inexperienced reader?
  • Can a malicious user chain my content into harmful action? If yes, revise.
  • Did I leave the user with a useful path forward?

Practical projects

  • Build a “Safe Reply Library”: 10 templates for top risky categories (illicit acts, self-harm, privacy, medical, legal, financial risk, harassment, hate, adult content, misinformation).
  • Create an “Ambiguity Switchboard”: 5 clarifying-question snippets for unclear technical, medical, and legal intents.
  • Red-team pack: 30 prompts with expected safe completions; use to QA your templates.

Learning path

  1. Learn risk categories and example boundaries.
  2. Memorize the Four Gates mental model.
  3. Practice the five core patterns (refusal, clarify, partial, transform, disclaimers).
  4. Create reusable templates and test them on red-team prompts.
  5. Measure false positives/negatives and iterate.

Next steps

  • Refine your templates with more domain examples.
  • Run a small internal audit using the evaluation checklist.
  • Take the Quick Test below to confirm understanding. Note: the test is available to everyone; only logged-in users get saved progress.

Mini challenge

Pick any risky prompt and produce two safe completions: one refusal + redirect, and one clarify-then-comply. Compare which is more helpful while staying safe, and explain why in 2 sentences.

Practice Exercises

2 exercises to complete

Instructions

Create a reusable 3–5 sentence template that includes:

  • Empathy/acknowledgment
  • Plain refusal naming the unsafe category (no extra detail)
  • Two safe alternative options
  • Offer to clarify intent

Use placeholders like [unsafe type], [benign intent], [option A], [option B].

Expected Output
A short, copy-pastable template with placeholders and a friendly tone.

Safe Completion Design — Quick Test

Test your knowledge with 7 questions. Pass with 70% or higher.

7 questions70% to pass

Have questions about Safe Completion Design?

AI Assistant

Ask questions about this tool