Who this is for
- NLP Engineers building assistants, agents, and RAG systems that call APIs or internal tools.
- Developers adding reliable function calls to LLM apps.
- Data/ML engineers connecting LLMs to search, databases, and automation.
Prerequisites
- Basic prompt engineering (system/user messages, role of examples).
- JSON fundamentals and simple schema/typing (strings, numbers, enums, required fields).
- High-level understanding of RAG (retrieval + generation loop).
Why this matters
Real NLP Engineer tasks require LLMs to take actions, not just chat. Examples:
- Call a flight search API, then summarize options for the user.
- Query a product database via SQL tool and compute totals.
- Run retrieval to ground answers in your documents (RAG) and cite sources.
- Chain tools: search → fetch page → extract → answer.
Getting tool use right means higher accuracy, less hallucination, and production-ready workflows.
Concept explained simply
Tool use (function calling) lets an LLM output a structured request that your application executes. The model does planning and argument-filling; your app does the actual action and returns results for the model to use in its final answer.
Mental model
Think of the LLM as a conductor:
- It reads the audience request (user message).
- It picks which instrument (tool) should play.
- It writes the sheet music (arguments) for that instrument.
- Your app plays the instrument and returns the sound (tool result).
- The conductor arranges the final performance (answer), sometimes repeating the loop with other instruments.
Core components
- Tool registry: names, clear descriptions, and a JSON parameter schema.
- Routing policy: when to call a tool vs. answer directly; how to ask for clarification.
- Execution engine: safely run the tool, validate inputs, handle timeouts/errors.
- Observation handling: feed tool results back to the model to continue reasoning.
- Finalizer: generate the user-facing answer grounded in tool outputs.
Worked examples
Example 1: Single tool — Weather
Tool declaration:
{
"name": "get_weather",
"description": "Get current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name, e.g., London"},
"unit": {"type": "string", "enum": ["C", "F"], "description": "Temperature unit"}
},
"required": ["city"]
}
}Possible model tool call:
{
"tool": "get_weather",
"arguments": {"city": "Lisbon", "unit": "C"}
}App executes and returns observation:
{
"tool": "get_weather",
"result": {"city": "Lisbon", "tempC": 21, "conditions": "Clear"}
}Final answer uses the observation: "It's 21°C and clear in Lisbon."
Example 2: SQL query tool — Products total
Tool:
{
"name": "run_sql",
"description": "Run a safe, parameterized SQL query on the analytics DB.",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "SQL with placeholders"},
"params": {"type": "array", "items": {"type": ["string", "number"]}}
},
"required": ["query"]
}
}User: "Total revenue for 2023 Q4?" Model call:
{
"tool": "run_sql",
"arguments": {
"query": "SELECT SUM(amount) AS total FROM revenue WHERE quarter = ? AND year = ?",
"params": ["Q4", 2023]
}
}Observation:
{"total": 1284000}Final answer: "Total revenue for 2023 Q4 is 1,284,000 (currency as per your DB)."
Note: always parameterize; never let the model write raw unbounded SQL into production.
Example 3: Two-step search → fetch
Tools:
{
"name": "web_search",
"description": "Search the web and return result snippets.",
"parameters": {"type": "object", "properties": {"q": {"type": "string"}}, "required": ["q"]}
}
{
"name": "http_get",
"description": "Fetch the content of a URL as text.",
"parameters": {"type": "object", "properties": {"url": {"type": "string"}}, "required": ["url"]}
}Model first calls web_search, inspects results, then calls http_get on a promising URL. Final answer summarizes and cites which snippet/URL the facts came from.
Example 4: RAG retrieval as a tool
Tool:
{
"name": "retrieve_docs",
"description": "Semantic search in company handbook; returns doc ids, titles, and text chunks.",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"},
"top_k": {"type": "integer", "minimum": 1, "maximum": 10, "default": 3}
},
"required": ["query"]
}
}Model call:
{"tool": "retrieve_docs", "arguments": {"query": "parental leave policy length", "top_k": 3}}Observation returns chunks with IDs. Final answer quotes or paraphrases and lists the chunk IDs used. This reduces hallucination by grounding the response.
Design patterns and prompts
- ReAct loop: Think → Act (tool) → Observe → Repeat → Answer.
- Gate: If unsure or missing required info, ask a clarification question instead of calling a tool.
- Strict output: Instruct the model to output either a tool call JSON or a final answer, not both.
Example system prompt
You are a precise assistant.
- If a tool is needed, output ONLY a JSON object: {"tool": "<name>", "arguments": { ... }}
- Arguments must follow the provided JSON schema exactly.
- If missing required info, ask a brief clarification question instead of calling a tool.
- When you have enough information and no tool is needed, provide a concise final answer.
- Do not fabricate tool names or fields.
Safety, validation, and limits
- Validate against schemas; reject/repair malformed JSON.
- Whitelist tool names; never execute arbitrary functions.
- Sanitize inputs for downstream systems (SQL, HTTP, shell); parameterize queries.
- Set timeouts, retries, and rate limits. Log calls and results for auditing.
- Never expose secrets in prompts or tool results.
- For RAG, show which chunks/sources were used in the final answer.
Common mistakes and how to self-check
- Vague tool descriptions → Model picks wrong tool. Self-check: Would a new engineer understand what the tool does in 1 sentence?
- Ambiguous argument types → Invalid or unsafe values. Self-check: Do enums/min/max/examples constrain the space?
- Mixing answer text with tool JSON → Parsing errors. Self-check: Is there a single, JSON-only branch for tool calls?
- Calling tools when info is missing. Self-check: Do prompts instruct to ask clarifying questions?
- Skipping validation → Runtime failures. Self-check: Do you validate and repair before execution?
Practical projects
- Support FAQ agent: retrieve_docs + summarizer; return answers with cited chunk IDs.
- KPI analyst: run_sql tool for parameterized queries; produce dashboards as text.
- Travel helper: chain weather, flight_search, and hotel_search tools; ask clarifications when dates/budget missing.
Exercises
Do these to lock in the concepts. Solutions are provided but try first.
Exercise 1 — Design a tool schema: get_exchange_rate
Define a strict tool for converting currency amounts. Requirements:
- Inputs: amount (number >= 0), from (3-letter code), to (3-letter code), date (optional ISO YYYY-MM-DD).
- If date omitted, use latest rate.
Output a tool declaration JSON with name, description, and parameters schema.
Exercise 2 — Write a system prompt for safe function calling
Write a system message that enforces:
- Tool JSON only when calling a tool.
- Ask one clarifying question if required fields are missing.
- No invented tools/fields; follow schemas exactly.
- Concise final answers grounded in observations.
Exercise 3 — Choose and fill tool calls
Tools available:
{"name":"retrieve_docs","parameters":{"type":"object","properties":{"query":{"type":"string"},"top_k":{"type":"integer","default":3}},"required":["query"]}}
{"name":"get_weather","parameters":{"type":"object","properties":{"city":{"type":"string"},"unit":{"type":"string","enum":["C","F"]}},"required":["city"]}}
{"name":"run_sql","parameters":{"type":"object","properties":{"query":{"type":"string"},"params":{"type":"array","items":{"type":["string","number"]}}},"required":["query"]}}For each user message, decide whether to call a tool and fill arguments.
- A) "What's the weather in Seoul in Fahrenheit today?"
- B) "Show onboarding steps for new hires."
- C) "Sum of expenses for January 2024?"
Exercise checklist
- Tool names/descriptions are unambiguous.
- Parameter schemas constrain types and enums.
- Calls include only required fields plus optional when available.
- No mixed free text inside tool JSON.
Mini challenge
Design an agent that answers product questions using retrieval first, then optional SQL for totals. Rules:
- If the question is definitional ("What is..."), call retrieve_docs.
- If the question asks for numeric aggregation (sum/avg/count), call run_sql.
- When both are relevant, run retrieve_docs first, then run_sql, and combine.
Deliverables: tool descriptions, gating prompt, and two sample conversations (one per path).
Learning path
- Next: Multi-tool orchestration and planning strategies.
- Then: Evaluation of tool-use agents (accuracy, robustness, latency).
- Later: Production hardening (monitoring, cost controls, caching).
Next steps
- Implement one practical project end-to-end with strict schema validation.
- Create 10 synthetic test cases to stress your tool gating logic.
- Run the quick test below. Note: The quick test is available to everyone; only logged-in users get saved progress.