Rick Lamers
ISO 3166-2
[...]
AKA JSON Mode + JSON Schema
When considering open-source LLMs for tool use, we have two high-level options:
Mixtral 8x22B efficiently tokenizes tool use, enabling accurate execution.
For models that don't natively support function calling, we can implement it through prompt engineering. Here's a simplified example of a system prompt:
You are an AI assistant capable of using tools. When you need to use a tool, respond with a JSON object in this format:
<tool_calls>
[
{
"id": "pending",
"type": "function",
"function": {
"name": "function_name"
},
"arguments": {
"arg1": "value1",
"arg2": "value2"
}
}
]
</tool_calls>
Available tools are defined as follows:
<available_tools>
[
{
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location"]
}
}
]
</available_tools>
Instructions:
- Provide all required parameters, even if you're unsure of the value.
- Don't use tools if they're not needed; respond directly in those cases.
Either use a tool as instructed above or reply with text to answer the user's question.
Constrained decoding ensures model outputs adhere to the specified format in the system prompt, enhancing reliability and consistency in tool usage.
{
"question": "Find the area of a triangle with a base of 10 units and height of 5 units.",
"function": {
"name": "calculate_triangle_area",
"description": "Calculate the area of a triangle given its base and height.",
"parameters": {
"type": "dict",
"properties": {
"base": {
"type": "integer",
"description": "The base of the triangle."
},
"height": {
"type": "integer",
"description": "The height of the triangle."
},
"unit": {
"type": "string",
"description": "The unit of measure (defaults to 'units' if not specified)"
}
},
"required": ["base", "height"]
}
}
}
{
"calculate_triangle_area": {
"base": [10],
"height": [5],
"unit": ["units", ""]
}
}
{
"question": "What is the mean of the following numbers: 5, 10, 15, 20, and 25, and can you also tell me the timezone of the coordinate with longitude '120.97388' and latitude '14.6042'?",
"function": [
{
"name": "get_time_zone_by_coord",
"description": "Finds the timezone of a coordinate.",
"parameters": {
"type": "dict",
"properties": {
"long": {
"type": "string",
"description": "The longitude of the coordinate."
},
"lat": {
"type": "string",
"description": "The latitude of the coordinate."
}
},
"required": ["long", "lat"]
}
},
{
"name": "calculate_mean",
"description": "Calculates the mean of a list of numbers.",
"parameters": {
"type": "dict",
"properties": {
"numbers": {
"type": "array",
"items": {
"type": "float"
},
"description": "The list of numbers."
}
},
"required": ["numbers"]
}
}
],
"execution_result": [15.0, "Asia/Manila"],
"execution_result_type": ["exact_match", "exact_match"]
}
Demonstrating the use of multiple tools in a forced sequential order to solve a complex problem.
[
{
"name": "multiply",
"description": "Multiplies two numbers",
"parameters": {
"type": "object",
"properties": {
"a": {
"type": "number",
"description": "The first number to multiply"
},
"b": {
"type": "number",
"description": "The second number to multiply"
}
},
"required": ["a", "b"]
}
},
{
"name": "add",
"description": "Adds two numbers",
"parameters": {
"type": "object",
"properties": {
"a": {
"type": "number",
"description": "The first number to add"
},
"b": {
"type": "number",
"description": "The second number to add"
}
},
"required": ["a", "b"]
}
},
{
"name": "exponentiate",
"description": "Raises a number to a power",
"parameters": {
"type": "object",
"properties": {
"base": {
"type": "number",
"description": "The base number"
},
"exponent": {
"type": "number",
"description": "The exponent"
}
},
"required": ["base", "exponent"]
}
}
]
Prompt: What is 8 times 6 to the 5th power plus 9?
Challenge: How to deal with latency of multiple round trips? (Hint: server side tools)
Assessing whether a model can correctly identify when a given function is not relevant to the user's query.
{
"question": "Calculate the volume of the sphere with radius 3 units.",
"function": {
"name": "calculate_park_area",
"description": "Calculate the total area of a park based on the radius of its circular part.",
"parameters": {
"type": "dict",
"properties": {
"radius": {
"type": "float",
"description": "The radius of the circular part of the park."
},
"units": {
"type": "string",
"description": "The units of the radius."
},
"shape": {
"type": "string",
"description": "The shape of the park. Default is 'circle'."
}
},
"required": ["radius", "units"]
}
}
}
Measuring when a model needs to ask a follow-up question. This can occur when the user's initial prompt doesn't provide enough information to use a given tool.
{
"name": "book_flight",
"description": "Books a flight based on user preferences",
"parameters": {
"type": "object",
"properties": {
"departure_city": {
"type": "string",
"description": "The city the user is departing from"
},
"arrival_city": {
"type": "string",
"description": "The city the user is traveling to"
},
"departure_date": {
"type": "string",
"description": "The date the user wants to depart (YYYY-MM-DD format)"
},
"return_date": {
"type": "string",
"description": "The date the user wants to return, if applicable (YYYY-MM-DD format)"
},
"num_passengers": {
"type": "integer",
"description": "The number of passengers traveling"
},
"class": {
"type": "string",
"enum": ["economy", "business", "first"],
"description": "The class of travel"
}
},
"required": ["departure_city", "arrival_city", "departure_date", "num_passengers", "class"]
}
}
Prompt: Book me a flight to New York next month
Missing: departure_city, num_passengers, class
Desired response: Certainly! I'd be happy to help you book a flight to New York next month. To proceed with the booking, I'll need a few more details: 1. What is your departure city? 2. How many passengers will be traveling? 3. What class of travel would you prefer (economy, business, or first)? Once you provide this information, I'll be able to search for the best flight options for you.
https://github.com/microsoft/TypeChat
export type API = {
add(x: number, y: number): number;
sub(x: number, y: number): number;
mul(x: number, y: number): number;
div(x: number, y: number): number;
neg(x: number): number;
id(x: number): number;
unknown(text: string): number;
}
import { API } from "./schema";
function program(api: API) {
const step1 = api.mul(2, 3); // -> independent step
const step2 = api.mul(4, 5); // -> independent step
return api.add(step1, step2); // -> type safe passing of subresults into add
}
Function Calling as ... code generation?
Idea: standard (stateful, sandboxed, WASM) REPL like Code Interpreter/Artifacts
curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4-turbo",
"messages": [
{
"role": "user",
"content": "What'\''s the weather like in Boston today?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}'
This is what everyone is standardizing on, but how desirable is it?
Any questions?
Follow me on
@RickLamers