Description
Description
Have a look at this demo from CopilotKit (not affiliated, they have an interesting product and I believe Pydantic AI should allow users to build experiences like they do): https://docs.copilotkit.ai/images/copilot-action-example.gif
Background
The demo illustrates how a conversational chatbot can perform actions in the frontend itself (client side).
This is almost the same as what Pydantic AI allows us to achieve with tool calling, except that in this case, the tool call request is sent to the client, and the client is the one that performs the action and sends the response back to the agent, which can then send out a final message indicating success or failure, depending on what the tool response from the client is.
This might sound similar to #1189, but that issue talks about returning a tool call response to the client, what this issue is focused on is returning a tool call request directly to the client.
Current library limitations
L1. Inability to define Tools without their implementation
Currently, defining a Tool
requires that a function
is specified. This is, of course quite sensible for most use cases except when the function is not executed by us.
Also worth keeping in mind is that setting aside the implementation of the function, the definition (parameters) of the function will be coming from the client and not statically defined by us.
L2. Inability to cleanly end an Agent run from a tool call request
iter
gets us quite far, however it seems that it requires manual message construction which is not that straightforward (at least not documented in a clear way)
Approaches
This issue is, from a library perspective, quite awkward admittedly. But I believe from a user experience support for flows like this would be a game-changer.
Here's a few approaches that I believe are possible to address this issue:
A1. Ending an Agent run when a client tool is encountered
This is pretty much just having a clear solution to L2, and having L1 addressed which would allow us to define a tool without it's implementation, and then we can just End
the run during an Agent's iter
. Message history is preserved. All is good and well...
....almost. The only thing left to be addressed would be how the tool response coming from the client would be consumed (do we manually patch the history ourselves or do we allow something like a ClientToolResponse
input?)
A2: Having ClientToolRequest
as an output type
Basically, something like this:
agent = Agent(
'openai:gpt-4o',
system_prompt=(
'You are a useful agent that helps the user manage '
'their todo list items.'
),
client_tools=[add_todo_tool]
)
# or
agent = Agent(
'openai:gpt-4o',
system_prompt=(
'You are a useful agent that helps the user manage '
'their todo list items.'
),
output_type=[str, ClientToolRequest(add_todo_tool)]
)
response = agent.run_sync("Add tomatoes to my todo list")
response # ClientToolRequest(tool=ClientTool(...), args={...})
# then, after the client responds...
response = agent.run_sync(ClientToolResponse(...))
Should Pydantic AI support this at all?
Despite frontend/client-side tool calling being an incredibly valuable feature, the fact that it's a bit awkward/non-straightforward may perhaps suggest that this shouldn't be the library's responsibility at all.
However, I'd still wager that this sort of feature can find its place inside this library. There already exists other libraries that have the notion of client-side tool call execution.
Libraries that support client tool calling
CopilotKit
See: Frontend Actions
AI SDK
See: Chatbot Tool Usage. This demonstrates how they can define server-side and client-side tools (which are handled by the client via onToolCall
)
ElevenLabs
(I have not used this platform. I just found happened upon this page when looking into client-side tool calling)
See: Client tools
Similar / related disussions:
References
No response