-
Notifications
You must be signed in to change notification settings - Fork 107
Assistant: Initial pass at implementing a data summary tool for Python #8208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
E2E Tests 🚀 |
5c7173e
to
5567433
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great start. My main suggestion is to rename the API that routes requests to the variables comm to something more generic (and it can just query a single session variable at a time) so that we can use it to add more data querying tools without having to modify the Positron API each time
The other changes that we will want to make is to make the handling of these tool calls "asynchronous" so they they do not block the functioning of the variables comm — this means basically copying the pattern from the data explorer comm for the get_column_profiles request (and its corresponding return_column_profiles front-end API, see https://github.com/posit-dev/positron/blob/main/extensions/positron-python/python_files/posit/positron/data_explorer.py#L492-L519)
extensions/positron-python/python_files/posit/positron/variables_comm.py
Show resolved
Hide resolved
"type_display": column.type_display, | ||
"summary_stats": summary_stats, | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good starting point to have this tool surfaced in the variables comm — since computing summary stats or other computed profiles can be expensive (and thus block other messaging handling in the variables comm), we'll probably want to separate "expensive" requests (e.g. summary stats, frequency tables, histograms, etc.) from "cheap" requests (like asking for the schema), and make sure that the expensive requests and performed in an asynchronous-response pattern like the get_column_profiles
request in the data explorer. This doesn't all have to get done in this PR so can be follow up work
extensions/positron-python/python_files/posit/positron/variables.py
Outdated
Show resolved
Hide resolved
src/vs/workbench/api/common/positron/extHost.positron.protocol.ts
Outdated
Show resolved
Hide resolved
29b64a0
to
94cb220
Compare
274cd0f
to
b902acc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is close to a good stopping point for the initial pass — I think the main thing that need to get fixed is the return type for the query_variable_data RPC — since it isn't easy to access all of the data explorer comm types in all the layers where this function is called, we can just return serialized JSON from the function for now (effectively schema: string, column_profiles: string[]
)
# Create a temporary table view with a temporary comm | ||
temp_state = DataExplorerState("temp_summary") | ||
temp_comm = PositronComm.create(target_name="temp_summary", comm_id="temp_summary_comm") | ||
table_view = _get_table_view(value, temp_comm, temp_state, self.kernel.job_queue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe later we can set up a persistent data explorer comm to use for Assistant tool calls (I realized just now after my earlier comment about the async column profiles — not needed for now — that these depend on there being a live comm available to send the frontend event though with the asynchronous result. We can look more closely at this later)
"description": "Result of the summarize operation", | ||
"type": "object", | ||
"properties": { | ||
"children": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is returning a different return type right now (with the schema and column profiles, so a lot more complex). I think to avoid having to drag along the schema and profile result type (and mainly having to expose these in the Positron runtime / extHost API) we can just return the schema and profiles as a serialized JSON string to sidestep this issue for now -- it would be good to make these results well-typed everywhere but there's a bunch of plumbing needed).
I rebased this today and will work on some unit tests on the Python backend portion before it can be merged |
36d49d5
to
b0bb2d8
Compare
improve logging performance to satisfy linter clean up code provide temp comm to satisfy pyright modify openRPC specs to autogen comms ccode and fix bug with passing 'path' parameter, also rename summarizeData function to make it more generic create data explorer helper functions revert formatting change
… the ext host API
…ving get schema requests
First pass at #7114
Provides Assistant with a
getDataSummary
tool, currently only implemented for Python, that provides a JSON structured summary of a data object by using the Positron API to communicate with the Variables Comm. I updated the variable's python backend to reuse existing functionality from the data explorer.I used the
inspectVariables
tool as a guide for retrieving info from the variables comm.Release Notes
New Features
Bug Fixes
QA Notes
@:data-explorer
@:assistant
@:variables
@:plots
@:viewer