Description
June 2025 Update: This issue is specifically for tracking the python portion of the issue
If you ask Assistant about some data in your environment, especially if it is large, it will typically try to execute R or Python functions that return plain-text versions of the information.

While this kind of works, these tools (summary
and str
) do not format information in a way that is intended (or, often, even legible) to an LLM. For example, here's the summary the model asked for. It relies on whitespace formatting and has multiple columns, so it's difficult or impossible for the model to parse it.
summary(diamonds)
carat cut color clarity
Min. :0.2000 Fair : 1610 D: 6775 SI1 :13065
1st Qu.:0.4000 Good : 4906 E: 9797 VS2 :12258
Median :0.7000 Very Good:12082 F: 9542 SI2 : 9194
Mean :0.7979 Premium :13791 G:11292 VS1 : 8171
3rd Qu.:1.0400 Ideal :21551 H: 8304 VVS2 : 5066
Max. :5.0100 I: 5422 VVS1 : 3655
J: 2808 (Other): 2531
depth table price x
Min. :43.00 Min. :43.00 Min. : 326 Min. : 0.000
1st Qu.:61.00 1st Qu.:56.00 1st Qu.: 950 1st Qu.: 4.710
Median :61.80 Median :57.00 Median : 2401 Median : 5.700
Mean :61.75 Mean :57.46 Mean : 3933 Mean : 5.731
3rd Qu.:62.50 3rd Qu.:59.00 3rd Qu.: 5324 3rd Qu.: 6.540
Max. :79.00 Max. :95.00 Max. :18823 Max. :10.740
y z
Min. : 0.000 Min. : 0.000
1st Qu.: 4.720 1st Qu.: 2.910
Median : 5.710 Median : 3.530
Mean : 5.735 Mean : 3.539
3rd Qu.: 6.540 3rd Qu.: 4.040
Max. :58.900 Max. :31.800
This problem was also observed by @jcheng5 when working with DataBot, which is why DataBot converts data to JSON before sending it to the model.
To provide Assistant with better tools for working with data, we should implement a tool that can give it information about a data set that is well-structured. Specifically:
- Unlike "execute code", the tool need not require confirmation since it is only reading information. This will allow the model to repeatedly look at data without pausing to ask the user to run code to see what the result looks like.
- The tool should return structured data in JSON. Existing models do really well with this format.
- Ideally, the tool should be usable to get a structured representation of any data type. (We might be able to use the existing variables comm?)
- Ideally, the execute code tool could also emit structured JSON when the result of execution is a data frame/table, for the model to consume easily.