Skip to content

Assistant: Provide a way to "see" data objects (Python) #7114

@jmcphers

Description

@jmcphers

June 2025 Update: This issue is specifically for tracking the python portion of the issue

If you ask Assistant about some data in your environment, especially if it is large, it will typically try to execute R or Python functions that return plain-text versions of the information.

Image

While this kind of works, these tools (summary and str) do not format information in a way that is intended (or, often, even legible) to an LLM. For example, here's the summary the model asked for. It relies on whitespace formatting and has multiple columns, so it's difficult or impossible for the model to parse it.

summary(diamonds)
     carat               cut        color        clarity     
 Min.   :0.2000   Fair     : 1610   D: 6775   SI1    :13065  
 1st Qu.:0.4000   Good     : 4906   E: 9797   VS2    :12258  
 Median :0.7000   Very Good:12082   F: 9542   SI2    : 9194  
 Mean   :0.7979   Premium  :13791   G:11292   VS1    : 8171  
 3rd Qu.:1.0400   Ideal    :21551   H: 8304   VVS2   : 5066  
 Max.   :5.0100                     I: 5422   VVS1   : 3655  
                                    J: 2808   (Other): 2531  
     depth           table           price             x         
 Min.   :43.00   Min.   :43.00   Min.   :  326   Min.   : 0.000  
 1st Qu.:61.00   1st Qu.:56.00   1st Qu.:  950   1st Qu.: 4.710  
 Median :61.80   Median :57.00   Median : 2401   Median : 5.700  
 Mean   :61.75   Mean   :57.46   Mean   : 3933   Mean   : 5.731  
 3rd Qu.:62.50   3rd Qu.:59.00   3rd Qu.: 5324   3rd Qu.: 6.540  
 Max.   :79.00   Max.   :95.00   Max.   :18823   Max.   :10.740  
                                                                 
       y                z         
 Min.   : 0.000   Min.   : 0.000  
 1st Qu.: 4.720   1st Qu.: 2.910  
 Median : 5.710   Median : 3.530  
 Mean   : 5.735   Mean   : 3.539  
 3rd Qu.: 6.540   3rd Qu.: 4.040  
 Max.   :58.900   Max.   :31.800  

This problem was also observed by @jcheng5 when working with DataBot, which is why DataBot converts data to JSON before sending it to the model.

To provide Assistant with better tools for working with data, we should implement a tool that can give it information about a data set that is well-structured. Specifically:

  • Unlike "execute code", the tool need not require confirmation since it is only reading information. This will allow the model to repeatedly look at data without pausing to ask the user to run code to see what the result looks like.
  • The tool should return structured data in JSON. Existing models do really well with this format.
  • Ideally, the tool should be usable to get a structured representation of any data type. (We might be able to use the existing variables comm?)
  • Ideally, the execute code tool could also emit structured JSON when the result of execution is a data frame/table, for the model to consume easily.

Metadata

Metadata

Assignees

Labels

area: assistantIssues related to Positron Assistant

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions