Description
In #26414 we splitted the pandas plotting module into a general plotting framework able to call different backends and the current matplotlib backends. The idea is that other backends can be implemented in a simpler way, and be used with a common API by pandas users.
The API defined by the current matplotlib backend includes the objects listed next, but this API can probably be simplified. Here is the list with questions/proposals:
Non-controversial methods to keep in the API (They provide the Series.plot(kind='line')
... functionality):
- LinePlot
- BarPlot
- BarhPlot
- HistPlot
- BoxPlot
- KdePlot
- AreaPlot
- PiePlot
- ScatterPlot
- HexBinPlot
Plotting functions provided in pandas (e.g. pandas.plotting.andrews_curves(df)
)
- andrews_curves
- autocorrelation_plot
- bootstrap_plot
- lag_plot
- parallel_coordinates
- radviz
- scatter_matrix
- table
Should those be part of the API and other backends should also implement them? Would it make sense to convert to the format .plot
(e.g. DataFrame.plot(kind='autocorrelation')
...)? Does it make sense to keep out of the API, or move to a third-party module?
Redundant methods that can possibly be removed:
- hist_series
- hist_frame
- boxplot
- boxplot_frame
- boxplot_frame_groupby
In the case of boxplot
, we currently have several ways of generating a plot (calling mainly the same code):
DataFrame.plot.boxplot()
DataFrame.plot(kind='box')
DataFrame.boxplot()
pandas.plotting.boxplot(df)
Personally, I'd deprecate number 4, and for number 3, deprecate or at least not require a separate boxplot_frame
method in the backend, but try to reuse BoxPlot
(for number 3 comments, same applies to hist
).
For boxplot_frame_groupby
, didn't check in detail, but not sure if BoxPlot
could be reused for this?
Functions to register converters:
- register
- deregister
Do those make sense for other backends?
Deprecated in pandas 0.23, to be removed:
- tsplot
To see what each of these functions do in practise, it may be useful this notebook by @liirusuk: https://github.com/python-sprints/pandas_plotting_library/blob/master/AllPlottingExamples.ipynb
CC: @pandas-dev/pandas-core @tacaswell, @jakevdp, @philippjfr, @PatrikHlobil
Activity
TomAugspurger commentedon Jun 9, 2019
I think keep things like autocorrelation out of the swappable backend API.
I think we’ve left things like df.boxplot and hist around because they have slightly different behavior than the .plot API. I wouldn’t recommend making them part of the backend API.
TomAugspurger commentedon Jun 9, 2019
Here’s my start on a proposed backend API from a few months ago: TomAugspurger@b07aba2
datapythonista commentedon Jun 9, 2019
I think it's worth mentioning that at least
hvplot
(didn't check the rest) does already provide the functions likeandrews_curves
,scatter_matrix
,lag_plot
,...May be if we don't want to force all backends to implement those, we can check if the selected backend implements them, and default to the matplotlib plots?
I assumed
boxplot
andhist
behaved exactly the same, but just had shortcutsSeries.hist()
forSeries.plot.hist()
. The "shortcut" shows the plot grid, but other than that I haven't seen any difference.TomAugspurger commentedon Jun 10, 2019
datapythonista commentedon Jun 10, 2019
I think that makes sense, but if we do that, I think we should move them to
pandas.plotting.matplotlib.andrews_curves
, instead ofpandas.plotting.andrews_curves
.@TomAugspurger I need to check in more detail, but I think the API you implemented in TomAugspurger@b07aba2 is the one that makes more sense. I'll work on it once I finish #26753. I'll also experiment on whether it's feasible to move
andrews_curves
,scatter_matrix
... to the.plot()
syntax, I think that will make things simpler and easier for everyone (us, third-party libraries, and users).jakevdp commentedon Jun 10, 2019
What's the intention here regarding extra kwargs passed to plotting functions? Should additional backends attempt to duplicate the functionality of all matplotlib-style plot customizations, or should they allow keywords to be passed that correspond to those used by the particular backend?
The first option would be nice in theory, but would require every non-matplotlib plotting backend to essentially implement its own matplotlib conversion layer with a long tail of incompatibilities that would essentially never be complete (speaking from experience as someone who tried to create mpld3 some years back).
The second option is not as nice from the perspective of interchangeability, but would allow other backends to be added with a more reasonable set of expectations.
TomAugspurger commentedon Jun 10, 2019
ghost commentedon Jun 14, 2019
I'm sorry if this is a stupid question, but If you define a plotting "API" which is basically a group of canned plots, wouldn't every backend produce more or less the same output? what new capability is this meant to enable? something like a pandas to vega exporter perhaps?
jakevdp commentedon Jun 14, 2019
I don't think it's correct to say that every backend produces more or less the same output.
For example, matplotlib is really good at static charts, but not great at producing portable interactive charts.
On the other hand, bokeh, altair, et al. are great for interactive charts, but aren't quite as mature as matplotlib for static charts.
Being able to produce both with the same API would be a big win.
tacaswell commentedon Jun 15, 2019
and also pins Matplotlib down even more than we already are API wise. I think it makes sense for pandas to declare what style knobs it wants to expose and expect the backend implementations to sort out what that means. This may mean not blindly passing
**kwargs
through and instead ensuring that the returned objects are "the right thing" for the given backend to be able to do after-the-fact style customization.52 remaining items