Skip to content

API: Define API for pandas plotting backends #26747

Open
@datapythonista

Description

@datapythonista
Member

In #26414 we splitted the pandas plotting module into a general plotting framework able to call different backends and the current matplotlib backends. The idea is that other backends can be implemented in a simpler way, and be used with a common API by pandas users.

The API defined by the current matplotlib backend includes the objects listed next, but this API can probably be simplified. Here is the list with questions/proposals:

Non-controversial methods to keep in the API (They provide the Series.plot(kind='line')... functionality):

  • LinePlot
  • BarPlot
  • BarhPlot
  • HistPlot
  • BoxPlot
  • KdePlot
  • AreaPlot
  • PiePlot
  • ScatterPlot
  • HexBinPlot

Plotting functions provided in pandas (e.g. pandas.plotting.andrews_curves(df))

  • andrews_curves
  • autocorrelation_plot
  • bootstrap_plot
  • lag_plot
  • parallel_coordinates
  • radviz
  • scatter_matrix
  • table

Should those be part of the API and other backends should also implement them? Would it make sense to convert to the format .plot (e.g. DataFrame.plot(kind='autocorrelation')...)? Does it make sense to keep out of the API, or move to a third-party module?

Redundant methods that can possibly be removed:

  • hist_series
  • hist_frame
  • boxplot
  • boxplot_frame
  • boxplot_frame_groupby

In the case of boxplot, we currently have several ways of generating a plot (calling mainly the same code):

  1. DataFrame.plot.boxplot()
  2. DataFrame.plot(kind='box')
  3. DataFrame.boxplot()
  4. pandas.plotting.boxplot(df)

Personally, I'd deprecate number 4, and for number 3, deprecate or at least not require a separate boxplot_frame method in the backend, but try to reuse BoxPlot (for number 3 comments, same applies to hist).

For boxplot_frame_groupby, didn't check in detail, but not sure if BoxPlot could be reused for this?

Functions to register converters:

  • register
  • deregister

Do those make sense for other backends?

Deprecated in pandas 0.23, to be removed:

  • tsplot

To see what each of these functions do in practise, it may be useful this notebook by @liirusuk: https://github.com/python-sprints/pandas_plotting_library/blob/master/AllPlottingExamples.ipynb

CC: @pandas-dev/pandas-core @tacaswell, @jakevdp, @philippjfr, @PatrikHlobil

Activity

TomAugspurger

TomAugspurger commented on Jun 9, 2019

@TomAugspurger
Contributor

I think keep things like autocorrelation out of the swappable backend API.

I think we’ve left things like df.boxplot and hist around because they have slightly different behavior than the .plot API. I wouldn’t recommend making them part of the backend API.

TomAugspurger

TomAugspurger commented on Jun 9, 2019

@TomAugspurger
Contributor

Here’s my start on a proposed backend API from a few months ago: TomAugspurger@b07aba2

datapythonista

datapythonista commented on Jun 9, 2019

@datapythonista
MemberAuthor

I think it's worth mentioning that at least hvplot (didn't check the rest) does already provide the functions like andrews_curves, scatter_matrix, lag_plot,...

May be if we don't want to force all backends to implement those, we can check if the selected backend implements them, and default to the matplotlib plots?

I assumed boxplot and hist behaved exactly the same, but just had shortcuts Series.hist() for Series.plot.hist(). The "shortcut" shows the plot grid, but other than that I haven't seen any difference.

TomAugspurger

TomAugspurger commented on Jun 10, 2019

@TomAugspurger
Contributor
datapythonista

datapythonista commented on Jun 10, 2019

@datapythonista
MemberAuthor

I think that makes sense, but if we do that, I think we should move them to pandas.plotting.matplotlib.andrews_curves, instead of pandas.plotting.andrews_curves.

@TomAugspurger I need to check in more detail, but I think the API you implemented in TomAugspurger@b07aba2 is the one that makes more sense. I'll work on it once I finish #26753. I'll also experiment on whether it's feasible to move andrews_curves, scatter_matrix... to the .plot() syntax, I think that will make things simpler and easier for everyone (us, third-party libraries, and users).

jakevdp

jakevdp commented on Jun 10, 2019

@jakevdp
Contributor

What's the intention here regarding extra kwargs passed to plotting functions? Should additional backends attempt to duplicate the functionality of all matplotlib-style plot customizations, or should they allow keywords to be passed that correspond to those used by the particular backend?

The first option would be nice in theory, but would require every non-matplotlib plotting backend to essentially implement its own matplotlib conversion layer with a long tail of incompatibilities that would essentially never be complete (speaking from experience as someone who tried to create mpld3 some years back).

The second option is not as nice from the perspective of interchangeability, but would allow other backends to be added with a more reasonable set of expectations.

TomAugspurger

TomAugspurger commented on Jun 10, 2019

@TomAugspurger
Contributor
ghost

ghost commented on Jun 14, 2019

@ghost

I'm sorry if this is a stupid question, but If you define a plotting "API" which is basically a group of canned plots, wouldn't every backend produce more or less the same output? what new capability is this meant to enable? something like a pandas to vega exporter perhaps?

jakevdp

jakevdp commented on Jun 14, 2019

@jakevdp
Contributor

I don't think it's correct to say that every backend produces more or less the same output.

For example, matplotlib is really good at static charts, but not great at producing portable interactive charts.

On the other hand, bokeh, altair, et al. are great for interactive charts, but aren't quite as mature as matplotlib for static charts.

Being able to produce both with the same API would be a big win.

tacaswell

tacaswell commented on Jun 15, 2019

@tacaswell
Contributor

The first option would be nice in theory, but would require every non-matplotlib plotting backend to essentially implement its own matplotlib conversion layer with a long tail of incompatibilities that would essentially never be complete (speaking from experience as someone who tried to create mpld3 some years back).

and also pins Matplotlib down even more than we already are API wise. I think it makes sense for pandas to declare what style knobs it wants to expose and expect the backend implementations to sort out what that means. This may mean not blindly passing **kwargs through and instead ensuring that the returned objects are "the right thing" for the given backend to be able to do after-the-fact style customization.

52 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @tacaswell@WillAyd@jakevdp@jreback@jorisvandenbossche

        Issue actions

          API: Define API for pandas plotting backends · Issue #26747 · pandas-dev/pandas