Skip to content

API: engine kw to .plot to enable selectable backends #14130

Closed
@jreback

Description

@jreback
Contributor

We have had some conversations in the past regarding an .plot(....., engine=) kw. in #8018
This would allow pandas to redirect the plotting to a user selectable back-end, keeping matplotlib as the default.

see chartpy here, for a way to selectively enable matplotlib, bokeh and plotly.

and generically via altair.

Missing from here is when to re-direct to seaborn.

So this issue is for discussion:

  1. should we do this
  2. implementation method and minimalist dependencies to actually do this (ideally pandas would add NO dependences itself, just import for a particular engine, raising if its not available).
  3. maybe should spin off much of the current pandas code into a separate repo (pandas-plot)?
    and this actually should be the default (rather than matplotlib), which is of course the dependency. This might allow simply removing the vast majority of the custom plotting code.

Activity

added this to the 0.20.0 milestone on Aug 31, 2016
jorisvandenbossche

jorisvandenbossche commented on Sep 1, 2016

@jorisvandenbossche
Member

The way that chartpy does this is by having multiple implementations of their plotting method for each engine (like pandas now has for matplotlib, but less extensive).

So, AFAIK, to have something like this work, we would either have to implement in pandas also the other engines (which means: having more plotting related code, not something we want?), or either expect from each engine to implement some kind of plot_dataframe handling the different chart types that pandas can delegate to. And I am not sure this is something that the different engines would like to do?

tacaswell

tacaswell commented on Sep 2, 2016

@tacaswell
Contributor

With mpl we have been working to better support pandas input natively to all of our plotting routines (the data kwarg, automatic index extraction, automatic label extraction).

It is not too hard now to write dataframe aware functions that do mostly sensible things (ex) with matplotlib. I have a suspicion that if you started from scratch and mpl 1.5+ the mpl version of the pandas plotting code would be much shorter and clearer.

My suggestion would be to pull the current pandas plotting code out into it's own project and refactor it into functions that look like

def some_chart_type(df, optional=backend, independent=input, *, backend=dependent, keyword=args):

and use that as a reference implementation of the plotting API that backends need to expose to pandas for use in the plot accessor.

modified the milestones: 0.20.0, Next Major Release on Mar 23, 2017
dhirschfeld

dhirschfeld commented on Sep 26, 2017

@dhirschfeld
Contributor

This may also be of interest to @santosjorge

TomAugspurger

TomAugspurger commented on Sep 28, 2017

@TomAugspurger
Contributor

some quick thoughts follow. Curious to hear other's.

Pandas Plotting

Goal: define a system for multiple backends (matplotlib, Bokeh, Plotly, Altair
etc.) to take over DataFrame.plot.

Note libraries can already achive this end, to an extent, with
DataFrame.pipe(func, **kwargs). func gets the DataFrame as the first
argument and all kwargs. It's completely up to func what happens then. This
is about the main .plot method, which is implemented around charts.

Overview of the implementation

DataFrame implements .plot as a AccessorProperty. This makes .plot
into a namespace with various plot methods. Currently, we define

['area', 'bar', 'barh', 'box', 'density', 'hexbin', 'hist', 'line',
 'pie', 'scatter']

(scatter and hexbin are DataFrame-only; the rest are also defined on Series.plot).
For backwards compatibility, plot is also callable, and is equivalent to .plot.line.
These methods call matplotlib axes plotting methods.

User API

A user-configurable

pandas.options.plotting.backend = {'matplotlib', 'bokeh', 'altair', 'plotly', ... }

Would be the main point for users. Users would set this globally

pd.options.plotting.backend = 'bokeh'

Or use a context manager

with pd.options_context('plotting.backend', 'bokeh'):
    df.plot(...)

Backend API

Now for the tough part.

Changes to Pandas

We'll refactor the current FramePlotMethods to MatplotlibFramePlotMethods.
We'll make the actual FramePlotMethods a simple shell that

  1. looks up the currently active backend
  2. calls the appropriate method on the active backend

So

class FramePlotMethods:
    def line(self, x=None, y=None, **kwds):
        backend = self.get_backend()
        # _data is the DataFrame calling .plot.line
        backend.line(self._data, x=x, y=y, **kwds)

At that point, things are entirely up to the backend. The various backends would
implement their own FramePlotMethods (probably inherit from a base class in
pandas that raises NotImplementedError with a nice error message saying that
this method isn't available with this backend).

Challenges

API consistency

How much should pandas care that backends accept similar keywords, behavior
similarly, etc? I'm not sure. For the most part, we've simply adopted
matplotlib's terminology for everything. That's probably not appropriate for
everyone. Certain methods do have "recommended" (somewhere between required
and optional) keyword arguments. For example .line takes an x and y. It'd
be nice if backends could agree on those.

Global State

Matplotlib has the notion of a "currently active figure", and some plotting
methods will add to that. Is there any difference between

with pd.options_context('plotting.backend', 'bokeh'):
    df.plot()
with pd.options_context('plotting.backend', 'matplotlib'):
    df.plot()
    
# Any difference here?
with pd.options_context('plotting.backend', 'bokeh'):
    df.plot()

I don't think so (aside from the extra matplotlib plot; the bokeh plots would be
identical). It's completely up to the backend how to handle global state between
calls.

Fortunately for us, pandas messed this up terribly at some point, so that
Series.plot goes onto the currently active axes, while DataFrame.plot
creates a new one. Users are used to diverging behavior in this area I guess :)

registration

I've been trying to improve pandas import time recently. Part of that involved
removing a

try: import matplotlib
excpet ImportError: pass

Pandas doesn't want to try / except each of the backends known to have an
implementation. Do we require users to import bokeh.pandas, which calls a
register_backend? That seems not great from the user's standpoint, but maybe
necessary?

TomAugspurger

TomAugspurger commented on Sep 28, 2017

@TomAugspurger
Contributor

Agreed with @tacaswell here that the current implementation should be moved to the plugin system I outlined above. That would be a good test case for what other backends would need.

mwaskom

mwaskom commented on Sep 28, 2017

@mwaskom
Contributor

Missing from here is when to re-direct to seaborn

Personally I don't think it really makes sense to consider seaborn a "backend" for pandas plotting. Seaborn seems higher in the stack than pandas, relative to the other backends. Are there particular plotting functions you had in mind for delegating to?

33 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      Participants

      @ellisonbg@trxcllnt@tacaswell@mwaskom@wesm

      Issue actions

        API: engine kw to .plot to enable selectable backends · Issue #14130 · pandas-dev/pandas