Closed
Description
We have had some conversations in the past regarding an .plot(....., engine=)
kw. in #8018
This would allow pandas to redirect the plotting to a user selectable back-end, keeping matplotlib
as the default.
see chartpy here, for a way to selectively enable matplotlib
, bokeh
and plotly
.
and generically via altair.
Missing from here is when to re-direct to seaborn
.
So this issue is for discussion:
- should we do this
- implementation method and minimalist dependencies to actually do this (ideally pandas would add NO dependences itself, just import for a particular engine, raising if its not available).
- maybe should spin off much of the current pandas code into a separate repo (
pandas-plot
)?
and this actually should be the default (rather thanmatplotlib
), which is of course the dependency. This might allow simply removing the vast majority of the custom plotting code.
Activity
jreback commentedon Aug 31, 2016
@jorisvandenbossche
@sinhrks
@wesm
@TomAugspurger
@shoyer
cc @ellisonbg
cc @tacaswell
cc @mdboom
cc @mwaskom
cc @pzwang
cc @bryevdv
jorisvandenbossche commentedon Sep 1, 2016
The way that
chartpy
does this is by having multiple implementations of their plotting method for each engine (like pandas now has for matplotlib, but less extensive).So, AFAIK, to have something like this work, we would either have to implement in pandas also the other engines (which means: having more plotting related code, not something we want?), or either expect from each engine to implement some kind of
plot_dataframe
handling the different chart types that pandas can delegate to. And I am not sure this is something that the different engines would like to do?tacaswell commentedon Sep 2, 2016
With mpl we have been working to better support pandas input natively to all of our plotting routines (the
data
kwarg, automatic index extraction, automatic label extraction).It is not too hard now to write dataframe aware functions that do mostly sensible things (ex) with matplotlib. I have a suspicion that if you started from scratch and mpl 1.5+ the mpl version of the pandas plotting code would be much shorter and clearer.
My suggestion would be to pull the current pandas plotting code out into it's own project and refactor it into functions that look like
and use that as a reference implementation of the plotting API that backends need to expose to pandas for use in the plot accessor.
dhirschfeld commentedon Sep 26, 2017
This may also be of interest to @santosjorge
TomAugspurger commentedon Sep 28, 2017
some quick thoughts follow. Curious to hear other's.
Pandas Plotting
Goal: define a system for multiple backends (matplotlib, Bokeh, Plotly, Altair
etc.) to take over
DataFrame.plot
.Note libraries can already achive this end, to an extent, with
DataFrame.pipe(func, **kwargs)
.func
gets theDataFrame
as the firstargument and all kwargs. It's completely up to
func
what happens then. Thisis about the main
.plot
method, which is implemented around charts.Overview of the implementation
DataFrame
implements.plot
as aAccessorProperty
. This makes.plot
into a namespace with various plot methods. Currently, we define
(scatter and hexbin are DataFrame-only; the rest are also defined on
Series.plot
).For backwards compatibility,
plot
is also callable, and is equivalent to.plot.line
.These methods call
matplotlib
axes plotting methods.User API
A user-configurable
Would be the main point for users. Users would set this globally
Or use a context manager
Backend API
Now for the tough part.
Changes to Pandas
We'll refactor the current
FramePlotMethods
toMatplotlibFramePlotMethods
.We'll make the actual
FramePlotMethods
a simple shell thatSo
At that point, things are entirely up to the backend. The various backends would
implement their own
FramePlotMethods
(probably inherit from a base class inpandas that raises
NotImplementedError
with a nice error message saying thatthis method isn't available with this backend).
Challenges
API consistency
How much should pandas care that backends accept similar keywords, behavior
similarly, etc? I'm not sure. For the most part, we've simply adopted
matplotlib's terminology for everything. That's probably not appropriate for
everyone. Certain methods do have "recommended" (somewhere between required
and optional) keyword arguments. For example
.line
takes anx
andy
. It'dbe nice if backends could agree on those.
Global State
Matplotlib has the notion of a "currently active figure", and some plotting
methods will add to that. Is there any difference between
I don't think so (aside from the extra matplotlib plot; the bokeh plots would be
identical). It's completely up to the backend how to handle global state between
calls.
Fortunately for us, pandas messed this up terribly at some point, so that
Series.plot
goes onto the currently active axes, whileDataFrame.plot
creates a new one. Users are used to diverging behavior in this area I guess :)
registration
I've been trying to improve pandas import time recently. Part of that involved
removing a
Pandas doesn't want to try / except each of the backends known to have an
implementation. Do we require users to
import bokeh.pandas
, which calls aregister_backend
? That seems not great from the user's standpoint, but maybenecessary?
TomAugspurger commentedon Sep 28, 2017
Agreed with @tacaswell here that the current implementation should be moved to the plugin system I outlined above. That would be a good test case for what other backends would need.
mwaskom commentedon Sep 28, 2017
Personally I don't think it really makes sense to consider seaborn a "backend" for pandas plotting. Seaborn seems higher in the stack than pandas, relative to the other backends. Are there particular plotting functions you had in mind for delegating to?
33 remaining items