-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
Compatpandas objects compatability with Numpy or Python functionspandas objects compatability with Numpy or Python functionsIO Excelread_excel, to_excelread_excel, to_excel
Milestone
Description
OpenPyXL v2.0 was released on May 13th, breaking backwards compatibility in some parts of its API, including those upon which pandas depends. Notably, v2 changes the style API and makes Style
objects immutable. This incompatibility is evident at 0.14.0-rc1 in pandas.io.excel._OpenpyxlWriter._convert_to_style
and the associated test case, pandas.io.tests.test_excel.OpenyxlTests.test_to_excel_styleconverter
.
In addition to teaching pandas
how to use the new API, the project may wish to retain support for the OpenPyXL v1 API, necessitating the design and development of a compatibility shim.
Metadata
Metadata
Assignees
Labels
Compatpandas objects compatability with Numpy or Python functionspandas objects compatability with Numpy or Python functionsIO Excelread_excel, to_excelread_excel, to_excel
Type
Projects
Relationships
Development
Select code repository
Activity
jreback commentedon May 19, 2014
dupe of #7169
would love to have u work on it!
py3.4 build already has 2.02 (and so fails)
neirbowj commentedon May 24, 2014
Could you re-open this issue and reassign it to the 0.14.1 milestone?
jreback commentedon Jun 10, 2014
any thoughts on this? (getting it to work with >= 2.0)
neirbowj commentedon Jun 10, 2014
I've been spending my pandas time allocation on this.
My thoughts, in case anybody else is driven to work on this before I can get to it, are as follows (caveat: I haven't thought about this for a couple weeks, so my recollection about some of these details may betray me).
There appear to be two main pieces to this feature. One is simply the conversion of a style dict to a native openpyxl
Style
object. This should be fairly rote, and by implementing to the new-styleStyle
protocol, which is the only option in v2, backwards compatibility with v1 will be quite tractable, because it offers both old- and new-style protocols for forwards compatibility.There are some differences in the structure of the
Style
class from v1 to v2, which pandas will have to handle. My preference would be to try, but not very hard, to map whatever style information the user passes onto the underlying representation, and to attach commentary about the openpyxl limitations we know of to whatever exceptions bubble up. However, I don't know how people tend to use this, and I haven't yet found a clear spec in the pandas docs for an engine-agnostic, user-facing style API (maybe there isn't and shouldn't be one).The other main piece is the merge behavior. I've had more trouble wrapping my head around how this should work. The new-style
Style
class has acopy
method that accepts coarse substitutions. For example, conceptually if not correct syntactically, consider this snippet.s2
will get the redColor
object froms1
but itsfont
attribute will have been overridden with a completely newFont
object that has the default face,'Calibri'
, at a non-default size, 12 pt. Make sense?I think this is a fine set of semantics to adopt for the internal style merging behavior of pandas, because it keeps the padding between the user and the engine implementation thin. I'm not sure what that implies for the openpyxl v1 case. At the least, it may be convenient to model a compatibility shim as a backport of the
copy
method applied to v1Style
objects.neirbowj commentedon Jun 21, 2014
It occurred to me belatedly to wonder, since I have never used
pandas.io.excel
or its main consumer, theto_excel
method, whether the styling capability ofExcelWriter
is really intended to be user facing? The documentation does not mention it, though the docstring for theExcelWriter.write_cells
method clearly indicates acceptance of formatted (i.e. styled) cells. In what follows, I will be assuming that this is supposed to be a user-facing capability that is simply not yet documented as such. If it isn't, please say so because it might provide some useful wiggle room to address the other areas where I'm getting stuck.The other thing I finally realized I didn't really know for sure, and need to, is whether the style dict those formatted cells carry around is required to be engine-agnostic, or if it can vary from engine to engine. I guessed that the intention was the former, and the uniformity of the usage examples --- the only internal user in
pandas.core.format
and the applicable test cases for the openpyxl and xlwt engines --- seem to substantiate my guess. However, the style dicts in the test cases are insufficient for me to tell what I should write if I were to write a style dict spec/doc in the spirit of earlier authors. For example, I can tell that the spec has to allow the font to be set to bold or not, but I cannot tell whether font colors are to be supported, and, if so, how they are to be encoded.If it were up to me, I think I would choose a very limited subset of all possible styling available in the underlying Excel, and call that the engine-agnostic,
pandas
dialect. By making the subset small and simple, the task gets easier of mapping it onto the supported versions of the existing engine libraries and any future versions or new engines. As I said, I don't know how users tend to usepandas.io.excel
so I'm basing this judgement on the notion that somebody who doesn't wish to specify a particular engine, probably won't wish to specify a great deal of formatting information either, if any.To accommodate users who do want to exert a great deal of control over cell styling, an additional, separable feature could be to teach
ExcelWriter.write_cells
to accept cells that come with attached style objects that are native to the selected engine. That is, if you want access to the full styling capability of the underlying library, then you have to build the style objects yourself instead of describing them topandas
as dicts.A related question. Does pandas have some sort of policy, roadmap, or procedure for determining version support? That is, how will we know when we can drop support for the old version of openpyxl?
I welcome discussion or even just direction.
jreback commentedon Jun 21, 2014
to your last points
we try maintain support for as long as possible with dep libraries until it becomes too painful :)
if it's easy enough then no biggie
openpyxl 1.6 I think is the min version but only. 1 year old I think
it's becoming pretty easy IMHO to upgrade individual deps easily
so if their is a recommendation to drop support for something we can discuss
note that pandas does still supports numpy 1.6.1 (and this is painful!)
neirbowj commentedon Jun 21, 2014
@jreback I'm not suggesting that we drop support for anything at the moment. I'm just trying to get a sense of how long the life of an openpyxl v1/v2 compatibility shim might be.
jreback commentedon Jun 21, 2014
one way to do this might be to make a openpyxl_v2 engine (and then depending on the installation choose one or the other)
that way u can rebuild the API as needed
neirbowj commentedon Jun 21, 2014
Yes, that's a good point. Splitting the implementation at the engine level would probably be better. I still think that we need a spec and documentation for the style dict, both of which would be informed by whether it is intended for internal use only, or is an external feature that we also use internally.
neirbowj commentedon Jun 25, 2014
The PR I just posted still has some rough edges (see, for example,
pandas.io.excel._Openpyxl2Writer._convert_to_fill
), but should give us a leg up on this issue.I find that openpyxl is maddeningly full of corner cases and reasons why I can't do something simple and uniform. If anybody sees a missed opportunity to refactor and consolidate this code, I welcome suggestions or even alternative PRs.