-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
InternalsRelated to non-user accessible pandas implementationRelated to non-user accessible pandas implementationRefactorInternal refactoring of codeInternal refactoring of code
Description
What do people think about inheriting pandas classes from abc.collections
classes?
I think the main ones are DataFrame
& Series
inheriting from Mapping
and Index
from Sequence
.
This would enable other libraries to infer classes' behavior in a tighter way, for example to work out whether an object is list-like but not dict-like, they can check the types, rather than hasattr(obj, '__iter__') and not hasattr(obj, 'values')
. And it doesn't cost anything.
But as far as I'm aware, other libraries in the pydata ecosystem don't do this, so this would be out of the norm.
evhubankostis
Metadata
Metadata
Assignees
Labels
InternalsRelated to non-user accessible pandas implementationRelated to non-user accessible pandas implementationRefactorInternal refactoring of codeInternal refactoring of code
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
kawochen commentedon Jan 15, 2016
I think you have misunderstood
abc
. These classes have special__subclasshook__
so that you don't have to inherit from them to be a subclass of them.e.g.
So perhaps we can think about whether we want to make
pandas
objects satisfy the necessary contracts.max-sixty commentedon Jan 15, 2016
Cheers @kawochen. It looks like
Mapping
doesn't implement that (its__subclasshook__
falls back to Sized).I'm not sure why that is. It could be because it has some Mixin methods on top of its abstract methods? Whatever the reason, I'm fairly confident DataFrame satisfies the contracts
kawochen commentedon Jan 15, 2016
@MaximilianR nope it doesn't have
items
edit: OK actually I don't know why.
kawochen commentedon Jan 15, 2016
I agree
hasattr
isn't pretty. We can think about defining a few custom abstract base classes forpandas
using__subclasshook__
to replace those and functions likeis_dict_like
. I don't know about performance, but it's probably just an extra indirection.max-sixty commentedon Jan 15, 2016
@kawochen have a look at the
Mapping.__subclasshook__
- it can never returnTrue
:So I think for these, you would need to inherit from
Mapping
. Still open to being wrong thoughkawochen commentedon Jan 15, 2016
oh you are right. It looks like you'd need to call
collections.Mapping.register(DataFrame)
max-sixty commentedon Jan 15, 2016
Or add that as one of the classes
DataFrame
inherits from. That's fairly canonical, there's an example on the collections page: https://docs.python.org/3/library/collections.abc.html (although the benefits described are slightly orthogonal)kawochen commentedon Jan 15, 2016
Hmm
DataFrame.values
is not callable though.max-sixty commentedon Jan 15, 2016
That's true. It does break the contract a bit. I still think it's better than what exists now, albeit not perfect.
And
.values
->.data
in pandas 1.0??mitar commentedon Sep 20, 2017
I also think that
pandas.DataFrame
should register with abstract classes.I think the following would work:
To my understanding it exposes everything necessary.
keijak commentedon Apr 16, 2019
Making DataFrame a subclass of Mapping would be problematic due to the semantics of DataFrame__len__.
len(df) returns the number of rows (=len(df.index)). It needs to be the number of columns to match the dict behavior.
jreback commentedon Jan 1, 2020
I don't think this is actually possible as we have some named properties e.g. index which have confliciing meaning with .index() iterables.