Skip to content

Inherit pandas classes from abc.collections classes? #12056

@max-sixty

Description

@max-sixty
Contributor

What do people think about inheriting pandas classes from abc.collections classes?

I think the main ones are DataFrame & Series inheriting from Mapping and Index from Sequence.

This would enable other libraries to infer classes' behavior in a tighter way, for example to work out whether an object is list-like but not dict-like, they can check the types, rather than hasattr(obj, '__iter__') and not hasattr(obj, 'values'). And it doesn't cost anything.

But as far as I'm aware, other libraries in the pydata ecosystem don't do this, so this would be out of the norm.

https://docs.python.org/3/library/collections.abc.html

Activity

kawochen

kawochen commented on Jan 15, 2016

@kawochen
Contributor

I think you have misunderstood abc. These classes have special __subclasshook__ so that you don't have to inherit from them to be a subclass of them.
e.g.

In [172]: issubclass(pandas.Series, collections.Container)
Out[172]: True

So perhaps we can think about whether we want to make pandas objects satisfy the necessary contracts.

max-sixty

max-sixty commented on Jan 15, 2016

@max-sixty
ContributorAuthor

Cheers @kawochen. It looks like Mapping doesn't implement that (its __subclasshook__ falls back to Sized).
I'm not sure why that is. It could be because it has some Mixin methods on top of its abstract methods? Whatever the reason, I'm fairly confident DataFrame satisfies the contracts

In [21]: df = pd.DataFrame()

In [22]: isinstance(df, collections.Mapping)
Out[22]: False

In [23]: isinstance(df, collections.Sized)
Out[23]: True
kawochen

kawochen commented on Jan 15, 2016

@kawochen
Contributor

@MaximilianR nope it doesn't have items
edit: OK actually I don't know why.

kawochen

kawochen commented on Jan 15, 2016

@kawochen
Contributor

I agree hasattr isn't pretty. We can think about defining a few custom abstract base classes for pandas using __subclasshook__ to replace those and functions like is_dict_like. I don't know about performance, but it's probably just an extra indirection.

max-sixty

max-sixty commented on Jan 15, 2016

@max-sixty
ContributorAuthor

@kawochen have a look at the Mapping.__subclasshook__ - it can never return True:

In [27]: collections.Mapping.__subclasshook__??
Signature: collections.Mapping.__subclasshook__(C)
Source:
    @classmethod
    def __subclasshook__(cls, C):
        if cls is Sized:
            if any("__len__" in B.__dict__ for B in C.__mro__):
                return True
        return NotImplemented
File:      /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/_collections_abc.py
Type:      method

So I think for these, you would need to inherit from Mapping. Still open to being wrong though

kawochen

kawochen commented on Jan 15, 2016

@kawochen
Contributor

oh you are right. It looks like you'd need to call collections.Mapping.register(DataFrame)

max-sixty

max-sixty commented on Jan 15, 2016

@max-sixty
ContributorAuthor

Or add that as one of the classes DataFrame inherits from. That's fairly canonical, there's an example on the collections page: https://docs.python.org/3/library/collections.abc.html (although the benefits described are slightly orthogonal)

kawochen

kawochen commented on Jan 15, 2016

@kawochen
Contributor

Hmm DataFrame.values is not callable though.

max-sixty

max-sixty commented on Jan 15, 2016

@max-sixty
ContributorAuthor

That's true. It does break the contract a bit. I still think it's better than what exists now, albeit not perfect.

And .values -> .data in pandas 1.0??

mitar

mitar commented on Sep 20, 2017

@mitar
Contributor

I also think that pandas.DataFrame should register with abstract classes.

I think the following would work:

Sequence.register(pandas.DataFrame)

To my understanding it exposes everything necessary.

keijak

keijak commented on Apr 16, 2019

@keijak

Making DataFrame a subclass of Mapping would be problematic due to the semantics of DataFrame__len__.
len(df) returns the number of rows (=len(df.index)). It needs to be the number of columns to match the dict behavior.

added
InternalsRelated to non-user accessible pandas implementation
RefactorInternal refactoring of code
on May 27, 2019
added this to the No action milestone on Jan 1, 2020
jreback

jreback commented on Jan 1, 2020

@jreback
Contributor

I don't think this is actually possible as we have some named properties e.g. index which have confliciing meaning with .index() iterables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    InternalsRelated to non-user accessible pandas implementationRefactorInternal refactoring of code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @mitar@jreback@max-sixty@keijak@mroeschke

        Issue actions

          Inherit pandas classes from abc.collections classes? · Issue #12056 · pandas-dev/pandas