Skip to content

Feature request: df that only return dfs when indexing. #3237

@twiecki

Description

@twiecki
Contributor

With more complex dataframes, I often stumble over this:

In [1]: x = pandas.DataFrame({'a': [1,2,3], 'b': [4,5,6]})
In [2]: x.ix[0].a
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-8358f19a1e70> in <module>()
----> 1 x.ix[0].a

AttributeError: 'Series' object has no attribute 'a'

In [3]: x.a.ix[0]
Out[3]: 1

This is just an example, at other times I end up converting the Series back to a DataFrame because that's what the rest of the code expects.

I know that there have been attempts to solve this issue by adding attribute lookup to Series (e.g. #1904) but they seem to come with a performance penalty.

Often, however, I care more about expressiveness than performance. I thus propose the addition of an option like:

x = DataFrame(data, slicing_returns_df=True)

Which will cause x.ix[0] or x.a to return again a DataFrame rather than a Series and make the above work. The default would be False so that there are no backward compat issues. Alternatively there could be a new DataFrame class that inherits from DataFrame and has the desired behavior.

I'm happy to gives this a crack, however, I wanted to first make sure that it's not only me who thinks that'd be a good idea or that this can't work for obvious reason X.

Activity

jreback

jreback commented on Apr 2, 2013

@jreback
Contributor

are you looking for something like this? (this is 0.11-dev)
equiv in 0.10.1 is x.get_value(0,'a')

In [6]: x.at[0,'a']
Out[6]: 1
jreback

jreback commented on Apr 2, 2013

@jreback
Contributor

And if you ALWAYS want to force a df to return (the above is ALWAYS a scalar)

In [12]: x.loc[[0],['a']]
Out[12]: 
   a
0  1
twiecki

twiecki commented on Apr 2, 2013

@twiecki
ContributorAuthor

.get_value() does not seem to support multi-indices x.get_value([0,1], 'a').

However, x.loc seems to do exactly what I need. Thanks!

jreback

jreback commented on Apr 2, 2013

@jreback
Contributor

@twiecki great!

yes, by definition at (and get_value) return are for fast scalar access, see: http://pandas.pydata.org/pandas-docs/dev/indexing.html#fast-scalar-value-getting-and-setting

while loc (and ix) are more flexibile label based (and have a small performance penatly in order to figure out what you are after)

twiecki

twiecki commented on May 13, 2013

@twiecki
ContributorAuthor

I tried this again just now but df.loc[0,:] returns a series again. Was this changed by any chance? This is with current master.

jreback

jreback commented on May 13, 2013

@jreback
Contributor

You asked for a Series, this will always return a series, enclose the list of rows with a [] to always get a frame

and FYI, the 0 is a label here (and not a location!),

In [1]: df = DataFrame(randn(5,2),columns=list('AB'))

In [2]: df.loc[0,:]
Out[2]: 
A   -1.080064
B   -0.499382
Name: 0, dtype: float64

In [3]: df.loc[[0],:]
Out[3]: 
          A         B
0 -1.080064 -0.499382
twiecki

twiecki commented on May 13, 2013

@twiecki
ContributorAuthor

Perfect. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @twiecki@jreback

        Issue actions

          Feature request: df that only return dfs when indexing. · Issue #3237 · pandas-dev/pandas