Skip to content

By-keyword, Inconcistencies between the new df.plot.hist and its df counterpart #11483

Open
@Twizzledrizzle

Description

@Twizzledrizzle

Found this when working on

#11441

import pandas as pd
d = {'one' : ['A', 'A', 'B', 'B', 'C'],
     'two' : [4., 3., 2., 2, 1],
     'three' : [10., 8., 3, 5, 7.]}     
df = pd.DataFrame(d)

# this works
df.hist('two', by='one', bins=range(0, 10))

# this does not work (everything in one plot), also no way to specify column
df.plot.hist(by='one', bins=range(0, 10))

My idea was to make the df.plot.hist version similar to the df.hist. But the code is much more complex. Would it not be best to point the df.plot.hist to the df.hist version? Instead of having two separate logics for this?

Oh, and the by keyword does not seem to work for df.plot.box, have not found any it worked for. At least the way I expected it to work :)

Activity

sinhrks

sinhrks commented on Oct 31, 2015

@sinhrks
Member

Related to #8018 (internally it splits data to groups).

by behaves differently in df.hist (subplots) and df.box (grouping in a sincle ax). Thus, I don't think porting these behavior to plot is not good idea. We should decide how by should work.

Twizzledrizzle

Twizzledrizzle commented on Oct 31, 2015

@Twizzledrizzle
Author

Oh! I missed your work completely when looking through the pull requests. It looks really nice.

I did not know about the groupby().hist, or groupby().plot.hist. I guess I would expect if having the by-keyword, we would get the same results.

Also, can you take a look at my pull request: #11441

I am trying to get a better implementation of the weighs keyword, and also work even though you have different nan's in the data & weights. But if this could be integrated in your solution I would be sooo happy

For example

df.plot.hist(column='two', by='one', weights='three')
# or
df.groupby('one').plot.hist(column='two', weighs='three')

and when plotting multiple data, perhaps like below

df.plot.hist(column=['data1', 'data2'], by='one', weights=['weights1', 'weighs2'])
Twizzledrizzle

Twizzledrizzle commented on Oct 31, 2015

@Twizzledrizzle
Author

And plotted in the same graph, if not your new keyword subplots=True is used?

Twizzledrizzle

Twizzledrizzle commented on Nov 1, 2015

@Twizzledrizzle
Author

@sinhrks I tried to pull your changes into my own dev environment to test out various things with weighs, but alas I failed :(

I did not find your group by repository. Can you publish it again? It would be really really fun trying your great looking additions out in the hope I can contribute a little back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @sinhrks@Twizzledrizzle@jbrockmendel@mroeschke

        Issue actions

          By-keyword, Inconcistencies between the new df.plot.hist and its df counterpart · Issue #11483 · pandas-dev/pandas