Skip to content

Using 'by' and 'weights' together with DataFrame.hist() results in ValueError: weights should have the same shape as x #9540

Open
@awhan

Description

@awhan

Wanted to produce grouped histogram such that the heights of the bars add up to 1. The following code results in ValueError: weights should have the same shape as x

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(123)

n = 100
df = pd.DataFrame(np.random.randn(n), columns=['a'])
by = np.random.randint(1,5,n)
df.hist(by=by) # works
plt.show()
weights = np.repeat(1/len(df), len(df))
df.hist(weights = weights) # works
plt.show()
df.hist(by = by, weights = weights) # does not work
plt.show()

In [15]: pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.4.2.final.0
python-bits: 64
OS: Linux
OS-release: 3.18.6-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.15.2
nose: 1.3.4
Cython: None
numpy: 1.9.1
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 2.4.1
sphinx: None
patsy: 0.3.0
dateutil: 2.4.0
pytz: 2014.10
bottleneck: 1.0.0
tables: None
numexpr: 2.4
matplotlib: 1.4.3
openpyxl: 1.8.6
xlrd: 0.9.3
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: 2.5.6
sqlalchemy: None
pymysql: None
psycopg2: None

Activity

mgdadv

mgdadv commented on Feb 27, 2015

@mgdadv

Could you clarify a bit more what you are trying to achieve?

The by splits the original data into groups. df.hist() then calls the matplotlib histogram function for each group with the original weights. In your case the size of each by-group will be random and different. The weights however always are of length 100.

The by and weights combination seems to work if the groups all have the same size and match the weights as in this example:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(123)

n = 100
df = pd.DataFrame(np.random.randn(n), columns=['a'])

by = np.repeat([1,2,3,4,5], 20)
weights = np.repeat(1/20., 20)
df.hist(by = by, weights = weights)
plt.show()
awhan

awhan commented on Mar 1, 2015

@awhan
Author

Thanks @mgdadv for the reply. Yes you understand exactly what I wanted to achieve and yes I did guess that the weights and data size within the groups probably did not match. If this is not a bug (as I thought) could it be a feature request then?

Twizzledrizzle

Twizzledrizzle commented on Sep 8, 2015

@Twizzledrizzle

I think #11028 will fix this

MaxGhenis

MaxGhenis commented on Jan 27, 2020

@MaxGhenis

#11028 became #11441, which was closed as stale. It'd be great to have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @awhan@jreback@MaxGhenis@Twizzledrizzle@mroeschke

      Issue actions

        Using 'by' and 'weights' together with DataFrame.hist() results in ValueError: weights should have the same shape as x · Issue #9540 · pandas-dev/pandas