Skip to content

Unexpected results for the mean of a DataFrame of ufloat from the uncertainties package. #14162

@bgatessucks

Description

@bgatessucks

Related to #6898.

I find it very convenient to use a DataFrame of ufloat from the uncertainties package. Each entry consists of (value, error) and could represent the result of Monte Carlo simulations or an experiment.

At present taking sums along both axes gives the expected result, but taking the mean does not.

import pandas as pd
import numpy as np
from uncertainties import unumpy

value = np.arange(12).reshape(3,4)
err = 0.01 * np.arange(12).reshape(3,4) + 0.005

data = unumpy.uarray(value, err)

df = pd.DataFrame(data, index=['r1', 'r2', 'r3'], columns=['c1', 'c2', 'c3', 'c4'])

Examples:

print (df)
               c1             c2             c3             c4
r1  0.000+/-0.005  1.000+/-0.015  2.000+/-0.025  3.000+/-0.035
r2    4.00+/-0.04    5.00+/-0.06    6.00+/-0.07    7.00+/-0.08
r3    8.00+/-0.09    9.00+/-0.10   10.00+/-0.11   11.00+/-0.12

df.sum(axis=0) # This works

c1    12.00+/-0.10
c2    15.00+/-0.11
c3    18.00+/-0.13
c4    21.00+/-0.14
dtype: object

df.sum(axis=1) # This works

r1     6.00+/-0.05
r2    22.00+/-0.12
r3    38.00+/-0.20
dtype: object

df.mean(axis=0) # This does not work

Series([], dtype: float64)

Expected (`df.apply(lambda x: x.sum() / x.size)`)

c1    4.000+/-0.032
c2      5.00+/-0.04
c3      6.00+/-0.04
c4      7.00+/-0.05
dtype: object

df.mean(axis=1) # This does not work

r1   NaN
r2   NaN
r3   NaN
dtype: float64

Expected (`df.T.apply(lambda x: x.sum() / x.size)`)

r1    1.500+/-0.011
r2    5.500+/-0.031
r3      9.50+/-0.05
dtype: object

Activity

jreback

jreback commented on Sep 6, 2016

@jreback
Contributor

this is very much like #13446 . Since pandas doesn't know that an uncertainity is numeric it cannot deal with it, similar to Decimal.

Without a custom dtype, or special support baked into object dtypes, this is not supported.

If someone wanted to contribute this functionaility then that would be great. Conceptually this is very easy, but there are lots of implementation details.

added this to the Someday milestone on Sep 6, 2016
lebigot

lebigot commented on Sep 6, 2016

@lebigot
Contributor

@jreback Do I understand correctly that there is nothing that the uncertainties module can do to solve this issue?

jreback

jreback commented on Sep 6, 2016

@jreback
Contributor

I have no idea
if u want t dig in and see would be great

shoyer

shoyer commented on Sep 6, 2016

@shoyer
Member

A useful first step would be to see if you can reproduce the issue with numpy alone (not using pandas).

bgatessucks

bgatessucks commented on Sep 6, 2016

@bgatessucks
Author

@shoyer No issue with numpy alone:

import pandas as pd
import numpy as np
from uncertainties import unumpy

value = np.arange(12).reshape(3,4)
err = 0.01 * np.arange(12).reshape(3,4) + 0.005

data = unumpy.uarray(value, err)

df = pd.DataFrame(data, index=['r1', 'r2', 'r3'], columns=['c1', 'c2', 'c3', 'c4'])

print (df.apply(lambda x: x.sum() / x.size).values), "\n"

print (data.mean(axis=0)), "\n"

print (df.T.apply(lambda x: x.sum() / x.size).values), "\n"

print (data.mean(axis=1))
shoyer

shoyer commented on Sep 6, 2016

@shoyer
Member

@bgatessucks what is the type/dtype of unumpy.uarray? Is it a numpy array with dtype=object?

bgatessucks

bgatessucks commented on Sep 6, 2016

@bgatessucks
Author

@shoyer

type(data) is <type 'numpy.ndarray'>.

shoyer

shoyer commented on Sep 6, 2016

@shoyer
Member

And data.dtype?

shoyer

shoyer commented on Sep 6, 2016

@shoyer
Member

I just wanted to be sure that you're not using subclassing or something else like that.

In any case, I think this is probably a pandas bug (but would need someone to work through/figure out). We should have a fallback implementation of mean (like NumPy's mean) that works on object arrays.

7 remaining items

added
Numeric OperationsArithmetic, Comparison, and Logical operations
Nuisance ColumnsIdentifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply
on Sep 20, 2020
added
ExtensionArrayExtending pandas with custom dtypes or arrays.
and removed
Dtype ConversionsUnexpected or buggy dtype conversions
Nuisance ColumnsIdentifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply
on May 1, 2021
removed this from the Someday milestone on Oct 13, 2022
MichaelTiemannOSC

MichaelTiemannOSC commented on Oct 17, 2022

@MichaelTiemannOSC
Contributor

Was this removed from the Someday milestone because it's more definitive than that now? I've just done a bunch of work to make uncertainties work with Pint and Pint-Pandas, and am seeing that some work needs to be done in Pandas as well. Just taking the temperature on how open that door might be.

hgrecco/pint#1615
hgrecco/pint-pandas#140

jbrockmendel

jbrockmendel commented on Apr 22, 2023

@jbrockmendel
Member

Was this removed from the Someday milestone because it's more definitive than that now

We stopped using the "Someday" label entirely.

I'm getting the same behavior on main as in the OP. Looks like the data is an object-type np.ndarray. As jreback said in 2016, this would need some special handling (probably in core.nanops). A PR would be welcome.

Something like pint-pandas would probably be a better user experience than an object-dtype.

jbrockmendel

jbrockmendel commented on Jun 13, 2023

@jbrockmendel
Member

@topper-123 this might be closed by your reduce_wrap PR?

topper-123

topper-123 commented on Jun 19, 2023

@topper-123
Contributor

Sorry for the slow reply, I had a big project before going on a family vacation (which will last until the end of this week). but yes, #52788 will allow extension arrays like pint-pandas to use _reduce_wrap to control the dtype of reduction results.

mroeschke

mroeschke commented on Jul 13, 2023

@mroeschke
Member

Closed by #52788

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @lebigot@rth@marcus-r-kelly@jreback@shoyer

        Issue actions

          Unexpected results for the mean of a DataFrame of ufloat from the uncertainties package. · Issue #14162 · pandas-dev/pandas