-
-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Closed
Labels
EnhancementExtensionArrayExtending pandas with custom dtypes or arrays.Extending pandas with custom dtypes or arrays.Numeric OperationsArithmetic, Comparison, and Logical operationsArithmetic, Comparison, and Logical operationsReduction Operationssum, mean, min, max, etc.sum, mean, min, max, etc.
Description
Related to #6898.
I find it very convenient to use a DataFrame of ufloat
from the uncertainties
package. Each entry consists of (value, error) and could represent the result of Monte Carlo simulations or an experiment.
At present taking sums along both axes gives the expected result, but taking the mean does not.
import pandas as pd
import numpy as np
from uncertainties import unumpy
value = np.arange(12).reshape(3,4)
err = 0.01 * np.arange(12).reshape(3,4) + 0.005
data = unumpy.uarray(value, err)
df = pd.DataFrame(data, index=['r1', 'r2', 'r3'], columns=['c1', 'c2', 'c3', 'c4'])
Examples:
print (df)
c1 c2 c3 c4
r1 0.000+/-0.005 1.000+/-0.015 2.000+/-0.025 3.000+/-0.035
r2 4.00+/-0.04 5.00+/-0.06 6.00+/-0.07 7.00+/-0.08
r3 8.00+/-0.09 9.00+/-0.10 10.00+/-0.11 11.00+/-0.12
df.sum(axis=0) # This works
c1 12.00+/-0.10
c2 15.00+/-0.11
c3 18.00+/-0.13
c4 21.00+/-0.14
dtype: object
df.sum(axis=1) # This works
r1 6.00+/-0.05
r2 22.00+/-0.12
r3 38.00+/-0.20
dtype: object
df.mean(axis=0) # This does not work
Series([], dtype: float64)
Expected (`df.apply(lambda x: x.sum() / x.size)`)
c1 4.000+/-0.032
c2 5.00+/-0.04
c3 6.00+/-0.04
c4 7.00+/-0.05
dtype: object
df.mean(axis=1) # This does not work
r1 NaN
r2 NaN
r3 NaN
dtype: float64
Expected (`df.T.apply(lambda x: x.sum() / x.size)`)
r1 1.500+/-0.011
r2 5.500+/-0.031
r3 9.50+/-0.05
dtype: object
rth, JesterEE and chicolucio
Metadata
Metadata
Assignees
Labels
EnhancementExtensionArrayExtending pandas with custom dtypes or arrays.Extending pandas with custom dtypes or arrays.Numeric OperationsArithmetic, Comparison, and Logical operationsArithmetic, Comparison, and Logical operationsReduction Operationssum, mean, min, max, etc.sum, mean, min, max, etc.
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
jreback commentedon Sep 6, 2016
this is very much like #13446 . Since pandas doesn't know that an
uncertainity
is numeric it cannot deal with it, similar toDecimal
.Without a custom dtype, or special support baked into
object
dtypes, this is not supported.If someone wanted to contribute this functionaility then that would be great. Conceptually this is very easy, but there are lots of implementation details.
lebigot commentedon Sep 6, 2016
@jreback Do I understand correctly that there is nothing that the uncertainties module can do to solve this issue?
jreback commentedon Sep 6, 2016
I have no idea
if u want t dig in and see would be great
shoyer commentedon Sep 6, 2016
A useful first step would be to see if you can reproduce the issue with
numpy
alone (not using pandas).bgatessucks commentedon Sep 6, 2016
@shoyer No issue with
numpy
alone:shoyer commentedon Sep 6, 2016
@bgatessucks what is the type/dtype of
unumpy.uarray
? Is it a numpy array withdtype=object
?bgatessucks commentedon Sep 6, 2016
@shoyer
type(data)
is<type 'numpy.ndarray'>
.shoyer commentedon Sep 6, 2016
And
data.dtype
?shoyer commentedon Sep 6, 2016
I just wanted to be sure that you're not using subclassing or something else like that.
In any case, I think this is probably a pandas bug (but would need someone to work through/figure out). We should have a fallback implementation of
mean
(like NumPy's mean) that works on object arrays.7 remaining items
MichaelTiemannOSC commentedon Oct 17, 2022
Was this removed from the Someday milestone because it's more definitive than that now? I've just done a bunch of work to make
uncertainties
work withPint
andPint-Pandas
, and am seeing that some work needs to be done inPandas
as well. Just taking the temperature on how open that door might be.hgrecco/pint#1615
hgrecco/pint-pandas#140
jbrockmendel commentedon Apr 22, 2023
We stopped using the "Someday" label entirely.
I'm getting the same behavior on main as in the OP. Looks like the data is an object-type np.ndarray. As jreback said in 2016, this would need some special handling (probably in core.nanops). A PR would be welcome.
Something like pint-pandas would probably be a better user experience than an object-dtype.
jbrockmendel commentedon Jun 13, 2023
@topper-123 this might be closed by your reduce_wrap PR?
topper-123 commentedon Jun 19, 2023
Sorry for the slow reply, I had a big project before going on a family vacation (which will last until the end of this week). but yes, #52788 will allow extension arrays like pint-pandas to use
_reduce_wrap
to control the dtype of reduction results.mroeschke commentedon Jul 13, 2023
Closed by #52788