Skip to content

BUG: DataFrame[Sparse] quantile fails because SparseArray has no reshape  #24600

@jbrockmendel

Description

@jbrockmendel
Member

Tried to simplify Block.quantile by arranging for it to only have to handle 2D case by having Series.quantile dispatch to DataFrame implementation. Ended up getting failures in pandas/tests/series/test_quantile.py test_quantile_sparse

ser = pd.Series([0., None, 1., 2.], dtype='Sparse[float]')
df = pd.DataFrame(ser)

>>> ser.quantile(0.5)
1.0
>>> ser.quantile([0.5])
0.5    1.0
dtype: float64
>>> df.quantile(0.5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/frame.py", line 7760, in quantile
    transposed=is_transposed)
  File "pandas/core/internals/managers.py", line 500, in quantile
    return self.reduction('quantile', **kwargs)
  File "pandas/core/internals/managers.py", line 432, in reduction
    axe, block = getattr(b, f)(axis=axis, axes=self.axes, **kwargs)
  File "pandas/core/internals/blocks.py", line 1530, in quantile
    result = _nanpercentile(values, qs * 100, axis=axis, **kw)
  File "pandas/core/internals/blocks.py", line 1484, in _nanpercentile
    mask = mask.reshape(values.shape)
AttributeError: 'SparseArray' object has no attribute 'reshape'
>>> df.quantile([0.5])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/frame.py", line 7760, in quantile
    transposed=is_transposed)
  File "pandas/core/internals/managers.py", line 500, in quantile
    return self.reduction('quantile', **kwargs)
  File "pandas/core/internals/managers.py", line 432, in reduction
    axe, block = getattr(b, f)(axis=axis, axes=self.axes, **kwargs)
  File "pandas/core/internals/blocks.py", line 1511, in quantile
    axis=axis, **kw)
  File "pandas/core/internals/blocks.py", line 1484, in _nanpercentile
    mask = mask.reshape(values.shape)
AttributeError: 'SparseArray' object has no attribute 'reshape'

datetime64[ns, tz] breaks in a slightly different way (presumably all ExtensionBlocks will fail):

dti = pd.date_range('2016-01-01', periods=3, tz='US/Pacific')

ser = pd.Series(dti)
df = pd.DataFrame(ser)

>>> df.quantile(0.5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/frame.py", line 7760, in quantile
    transposed=is_transposed)
  File "pandas/core/internals/managers.py", line 500, in quantile
    return self.reduction('quantile', **kwargs)
  File "pandas/core/internals/managers.py", line 473, in reduction
    values = _concat._concat_compat([b.values for b in blocks])
  File "pandas/core/dtypes/concat.py", line 174, in _concat_compat
    return np.concatenate(to_concat, axis=axis)
ValueError: need at least one array to concatenate
>>> df.quantile([0.5])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/frame.py", line 7760, in quantile
    transposed=is_transposed)
  File "pandas/core/internals/managers.py", line 500, in quantile
    return self.reduction('quantile', **kwargs)
  File "pandas/core/internals/managers.py", line 473, in reduction
    values = _concat._concat_compat([b.values for b in blocks])
  File "pandas/core/dtypes/concat.py", line 174, in _concat_compat
    return np.concatenate(to_concat, axis=axis)
ValueError: need at least one array to concatenate

xref #24583

Activity

jbrockmendel

jbrockmendel commented on Jan 4, 2019

@jbrockmendel
MemberAuthor

IntNA is also a catastrophe for quantile

TomAugspurger

TomAugspurger commented on Jan 4, 2019

@TomAugspurger
Contributor

Do you think this will need to be pushed down to the array for ExtensionArrays?

jbrockmendel

jbrockmendel commented on Jan 4, 2019

@jbrockmendel
MemberAuthor

Do you think this will need to be pushed down to the array for ExtensionArrays?

quantile itself? Probably not. For SparseArray a patch is now in place that avoids the immediate problem. For IntNA it looks like the problem is in _try_coerce_result not handling things correctly. For DatetimeTZBlock the problem is in _concat._concat_compat. It's eclectic.

I think we'll want to define _try_coerce_result (and possibly _try_coerce_args, not sure) in terms of _holder._from_sequence (and possibly _holder._unbox_scalar or something resembling _scalar_from_string).

pglopezamaya

pglopezamaya commented on Apr 23, 2019

@pglopezamaya

Any news on the SparseArray' object has no attribute 'reshape' patch?

mroeschke

mroeschke commented on Apr 5, 2020

@mroeschke
Member

These cases look to work on master. Could use a test

In [57]: ser = pd.Series([0., None, 1., 2.], dtype='Sparse[float]')
    ...: df = pd.DataFrame(ser)

In [58]: df.quantile(0.5)
Out[58]:
0    1.0
Name: 0.5, dtype: float64

In [59]: dti = pd.date_range('2016-01-01', periods=3, tz='US/Pacific')
    ...:
    ...: ser = pd.Series(dti)
    ...: df = pd.DataFrame(ser)

In [60]: df.quantile(0.5)
Out[60]: Series([], Name: 0.5, dtype: float64)

In [61]: pd.__version__
Out[61]: '1.1.0.dev0+1108.gcad602e16'
added
Needs TestsUnit test(s) needed to prevent regressions
and removed
Numeric OperationsArithmetic, Comparison, and Logical operations
SparseSparse Data Type
on Apr 5, 2020
added this to the 1.1 milestone on Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      Participants

      @jreback@TomAugspurger@jbrockmendel@mroeschke@pglopezamaya

      Issue actions

        BUG: DataFrame[Sparse] quantile fails because SparseArray has no reshape · Issue #24600 · pandas-dev/pandas