Skip to content

MultiIndex.dropna() does not always drop NANs #19387

@jzwinck

Description

@jzwinck
Contributor

A MultiIndex "label" of -1 means NAN (though I have found no documentation of this). For example:

>>> pd.MultiIndex.from_arrays([['a', 'a'], ['x', np.nan]])
MultiIndex(levels=[['a'], ['x']],
           labels=[[0, 0], [0, -1]])

A MultiIndex can also be constructed with NAN values in levels:

>>> pd.MultiIndex(levels=[['a'], ['x', np.nan]], labels=[[0, 0], [0, 1]])
MultiIndex(levels=[['a'], ['x', nan]],
           labels=[[0, 0], [0, 1]])

MultiIndex.dropna() works for the first case, but does nothing for the second:

>>> pd.MultiIndex(levels=[['a'], ['x', np.nan]], labels=[[0,0], [0,1]]).dropna()
MultiIndex(levels=[['a'], ['x', nan]],
           labels=[[0, 0], [0, 1]])

It appears that MultiIndex.dropna() only drops rows whose label is -1, but not rows whose level is actually NAN. It should drop both types of rows, so the result should be:

MultiIndex(levels=[['a'], ['x']],
           labels=[[0], [0]])

I am using Pandas 0.20.3, NumPy 1.13.1, and Python 3.5.

Activity

jreback

jreback commented on Jan 25, 2018

@jreback
Contributor

i believe this is fixed in 0.22 or master
pls give a try

howsiwei

howsiwei commented on May 15, 2019

@howsiwei
Contributor

@jreback I just checked and it's fixed in neither 0.22 nor master.

added this to the 0.25.0 milestone on May 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      Participants

      @jreback@jzwinck@howsiwei@jbrockmendel

      Issue actions

        MultiIndex.dropna() does not always drop NANs · Issue #19387 · pandas-dev/pandas