Skip to content

Inconsistent behavior of .map() #20495

@readyready15728

Description

@readyready15728
Contributor

Code Sample, a copy-pastable example if possible

pd.Series(list('abc') + [np.nan]).map({np.nan: 'bar'})

Result:

0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

Compare:

pd.Series(list('abc') + [np.nan]).map({'a': 'foo', np.nan: 'bar'})

Result:

0    foo
1    NaN
2    NaN
3    bar
dtype: object

Problem description

If only np.nan is present in the mapping, then the associated mapping is not carried out. If another value is present, then it is carried out. Forgive me if this is intended behavior for some reason.

Expected Output

I'm not sure what is appropriate here.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 32 OS: Linux OS-release: 4.4.0-116-generic machine: i686 processor: i686 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: None
pip: 8.1.1
setuptools: 20.7.0
Cython: None
numpy: 1.11.0
scipy: 0.17.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.5.0
dateutil: 2.4.2
pytz: 2014.10
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.3
feather: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.6.0

Activity

WillAyd

WillAyd commented on Mar 27, 2018

@WillAyd
Member

Strange but it looks like this works fine on master:

In [3]: pd.Series(list('abc') + [np.nan]).map({np.nan: 'bar'})
Out[3]: 
0    NaN
1    NaN
2    NaN
3    bar
dtype: object

INSTALLED VERSIONS

commit: c36e9f7
python: 3.6.4.final.0
python-bits: 64
OS: Darwin
OS-release: 17.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0.dev0+685.gc36e9f705
pytest: 3.4.1
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.27.3
numpy: 1.14.1
scipy: 1.0.0
pyarrow: 0.8.0
xarray: 0.10.0
IPython: 6.2.1
sphinx: 1.7.0
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.1.2
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.3
fastparquet: 0.1.4
pandas_gbq: None
pandas_datareader: None

jreback

jreback commented on Mar 27, 2018

@jreback
Contributor

yeah this lgtm. can one of you check if we have a test covering this case, if not pls add (advise either way).

added
Testingpandas testing functions or related to the test suite
Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
ReshapingConcat, Merge/Join, Stack/Unstack, Explode
on Mar 27, 2018
added this to the 0.23.0 milestone on Mar 27, 2018
WillAyd

WillAyd commented on Mar 29, 2018

@WillAyd
Member

I looked through our tests but couldn't find anything to really cover this. From what I can tell there are a few similar tests in pandas/tests/series/test_apply.py but none of them mix NA and non-NA values in the caller.

@readyready15728 any interest in trying your hand at a PR to add this scenario? I think the module above would be a good location unless @jreback disagrees

readyready15728

readyready15728 commented on Mar 29, 2018

@readyready15728
ContributorAuthor

@WillAyd

I barely know anything about pull requests. Could you be a little more specific about how I should proceed?

WillAyd

WillAyd commented on Mar 29, 2018

@WillAyd
Member

There is a pretty detailed guide for contributing to pandas - give that a look and let me know if you have any questions.

https://pandas.pydata.org/pandas-docs/stable/contributing.html

readyready15728

readyready15728 commented on Mar 29, 2018

@readyready15728
ContributorAuthor

OK so do you intend that I submit a pull request to the code that implements .map() itself or the test code? And what is the intended behavior of .map()? I'm not sure which is right.

WillAyd

WillAyd commented on Mar 29, 2018

@WillAyd
Member

You should be adding a test to pandas/tests/series/test_apply.py. You can probably place it right below functions like test_empty and call it something like test_missing_mixed.

The expected value would be your second example from your original post

readyready15728

readyready15728 commented on Mar 30, 2018

@readyready15728
ContributorAuthor

Alright, I'm virtually certain that my addition will work but I want to test it first. To do so, I need to get the tests to run. pytest pandas said that I need to do python setup.py build_ext --inplace --force. I did that but then I got more errors:

Exception: Cython-generated file 'pandas/_libs/reduction.c' not found.
                Cython is required to compile pandas from a development branch.
                Please install Cython or download a release package of pandas.

But I have Cython installed. :?

(Am running Ubuntu 16.04)

EDIT: I did some Googling and someone suggested using pip to get the latest version and now the Cython extensions are compiling.

readyready15728

readyready15728 commented on Mar 30, 2018

@readyready15728
ContributorAuthor

There is something else about the behavior of .map() on my previous version (0.22) that may interest you:

pd.Series([42] + [np.nan]).map({42: 'the answer', np.nan: 'not NaN'})

Has the result:

0    the answer
1           NaN
dtype: object

In other words, the intended effect doesn't happen when there's just an integer alongside np.nan in the map. With strings the intended effect obviously happens and also with tuples but I didn't test floats and now I have the development version installed so going back will just be a pain. Might want to review 0.22 to see if any other bugs are lurking in there.

I am almost ready to do a pull request but I'd like a little guidance about creating a whatsnew entry, if it is needed at all.

WillAyd

WillAyd commented on Apr 1, 2018

@WillAyd
Member

Sorry for the delay in response. Since you aren't changing any of the core code (just adding tests) I don't think you need a whatsnew entry, since there's nothing that you can link to here that would change behavior.

That said, there's already a map change in the v0.23 whatsnew (see #15081) . If you are feeling ambitious, you could take a look at that and see if it is responsible for the fix here. If so, we can update that existing note to mention the bug being fixed and link to this issue as well

readyready15728

readyready15728 commented on Apr 1, 2018

@readyready15728
ContributorAuthor

Frankly I wouldn't know how to do that. Additionally git diff upstream/master -u -- "*.py" | flake8 --diff gives fatal: bad revision 'upstream/master'.

WillAyd

WillAyd commented on Apr 1, 2018

@WillAyd
Member

Did you configure upstream to point to this repository? git remote show -v should yield at least the following:

git remote -v show
upstream	https://github.com/pandas-dev/pandas.git (fetch)
upstream	https://github.com/pandas-dev/pandas.git (push)
readyready15728

readyready15728 commented on Apr 2, 2018

@readyready15728
ContributorAuthor

No, it has only my forked repository. [dumb comment by me] I'm accustomed to working only on my own repos.

EDIT: OK never mind I see what is going on here.

WillAyd

WillAyd commented on Apr 2, 2018

@WillAyd
Member

Make sure you check out the contributing guide as it gives instructions on how to properly set your upstream repo. You will need this for comparisons and pulling updates to master

https://pandas.pydata.org/pandas-docs/stable/contributing.html

readyready15728

readyready15728 commented on Apr 2, 2018

@readyready15728
ContributorAuthor

I did git remote add upstream https://github.com/pandas-dev/pandas.git and I'm still getting the same error message. What the hell?

WillAyd

WillAyd commented on Apr 2, 2018

@WillAyd
Member

Did you git fetch upstream? I've never gotten that error before but it would make sense that the comp fails if a copy of the upstream repo doesn't exist locally

readyready15728

readyready15728 commented on Apr 2, 2018

@readyready15728
ContributorAuthor

Alright, it succeeded. While my test works I still might not meet all your expectations. A pull request has been submitted anyway. We can go back and forth until it's acceptable.

added a commit that references this issue on Apr 3, 2018
85c7900
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateReshapingConcat, Merge/Join, Stack/Unstack, ExplodeTestingpandas testing functions or related to the test suite

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Participants

      @WillAyd@jreback@readyready15728

      Issue actions

        Inconsistent behavior of .map() · Issue #20495 · pandas-dev/pandas