-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
Code Sample, a copy-pastable example if possible
pd.Series(list('abc') + [np.nan]).map({np.nan: 'bar'})
Result:
0 NaN
1 NaN
2 NaN
3 NaN
dtype: float64
Compare:
pd.Series(list('abc') + [np.nan]).map({'a': 'foo', np.nan: 'bar'})
Result:
0 foo
1 NaN
2 NaN
3 bar
dtype: object
Problem description
If only np.nan
is present in the mapping, then the associated mapping is not carried out. If another value is present, then it is carried out. Forgive me if this is intended behavior for some reason.
Expected Output
I'm not sure what is appropriate here.
Output of pd.show_versions()
pandas: 0.22.0
pytest: None
pip: 8.1.1
setuptools: 20.7.0
Cython: None
numpy: 1.11.0
scipy: 0.17.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.5.0
dateutil: 2.4.2
pytz: 2014.10
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.3
feather: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.6.0
Activity
WillAyd commentedon Mar 27, 2018
Strange but it looks like this works fine on master:
INSTALLED VERSIONS
commit: c36e9f7
python: 3.6.4.final.0
python-bits: 64
OS: Darwin
OS-release: 17.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.0.dev0+685.gc36e9f705
pytest: 3.4.1
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.27.3
numpy: 1.14.1
scipy: 1.0.0
pyarrow: 0.8.0
xarray: 0.10.0
IPython: 6.2.1
sphinx: 1.7.0
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.1.2
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.3
fastparquet: 0.1.4
pandas_gbq: None
pandas_datareader: None
jreback commentedon Mar 27, 2018
yeah this lgtm. can one of you check if we have a test covering this case, if not pls add (advise either way).
WillAyd commentedon Mar 29, 2018
I looked through our tests but couldn't find anything to really cover this. From what I can tell there are a few similar tests in
pandas/tests/series/test_apply.py
but none of them mix NA and non-NA values in the caller.@readyready15728 any interest in trying your hand at a PR to add this scenario? I think the module above would be a good location unless @jreback disagrees
readyready15728 commentedon Mar 29, 2018
@WillAyd
I barely know anything about pull requests. Could you be a little more specific about how I should proceed?
WillAyd commentedon Mar 29, 2018
There is a pretty detailed guide for contributing to pandas - give that a look and let me know if you have any questions.
https://pandas.pydata.org/pandas-docs/stable/contributing.html
readyready15728 commentedon Mar 29, 2018
OK so do you intend that I submit a pull request to the code that implements .map() itself or the test code? And what is the intended behavior of .map()? I'm not sure which is right.
WillAyd commentedon Mar 29, 2018
You should be adding a test to
pandas/tests/series/test_apply.py
. You can probably place it right below functions liketest_empty
and call it something liketest_missing_mixed
.The expected value would be your second example from your original post
readyready15728 commentedon Mar 30, 2018
Alright, I'm virtually certain that my addition will work but I want to test it first. To do so, I need to get the tests to run.
pytest pandas
said that I need to dopython setup.py build_ext --inplace --force
. I did that but then I got more errors:But I have Cython installed. :?
(Am running Ubuntu 16.04)
EDIT: I did some Googling and someone suggested using pip to get the latest version and now the Cython extensions are compiling.
readyready15728 commentedon Mar 30, 2018
There is something else about the behavior of
.map()
on my previous version (0.22) that may interest you:Has the result:
In other words, the intended effect doesn't happen when there's just an integer alongside
np.nan
in the map. With strings the intended effect obviously happens and also with tuples but I didn't test floats and now I have the development version installed so going back will just be a pain. Might want to review 0.22 to see if any other bugs are lurking in there.I am almost ready to do a pull request but I'd like a little guidance about creating a whatsnew entry, if it is needed at all.
WillAyd commentedon Apr 1, 2018
Sorry for the delay in response. Since you aren't changing any of the core code (just adding tests) I don't think you need a whatsnew entry, since there's nothing that you can link to here that would change behavior.
That said, there's already a
map
change in the v0.23 whatsnew (see #15081) . If you are feeling ambitious, you could take a look at that and see if it is responsible for the fix here. If so, we can update that existing note to mention the bug being fixed and link to this issue as wellreadyready15728 commentedon Apr 1, 2018
Frankly I wouldn't know how to do that. Additionally
git diff upstream/master -u -- "*.py" | flake8 --diff
givesfatal: bad revision 'upstream/master'
.WillAyd commentedon Apr 1, 2018
Did you configure
upstream
to point to this repository?git remote show -v
should yield at least the following:readyready15728 commentedon Apr 2, 2018
No, it has only my forked repository. [dumb comment by me] I'm accustomed to working only on my own repos.
EDIT: OK never mind I see what is going on here.
WillAyd commentedon Apr 2, 2018
Make sure you check out the contributing guide as it gives instructions on how to properly set your upstream repo. You will need this for comparisons and pulling updates to master
https://pandas.pydata.org/pandas-docs/stable/contributing.html
readyready15728 commentedon Apr 2, 2018
I did
git remote add upstream https://github.com/pandas-dev/pandas.git
and I'm still getting the same error message. What the hell?WillAyd commentedon Apr 2, 2018
Did you
git fetch upstream
? I've never gotten that error before but it would make sense that the comp fails if a copy of the upstream repo doesn't exist locallyreadyready15728 commentedon Apr 2, 2018
Alright, it succeeded. While my test works I still might not meet all your expectations. A pull request has been submitted anyway. We can go back and forth until it's acceptable.
TST: Added test for mapping on mixed NA (#20574)