Skip to content

ENH: Series Mapping na_action=ignore result is misleading. #47262

@chapmanderek

Description

@chapmanderek

Is your feature request related to a problem?

More a misleading flag... When mapping a dictionary to a pandas series and using the na_action='ignore' i would expect it to ignore unknown values. Currently it replaces them with an NaN. For example:

pd.Series(['calf', 'foal', 'bunny']).map({'calf':'cow','bunny':'rabbit'}, na_action='ignore')

returns:

0       cow
1       NaN
2    rabbit

Describe the solution you'd like

I would expect it to ignore items that aren't in the dictionary instead of replacing them. I would expect this to be returned:

0       cow
1       foal
2    rabbit

The current behavior is problematic for large dataframes with many different elements in a series where perhaps you only want to rename some of them. In this case you would have to either do a replace for each one... or make a mapping for every one and hope you didn't miss one.

Describe alternatives you've considered

Trying to differentiate the current action of ignore and the action of ignoring-but-not-replacing will be difficult. Perhaps a new value for na_action of dont_replace so that it is very apparent what you are asking to happen with NaNs while keeping the current behavior (albeit confusing) of ignore.

Additional context

Similar-ish to #14210

Activity

rhshadrach

rhshadrach commented on Jun 7, 2022

@rhshadrach
Member

With ser being the pandas Series:

ser = ser.map(mapper).fillna(ser)

will map only values that exist in mapper, leaving other values untouched.

i would expect it to ignore unknown values

I could see that as a reasonable interpretation when viewing the name alone, but do you have that same expectation from the documentation?

added
Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
SeriesSeries data structure
on Jun 7, 2022
changed the title [-]Series Mapping `na_action=ignore` result is misleading. ENH:[/-] [+]ENH: Series Mapping `na_action=ignore` result is misleading.[/+] on Jun 9, 2022
chapmanderek

chapmanderek commented on Jun 9, 2022

@chapmanderek
Author

@rhshadrach
No i think the documentation spells it out. It specifically says:

If ‘ignore’, propagate NaN values

Also one of the examples ("rabbit" is missing) it fills in with a NaN value. I just think that the actual flag is misleading. The label isnt the contents of the can.

The ignore flag only really makes sense in terms of the last example where the value is getting passed onto another function or thing. When you are trying to replace items (possibly the more common scenario) it doesn't work as I would expect given the name.

topper-123

topper-123 commented on Mar 25, 2023

@topper-123
Contributor

For what it's worth, I think mappings using the existing value would be more logical to me than replacing it with Nan. But I agree the current behavior is as intended.

topper-123

topper-123 commented on Mar 25, 2023

@topper-123
Contributor

Thinking a bit further, I think it's probably always better to use the Series.replace method for this:

>>> import pandas as pd
>>> ser = pd.Series(['calf', 'foal', 'bunny'])
>>> ser.map({'calf':'cow','bunny':'rabbit'}, na_action='ignore')
0       cow
1       NaN
2    rabbit
dtype: object
>>> ser.replace({'calf':'cow','bunny':'rabbit'})
0       cow
1      foal
2    rabbit
dtype: object

IMO we should add replace under the See also doc section under Series.map.

Thinking even further, should we not just deprecate allowing dict-likes to Series.map and direct users to use Series.replace for dict-likes instead? That seems very logical to me to do, especially if we see Series.map as a element-level version of Series.apply after #52140.

@rhshadrach

rhshadrach

rhshadrach commented on Mar 29, 2023

@rhshadrach
Member

It seems natural for Series.map to take a mapping. While there is an overlap in functionality here, I lean toward not deprecating unless there is an issue with map taking a mapping.

added
Needs DiscussionRequires discussion from core team before further action
and removed
Needs TriageIssue that has not been reviewed by a pandas team member
on Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateNeeds DiscussionRequires discussion from core team before further actionSeriesSeries data structure

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @chapmanderek@mroeschke@topper-123@rhshadrach

        Issue actions

          ENH: Series Mapping `na_action=ignore` result is misleading. · Issue #47262 · pandas-dev/pandas