Skip to content

BUG: possible inconsistency between inplace=True and inplace=False in DataFrame.where/mask #57083

Open
@yuanx749

Description

@yuanx749
Contributor

Pandas version checks

  • I have checked that this issue has not already been reported.

    I have confirmed this bug exists on the latest version of pandas.

    I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

# examples from docstrings, inplace=False
s = pd.Series(range(5))
t = pd.Series([True, False])
print(s.where(t, 99))
# 0     0
# 1    99
# 2    99
# 3    99
# 4    99
# dtype: int64
print(s.mask(t, 99))
# 0    99
# 1     1
# 2    99
# 3    99
# 4    99
# dtype: int64

# inplace=True
s = pd.Series(range(5))
s.where(t, 99, inplace=True)
print(s)
# 0     0
# 1    99
# 2     2
# 3     3
# 4     4
# dtype: int64
s = pd.Series(range(5))
s.mask(t, 99, inplace=True)
print(s)
# 0    99
# 1     1
# 2     2
# 3     3
# 4     4
# dtype: int64

Issue Description

The first two examples are from the docstrings of DataFrame.where and DataFrame.mask. They agree with the documentations regarding how to fill the values of cond on misaligned index positions.
However, when inplace=True, the results are different from inplace=False for both where and mask.

Expected Behavior

I would expect inplace parameter does not affect the results. But I notice the first line of code below in the source code of where. So I wonder is this behaviour expected?
Thank you in advance.

pandas/pandas/core/generic.py

Lines 10665 to 10674 in d928a5c

# make sure we are boolean
fill_value = bool(inplace)
with warnings.catch_warnings():
warnings.filterwarnings(
"ignore",
"Downcasting object dtype arrays",
category=FutureWarning,
)
cond = cond.fillna(fill_value)
cond = cond.infer_objects(copy=False)

Installed Versions

INSTALLED VERSIONS

commit : 4c520e3
python : 3.10.13.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.16.3-microsoft-standard-WSL2
Version : #1 SMP Fri Apr 2 22:23:49 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.2.0dev0+743.g4c520e35f9
numpy : 1.26.2
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.2.2
pip : 23.3.1
Cython : 3.0.5
pytest : 7.4.3
hypothesis : 6.91.0
sphinx : 7.2.6
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.18.1
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.2
numba : 0.58.1
numexpr : 2.8.7
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 14.0.1
pyreadstat : None
python-calamine : None
pyxlsb : 1.0.10
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : 2.0.1
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

Activity

added
Needs TriageIssue that has not been reviewed by a pandas team member
on Jan 26, 2024
rhshadrach

rhshadrach commented on Jan 26, 2024

@rhshadrach
Member

Thanks for the report. Agreed this looks suspect, but the code seems quite deliberate. I haven't been able to track down where this behavior was introduced, I'm thinking the origin should be better understood.

Note that these methods will retain inplace under PDEP-8.

added
inplaceRelating to inplace parameter or equivalent
ConditionalsE.g. where, mask, case_when
Needs DiscussionRequires discussion from core team before further action
and removed
Needs TriageIssue that has not been reviewed by a pandas team member
on Jan 26, 2024
mitlabence

mitlabence commented on Apr 10, 2024

@mitlabence
Contributor

The relevant commit seems to be this with the corresponding comment. I believe the corresponding Python version is 3.1-3.2, how would one go about testing with such an old release?

rhshadrach

rhshadrach commented on Apr 10, 2024

@rhshadrach
Member

Thanks for finding this! I don't think we need to test - understanding comes from the discussion around the changes made.

It does seem to me the comment you found has things backwards, even according to the docstring at the time:

Return a DataFrame with the same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.

I think this is easy to mix up (especially since the semantics are somewhat different from np.where).

removed
Needs DiscussionRequires discussion from core team before further action
on Apr 10, 2024
mitlabence

mitlabence commented on May 5, 2024

@mitlabence
Contributor

To my understanding, there is now an inconsistency between what the documentation of mask and where say about misaligned indices (replace by other, as for the inplace=False examples above) and what the bracket indexing is expected to do:

import pandas as pd
df = pd.DataFrame({"a" : [0, 1, 2, -3], "b": [0, -1, 2, 3]})
#    a  b
# 0  0  0
# 1  1 -1
# 2  2  2
# 3 -3  3
df[df[:-1] < 0] = 4
#    a  b
# 0  0  0
# 1  1  4
# 2  2  2
# 3 -3  3

This latter behavior is expected in the tests here, here and here.
It is also (obviously) syntactically similar to inplace mask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugConditionalsE.g. where, mask, case_wheninplaceRelating to inplace parameter or equivalent

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @mitlabence@rhshadrach@yuanx749

      Issue actions

        BUG: possible inconsistency between `inplace=True` and `inplace=False` in `DataFrame.where/mask` · Issue #57083 · pandas-dev/pandas