Description
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
# examples from docstrings, inplace=False
s = pd.Series(range(5))
t = pd.Series([True, False])
print(s.where(t, 99))
# 0 0
# 1 99
# 2 99
# 3 99
# 4 99
# dtype: int64
print(s.mask(t, 99))
# 0 99
# 1 1
# 2 99
# 3 99
# 4 99
# dtype: int64
# inplace=True
s = pd.Series(range(5))
s.where(t, 99, inplace=True)
print(s)
# 0 0
# 1 99
# 2 2
# 3 3
# 4 4
# dtype: int64
s = pd.Series(range(5))
s.mask(t, 99, inplace=True)
print(s)
# 0 99
# 1 1
# 2 2
# 3 3
# 4 4
# dtype: int64
Issue Description
The first two examples are from the docstrings of DataFrame.where
and DataFrame.mask
. They agree with the documentations regarding how to fill the values of cond
on misaligned index positions.
However, when inplace=True
, the results are different from inplace=False
for both where
and mask
.
Expected Behavior
I would expect inplace
parameter does not affect the results. But I notice the first line of code below in the source code of where
. So I wonder is this behaviour expected?
Thank you in advance.
Lines 10665 to 10674 in d928a5c
Installed Versions
INSTALLED VERSIONS
commit : 4c520e3
python : 3.10.13.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.16.3-microsoft-standard-WSL2
Version : #1 SMP Fri Apr 2 22:23:49 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.2.0dev0+743.g4c520e35f9
numpy : 1.26.2
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.2.2
pip : 23.3.1
Cython : 3.0.5
pytest : 7.4.3
hypothesis : 6.91.0
sphinx : 7.2.6
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.18.1
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.2
numba : 0.58.1
numexpr : 2.8.7
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 14.0.1
pyreadstat : None
python-calamine : None
pyxlsb : 1.0.10
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : 2.0.1
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None
Activity
rhshadrach commentedon Jan 26, 2024
Thanks for the report. Agreed this looks suspect, but the code seems quite deliberate. I haven't been able to track down where this behavior was introduced, I'm thinking the origin should be better understood.
Note that these methods will retain
inplace
under PDEP-8.mitlabence commentedon Apr 10, 2024
The relevant commit seems to be this with the corresponding comment. I believe the corresponding Python version is 3.1-3.2, how would one go about testing with such an old release?
rhshadrach commentedon Apr 10, 2024
Thanks for finding this! I don't think we need to test - understanding comes from the discussion around the changes made.
It does seem to me the comment you found has things backwards, even according to the docstring at the time:
I think this is easy to mix up (especially since the semantics are somewhat different from
np.where
).mitlabence commentedon May 5, 2024
To my understanding, there is now an inconsistency between what the documentation of
mask
andwhere
say about misaligned indices (replace byother
, as for theinplace=False
examples above) and what the bracket indexing is expected to do:This latter behavior is expected in the tests here, here and here.
It is also (obviously) syntactically similar to inplace
mask
.simple_profile
fromIndustrialLoadProfile
behaves differently in version 0.2.0 compared to 0.1.9 due to pandas masking functions oemof/demandlib#64