-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: Boolean Series (actually object) with <NA> values breaks ~ negation and reverts to bit-wise operations #60049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report. A possible resolution would be to infer When you have print(~0)
# -1
I do not understand this comment, can you not use |
Well, for a start because string would be undefined (you mean str), but the point is that if you have a Series with anything null-like, they have to be explicitly handled every time, otherwise errors are thrown. In SQL, if I query a table and use something like And there's no easy way to deal with this - or, well, there is: |
Ah, sorry, I mean |
xref #32931 |
This is an numpy object array, so if each value is treated separately instead of the vectorized operation you get a consistent result. ~True, ~False, ~pd.NA
# (-2, -1, <NA>)
This is a numpy boolean array, so if you apply the operation to each scalar as above you get a consistent result ~np.bool_(True), ~np.bool_(False)
# (False, True) in fact if you have other values instead from pd.NA you will get an object array and it will give a similar result ~pd.Series([True, False, 3])
# 0 -2
# 1 -1
# 2 -4
# dtype: object so I appreciate that it would perhaps be reasonable to expect a nullable boolean from the constructor. That would perhaps be a future default. for now this is probably a duplicate of #33662. i.e. assuming a pandas nullable type when including pd.NA in the constructor? |
Uh oh!
There was an error while loading. Please reload this page.
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
In the example it is of course possible to work around the issue with forcing the dtype, but in the context of actual dataframes coming from real databases, NULLs are still best handled by simply replacing them with empty strings and hoping for the best, because no NULL-like object seems to work well with ordinary string and filter operations.
As a sidenote, if the pd.NA is instead a None, a TypeError is thrown.
#59831 is a related issue.
Expected Behavior
I'd expect the T/F values to flip and the NAs to remain by default, as they do when the dtype is forced.
Installed Versions
INSTALLED VERSIONS
commit : d9cdd2e
python : 3.11.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19045
machine : AMD64
processor : AMD64 Family 25 Model 33 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : en
LOCALE : English_United Kingdom.1252
pandas : 2.2.2
numpy : 1.26.4
pytz : 2023.3
dateutil : 2.8.2
setuptools : 68.2.2
pip : 24.2
Cython : 3.0.5
pytest : 7.4.2
hypothesis : 6.87.1
sphinx : 6.1.3
blosc : None
feather : 0.4.1
xlsxwriter : 3.1.2
lxml.etree : 4.9.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.18.1
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.2
bottleneck : 1.3.7
dataframe-api-compat : None
fastparquet : 2023.8.0
fsspec : 2023.9.2
gcsfs : None
matplotlib : 3.9.0
numba : 0.60.0
numexpr : 2.8.7
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 13.0.0
pyreadstat : None
python-calamine : None
pyxlsb : 1.0.10
s3fs : None
scipy : 1.14.0
sqlalchemy : 2.0.22
tables : 3.9.2
tabulate : 0.9.0
xarray : 2023.11.0
xlrd : 2.0.1
zstandard : 0.22.0
tzdata : 2022.7
qtpy : 2.4.1
pyqt5 : None
The text was updated successfully, but these errors were encountered: