BUG: Boolean Series (actually object) with <NA> values breaks ~ negation and reverts to bit-wise operations #60049

RedAero · 2024-10-15T12:51:41Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

~pd.Series([True, False]) => [False, True]
~pd.Series([True, False, pd.NA]) => [-2, -1, <NA>]

~pd.Series([True, False, pd.NA], dtype=pd.BooleanDtype()) => [False, True, <NA>]

Issue Description

In the example it is of course possible to work around the issue with forcing the dtype, but in the context of actual dataframes coming from real databases, NULLs are still best handled by simply replacing them with empty strings and hoping for the best, because no NULL-like object seems to work well with ordinary string and filter operations.

As a sidenote, if the pd.NA is instead a None, a TypeError is thrown.

#59831 is a related issue.

Expected Behavior

I'd expect the T/F values to flip and the NAs to remain by default, as they do when the dtype is forced.

Installed Versions

INSTALLED VERSIONS

commit : d9cdd2e
python : 3.11.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19045
machine : AMD64
processor : AMD64 Family 25 Model 33 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : en
LOCALE : English_United Kingdom.1252

pandas : 2.2.2
numpy : 1.26.4
pytz : 2023.3
dateutil : 2.8.2
setuptools : 68.2.2
pip : 24.2
Cython : 3.0.5
pytest : 7.4.2
hypothesis : 6.87.1
sphinx : 6.1.3
blosc : None
feather : 0.4.1
xlsxwriter : 3.1.2
lxml.etree : 4.9.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.18.1
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.2
bottleneck : 1.3.7
dataframe-api-compat : None
fastparquet : 2023.8.0
fsspec : 2023.9.2
gcsfs : None
matplotlib : 3.9.0
numba : 0.60.0
numexpr : 2.8.7
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 13.0.0
pyreadstat : None
python-calamine : None
pyxlsb : 1.0.10
s3fs : None
scipy : 1.14.0
sqlalchemy : 2.0.22
tables : 3.9.2
tabulate : 0.9.0
xarray : 2023.11.0
xlrd : 2.0.1
zstandard : 0.22.0
tzdata : 2022.7
qtpy : 2.4.1
pyqt5 : None

The text was updated successfully, but these errors were encountered:

rhshadrach · 2024-10-21T20:29:18Z

Thanks for the report. A possible resolution would be to infer Int64 dtype from the list [1, 2, pd.NA], but this would be a bit too fragile: you'd get int64 if there was no pd.NA, but Int64 when there is. That seems highly undesirable to me. When pandas switches to nullable arrays to be the default, this issue will go away. Until then, I think the only resolution is to specify the dtype or use Series.convert_dtypes

When you have object dtype, it seems correct to me to apply the negation (~) operation corresponding to the underlying Python object.

print(~0)
# -1

because no NULL-like object seems to work well with ordinary string and filter operations.

I do not understand this comment, can you not use dtype=string?

RedAero · 2024-10-31T14:25:55Z

I do not understand this comment, can you not use dtype=string?

Well, for a start because string would be undefined (you mean str), but the point is that if you have a Series with anything null-like, they have to be explicitly handled every time, otherwise errors are thrown. In SQL, if I query a table and use something like WHERE LEFT(customer_name, 1) = 'E", it's not going to error out if the ID field has NULLs.
Pandas does:
df.loc[df["customer_name"].str.startswith("E")]
ValueError: Cannot mask with non-boolean array containing NA / NaN values

And there's no easy way to deal with this - or, well, there is: fillna('some-string-that-won't-appear-in-the-data') Which should give everyone the ick. You can also use True/False instead, but that has it's own, obvious problems.

rhshadrach · 2024-10-31T20:12:34Z

Well, for a start because string would be undefined (you mean str)

Ah, sorry, I mean dtype="string"

simonjayhawkins · 2025-06-10T19:05:10Z

xref #32931

simonjayhawkins · 2025-06-11T12:04:27Z

~pd.Series([True, False, pd.NA]) => [-2, -1, ]

This is an numpy object array, so if each value is treated separately instead of the vectorized operation you get a consistent result.

~True, ~False, ~pd.NA
# (-2, -1, <NA>)

~pd.Series([True, False]) => [False, True]

This is a numpy boolean array, so if you apply the operation to each scalar as above you get a consistent result

~np.bool_(True), ~np.bool_(False)
# (False, True)

in fact if you have other values instead from pd.NA you will get an object array and it will give a similar result

~pd.Series([True, False, 3])
# 0    -2
# 1    -1
# 2    -4
# dtype: object

so pd.NA is propagating correctly in the example and the issue is related to the expectations of an object array.

I appreciate that it would perhaps be reasonable to expect a nullable boolean from the constructor.

That would perhaps be a future default.

for now this is probably a duplicate of #33662. i.e. assuming a pandas nullable type when including pd.NA in the constructor?

RedAero added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 15, 2024

rhshadrach added Closing Candidate May be closeable, needs more eyeballs Constructors Series/DataFrame/Index/pd.array Constructors NA - MaskedArrays Related to pd.NA and nullable extension arrays and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 21, 2024

simonjayhawkins mentioned this issue Jun 10, 2025

PDEP-18: Nullable Object Dtype #61599

Open

simonjayhawkins added the Duplicate Report Duplicate issue or pull request label Jun 11, 2025

simonjayhawkins closed this as completed Jun 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Boolean Series (actually object) with <NA> values breaks ~ negation and reverts to bit-wise operations #60049

BUG: Boolean Series (actually object) with <NA> values breaks ~ negation and reverts to bit-wise operations #60049

RedAero commented Oct 15, 2024 •

edited

Loading

INSTALLED VERSIONS

rhshadrach commented Oct 21, 2024

Uh oh!

RedAero commented Oct 31, 2024 •

edited

Loading

Uh oh!

rhshadrach commented Oct 31, 2024 •

edited

Loading

Uh oh!

simonjayhawkins commented Jun 10, 2025

Uh oh!

simonjayhawkins commented Jun 11, 2025

Uh oh!

Uh oh!

BUG: Boolean Series (actually object) with <NA> values breaks ~ negation and reverts to bit-wise operations #60049

BUG: Boolean Series (actually object) with <NA> values breaks ~ negation and reverts to bit-wise operations #60049

Comments

RedAero commented Oct 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

rhshadrach commented Oct 21, 2024

Uh oh!

RedAero commented Oct 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhshadrach commented Oct 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonjayhawkins commented Jun 10, 2025

Uh oh!

simonjayhawkins commented Jun 11, 2025

Uh oh!

RedAero commented Oct 15, 2024 •

edited

Loading

RedAero commented Oct 31, 2024 •

edited

Loading

rhshadrach commented Oct 31, 2024 •

edited

Loading