Skip to content

BUG (string dtype): comparison of string column to mixed object column fails #60228

@jorisvandenbossche

Description

@jorisvandenbossche
Member

At the moment you can freely compare with mixed object dtype column:

>>> ser_string = pd.Series(["a", "b"])
>>> ser_mixed = pd.Series([1, "b"])
>>> ser_string == ser_mixed
0    False
1     True
dtype: bool

But with the string dtype enabled (using pyarrow), this now raises an error:

>>> pd.options.future.infer_string = True
>>> ser_string = pd.Series(["a", "b"])
>>> ser_mixed = pd.Series([1, "b"])
>>> ser_string == ser_mixed
...
File ~/scipy/repos/pandas/pandas/core/arrays/arrow/array.py:510, in ArrowExtensionArray._box_pa_array(cls, value, pa_type, copy)
...
--> 510     pa_array = pa.array(value, from_pandas=True)
...
ArrowInvalid: Could not convert 'b' with type str: tried to convert to int64

This happens because the ArrowEA tries to convert the other operand to Arrow as well, which fails for mixed types.

In general, I think our rule is that == comparison never fails, but then just gives False for when values are not comparable.

Activity

added this to the 2.3 milestone on Nov 7, 2024
jorisvandenbossche

jorisvandenbossche commented on Nov 7, 2024

@jorisvandenbossche
MemberAuthor

It seems we actually have a comment in the code about this issue in case of object dtype:

try:
result = pc_func(self._pa_array, self._box_pa(other))
except pa.ArrowNotImplementedError:
# TODO: could this be wrong if other is object dtype?
# in which case we need to operate pointwise?
result = ops.invalid_comparison(self, other, op)
result = pa.array(result, type=pa.bool_())

removed their assignment
on Nov 14, 2024
TEARFEAR

TEARFEAR commented on Nov 14, 2024

@TEARFEAR

take

modified the milestones: 2.3, 3.0 on Jun 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

BugStringsString extension data type and string data

Type

No type

Projects

No projects

Relationships

None yet

    Participants

    @jorisvandenbossche@mroeschke@TEARFEAR

    Issue actions

      BUG (string dtype): comparison of string column to mixed object column fails · Issue #60228 · pandas-dev/pandas