Skip to content

BUG: comparing multicolumn dataframe with datetime64 values to series gives TypeError #9006

Closed
@jorisvandenbossche

Description

@jorisvandenbossche
Member

When trying to compare a dataframe to a column/series (I know, in the following case not useful due to the alignement of the series with the columns of the dataframe and not the rows, but it is something typical users will try), I get the correct results if there are strings in the dataframe and series, but a TypeError when the dataframe contains datetime values:

In [1]: from io import StringIO

In [2]: s = """id       date birth_date_1 birth_date_2
   ...: 1 2000-01-01   2000-01-03   2000-01-05
   ...: 1 2000-01-07   2000-01-03   2000-01-05
   ...: 2 2000-01-02   2000-01-10   2000-01-01
   ...: 2 2000-01-05   2000-01-10   2000-01-01"""

In [3]: df = pd.read_csv(StringIO(s), sep='\s+')

In [5]: df[['birth_date_1','birth_date_2']] > df['date']
Out[5]:
       0      1      2      3 birth_date_1 birth_date_2
0  False  False  False  False         True         True
1  False  False  False  False         True         True
2  False  False  False  False         True         True
3  False  False  False  False         True         True

In [7]: df = pd.read_csv(StringIO(s), sep='\s+', parse_dates=[1,2,3])

In [8]: df[['birth_date_1','birth_date_2']] > df['date']
...
c:\users\vdbosscj\scipy\pandas-joris\pandas\core\internals.pyc in handle_error()

    954             if raise_on_error:
    955                 raise TypeError('Could not operate %s with block values
%s'
--> 956                                 % (repr(other), str(detail)))
    957             else:
    958                 # return the values

TypeError: Could not operate array(['2000-01-01T01:00:00.000000000+0100',
       '2000-01-07T01:00:00.000000000+0100',
       '2000-01-02T01:00:00.000000000+0100',
       '2000-01-05T01:00:00.000000000+0100'], dtype='datetime64[ns]') with block
 values invalid type promotion

Activity

changed the title [-]BUG:[/-] [+]BUG: comparing dataframe with datetime64 values to series gives TypeError[/+] on Dec 4, 2014
jorisvandenbossche

jorisvandenbossche commented on Dec 4, 2014

@jorisvandenbossche
MemberAuthor

Although I am not sure this is the correct result:

In [5]: df[['birth_date_1','birth_date_2']] > df['date']
Out[5]:
       0      1      2      3 birth_date_1 birth_date_2
0  False  False  False  False         True         True
1  False  False  False  False         True         True
2  False  False  False  False         True         True
3  False  False  False  False         True         True

There are no overlapping elements between the dataframe and series, but why then sometimes True and sometimes False?

jreback

jreback commented on Dec 4, 2014

@jreback
Contributor

this is quite tricky; datetimes are not handled in a multi-column vectorized way correctly

xref to #8554. I think I can fix this but its a bit tricky.

added this to the 0.16.0 milestone on Dec 4, 2014
modified the milestones: 0.16.0, Next Major Release on Mar 6, 2015
jbrockmendel

jbrockmendel commented on Oct 24, 2018

@jbrockmendel
Member

@jorisvandenbossche I'm not entirely clear on what the issue is here. Is it about broadcasting? Maybe it has been resolved in the interim?

mroeschke

mroeschke commented on Mar 31, 2020

@mroeschke
Member

I think the first case raises a sensible error now (not date parsed)

TypeError: '>' not supported between instances of 'numpy.ndarray' and 'str'

The 2nd case doesn't seem to raise a sensible error as there is no float column being compared

TypeError: '<' not supported between instances of 'Timestamp' and 'float'
In [60]: pd.__version__
Out[60]: '1.1.0.dev0+1027.g767335719'
changed the title [-]BUG: comparing dataframe with datetime64 values to series gives TypeError[/-] [+]BUG: comparing multicolumn dataframe with datetime64 values to series gives TypeError[/+] on Mar 31, 2020
added
Error ReportingIncorrect or improved errors from pandas
and removed
Dtype ConversionsUnexpected or buggy dtype conversions
DatetimeDatetime data dtype
on Apr 11, 2021
jbrockmendel

jbrockmendel commented on Dec 19, 2021

@jbrockmendel
Member

IIUC reindexing is introducing float (all-nan) columns, which then raise on comparison. That automatic reindexing was deprecated in #36795. we could try to get something in for 1.4 to give a better exception message, but i dont think its worth the trouble

removed this from the Contributions Welcome milestone on Oct 13, 2022
jbrockmendel

jbrockmendel commented on Mar 28, 2023

@jbrockmendel
Member

This now correctly raises because automatic alignment deprecation has been enforced. Is there another bug after that surfaces if we manually align before the comparison?

mroeschke

mroeschke commented on May 10, 2025

@mroeschke
Member

Yeah it appears this raises consistently due to alignment. Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugClosing CandidateMay be closeable, needs more eyeballsError ReportingIncorrect or improved errors from pandasNumeric OperationsArithmetic, Comparison, and Logical operations

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @jreback@jorisvandenbossche@jbrockmendel@mroeschke

        Issue actions

          BUG: comparing multicolumn dataframe with datetime64 values to series gives TypeError · Issue #9006 · pandas-dev/pandas