Skip to content

MultiIndex.from_product converts datetime.date to pd.Timestamp #28152

@MaximeWeyl

Description

@MaximeWeyl

Code Sample, a copy-pastable example if possible

import pandas as pd

a = pd.Timestamp("2019-2-2").date()
i = pd.MultiIndex.from_product([
    [a, a],
    [2, 3]
])

print("a={} ({}".format(a, type(a)))
print(i[0])

Output is :

a=2019-02-02 (<class 'datetime.date'>
(Timestamp('2019-02-02 00:00:00'), 2)

Problem description

When using from_product with python datetimes, the resulting MultiIndex level is converted to pandas datetimes (Timestamps). There is no way to keep the original python datetime which I want. I got the same behavior with from_tuples. But it was not the case with from_arrays.

Expected Output

I expect the output to respect the type I gave to from_product :

a=2019-02-02 (<class 'datetime.date'>
(datetime.date(2019, 2, 2), 2)

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 0.25.1
numpy : 1.17.0
pytz : 2019.2
dateutil : 2.8.0
pip : 19.0.3
setuptools : 40.8.0
Cython : None
pytest : 4.6.5
hypothesis : None
sphinx : 2.2.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.7.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : 2.6.3
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : None
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None

Activity

MaximeWeyl

MaximeWeyl commented on Aug 26, 2019

@MaximeWeyl
Author

One workaround I found is to pass Index to from_product instead of lists :

import pandas as pd

a = pd.Timestamp("2019-2-2").date()
i = pd.MultiIndex.from_product([
    pd.Index([a, a]),
    [2, 3]
])

print("a={} ({}".format(a, type(a)))
print(i[0])

Output :


a=2019-02-02 (<class 'datetime.date'>
(datetime.date(2019, 2, 2), 2)

TomAugspurger

TomAugspurger commented on Aug 26, 2019

@TomAugspurger
Contributor

I expect the output to respect the type I gave to from_product :

You don't specify a dtype, right? So this is a bug in inference.

I would expect the MultiIndex constructors to follow the behavior of Index (and Series), which preserve the datetime.

In [24]: pd.Index([a, a])
Out[24]: Index([2019-02-02, 2019-02-02], dtype='object')

In [25]: pd.Index([a, a])[0]
Out[25]: datetime.date(2019, 2, 2)
added this to the Contributions Welcome milestone on Aug 26, 2019
TomAugspurger

TomAugspurger commented on Aug 26, 2019

@TomAugspurger
Contributor

Hmm the bug seems to be in Categorical (used by MI internally)

In [31]: pd.Categorical([a]).categories
Out[31]: DatetimeIndex(['2019-02-02'], dtype='datetime64[ns]', freq=None)
changed the title [-]MultiIndex.from_product converts datetime.datetime to pd.Timestamp[/-] [+]MultiIndex.from_product converts datetime.date to pd.Timestamp[/+] on Aug 26, 2019
mroeschke

mroeschke commented on Aug 26, 2019

@mroeschke
Member

Just noting the distinction that this is an issue with datetime.date objects, which are not first class in pandas.

kurtosis

kurtosis commented on Jun 29, 2020

@kurtosis

I got tripped up on this as well when using groupby on a datetime.date column and moving it in and out of the index. The different behavior between Index and MultiIndex is especially tricky.

One suggestion: maybe add documentation/warning on the preferred way to round timestamps to date? This is a common operation and dt.date was all I came across in the docs.
https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.date.html

(See also #15906)

In [4]: df = pd.DataFrame({'date' : pd.date_range(start='2020-01-01', periods=10), 'label' : 1})                                                                              
In [5]: df['date'] = df['date'].dt.date                                                                                                                                       
In [6]: display(df.date[0])                                                                                                                                                   
datetime.date(2020, 1, 1)
In [7]: df_1 = df.set_index('date').reset_index()                                                                                                                             
In [8]: display(df_1.date[0])                                                                                                                                                 
datetime.date(2020, 1, 1)
In [9]: df_1 = df.set_index(['date', 'label']).reset_index()                                                                                                                  
In [10]: display(df_1.date[0])                                                                                                                                                
Timestamp('2020-01-01 00:00:00')
removed this from the Contributions Welcome milestone on Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @TomAugspurger@jbrockmendel@mroeschke@kurtosis@MaximeWeyl

        Issue actions

          MultiIndex.from_product converts datetime.date to pd.Timestamp · Issue #28152 · pandas-dev/pandas