Skip to content

diff of datetime column with NaT does not result into TimeDelta  #17837

@michalsustr

Description

@michalsustr

Code Sample, a copy-pastable example if possible

In[40]: df = pd.DataFrame([[0]], dtype="datetime64[ns]")

In[46]: df[0] - pd.to_datetime("nat")
Out[47]: 
0   NaT
Name: 0, dtype: datetime64[ns]

In[47]: df[0] - pd.to_datetime("2017")
Out[48]: 
0   -17167 days
Name: 0, dtype: timedelta64[ns]

Problem description

Operation - on datetimes should always return a timedelta.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.10.0-35-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 36.2.7
Cython: None
numpy: 1.13.3
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: 1.1.14
pymysql: None
psycopg2: 2.7.3 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

Activity

michalsustr

michalsustr commented on Oct 10, 2017

@michalsustr
Author

This seems like an easy issue to fix, I can create a PR if this is indeed confirmed.

chris-b1

chris-b1 commented on Oct 10, 2017

@chris-b1
Contributor

Can you try on master? There's a chance this is fixed by some of the recent inference changes, if it isn't not a PR would be welcome

michalsustr

michalsustr commented on Oct 10, 2017

@michalsustr
Author

I'm building it now... but $ python setup.py develop is super slow, runs only one process...

michalsustr

michalsustr commented on Oct 10, 2017

@michalsustr
Author

Confirmed.

>>> import pandas as pd
>>> df = pd.DataFrame([[0]], dtype="datetime64[ns]")
>>> df
           0
0 1970-01-01
>>> df[0] - pd.to_datetime("nat")
0   NaT
Name: 0, dtype: datetime64[ns]
>>> df[0] - pd.to_datetime("2017")
0   -17167 days
Name: 0, dtype: timedelta64[ns]

Output of pd.show_versions():

INSTALLED VERSIONS ------------------ commit: d12a7a0 python: 3.6.2.final.0 python-bits: 64 OS: Linux OS-release: 4.10.0-35-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.21.0.dev+613.gd12a7a018
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.5.0
Cython: 0.27.1
numpy: 1.13.3
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

michalsustr

michalsustr commented on Oct 11, 2017

@michalsustr
Author

Strangely, the following works correctly:

In [15]: s = pd.Series(pd.date_range('2012-1-1', periods=1, freq='D'))

In [17]: s.shift()
Out[17]: 
0   NaT
dtype: datetime64[ns]

In [16]: s - s.shift()
    ...: 
Out[16]: 
0   NaT
dtype: timedelta64[ns]
michalsustr

michalsustr commented on Oct 11, 2017

@michalsustr
Author

So it seems that if the datetime is inside of Series the behaviour is correct, but if it is a scalar value improper type is casted:

In [24]:  df = pd.DataFrame([[0], [1]], dtype="datetime64[ns]")

In [26]: df[0] - pd.Series(pd.to_datetime("nat"))
Out[26]: 
0   NaT
1   NaT
dtype: timedelta64[ns]

In [27]: df[0] - pd.to_datetime("nat")
Out[27]: 
0   NaT
1   NaT
Name: 0, dtype: datetime64[ns]
jreback

jreback commented on Oct 13, 2017

@jreback
Contributor
In [22]: df = pd.DataFrame([[0]], dtype="datetime64[ns]")

In [23]: df
Out[23]: 
           0
0 1970-01-01

In [24]: df[0] - pd.to_datetime("nat")
Out[24]: 
0   NaT
Name: 0, dtype: datetime64[ns]

In [25]: df[0] - pd.to_datetime("2017")
Out[25]: 
0   -17167 days
Name: 0, dtype: timedelta64[ns]

In [26]: pd.to_datetime("2017")
Out[26]: Timestamp('2017-01-01 00:00:00')


@michalsustr what do you think is wrong here?

added
Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
DatetimeDatetime data dtype
on Oct 13, 2017
jaksmid

jaksmid commented on Dec 7, 2017

@jaksmid

It seems the issue is now fixed, at least for pandas 0.20.3.

jreback

jreback commented on Dec 7, 2017

@jreback
Contributor

can u do a PR with some validation tests?

modified the milestone: Next Major Release on Jan 1, 2018

2 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    DatetimeDatetime data dtypeMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      Participants

      @jreback@michalsustr@chris-b1@jaksmid

      Issue actions

        diff of datetime column with NaT does not result into TimeDelta · Issue #17837 · pandas-dev/pandas