Skip to content

BUG: timedelta64[s] series constructor isn't equal with alternative constructor using to_timedelta unit='s' #48312

@ntachukwu

Description

@ntachukwu
Contributor

Pandas version checks

  • I have checked that this issue has not already been reported.

    I have confirmed this bug exists on the latest version of pandas.

    I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
from pandas import Series
import pandas._testing as tm

result = Series([1000000, 200000, 3000000], dtype="timedelta64[s]")
expected = Series(pd.to_timedelta([1000000, 200000, 3000000], unit="s"))
tm.assert_series_equal(result, expected)

Issue Description

This code passes

 result = Series([1000000, 200000, 3000000], dtype="timedelta64[ns]")
 expected = Series(pd.to_timedelta([1000000, 200000, 3000000], unit="ns"))
 tm.assert_series_equal(result, expected)

But when dtype="timedelta64[s]" and unit="s" it returns

AssertionError: numpy array are different

numpy array values are different (100.0 %)
[index]: [0, 1, 2]
[left]:  [1000000, 200000, 3000000]
[right]: [1000000000000000, 200000000000000, 3000000000000000]

Expected Behavior

Both series should be equal.

Installed Versions

INSTALLED VERSIONS

commit : 201cbf6
python : 3.9.10.final.0
python-bits : 64
OS : Darwin
OS-release : 21.3.0
Version : Darwin Kernel Version 21.3.0: Wed Jan 5 21:37:58 PST 2022; root:xnu-8019.80.24~20/RELEASE_ARM64_T8101
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 1.5.0.dev0+1364.g201cbf6bc1.dirty
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
setuptools : 60.9.3
pip : 22.1.1
Cython : 0.29.32
pytest : 7.1.2
hypothesis : 6.52.3
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.8.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.3.0
pandas_datareader: 0.10.0
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.2
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.8.1
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

Activity

added
Needs TriageIssue that has not been reviewed by a pandas team member
on Aug 30, 2022
phofl

phofl commented on Sep 1, 2022

@phofl
Member

Simple reproducer:

result = np.array([1000000, 200000, 3000000], dtype="timedelta64[s]")
result_pandas = pd.Series([1000000, 200000, 3000000], dtype="timedelta64[s]")
tm.assert_numpy_array_equal(result, result_pandas.values)

This should pass, but we seem to ignore the seconds and interpret it as nanoseconds

phofl

phofl commented on Sep 1, 2022

@phofl
Member

This is currently not supported and should raise imo rather than returning buggy conversions

cc @jbrockmendel

added
Timestamppd.Timestamp and associated methods
and removed
Needs TriageIssue that has not been reviewed by a pandas team member
on Sep 2, 2022
jbrockmendel

jbrockmendel commented on Sep 2, 2022

@jbrockmendel
Member

This is currently not supported and should raise imo rather than returning buggy conversions

Agreed.

Also will be supported in 2.0, so just need a temporary patch for 1.4.x/1.5.x

jbrockmendel

jbrockmendel commented on Oct 12, 2022

@jbrockmendel
Member

cc @mroeschke @jreback this becomes more salient with non-nano support. pd.Series([1, 2, 3], dtype="m8[s]") i think ideally should interpret those integers as seconds, but without an API change it will interpret them as nanoseconds, then cast the result to m8[s]. Interpreting them as seconds would also be consistent with pd.Series([1, 2 , 3]).astype("m8[s]")

added
Non-Nanodatetime64/timedelta64 with non-nanosecond resolution
on Oct 12, 2022
mroeschke

mroeschke commented on Oct 12, 2022

@mroeschke
Member

That sounds reasonable; it also make it effectively similar to to_timedelta([ints], unit="s") which in spirit mangles "unit" and "reso" but may not matter.

jbrockmendel

jbrockmendel commented on Oct 14, 2022

@jbrockmendel
Member
jbrockmendel

jbrockmendel commented on Oct 14, 2022

@jbrockmendel
Member

possible deprecation cycles notwithstanding, my preferred behavior would be for pd.Series(some_ints, dtype="m8[unit]").to_numpy() to match np.array(some_ints, dtype="m8[unit]"). i'd do the same for dt64 dtypes.

jreback

jreback commented on Oct 14, 2022

@jreback
Contributor

proposal sounds good

4 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNon-Nanodatetime64/timedelta64 with non-nanosecond resolutionTimedeltaTimedelta data type

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @jreback@jbrockmendel@gfyoung@mroeschke@ntachukwu

      Issue actions

        BUG: timedelta64[s] series constructor isn't equal with alternative constructor using to_timedelta unit='s' · Issue #48312 · pandas-dev/pandas