-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
To pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.
Reproducible Example
import pandas as pd
from pandas import Series
import pandas._testing as tm
result = Series([1000000, 200000, 3000000], dtype="timedelta64[s]")
expected = Series(pd.to_timedelta([1000000, 200000, 3000000], unit="s"))
tm.assert_series_equal(result, expected)
Issue Description
This code passes
result = Series([1000000, 200000, 3000000], dtype="timedelta64[ns]")
expected = Series(pd.to_timedelta([1000000, 200000, 3000000], unit="ns"))
tm.assert_series_equal(result, expected)
But when dtype="timedelta64[s]" and unit="s" it returns
AssertionError: numpy array are different
numpy array values are different (100.0 %)
[index]: [0, 1, 2]
[left]: [1000000, 200000, 3000000]
[right]: [1000000000000000, 200000000000000, 3000000000000000]
Expected Behavior
Both series should be equal.
Installed Versions
INSTALLED VERSIONS
commit : 201cbf6
python : 3.9.10.final.0
python-bits : 64
OS : Darwin
OS-release : 21.3.0
Version : Darwin Kernel Version 21.3.0: Wed Jan 5 21:37:58 PST 2022; root:xnu-8019.80.24~20/RELEASE_ARM64_T8101
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8
pandas : 1.5.0.dev0+1364.g201cbf6bc1.dirty
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
setuptools : 60.9.3
pip : 22.1.1
Cython : 0.29.32
pytest : 7.1.2
hypothesis : 6.52.3
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.8.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.3.0
pandas_datareader: 0.10.0
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.2
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.8.1
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None
Activity
phofl commentedon Sep 1, 2022
Simple reproducer:
This should pass, but we seem to ignore the seconds and interpret it as nanoseconds
phofl commentedon Sep 1, 2022
This is currently not supported and should raise imo rather than returning buggy conversions
cc @jbrockmendel
jbrockmendel commentedon Sep 2, 2022
Agreed.
Also will be supported in 2.0, so just need a temporary patch for 1.4.x/1.5.x
jbrockmendel commentedon Oct 12, 2022
cc @mroeschke @jreback this becomes more salient with non-nano support.
pd.Series([1, 2, 3], dtype="m8[s]")
i think ideally should interpret those integers as seconds, but without an API change it will interpret them as nanoseconds, then cast the result tom8[s]
. Interpreting them as seconds would also be consistent withpd.Series([1, 2 , 3]).astype("m8[s]")
mroeschke commentedon Oct 12, 2022
That sounds reasonable; it also make it effectively similar to
to_timedelta([ints], unit="s")
which in spirit mangles "unit" and "reso" but may not matter.jbrockmendel commentedon Oct 14, 2022
cc @jreback
jbrockmendel commentedon Oct 14, 2022
possible deprecation cycles notwithstanding, my preferred behavior would be for
pd.Series(some_ints, dtype="m8[unit]").to_numpy()
to matchnp.array(some_ints, dtype="m8[unit]")
. i'd do the same for dt64 dtypes.jreback commentedon Oct 14, 2022
proposal sounds good
4 remaining items