Closed
Description
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample:
import pandas as pd
df = pd.DataFrame({"date": ["2019-02-10", "2019-02-10", "2019-02-11"]})
df["date"] = pd.to_datetime(df["date"])
print("Type before the for cycle:")
print(type(df["date"][0])) # pandas._libs.tslibs.timestamps.Timestamp
for day in df["date"].unique():
print("Type in the loop:")
print(type(day)) # here is a numpy.datetime64
which returns:
Type before the for cycle:
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
Type in the loop:
<class 'numpy.datetime64'>
Type in the loop:
<class 'numpy.datetime64'>
Problem description
The function unique()
should not cast the data type.
Expected Output
types of df_target["date"].unique()
should be the same as in set(df_target["date"].to_list())
. E.g.
import pandas as pd
df = pd.DataFrame({"date": ["2019-02-10", "2019-02-10", "2019-02-11"]})
df["date"] = pd.to_datetime(df["date"])
print("Type before the for cycle:")
print(type(df["date"][0]))
for day in set(df["date"].to_list()):
print("Type in the loop:")
print(type(day))
Returning:
Type before the for cycle:
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
Type in the loop:
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
Type in the loop:
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit : None
python : 3.7.7.final.0
python-bits : 64
OS : Darwin
OS-release : 19.5.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
pandas : 1.0.5
numpy : 1.19.0
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 47.3.1
Cython : None
pytest : 5.4.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.2.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.4.1
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None
Metadata
Metadata
Assignees
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
jreback commentedon Jul 29, 2020
there have been a number of discussions about this - pls look for duplicate issues before opening a new one
SebastianoX commentedon Jul 29, 2020
Thanks for your answer @jreback .
Before posting I looked for duplicate issues / stackoverflow questions / google in general and I could not see any.
Please do link the discussions/issues here, so that I and other interested developers can find them.
If it is a duplicate feel free to close it.
simonjayhawkins commentedon Jul 29, 2020
I'll close this since I think it is covered by #22824
jreback commentedon Jul 29, 2020
though having a dedicated issue for this might be ok (as that catch all unique issue brings up many topics)
we cannot change this to return a DatetimeArray till 2.0 in any event (nor can we deprecate anything)
SebastianoX commentedon Jul 29, 2020
@simonjayhawkins for what I can understand #22824 is a different issue.
The problem of the current issue, is not that
unique()
returns an array. The problem is that the objects of typeTimestamp
in a colum are casted to objects of type np.datetime64 in the numpy array returned whenunique()
is invoked on this column.SebastianoX commentedon Jul 29, 2020
Let me add a clearer example:
The code returns:
simonjayhawkins commentedon Jul 29, 2020
OK but I don't think that's clear from the OP. Feel free to open a new issue.
SebastianoX commentedon Jul 29, 2020
You do not think it is clear as in "I think it is covered by #22824"?
Anyway, new issue is on its way.
3 remaining items