Open
Description
Code Sample, a copy-pastable example if possible
import numpy as np
import pandas as pd
s1 = pd.Series({"a": np.int64(64), "b": 10})
for v in s1.to_dict().values():
print(type(v)) # prints <class 'int'> 2x
s2 = pd.Series({"a": np.int64(64), "b": 10, "c": "ABC"})
for v in s2.to_dict().values():
print(type(v)) # prints <class 'numpy.int64'> for first variable "a"
for k, v in s1.items():
print(k, type(v)) # prints <class 'int'> 2x
for k, v in s2.items():
print(k, type(v)) # prints <class 'numpy.int64'> again for the first variable "a"
Problem description
pd.Series.to_dict
can return different types for objects depending on the composition of the series. This also affects iteration, e.g., for k, v in series: ...
. This is inconsistent and, critically, leads to really weird and hard to debug issues downstream with types, especially around JSON conversion (the built-in json
module and many others will blow up when it encounters numpy dtypes).
I cannot find this exact issue open in the issue tracker, though there are a number of related issues including:
- An issue related to
DataFrame.to_dict
and inconsistent types (closed in0.24
): to_dict doesn't convert np.int64 to python integers (second try) #24908 - This issue also related to scalar coercion on
DataFrame.to_dict
calls (also closed recently): DataFrame.to_dict returning numpy scalars in certain cases #23753 - This PR fixes the issue deriving from iteration, but it looks like the above case is either an untested edge case or a regression: COMPAT: Iteration should always yield a python scalar #17491
Expected Output
Expected output is for type coercion to Python ints to occur regardless of the exact column composition in the Series. #24908 is a related issue for DataFrame
coercions with irregular behavior happening as a result.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.2
pytest: None
pip: 10.0.1
setuptools: 39.0.1
Cython: None
numpy: 1.16.2
scipy: None
pyarrow: None
xarray: None
IPython: 7.4.0
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
Activity
mroeschke commentedon Apr 4, 2019
This probably occurs because
s2
is object dtype and it's trying to preserve the dtype of each input argument while the arguments ins1
can both be coerced toint64
.Investigation and PR's welcome~
drew-heenan commentedon Apr 7, 2019
I'm having a go at this issue - quick note @boydgreenfield, it looks like iterating over a
Series
object as in the last two loops in your example results in an iteration only over theint
values in theSeries
. Did you mean to iterate overs1.items()
or similar?boydgreenfield commentedon Apr 8, 2019
@drew-heenan Yes you're right I meant
.items()
. Have updated the above code snippet. Thanks for taking a look at the issue!Patch Potion's JSONEncoder to handle numpy int and float types. Closes …
Patch Potion's JSONEncoder to handle numpy int and float types. Closes …
11 remaining items