Skip to content

COMPAT: .map iterates over python types rather than storage type #13236

Closed
@glaucouri

Description

@glaucouri

Code Sample, a copy-pastable example if possible

import pandas as P
S=P.Series([0.6,0.2,15])

pandas 0.18+numpy 0.10:

In [1]: print S.dtype
float64

In [2]: print S.values.dtype
float64

In [3]: print S.map(type)
0    <type 'numpy.float64'>
1    <type 'numpy.float64'>
2    <type 'numpy.float64'>
dtype: object

pandas 0.18.1+numpy 0.11.0:

In [5]: print S.dtype
float64

In [6]: print S.values.dtype
float64

In [7]: print S.map(type)
0    <type 'float'>
1    <type 'float'>
2    <type 'float'>
dtype: object

I expect to get the same dtype for the 3 print, why this is changed in last version?

output of pd.show_versions()

pandas: 0.18.1
nose: 1.3.7
pip: 1.5.4
setuptools: 21.0.0
Cython: 0.24
numpy: 1.11.0
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.3.5
patsy: 0.4.1
dateutil: 2.4.2
pytz: 2016.4
blosc: 1.2.7
bottleneck: None
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.1
openpyxl: 2.3.5
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: None
lxml: 3.4.0
bs4: 4.4.1
html5lib: 0.9999999
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

Thank you
Gla

Activity

jreback

jreback commented on May 20, 2016

@jreback
Contributor

#12473 fixed map as was failing on non-numpy aware dtypes. This is iterating over python float values so this is correct (was essentially a bug before).

cc @sinhrks

added
Dtype ConversionsUnexpected or buggy dtype conversions
Compatpandas objects compatability with Numpy or Python functions
on May 20, 2016
added this to the No action milestone on May 20, 2016
glaucouri

glaucouri commented on May 20, 2016

@glaucouri
Author

But the underlyng data type IS a numpy dtype.

In this example is reasonable to have all float32 dtype, the same type.

In [4]: S.astype(P.np.float32).map(type)
Out[4]: 
0    <type 'float'>
1    <type 'float'>
2    <type 'float'>
dtype: object

In [5]: map(type, S.astype(P.np.float32))
Out[5]: [numpy.float32, numpy.float32, numpy.float32]

thank you
Gla

jreback

jreback commented on May 20, 2016

@jreback
Contributor

So < 0.18.1 we were NOT using .asobject and instead iterating over the actual numpy types. Now these are converted to python types and then iterated.

In [1]: s = Series([1],dtype='float32')

In [2]: pd.lib.map_infer(s.values, type)
Out[2]: array([<type 'numpy.float32'>], dtype=object)

In [4]: pd.lib.map_infer(s.asobject, type)
Out[4]: array([<type 'float'>], dtype=object)

I think this is more correct actually (I don't think this was tested before), and did not provide a guarantee (either way) of what types it would result.

cc @sinhrks

jreback

jreback commented on May 20, 2016

@jreback
Contributor

I will reopen for discussion.

removed this from the No action milestone on May 20, 2016
changed the title [-]strange float type promotion[/-] [+]COMPAT: .map iterates over python types rather than storage type[/+] on May 20, 2016
sinhrks

sinhrks commented on May 21, 2016

@sinhrks
Member

Assuming map to datetime-likes Series. Because np.datetime64 doesn't have convenient properties like pd.Timestamp, it coerces from numpy dtype to pandas/python dtype (Timestamp inherits datetime.datetime).

s = pd.Series([pd.Timestamp('2011-01-01')])
s.dtype
# dtype('<M8[ns]')

s.map(type)
# 0    <class 'pandas.tslib.Timestamp'>
# dtype: object

I feel it is natural numeric coerces to python repr also. Can u provide an usage which np.float is preferable?

jreback

jreback commented on May 21, 2016

@jreback
Contributor

yeah I think this only matters for non-extension & i8 types (e.g. int/float)
(strings / bools, extension types, and i8 are better off with python (and pandas) types for sure).

I think it does make sense to return native types (float vs np.float) as has lots more utility.

Let's update the doc-string to indicate this?

so repurposing this issue.

24 remaining items

modified the milestones: 0.21.0, Next Major Release on Sep 10, 2017
added a commit that references this issue on Sep 10, 2017
86231bf
added 6 commits that reference this issue on Sep 11, 2017
2ebcbfc
805d6ef
18126ea
c0fd989
6a02e4f
05f8a6f
added a commit that references this issue on Sep 12, 2017
83436af
added a commit that references this issue on Nov 10, 2017
c7e4654
added a commit that references this issue on Nov 28, 2017
2414864
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignCompatpandas objects compatability with Numpy or Python functionsDocsDtype ConversionsUnexpected or buggy dtype conversions

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      Participants

      @jreback@jorisvandenbossche@sinhrks@glaucouri

      Issue actions

        COMPAT: .map iterates over python types rather than storage type · Issue #13236 · pandas-dev/pandas