Skip to content

BUG/API: Index constructor does not enforce specified dtype  #21311

@jorisvandenbossche

Description

@jorisvandenbossche
Member

Code Sample, a copy-pastable example if possible

Manually specifying a dtype does not garantuee the output is in that dtype. Eg with Series if incompatible data is passed, an error is raised, while for Index it just silently outputs another dtype:

In [11]: pd.Series(['a', 'b', 'c'], dtype='int64')
...
ValueError: invalid literal for int() with base 10: 'a'

In [12]: pd.Index(['a', 'b', 'c'], dtype='int64')
Out[12]: Index(['a', 'b', 'c'], dtype='object')

Activity

jschendel

jschendel commented on Jun 4, 2018

@jschendel
Member

xref #21254 (comment)

Meant to create an issue similar to this for the Categorical --> Interval case:

In [2]: cat = pd.Categorical([pd.Interval(0, 1), pd.Interval(1, 2), pd.Interval(0, 1)])

In [3]: pd.Index(cat, dtype='interval')
Out[3]: CategoricalIndex([(0, 1], (1, 2], (0, 1]], categories=[(0, 1], (1, 2]], ordered=False, dtype='category')

This happens because the Index code is structured so that categorical takes precedence over interval:

# categorical
if is_categorical_dtype(data) or is_categorical_dtype(dtype):
from .category import CategoricalIndex
return CategoricalIndex(data, dtype=dtype, copy=copy, name=name,
**kwargs)
# interval
if is_interval_dtype(data) or is_interval_dtype(dtype):
from .interval import IntervalIndex
closed = kwargs.get('closed', None)
return IntervalIndex(data, dtype=dtype, name=name, copy=copy,
closed=closed)

The code above could be restructured so that the dtype argument, if present, takes precedence over the type of data.

Not immediately sure what the fix for other scenarios entails, as Categorical and Interval are a bit of a special case.

added
IndexingRelated to indexing on series/frames, not to indexes themselves
on Jun 15, 2018
added
IndexRelated to the Index class or subclasses
and removed
IndexingRelated to indexing on series/frames, not to indexes themselves
on Jun 29, 2019
added this to the 1.3 milestone on Dec 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugConstructorsSeries/DataFrame/Index/pd.array ConstructorsDtype ConversionsUnexpected or buggy dtype conversionsIndexRelated to the Index class or subclasses

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Participants

      @jreback@jorisvandenbossche@toobaz@jschendel@jbrockmendel

      Issue actions

        BUG/API: Index constructor does not enforce specified dtype · Issue #21311 · pandas-dev/pandas