Skip to content

Concat coerces ints to floats if empty DataFrame is present #8902

Closed
@EVaisman

Description

@EVaisman

It seems that when using concat coerces Series of type int to type float if one of the DataFrames being concatted is empty:

>>> df1 = pd.DataFrame([1],columns=['a'])
>>> df2 = pd.DataFrame(columns=['a'])
>>> df = pd.concat([df1, df2])
>>> df['a'].dtype
dtype('float64')
>>> df1['a'].dtype
dtype('int64')

While if both columns have ints, no coercion happens:

>>> df1 = pd.DataFrame([1],columns=['a'])
>>> df2 = pd.DataFrame([1],columns=['a'])
>>> df = pd.concat([df1, df2])
>>> df['a'].dtype
dtype('int64')

Activity

jreback

jreback commented on Nov 26, 2014

@jreback
Contributor

cc @immerrr

This is actually somewhat of a grey area. If you had passed a None, then it would be excluded and so no coercion happens. The fact that you passed a completely empty frame, which is object dtype is the issue. Technically this should coerce, though I can see the case for 'ignoring' it. Since it doesn't have any valid shape.

In [21]: np.prod(df2.shape)
Out[21]: 0

In [22]: np.prod(df1.shape)
Out[22]: 1

IOW, I could see ignoring these in the pre-computation.

added
Dtype ConversionsUnexpected or buggy dtype conversions
ReshapingConcat, Merge/Join, Stack/Unstack, Explode
on Nov 26, 2014
added this to the 0.16.0 milestone on Nov 26, 2014
immerrr

immerrr commented on Nov 27, 2014

@immerrr
Contributor

I agree, it should coerce technically, but not practically.

immerrr

immerrr commented on Nov 27, 2014

@immerrr
Contributor

I have a lot on my plate right now, but if it waits till I'm done with that, I'll fix this.

modified the milestones: 0.16.0, Next Major Release on Mar 6, 2015
a-y-khan

a-y-khan commented on Apr 5, 2020

@a-y-khan
Contributor

In recent Pandas version (v1.0.3), concat coerces ints to object instead of floats:

In [1]: import pandas as pd                    

In [2]: pd.__version__                         
Out[2]: '1.0.3'

In [3]: df1 = pd.DataFrame([1],columns=['a'])  

In [4]: df1['a'].dtype                         
Out[4]: dtype('int64')

In [5]: df2 = pd.DataFrame(columns=['a'])      

In [6]: df2['a'].dtype                         
Out[6]: dtype('O')

In [7]: df = pd.concat([df1, df2])             

In [8]: df['a'].dtype                          
Out[8]: dtype('O')
mroeschke

mroeschke commented on Apr 11, 2021

@mroeschke
Member

I think the above behavior was an intentional behavior change a while back (that an empty DataFrame has an object dtype`); therefore, a common dtype between object and int is object.

Closing as the new expected behavior, but happy to reopen if I am misunderstanding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDtype ConversionsUnexpected or buggy dtype conversionsReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @immerrr@jreback@a-y-khan@EVaisman@mroeschke

        Issue actions

          Concat coerces ints to floats if empty DataFrame is present · Issue #8902 · pandas-dev/pandas