Skip to content

BUG: df.to_dict(orient="records") significantly slower in Pandas 1.3.0 #42352

@kyri-petrou

Description

@kyri-petrou

Using df.to_dict(orient="records") with large dataframes is significantly slower in pandas 1.3.0 vs 1.2.5.

Could you please advice on what might be the cause of this issue?

Test dataframe

image

<class 'pandas.core.frame.DataFrame'>
Int64Index: 100823 entries, 0 to 262141
Data columns (total 13 columns):
 #   Column           Non-Null Count   Dtype  
---  ------           --------------   -----  
 0   Class            100823 non-null  object 
 1   x                100823 non-null  float64
 2   y                100823 non-null  float64
 3   z                100823 non-null  float64
 4   rgb              100823 non-null  object 
 5   distance         100823 non-null  float64
 6   treecluster      100820 non-null  float64
 7   normal           100823 non-null  object 
 8   color            100823 non-null  object 
 9   rgb_distance     100823 non-null  object 
 10  responsibility   100820 non-null  object 
 11  vp_codes         100820 non-null  float64
 12  rgb_treecluster  100823 non-null  object 
dtypes: float64(6), object(7)
memory usage: 10.8+ MB

Simple timing test

image

Profiling

Pandas 1.2.5

         5547864 function calls (5547672 primitive calls) in 1.791 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1310699    0.568    0.000    0.831    0.000 cast.py:137(maybe_box_datetimelike)
  1411522    0.351    0.000    1.181    0.000 frame.py:1601(<genexpr>)
   100824    0.288    0.000    0.288    0.000 frame.py:1596(<genexpr>)
        1    0.280    0.280    1.759    1.759 frame.py:1600(<listcomp>)
  2621687    0.263    0.000    0.263    0.000 {built-in method builtins.isinstance}
        1    0.028    0.028    1.791    1.791 <string>:2(<module>)
   100823    0.010    0.000    0.010    0.000 {method 'items' of 'dict' objects}
     83/3    0.002    0.000    0.002    0.001 {built-in method _abc._abc_subclasscheck}
       13    0.000    0.000    0.001    0.000 indexing.py:782(_getitem_lowerdim)
       26    0.000    0.000    0.000    0.000 generic.py:5467(__setattr__)

Pandas 1.3.0

         35794233 function calls (35794206 primitive calls) in 15.844 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1310699    2.127    0.000   11.738    0.000 common.py:1578(_is_dtype_type)
   806581    2.013    0.000    5.492    0.000 base.py:425(find)
  1310696    1.942    0.000    8.020    0.000 common.py:1744(pandas_dtype)
  2218073    1.415    0.000    1.740    0.000 base.py:208(construct_from_string)
 12703947    1.410    0.000    1.410    0.000 {built-in method builtins.isinstance}
  1310699    1.066    0.000   14.387    0.000 cast.py:173(maybe_box_native)
  1310699    0.962    0.000   12.927    0.000 common.py:996(is_datetime_or_timedelta_dtype)
  1411522    0.598    0.000   14.985    0.000 frame.py:1823(<genexpr>)
        1    0.498    0.498   15.815   15.815 frame.py:1822(<listcomp>)
  1310699    0.344    0.000    0.535    0.000 common.py:146(<lambda>)
   100824    0.313    0.000    0.313    0.000 frame.py:1818(<genexpr>)

Activity

added
Needs TriageIssue that has not been reviewed by a pandas team member
on Jul 3, 2021
mzeitlin11

mzeitlin11 commented on Jul 11, 2021

@mzeitlin11
Member

Thanks for reporting and taking the time to profile this @kyri-petrou! Confirmed on current master with a simple reproducer like

rng = np.random.default_rng(0)

data = rng.integers(0, 1000, size=(10000, 10))
df = pd.DataFrame(data)
df.to_dict(orient="records")  # 0.73s on master, 0.12s on 1.2.5 (under profiling conditions)

Investigations welcome!

added
Dtype ConversionsUnexpected or buggy dtype conversions
PerformanceMemory or execution speed performance
RegressionFunctionality that used to work in a prior pandas version
and removed
Needs TriageIssue that has not been reviewed by a pandas team member
on Jul 11, 2021
added this to the 1.3.1 milestone on Jul 11, 2021
mzeitlin11

mzeitlin11 commented on Jul 11, 2021

@mzeitlin11
Member

Looks due to #37648, will look further soon

self-assigned this
on Jul 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

Dtype ConversionsUnexpected or buggy dtype conversionsPerformanceMemory or execution speed performanceRegressionFunctionality that used to work in a prior pandas version

Type

No type

Projects

No projects

Relationships

None yet

    Development

    Participants

    @mzeitlin11@kyri-petrou

    Issue actions

      BUG: df.to_dict(orient="records") significantly slower in Pandas 1.3.0 · Issue #42352 · pandas-dev/pandas