Avoiding DataFrame.apply unintended side effect when result_type is not specified.

According to the docs (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html)

"In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects..."

Well it definitely is there in the docs, but took me several hours to trace down the bug to this "feature".

So I think it would be cleaner either to fully support side effects in apply (e.g. by calling func on a copy of the first column/row in the testing phase ) or ban it completely if technically possible.

I know there are plans to ban modification when using `groupby.apply` ( #12653 )
I don't see any issues with mutation inside a (non groupby) apply per se, but I may be wrong.

I also have to note, that the above note from the docs is not entirely correct. If `result_type` is specified the first row/column is not necessarily processed twice.





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Uh oh!

Avoiding DataFrame.apply unintended side effect when result_type is not specified. #24614

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Avoiding DataFrame.apply unintended side effect when result_type is not specified. #24614

Description

Activity

dsaxton commented on Jan 4, 2019

kefirbandi commented on Jan 5, 2019

fxjung commented on Apr 1, 2020

fxjung commented on Apr 1, 2020

rhshadrach commented on Apr 16, 2022

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions