Skip to content

Copy-on-Write (PDEP-7) follow-up overview issue #48998

Open
@jorisvandenbossche

Description

@jorisvandenbossche
Member

PDEP-7: https://pandas.pydata.org/pdeps/0007-copy-on-write.html

An initial implementation was merged in #46958 (with the proposal described in more detail in https://docs.google.com/document/d/1ZCQ9mx3LBMy-nhwRl33_jgcvWo9IWdEfxDNQ2thyTb0/edit / discussed in #36195).

In #36195 (comment) I mentioned some next steps that are still needed; moving this to a new issue.

Implementation

Complete the API surface:

Improve the performance

  • Optimize setitem operations to prevent copies of whole blocks (eg splitting the block could help keeping a view for all other columns, and we only take a copy for the columns that are modified) where splitting the block could keep a view for all other columns, and
  • Check overall performance impact (eg run asv with / without CoW enabled by default and see the difference)

Provide upgrade path:

  • Add a warning mode that gives deprecation warnings for all cases where the current behaviour would change (initially also behind an option): CoW warning mode for cases that will change behaviour #56019
    • We can also update the message of the existing SettingWithCopyWarnings to point users towards enabling CoW as a way to get rid of the warnings
    • Add a general FutureWarning "on first use that would change" that is only raised a single time

Documentation / feedback

Aside from finalizing the implementation, we also need to start documenting this, and it will be super useful to have people give this a try, run their code or test suites with it, etc, so we can iron out bugs / missing warnings / or discover unexpected consequences that need to be addressed/discussed.

  • Document this new feature (how it works, how you can test it)
    We can still add a note to the 1.5 whatsnew linking to those docs
    Write a set of blogposts on the topic
    Gather feedback from users / downstream packages
  • Update existing documentation:
  • Write an upgrade guide

Some remaining aspects of the API to figure out:

  • What to do with the Series.view() method -> is deprecated
    Let head()/tail() return eager copies? (to avoid using those methods for exploration trigger CoW) -> API/CoW: Return copies for head and tail #54011

Activity

17 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @jorisvandenbossche@bashtage

        Issue actions

          Copy-on-Write (PDEP-7) follow-up overview issue · Issue #48998 · pandas-dev/pandas