Skip to content

ENH: support pd.NA in "category" dtype #47982

Open
@devmcp

Description

@devmcp

Feature Type

  • Adding new functionality to pandas

    Changing existing functionality in pandas

    Removing existing functionality in pandas

Problem Description

I would like to be able to use pd.NA for missing data in a column of dtype "category"

Currently this:

pd.DataFrame({"A": ["one", "two", pd.NA]}).astype("category")

converts the pd.NA to np.NaN.

Feature Description

I think there should be a "category" dtype that supports pd.NA.

Alternative Solutions

I don't think there is a current workaround

Additional Context

No response

Activity

phofl

phofl commented on Aug 5, 2022

@phofl
Member

Hi, thanks for your report. We already have this, through string-dtype:

pd.DataFrame({"A": ["one", "two", pd.NA]}, dtype="string").astype("category")
mzeitlin11

mzeitlin11 commented on Aug 5, 2022

@mzeitlin11
Member

Related to #29962

added
Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Closing CandidateMay be closeable, needs more eyeballs
and removed
Needs TriageIssue that has not been reviewed by a pandas team member
on Aug 5, 2022
devmcp

devmcp commented on Aug 8, 2022

@devmcp
Author

Thanks for your reply @phofl - I didn't quite appreciate how the category "type" sat on top of the actual type of the data. Thanks for linking @mzeitlin11 - good to see I'm not the only one who didn't find this entirely intuitive at first pass.

WillAyd

WillAyd commented on Jun 25, 2024

@WillAyd
Member

This is also a rather interesting problem for discussion under the scope of #58988

Going to reopen to track this - having to go the .astype("category") route is pretty inefficient, but the alternative I think brings up some pitfalls:

>>> pd.DataFrame({"A": ["one", "two", pd.NA]}, dtype="category")
     A
0  one
1  two
2  NaN
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    CategoricalCategorical Data TypeEnhancementMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @WillAyd@mroeschke@mzeitlin11@devmcp@phofl

        Issue actions

          ENH: support pd.NA in "category" dtype · Issue #47982 · pandas-dev/pandas