Description
With the next release of Pandas, it will be possible to define custom column types that back a pandas.Series
. Thus we will not be able to cover all possible column types in the to_pandas
conversion by default as we won't be aware of all extension arrays.
To enable users to create ExtensionArray
instances from Arrow columns in the to_pandas
conversion, we should provide a hook in the to_pandas
call where they can overload the default conversion routines with the ones that produce their ExtensionArray
instances.
This should avoid additional copies in the case where we would nowadays first convert the Arrow column into a default Pandas column (probably of object type) and the user would afterwards convert it to a more efficient ExtensionArray
. This hook here will be especially useful when you build ExtensionArrays
where the storage is backed by Arrow.
The meta-issue that tracks the implementation inside of Pandas is: pandas-dev/pandas#19696
Reporter: Uwe Korn / @xhochy
Assignee: Joris Van den Bossche / @jorisvandenbossche
Related issues:
- [Python] support pandas' nullable Integer type in from_pandas (blocks)
- [Python] Add API to map Arrow types to pandas ExtensionDtypes for to_pandas conversions (relates to)
- [Python] Interface for converting pandas ExtensionArray / other custom array objects to pyarrow Array (relates to)
- [Python] Provide Python API for creating user-defined data types that can survive Arrow IPC (depends upon)
PRs and other links:
Note: This issue was originally created as ARROW-2428. Please see the migration documentation for further details.