Skip to content

[Question] pyiceberg 0.6.0 #350

Open
@gui-elastic

Description

@gui-elastic

Hello,

Recently, the pyiceberg 0.6.0 version was released which allows writing iceberg tables without needing tools like Spark and Trino.

I was about to write a custom plugin to implement the writing feature, however, I see that when using the external materialization with a custom plugin, first the outputted data is stored locally and then is read and ingested to the final source, however for Iceberg and Delta is does not seem to be a good solution. Would be good instead of storing the data on disk, simply load an Arrow Dataframe and then write to the final destination (e.g., s3 in Iceberg format).

I saw this thread: #332 (comment), so I would like to ask you if there is any ETA to implement this feature. It would be an amazing feature to even use for production workloads with a Data Lakehouse architecture.

This explains well what needs to be fixed to use the iceberg writer in the best way possible: #284 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions