Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Iceberg] Daft Dataframe Reader #170

Open
pdames opened this issue Aug 2, 2023 · 0 comments
Open

[Iceberg] Daft Dataframe Reader #170

pdames opened this issue Aug 2, 2023 · 0 comments
Assignees
Labels
iceberg This issue is related to Apache Iceberg catalog support

Comments

@pdames
Copy link
Member

pdames commented Aug 2, 2023

To help improve the end-to-end efficiency of compaction and similar operations (e.g. hash-equality joins, dataset filters, hash partitioning, etc.) and based on current benchmarks showing Daft I/O to be more performance that PyArrow and S3FS for S3 Parquet file reads, we would like to add an Daft-native Dataframe reader for Iceberg, where "Daft-native" refers to the desired end state of the implementation relying on no intermediate conversion through a Ray Dataset or any other intermediate format.

@pdames pdames self-assigned this Aug 2, 2023
@pdames pdames added this to the Compaction for Iceberg milestone Aug 2, 2023
@pdames pdames changed the title Daft Dataframe Reader for Iceberg [Iceberg] Daft Dataframe Reader Aug 16, 2023
@pdames pdames added the iceberg This issue is related to Apache Iceberg catalog support label Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
iceberg This issue is related to Apache Iceberg catalog support
Projects
None yet
Development

No branches or pull requests

1 participant