[Iceberg] Daft Dataframe Reader #170

pdames · 2023-08-02T17:00:25Z

To help improve the end-to-end efficiency of compaction and similar operations (e.g. hash-equality joins, dataset filters, hash partitioning, etc.) and based on current benchmarks showing Daft I/O to be more performance that PyArrow and S3FS for S3 Parquet file reads, we would like to add an Daft-native Dataframe reader for Iceberg, where "Daft-native" refers to the desired end state of the implementation relying on no intermediate conversion through a Ray Dataset or any other intermediate format.

pdames self-assigned this Aug 2, 2023

pdames added this to the Compaction for Iceberg milestone Aug 2, 2023

pdames changed the title ~~Daft Dataframe Reader for Iceberg~~ [Iceberg] Daft Dataframe Reader Aug 16, 2023

pdames mentioned this issue Aug 16, 2023

[Iceberg] AWS Glue Job Runner #190

Open

pdames added the iceberg This issue is related to Apache Iceberg catalog support label Aug 16, 2023

raghumdani mentioned this issue Feb 22, 2024

Daft reader for DeltaCAT catalogs in general #265

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Iceberg] Daft Dataframe Reader #170

[Iceberg] Daft Dataframe Reader #170

pdames commented Aug 2, 2023 •

edited

[Iceberg] Daft Dataframe Reader #170

[Iceberg] Daft Dataframe Reader #170

Comments

pdames commented Aug 2, 2023 • edited

pdames commented Aug 2, 2023 •

edited