Support complex datatypes in Comet Scan #434

mattwparas · 2024-05-15T17:29:33Z

What is the problem the feature request solves?

As of right now, only primitives are supported for parquet scan, and if any non primitives are detected in a sink node, comet will bail out of performing any transformations.

It would be great if Comet were able to handle relatively simple complex data types, like those supported for shuffle found here. Nested structs or maps from primitives to structs would also be helpful, but I'm not sure on the relative complexity past flat complex types.

Even more complex data types past this would also be helpful, but at a minimum supporting these would enable comet to perform optimizations on the current set of spark jobs that I'm working with.

Describe the potential solution

Comet is able to lower spark operations to native operations when the schema contains complex data types. As a start, relatively complex data types such as those supported for shuffle would be great. This includes arrays of primitives, maps with primitives, and structs with primitives.

Additional context

To help guide the implementation, knowing what the difference is between a type being supported in parquet scan versus within shuffle would be helpful - at least understanding why certain types can be used in different operations at a high level.

viirya · 2024-05-15T17:48:30Z

Actually Comet columnar shuffle already supports some complex data types. You can find some tests using complex types in Comet shuffle test suites.

But Comet scan operator doesn't support complex types now. So you cannot read data of complex types from Parquet and do native operations on it. I think currently we also don't add any native expression which can produce output of complex types.

mattwparas added the enhancement New feature or request label May 15, 2024

viirya changed the title ~~Support complex datatypes~~ Support complex datatypes in Comet Scan May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support complex datatypes in Comet Scan #434

Support complex datatypes in Comet Scan #434

mattwparas commented May 15, 2024

viirya commented May 15, 2024

Support complex datatypes in Comet Scan #434

Support complex datatypes in Comet Scan #434

Comments

mattwparas commented May 15, 2024

What is the problem the feature request solves?

Describe the potential solution

Additional context

viirya commented May 15, 2024