Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support complex datatypes in Comet Scan #434

Open
mattwparas opened this issue May 15, 2024 · 1 comment
Open

Support complex datatypes in Comet Scan #434

mattwparas opened this issue May 15, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@mattwparas
Copy link

What is the problem the feature request solves?

As of right now, only primitives are supported for parquet scan, and if any non primitives are detected in a sink node, comet will bail out of performing any transformations.

It would be great if Comet were able to handle relatively simple complex data types, like those supported for shuffle found here. Nested structs or maps from primitives to structs would also be helpful, but I'm not sure on the relative complexity past flat complex types.

Even more complex data types past this would also be helpful, but at a minimum supporting these would enable comet to perform optimizations on the current set of spark jobs that I'm working with.

Describe the potential solution

Comet is able to lower spark operations to native operations when the schema contains complex data types. As a start, relatively complex data types such as those supported for shuffle would be great. This includes arrays of primitives, maps with primitives, and structs with primitives.

Additional context

To help guide the implementation, knowing what the difference is between a type being supported in parquet scan versus within shuffle would be helpful - at least understanding why certain types can be used in different operations at a high level.

@mattwparas mattwparas added the enhancement New feature or request label May 15, 2024
@viirya
Copy link
Member

viirya commented May 15, 2024

Actually Comet columnar shuffle already supports some complex data types. You can find some tests using complex types in Comet shuffle test suites.

But Comet scan operator doesn't support complex types now. So you cannot read data of complex types from Parquet and do native operations on it. I think currently we also don't add any native expression which can produce output of complex types.

@viirya viirya changed the title Support complex datatypes Support complex datatypes in Comet Scan May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants