-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support complex datatypes in Comet Scan #434
Labels
enhancement
New feature or request
Comments
Actually Comet columnar shuffle already supports some complex data types. You can find some tests using complex types in Comet shuffle test suites. But Comet scan operator doesn't support complex types now. So you cannot read data of complex types from Parquet and do native operations on it. I think currently we also don't add any native expression which can produce output of complex types. |
viirya
changed the title
Support complex datatypes
Support complex datatypes in Comet Scan
May 15, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What is the problem the feature request solves?
As of right now, only primitives are supported for parquet scan, and if any non primitives are detected in a sink node, comet will bail out of performing any transformations.
It would be great if Comet were able to handle relatively simple complex data types, like those supported for shuffle found here. Nested structs or maps from primitives to structs would also be helpful, but I'm not sure on the relative complexity past flat complex types.
Even more complex data types past this would also be helpful, but at a minimum supporting these would enable comet to perform optimizations on the current set of spark jobs that I'm working with.
Describe the potential solution
Comet is able to lower spark operations to native operations when the schema contains complex data types. As a start, relatively complex data types such as those supported for shuffle would be great. This includes arrays of primitives, maps with primitives, and structs with primitives.
Additional context
To help guide the implementation, knowing what the difference is between a type being supported in parquet scan versus within shuffle would be helpful - at least understanding why certain types can be used in different operations at a high level.
The text was updated successfully, but these errors were encountered: