-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading large parquet file with parquet modular encryption fail #22703
Comments
I try to testing the file size, it seem like the threshold is 128MB (as my HDFS block size setup), it means that file <=128MB is fine, >128MB is fail. |
@shangxinli , please take a look |
Hi there, any update on this @tdcmeehan ?
Presto read a file in HDFS by creating multiple splits, this process divides the parquet file into multiple parts. If we enable the PME in file, each page become an undivided, because it need the whole data byte into to decrypt data. So I think there is something wrong with the split creating process. This config make the "hive files become non-splittable", so bypass this splitting process and make every thing work fine. |
I use presto to read Parquet file in HDFS. The parquet file has enable Parquet modular encryption.
Reading small file is fine, but while reading large file, it fail at the decrypt function.
Presto show error:
Query 20240509_030132_00001_r659k failed: GCM tag check failed
Your Environment
Expected Behavior
Data must be returned to client
Current Behavior
Fail while decrypt function
Possible Solution
TBD
Steps to Reproduce
.
Context
The text was updated successfully, but these errors were encountered: