-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong version of aws-java-sdk-bundle in sagemaker-spark 1.4.5 #149
Comments
|
jobvisser03
changed the title
Read using S3A doesn't work; SdkClientException: Unable to execute HTTP request: Certificate for ... doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]
Wrong version of aws-java-sdk-bundle in sagemaker-spark 1.4.5
Feb 13, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
System Information
Describe the problem
I just spend 3 days trying to fix this but to no avail. My setup on an AWS notebook instance:
jars:
aws-java-sdk-bundle-1.11.901.jar
aws-java-sdk-core-1.12.262.jar
aws-java-sdk-kms-1.12.262.jar
aws-java-sdk-s3-1.12.262.jar
aws-java-sdk-sagemaker-1.12.262.jar
aws-java-sdk-sagemakerruntime-1.12.262.jar
aws-java-sdk-sts-1.12.262.jar
hadoop-aws-3.3.1.jar
sagemaker-spark_2.12-spark_3.3.0-1.4.5.jar
Problem:
this is caused by a bug in the httpclient jar dependency of pyspark and is reported here: https://issues.apache.org/jira/browse/HADOOP-18159?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17554677#comment-17554677
Based on suggested workarounds in the article above I tried 4 things
aws-java-sdk-bundle
to version 1.12.262 like the other jars → didn’t workhttpclient
to version 4.5.10 → didn’t workaws-java-sdk
to disable SSL certificate checking (SSLPeerUnverifiedException on S3 actions aws-sdk-java-v2#1786 ) → didn’t work with "-Dcom.amazonaws.sdk.disableCertChecking=true"Minimal repo / logs
22/08/30 11:00:22 WARN FileStreamSink: Assume no metadata directory. Error while looking for metadata directory in the path: s3a://comp.data.sci.data.tst/some/folder/export_date=20220822. org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on s3a://comp.data.sci.data.tst/some/folder/export_date=20220822: com.amazonaws.SdkClientException: Unable to execute HTTP request: Certificate for <comp.data.sci.data.tst.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]: Unable to execute HTTP request: Certificate for <comp.data.sci.data.tst.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com] at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:208) at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:170) at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3351) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185) at org.apache.hadoop.fs.s3a.S3AFileSystem.isDirectory(S3AFileSystem.java:4277) at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:54) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210) at scala.Option.getOrElse(Option.scala:189)
Works:
df = spark.read.parquet("s3a://aws-bucket-with-dashes/file_0_1_0.snappy.parquet")
Doesn't work:
df = spark.read.parquet("s3a://aws.bucket.with.dots/file_0_1_0.snappy.parquet")
It's not possible to rename the bucket due to the many data consumers that depend on them.
The text was updated successfully, but these errors were encountered: