-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encryption/Decryption mechanism for awsSecretKey and awsAccessKeyId #31
Comments
Thanks @ANUJABANTHIYA for raising this. This is a critical issue. But instead of encrypting/decrypting the credentials, shall we drop it altogether. There are other ways to provide AWS keys - https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html We can use DefaultAWSCredentialsProviderChain which can pick credentials from Environment variables or from default credential profiles file Will you be interested in making this change? |
Thanks @itsvikramagr for reply. I understand we can use DefaultAWSCredentialsProviderChain for credentials. But my current use case is to use BasicAWSCredentials. Thanks, |
Hi @itsvikramagr
I'm using qubole connector for reading from amazonkinesis streams in spark structured streaming mode.
If i change the log level from INFO to DEBUG in log4j.properties, i see the physical plan getting dump in target/unit-tests.log. Physical plan contains sensitive information awsSecretKey and awsAccessKeyId . It the security wise concern as is it coming as plain text.
Snippet of Physical plan :
`
18/10/25 20:04:05.023 main TRACE BaseSessionStateBuilder$$anon$1:
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences ===
!'Project [unresolvedalias(cast('approximateArrivalTimestamp as timestamp), None)] 'Project [unresolvedalias(cast(approximateArrivalTimestamp#4 as timestamp), None)]
+- StreamingRelation DataSource(org.apache.spark.sql.SparkSession@4d0b0fd4,kinesis,List(),None,List(),None,Map(awsAccessKeyId -> AKIAF6LDVCA3FMAD5FCV, endpointUrl -> kinesis.us-west-1.amazonaws.com, awsSecretKey -> 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, startingposition -> LATEST, streamName -> bdstest),None), kinesis, [data#0, streamName#1, partitionKey#2, sequenceNumber#3, approximateArrivalTimestamp#4] +- StreamingRelation DataSource(org.apache.spark.sql.SparkSession@4d0b0fd4,kinesis,List(),None,List(),None,Map(awsAccessKeyId -> AKIAF6LDVCA3FMAD5FCV, endpointUrl -> kinesis.us-west-1.amazonaws.com, awsSecretKey -> 80vlwE5oxcvar9XPlFr1hooYuG9At47nB/ujbgKi8, startingposition -> LATEST, streamName -> bdstest),None), kinesis, [data#0, streamName#1, partitionKey#2, sequenceNumber#3, approximateArrivalTimestamp#4]
18/10/25 20:04:05.031 main TRACE BaseSessionStateBuilder$$anon$1:
`
Drivers pass the awsSecretKey and awsSecretKeyId in plain text to spark, so it is getting dump in physical plan of spark logs. Ideally drivers should take care of encrypting and decrypting the passwords, which is not done .
Can encryption/decryption mechanism be added for awsSecretKey and awsAccessKeyId before passing to Spark ?
Thanks,
Anuja
The text was updated successfully, but these errors were encountered: