Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encryption/Decryption mechanism for awsSecretKey and awsAccessKeyId #31

Open
ANUJABANTHIYA opened this issue Oct 25, 2018 · 2 comments
Open
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@ANUJABANTHIYA
Copy link

ANUJABANTHIYA commented Oct 25, 2018

Hi @itsvikramagr

I'm using qubole connector for reading from amazonkinesis streams in spark structured streaming mode.
If i change the log level from INFO to DEBUG in log4j.properties, i see the physical plan getting dump in target/unit-tests.log. Physical plan contains sensitive information awsSecretKey and awsAccessKeyId . It the security wise concern as is it coming as plain text.

Snippet of Physical plan :

`
18/10/25 20:04:05.023 main TRACE BaseSessionStateBuilder$$anon$1:
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences ===
!'Project [unresolvedalias(cast('approximateArrivalTimestamp as timestamp), None)] 'Project [unresolvedalias(cast(approximateArrivalTimestamp#4 as timestamp), None)]
+- StreamingRelation DataSource(org.apache.spark.sql.SparkSession@4d0b0fd4,kinesis,List(),None,List(),None,Map(awsAccessKeyId -> AKIAF6LDVCA3FMAD5FCV, endpointUrl -> kinesis.us-west-1.amazonaws.com, awsSecretKey -> 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, startingposition -> LATEST, streamName -> bdstest),None), kinesis, [data#0, streamName#1, partitionKey#2, sequenceNumber#3, approximateArrivalTimestamp#4] +- StreamingRelation DataSource(org.apache.spark.sql.SparkSession@4d0b0fd4,kinesis,List(),None,List(),None,Map(awsAccessKeyId -> AKIAF6LDVCA3FMAD5FCV, endpointUrl -> kinesis.us-west-1.amazonaws.com, awsSecretKey -> 80vlwE5oxcvar9XPlFr1hooYuG9At47nB/ujbgKi8, startingposition -> LATEST, streamName -> bdstest),None), kinesis, [data#0, streamName#1, partitionKey#2, sequenceNumber#3, approximateArrivalTimestamp#4]

18/10/25 20:04:05.031 main TRACE BaseSessionStateBuilder$$anon$1:
`

Drivers pass the awsSecretKey and awsSecretKeyId in plain text to spark, so it is getting dump in physical plan of spark logs. Ideally drivers should take care of encrypting and decrypting the passwords, which is not done .
Can encryption/decryption mechanism be added for awsSecretKey and awsAccessKeyId before passing to Spark ?

Thanks,
Anuja

@itsvikramagr
Copy link
Contributor

Thanks @ANUJABANTHIYA for raising this. This is a critical issue.

But instead of encrypting/decrypting the credentials, shall we drop it altogether. There are other ways to provide AWS keys - https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html

We can use DefaultAWSCredentialsProviderChain which can pick credentials from Environment variables or from default credential profiles file

Will you be interested in making this change?

@itsvikramagr itsvikramagr added the good first issue Good for newcomers label Oct 26, 2018
@ANUJABANTHIYA
Copy link
Author

ANUJABANTHIYA commented Oct 26, 2018

Thanks @itsvikramagr for reply.

I understand we can use DefaultAWSCredentialsProviderChain for credentials. But my current use case is to use BasicAWSCredentials.

Thanks,
Anuja

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants