Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Loss when extracting data from Kinesis #116

Open
success-m opened this issue Apr 3, 2023 · 3 comments
Open

Data Loss when extracting data from Kinesis #116

success-m opened this issue Apr 3, 2023 · 3 comments

Comments

@success-m
Copy link

We were having issues with data loss. We logged the data sent to kinesis from our producers and compared the data in our sinks. We are 100% sure that the data was pushed to kinesis but the data for 1 minutes was lost. Any possible reason why this was the case.

PS: I know the active repo is https://github.com/roncemer/spark-sql-kinesis but I could not post issues in the repo

Please help

@success-m
Copy link
Author

@roncemer - Any idea why is this happening. My initial guess is the kinesis re-sharding. So I have added the option .option("kinesis.client.describeShardInterval", "500ms") but dont know if this will fix it

@roncemer
Copy link
Contributor

roncemer commented Apr 4, 2023

@success-m I accidentally had issues disabled on my repo. I enabled that feature. If you have a change you'd like to submit, feel free to issue a pull request against https://github.com/roncemer/spark-sql-kinesis and I will merge it and drop a new release as soon as I can get to it.

I am currently not using this project for anything (I switched to using Kinesis Firehose Delivery Streams with AWS Lambda functions, as it's cheaper and doesn't require any explicit checkpointing mechanism), so if you're interested in taking over the project, I would be happy to add you as a maintainer and provide instructions for packaging and publishing updated versions.

@success-m
Copy link
Author

@roncemer - I don't have any changes that needs to be pushed yet.

But ya, please do add me in. I would like to contribute to the library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants