Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot restart the streaming job from where it stopped previously in the kinesis stream #101

Open
prashanthvg89 opened this issue May 3, 2021 · 1 comment

Comments

@prashanthvg89
Copy link

Hi,

I could not find a way to start the streaming job from where it left off previously in the kinesis stream even with using Spark streaming's "checkpointLocation" option.

TRIM_HORIZON starts all the way from the beginning and LATEST starts from the current timestamp. Let's say I stop (or restarts due to an error) the streaming job, how to restart from the offset where it left off? I believe this is possible with KCL 2.0 but "kinesis-sql" library only supports KCL 1.0. Is there a way to achieve this or not possible at all until we move to KCL 2.0?

Thanks,
Prashanth

@roncemer
Copy link
Contributor

I sent an email to the maintainer of this repo, and didn't get a response. For now, I've forked the project here https://github.com/roncemer/kinesis-spark-connector and updated it to build for Spark 3.2.1. Under Spark 3.2.1, it appears to be working correctly. I allowed it to go overnight with no new records being posted to the Kinesis data stream. When I started posting records again, the records arrived in Spark, and were processed.

I issued pull request https://github.com/qubole/kinesis-sql/pull/113/files from my forked repo back to the original qubole/kinesis-sql repo. Be sure to read the full description under the Comments tab as well.

I could use the help of anyone who might be interested in this project. Apparently, qubole/kinesis-sql is abandoned for about two years, and the main guy who was maintaining it doesn't respond to emails. If anyone can get in touch with someone who has control of this repo, please ask them to make me a committer. Barring that, I will have to publish the jar file from my forked repo as a maven resource. In any case, if I end up maintaining this project, either the original or forked repo, I will need volunteers to help handle the builds each time a new version of Spark is released.

In the mean time, feel free to build your own jar from my forked repo and run it under Spark 3.2.1. Also, let me know if it works under 3.3.x, or if you run into any other problems.

Thanks!
Ron Cemer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants