Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consumer requiring re-sharding of stream before starts consuming data #95

Open
marciodebarros opened this issue Nov 18, 2020 · 1 comment

Comments

@marciodebarros
Copy link

Hello I am implementing a consumer application with the kinesis-sql library on spark 3.0.and running into an the following issue:

  • Start my consumer and there is no data available.
  • Start the producer to push data to kinesis and I get a response code of 200 from kinesis. The consumer application completes normally without any errors.
  • Wait for a few minutes and no data shows up
  • I log into the AWS console, change the number of shards in the stream from 1 to 2 or 2 to 1.
  • At this point the consumer is still running.
  • Start a new session of the producer and push more data with response code of 200.
  • At this point only after a few seconds I start receiving the data in my consumer.
  • My spark stream configured with TRIM_HORIZON.

I checked my configuration with the examples available and could not find anything else that may be different. Has anyone run into a similar issue or have any suggestions on what else I could try to troubleshoot this issue ?

Thank you in advance.

—MD.

@roncemer
Copy link
Contributor

I sent an email to the maintainer of this repo, and didn't get a response. For now, I've forked the project here https://github.com/roncemer/kinesis-spark-connector and updated it to build for Spark 3.2.1. Under Spark 3.2.1, it appears to be working correctly. I allowed it to go overnight with no new records being posted to the Kinesis data stream. When I started posting records again, the records arrived in Spark, and were processed.

I issued pull request https://github.com/qubole/kinesis-sql/pull/113/files from my forked repo back to the original qubole/kinesis-sql repo. Be sure to read the full description under the Comments tab as well.

I could use the help of anyone who might be interested in this project. Apparently, qubole/kinesis-sql is abandoned for about two years, and the main guy who was maintaining it doesn't respond to emails. If anyone can get in touch with someone who has control of this repo, please ask them to make me a committer. Barring that, I will have to publish the jar file from my forked repo as a maven resource. In any case, if I end up maintaining this project, either the original or forked repo, I will need volunteers to help handle the builds each time a new version of Spark is released.

In the mean time, feel free to build your own jar from my forked repo and run it under Spark 3.2.1. Also, let me know if it works under 3.3.x, or if you run into any other problems.

Thanks!
Ron Cemer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants