[BUG] JedisCheckpointStore - Invalid partitionOwnership data from CheckpointStore #40176
Closed
3 tasks done
Labels
Client
This issue points to a problem in the data-plane of the library.
customer-reported
Issues that are reported by GitHub users external to the Azure organization.
Event Hubs
needs-team-attention
This issue needs attention from Azure service team or SDK team
question
The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Describe the bug
Currently I am working on a proof of concept for the Beta version of the Azure JedisCheckpointStore. I am following instructions as described in documents:
After following all the detailed instructions in above links, on first run, everything is correct - partition ownership along with corresponding checkpoints are recorded in the Azure Redis Cache. After the prototype is stopped, and then restarted, an error is experienced. Also the same error is seen when two instances of the client code (EventProcessorClient) are started to demonstrate load balancing. If checkpointing is disabled (i.e., no calls are made to eventContext.updateCheckpoint()), the error is not seen and load balancing works correctly.
The root cause issue appears to be with the JedisCheckpointStore claimOwnership() code, but I am happy to be proved wrong. The error surfaces in JedisCheckpointStore listOwnership() code on start up as the partitionOwnership object (retrieved from the Redis cache) is missing two pieces of information it expects (LastModifiedTime, ETag). The com.azure.messaging.eventhubs.PartitionBasedLoadBalancer code checks the partitionOwnership objects by running the isValid() method against them. This check throws the error when it discovers the attributes are missing.
Would be interested if others see this as a bug. Thanking you in advance.
Exception or Stack Trace
Error occurred in partition processor for partition NONE, java.lang.IllegalStateException: Invalid partitionOwnership data from CheckpointStore.
java.lang.IllegalStateException: Invalid partitionOwnership data from CheckpointStore
at com.azure.messaging.eventhubs.PartitionBasedLoadBalancer.lambda$loadBalance$6(PartitionBasedLoadBalancer.java:186)
at reactor.core.publisher.MonoRunnable.call(MonoRunnable.java:73)
at reactor.core.publisher.MonoRunnable.call(MonoRunnable.java:32)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:139)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1839)
at reactor.core.publisher.MonoZip$ZipCoordinator.signal(MonoZip.java:258)
at reactor.core.publisher.MonoZip$ZipInner.onNext(MonoZip.java:347)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1839)
at reactor.core.publisher.MonoCollectList$MonoCollectListSubscriber.onComplete(MonoCollectList.java:129)
at reactor.core.publisher.SerializedSubscriber.onComplete(SerializedSubscriber.java:146)
at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.onComplete(FluxTimeout.java:234)
at reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner.onComplete(MonoFlatMapMany.java:260)
at reactor.core.publisher.FluxIterable$IterableSubscription.fastPath(FluxIterable.java:424)
at reactor.core.publisher.FluxIterable$IterableSubscription.request(FluxIterable.java:291)
at reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain.onSubscribeInner(MonoFlatMapMany.java:150)
at reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner.onSubscribe(MonoFlatMapMany.java:245)
at reactor.core.publisher.FluxIterable.subscribe(FluxIterable.java:201)
at reactor.core.publisher.FluxIterable.subscribe(FluxIterable.java:83)
at reactor.core.publisher.Flux.subscribe(Flux.java:8642)
at reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain.onNext(MonoFlatMapMany.java:195)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1839)
at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:249)
at reactor.core.publisher.MonoPublishOn$PublishOnSubscriber.run(MonoPublishOn.java:181)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
To Reproduce
Steps to reproduce the behavior:
Code Snippet
Suspected error in code below. I have commented with 'Matt W' where I believe a change could be made.
Expected behavior
When the client code (EventProcessorClient) is started for a second time, after being stopped, the above error ("Invalid partitionOwnership data from CheckpointStore") should not be seen. The code snippet provided above shows (in comments) where I believe the code should be changed to incorporate the missing attributes lastModifiedTime and eTag when being serialized.
PartitionOwnership object missing expected attributes (lastModifiedTime, eTag):
key: nsclienteu81fxxxxxx.servicebus.windows.net/eh178d50xxxx/consumergroup-amp/5
value:
After fix, PartitionOwnership object containing expected attributes (lastModifiedTime, eTag):
key: nsclienteu81fxxxxxx.servicebus.windows.net/eh178d50xxxx/consumergroup-amp/5
value:
Screenshots
N/A
Setup:
Additional context
N/A
Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report
The text was updated successfully, but these errors were encountered: