You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The KafkaRoller detects stuck pods while rolling the Kafka cluster and does not seem to wait for them to get ready. It results in the following messages in the log:
2023-12-04 09:09:33 INFO ClusterOperator:142 - Triggering periodic reconciliation for namespace myproject
2023-12-04 09:09:33 INFO AbstractOperator:265 - Reconciliation #19(timer) Kafka(myproject/my-cluster): Kafka my-cluster will be checked for creation or modification
2023-12-04 09:09:33 INFO KafkaRoller:382 - Reconciliation #19(timer) Kafka(myproject/my-cluster): Could not verify pod my-cluster-controllers-2/2 is up-to-date, giving up after 10 attempts. Total delay between attempts 127750ms
io.strimzi.operator.cluster.operator.resource.KafkaRoller$FatalProblem: Pod is unschedulable or is not starting
at io.strimzi.operator.cluster.operator.resource.KafkaRoller.checkIfRestartOrReconfigureRequired(KafkaRoller.java:598) ~[io.strimzi.cluster-operator-0.39.0-SNAPSHOT.jar:0.39.0-SNAPSHOT]
at io.strimzi.operator.cluster.operator.resource.KafkaRoller.restartIfNecessary(KafkaRoller.java:462) ~[io.strimzi.cluster-operator-0.39.0-SNAPSHOT.jar:0.39.0-SNAPSHOT]
at io.strimzi.operator.cluster.operator.resource.KafkaRoller.lambda$schedule$7(KafkaRoller.java:376) ~[io.strimzi.cluster-operator-0.39.0-SNAPSHOT.jar:0.39.0-SNAPSHOT]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.lang.Thread.run(Thread.java:840) ~[?:?]
2023-12-04 09:09:33 ERROR AbstractOperator:284 - Reconciliation #19(timer) Kafka(myproject/my-cluster): createOrUpdate failed
io.strimzi.operator.cluster.operator.resource.KafkaRoller$FatalProblem: Pod is unschedulable or is not starting
at io.strimzi.operator.cluster.operator.resource.KafkaRoller.checkIfRestartOrReconfigureRequired(KafkaRoller.java:598) ~[io.strimzi.cluster-operator-0.39.0-SNAPSHOT.jar:0.39.0-SNAPSHOT]
at io.strimzi.operator.cluster.operator.resource.KafkaRoller.restartIfNecessary(KafkaRoller.java:462) ~[io.strimzi.cluster-operator-0.39.0-SNAPSHOT.jar:0.39.0-SNAPSHOT]
at io.strimzi.operator.cluster.operator.resource.KafkaRoller.lambda$schedule$7(KafkaRoller.java:376) ~[io.strimzi.cluster-operator-0.39.0-SNAPSHOT.jar:0.39.0-SNAPSHOT]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.lang.Thread.run(Thread.java:840) ~[?:?]
2023-12-04 09:09:33 WARN AbstractOperator:557 - Reconciliation #19(timer) Kafka(myproject/my-cluster): Failed to reconcile
io.strimzi.operator.cluster.operator.resource.KafkaRoller$FatalProblem: Pod is unschedulable or is not starting
at io.strimzi.operator.cluster.operator.resource.KafkaRoller.checkIfRestartOrReconfigureRequired(KafkaRoller.java:598) ~[io.strimzi.cluster-operator-0.39.0-SNAPSHOT.jar:0.39.0-SNAPSHOT]
at io.strimzi.operator.cluster.operator.resource.KafkaRoller.restartIfNecessary(KafkaRoller.java:462) ~[io.strimzi.cluster-operator-0.39.0-SNAPSHOT.jar:0.39.0-SNAPSHOT]
at io.strimzi.operator.cluster.operator.resource.KafkaRoller.lambda$schedule$7(KafkaRoller.java:376) ~[io.strimzi.cluster-operator-0.39.0-SNAPSHOT.jar:0.39.0-SNAPSHOT]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.lang.Thread.run(Thread.java:840) ~[?:?]
I'm not sure if not waiting for their readiness is intentional or not => it might have its reasons (and anyway, the next periodic reconciliation will check it again latest in few minutes, so it is not a problem per-se). But in any case, if you check the timestamps, it is clear that this message is misleading:
Could not verify pod my-cluster-controllers-2/2 is up-to-date, giving up after 10 attempts. Total delay between attempts 127750ms
Not sure if it tried something 10 times. But it did not wait for 127750ms as the whole reconciliation happened from the start till the end within 1 second. So we should try to fix the message to avoid misleading people when analyzing it.
Note: This seems to be a general issue that applies to controllers, brokers, mixed nodes and even in ZooKeeper-based clusters.
The text was updated successfully, but these errors were encountered:
The
KafkaRoller
detects stuck pods while rolling the Kafka cluster and does not seem to wait for them to get ready. It results in the following messages in the log:I'm not sure if not waiting for their readiness is intentional or not => it might have its reasons (and anyway, the next periodic reconciliation will check it again latest in few minutes, so it is not a problem per-se). But in any case, if you check the timestamps, it is clear that this message is misleading:
Not sure if it tried something 10 times. But it did not wait for 127750ms as the whole reconciliation happened from the start till the end within 1 second. So we should try to fix the message to avoid misleading people when analyzing it.
Note: This seems to be a general issue that applies to controllers, brokers, mixed nodes and even in ZooKeeper-based clusters.
The text was updated successfully, but these errors were encountered: