Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After canary upgrade, the datacenter will refuse to update StatefulSets #656

Closed
burmanm opened this issue May 16, 2024 · 0 comments · Fixed by #654
Closed

After canary upgrade, the datacenter will refuse to update StatefulSets #656

burmanm opened this issue May 16, 2024 · 0 comments · Fixed by #654
Assignees
Labels
bug Something isn't working done Issues in the state 'done'

Comments

@burmanm
Copy link
Contributor

burmanm commented May 16, 2024

What happened?

When trying to get all the nodes to the same state after doing a canary upgrade, we run into a situation where the check for StatefulSet being completed will always fail and the operator will output:

2024-05-16T05:41:28.555Z        INFO    waiting for upgrade to finish on statefulset    {"controller": "cassandradatacenter_controller", "controllerGroup": "cassandra.datastax.com", "controllerKind": "CassandraDatacenter", "CassandraDatacenter": {"name":"dc1","namespace":"test-canary-upgrade"}, "namespace": "test-canary-upgrade", "name": "dc1", "reconcileID": "56346c1d-4130-4020-b659-2f18f8ca4a0c", "namespace": "test-canary-upgrade", "datacenterName": "dc1", "clusterName": "cluster1", "statefulset": "cluster1-dc1-r1-sts", "replicas": 3, "readyReplicas": 3, "currentReplicas": 2, "updatedReplicas": 1}

There's no recovery from this, since what cass-operator expects is:

			if statefulSet.Generation != status.ObservedGeneration ||
				status.Replicas != status.ReadyReplicas ||
				status.Replicas != status.CurrentReplicas ||
				status.Replicas != status.UpdatedReplicas {

However, the StatefulSet itself will output the following status (in 1.28):

status:
  availableReplicas: 3
  collisionCount: 0
  currentReplicas: 2
  currentRevision: cluster1-dc1-r1-sts-59d575d967
  observedGeneration: 2
  readyReplicas: 3
  replicas: 3
  updateRevision: cluster1-dc1-r1-sts-5fc6f6758f
  updatedReplicas: 1

This is the final and correct state. Sadly, there's no way to tell from the status if the canary upgrade was actually reached correctly.

What did you expect to happen?

We should be able to continue the upgrade by simply removing the canary upgrade requirements.

How can we reproduce it (as minimally and precisely as possible)?

See the test in #654

cass-operator version

1.20

Kubernetes version

1.28

Method of installation

No response

Anything else we need to know?

No response

@burmanm burmanm added the bug Something isn't working label May 16, 2024
@burmanm burmanm self-assigned this May 16, 2024
@adejanovski adejanovski added the in-progress Issues in the state 'in-progress' label May 16, 2024
@adejanovski adejanovski added ready-for-review Issues in the state 'ready-for-review' and removed in-progress Issues in the state 'in-progress' labels May 17, 2024
@adejanovski adejanovski added done Issues in the state 'done' and removed ready-for-review Issues in the state 'ready-for-review' labels May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working done Issues in the state 'done'
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants