K8SSAND-559 ⁃ PVC can be deleted mistakenly when reading stale deletionTimestamp information #118

jdonenine · 2021-06-08T15:44:11Z

This issue was originally reported in datastax/cass-operator #412 by srteam2020

Description

We find that in a HA k8s cluster, cass-operator could mistakenly delete the PVC of the cassandra pod after cass-operator experiencing a restart. After diagnosis and inspection, we find it is caused by potential staleness in some of the apiserver.
More concretely, if cass-operator receives a stale update event from an apiserver saying "CassandraDatacenter has non-nil deletion timestamp", the controller will delete the PVCs returned by listPVCs, and listPVCs will list PVCs using the name(space) instead of the UID. Note that the stale event actually comes from the deletion of a previous CassandraDatacenter sharing the same name (but with a different UID) as the currently running one, so the PVC of the existing CassandraDatacenter will be listed and deleted mistakenly.
One potential approach to fix this is to label each PVC the UID of the CassandraDatacenter when creating them, and list the PVC using the UID of the CassandraDatacenter in listPVCs to ensure that we always delete the right PVC.

Reproduction

We list concrete reproduction steps in a HA cluster as below:

Create a CassandraDatacenter cdc. PVC will be created in the cluster.
Delete cdc. Apiserver1 will send the update events with a non-nil deletion timestamp to the controller and the controller will trigger deletePVCs() to delete related PVC. Meanwhile, apiserver2 is partitioned so its watch cache stops at the moment that cdc is tagged with a deletion timestamp.
Create the CassandraDatacenter with the same name cdc again. Now the CassandraDatacenter gets back with a different uid, so as its PVC. However, apiserver2 still holds the stale view that cdc has a non-nil deletion timestamp and is about to be deleted.
The controller restarts after a node failure and talks to the stale apiserver2. Reading the stale update events from apiserver2 that cdc has a deletion timestamp, the controller lists all the PVC belonging to the currently running cdc (as mentioned above) and deletes them all.

Fix

We are willing to help fix this bug by issuing a PR.
As mentioned above, the bug can be avoided by tagging each PVC with the UID of the CassandraDatacenter and listing PVC with UID. Each CassandraDatacenter will always have a different UID even with the same name. So in this case the PVCs belonging to the newly created CassandraDatacenter will not be deleted by the stale events of the old one.

┆Issue is synchronized with this Jira Bug by Unito
┆friendlyId: K8SSAND-559
┆priority: Medium

The text was updated successfully, but these errors were encountered:

srteam2020 · 2021-06-14T03:29:06Z

Hi @jdonenine I have issued a PR #122 (but it seems that there is some transient error in building?).

Besides the fix, I was asked about how to reproduce the bugs reliably in the original issue (in the old repo). Actually, we are recently building a tool https://github.com/sieve-project/sieve to reliably detect and reproduce bugs like the above one in various k8s controllers. This bug can be automatically reproduced by just a few commands using our tool. You can also use the tool to verify the patch.
To use the tool, please first run the environment check:

python3 check-env.py

If the checker passes, please build the Kubernetes and controller images using the commands below as our tool needs to set up a kind cluster to reproduce the bug:

python3 build.py -p kubernetes -m time-travel -d DOCKER_REPO_NAME
python3 build.py -p cass-operator -m time-travel -d DOCKER_REPO_NAME -s dbd4f7a10533bb2298aed0d40ea20bfd8c133da2

-d DOCKER_REPO_NAME should be the docker repo that you have write access to, -s dbd4f7a10533bb2298aed0d40ea20bfd8c133da2 is the commit where the bug is found.

Finally, run

python3 sieve.py -p cass-operator -t recreate -d DOCKER_REPO_NAME

and the bug is successfully reproduced if you can see the message below

[ERROR] persistentvolumeclaim TERMINATING inconsistency: 0 seen after learning run, but 1 seen after testing run

For more information please refer to https://github.com/sieve-project/sieve#sieve-testing-datacenter-infrastructures-using-partial-histories and https://github.com/sieve-project/sieve/blob/main/docs/reprod.md#k8ssandra-cass-operator-originally-datastax-cass-operator-412

Please let me know if you have interest in using or deploying our tool or encounter any problems when reproducing the bugs with the tool.

jdonenine added the bug Something isn't working label Jun 8, 2021

jdonenine mentioned this issue Jun 8, 2021

Fix-412: Check UID before deleting PVC datastax/cass-operator#416

Open

srteam2020 linked a pull request Jun 12, 2021 that will close this issue

Fix-118: Check UID before deleting PVC #122

Open

5 tasks

sync-by-unito bot changed the title ~~PVC can be deleted mistakenly when reading stale deletionTimestamp information~~ K8SSAND-559 ⁃ PVC can be deleted mistakenly when reading stale deletionTimestamp information Apr 4, 2022

adejanovski added the zh:Icebox Issues in the ZenHub pipeline 'Icebox' label Jul 26, 2022

adejanovski added the help-wanted Extra attention is needed label Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K8SSAND-559 ⁃ PVC can be deleted mistakenly when reading stale deletionTimestamp information #118

K8SSAND-559 ⁃ PVC can be deleted mistakenly when reading stale deletionTimestamp information #118

jdonenine commented Jun 8, 2021 •

edited by sync-by-unito bot

srteam2020 commented Jun 14, 2021

K8SSAND-559 ⁃ PVC can be deleted mistakenly when reading stale deletionTimestamp information #118

K8SSAND-559 ⁃ PVC can be deleted mistakenly when reading stale deletionTimestamp information #118

Comments

jdonenine commented Jun 8, 2021 • edited by sync-by-unito bot

Description

Reproduction

Fix

srteam2020 commented Jun 14, 2021

jdonenine commented Jun 8, 2021 •

edited by sync-by-unito bot