Enabling Calico API Server on Kubespray 2.24 causes Kubernetes cascade delete to fail #10949

bjetal · 2024-02-23T15:02:35Z

What happened?

Installed Kubernetes including Calico and the Calico API Server using kubespray. When completed created simple deployment and then deleted it: kubectl create deployment nginx --image=nginx kubectl delete deployments.apps nginx

Replicaset and pods for deployment were not deleted:```
[tpx-admin@sjc-rayo-1 ~]$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/nginx-7854ff8877-rg4x7 0/1 ContainerCreating 0 18s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.233.0.1 443/TCP 6m27s

NAME DESIRED CURRENT READY AGE
replicaset.apps/nginx-7854ff8877 1 1 0 18s


### What did you expect to happen?

Kuberenetes cascade delete would delete the replicaset and pods.

### How can we reproduce it (as minimally and precisely as possible)?

Install Kubernetes using Kubespray 2.24 with the variable `calico_apiserver_enabled` set to true.

### OS

Linux 4.18.0-513.9.1.el8_9.x86_64 x86_64
NAME="Red Hat Enterprise Linux"
VERSION="8.9 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.9"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.9 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.9
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.9"
VOCERA_PLATFORM_VERSION="7.1.0.14"

### Version of Ansible

ansible [core 2.15.9]
  config file = /Users/robert.mitchell/workspaces/vocera-new/kubernetes-deploy/ansible.cfg
  configured module search path = ['/Users/robert.mitchell/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /Users/robert.mitchell/.pyenv/versions/3.10.9/lib/python3.10/site-packages/ansible
  ansible collection location = /Users/robert.mitchell/.ansible/collections:/usr/share/ansible/collections
  executable location = /Users/robert.mitchell/.pyenv/versions/3.10.9/bin/ansible
  python version = 3.10.9 (main, Jan 12 2023, 14:23:36) [Clang 14.0.0 (clang-1400.0.29.202)] (/Users/robert.mitchell/.pyenv/versions/3.10.9/bin/python3.10)
  jinja version = 3.1.2
  libyaml = True

### Version of Python

Python 3.10.9

### Version of Kubespray (commit)

64447e745

### Network plugin used

calico

### Full inventory with variables

```all:
  hosts:
    sjc-rayo-1:
      ansible_host: sjc-rayo-1.vcraeng.com
  vars:
    ansible_user: tpx-admin
  children:
    kube_control_plane:
      hosts:
        sjc-rayo-1:
    etcd:
      children:
        kube_control_plane:
    kube_node:
      children:
        kube_control_plane:
    k8s_cluster:
      children:
        kube_control_plane:
        kube_node:
      vars:
        calico_apiserver_enabled: true

Command used to invoke ansible

ansible-playbook cluster.yml -i inventory-simple -b

Output of ansible run

https://gist.github.com/bjetal/bbfcc8f5fd2dd301aea9d2a34c99db6c

Anything else we need to know

This can be fixed by updating the calico-crds ClusterRole to add the resource bgpfilters to the list of resources in the crd.calicoproject.org namespace. This would need to be done in roles/network_plugins/calico/templates/calico-apiserver.yml.j2. This was suggested by comparing the manifest file in Kubespray to Calico's equivalent manifest file: https://raw.githubusercontent.com/projectcalico/calico/v3.26.4/manifests/apiserver.yaml.

A workaround is to edit the cluster role and add the resource manually. This will also cause any garbage collection delayed by the issue to also take place within a few seconds with no further intervention.

Best guess is this is triggered by a change in Calico v3.26.0. It is probably closely related to Calico bug projectcalico/calico#7598 (which was caused by a similar bug in the Tigera Operator).

This type of thing appears to be a good argument for moving to the Tigera Operator to install Calico, which takes care of a lot of these details for you.

The text was updated successfully, but these errors were encountered:

bjetal added the kind/bug Categorizes issue or PR as related to a bug. label Feb 23, 2024

RaSerge mentioned this issue Apr 15, 2024

fix: updating the calico-crds #11089

Merged

k8s-ci-robot closed this as completed in #11089 Apr 30, 2024

hadi2f244 mentioned this issue May 15, 2024

calico-apiserver ServiceAccount is used by other services unexpectedly projectcalico/calico#8824

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling Calico API Server on Kubespray 2.24 causes Kubernetes cascade delete to fail #10949

Enabling Calico API Server on Kubespray 2.24 causes Kubernetes cascade delete to fail #10949

bjetal commented Feb 23, 2024

Enabling Calico API Server on Kubespray 2.24 causes Kubernetes cascade delete to fail #10949

Enabling Calico API Server on Kubespray 2.24 causes Kubernetes cascade delete to fail #10949

Comments

bjetal commented Feb 23, 2024

What happened?

Command used to invoke ansible

Output of ansible run

Anything else we need to know