Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling Calico API Server on Kubespray 2.24 causes Kubernetes cascade delete to fail #10949

Closed
bjetal opened this issue Feb 23, 2024 · 0 comments · Fixed by #11089
Closed

Enabling Calico API Server on Kubespray 2.24 causes Kubernetes cascade delete to fail #10949

bjetal opened this issue Feb 23, 2024 · 0 comments · Fixed by #11089
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@bjetal
Copy link

bjetal commented Feb 23, 2024

What happened?

Installed Kubernetes including Calico and the Calico API Server using kubespray. When completed created simple deployment and then deleted it: kubectl create deployment nginx --image=nginx kubectl delete deployments.apps nginx

Replicaset and pods for deployment were not deleted:```
[tpx-admin@sjc-rayo-1 ~]$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/nginx-7854ff8877-rg4x7 0/1 ContainerCreating 0 18s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.233.0.1 443/TCP 6m27s

NAME DESIRED CURRENT READY AGE
replicaset.apps/nginx-7854ff8877 1 1 0 18s


### What did you expect to happen?

Kuberenetes cascade delete would delete the replicaset and pods.

### How can we reproduce it (as minimally and precisely as possible)?

Install Kubernetes using Kubespray 2.24 with the variable `calico_apiserver_enabled` set to true.

### OS

Linux 4.18.0-513.9.1.el8_9.x86_64 x86_64
NAME="Red Hat Enterprise Linux"
VERSION="8.9 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.9"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.9 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.9
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.9"
VOCERA_PLATFORM_VERSION="7.1.0.14"

### Version of Ansible

ansible [core 2.15.9]
  config file = /Users/robert.mitchell/workspaces/vocera-new/kubernetes-deploy/ansible.cfg
  configured module search path = ['/Users/robert.mitchell/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /Users/robert.mitchell/.pyenv/versions/3.10.9/lib/python3.10/site-packages/ansible
  ansible collection location = /Users/robert.mitchell/.ansible/collections:/usr/share/ansible/collections
  executable location = /Users/robert.mitchell/.pyenv/versions/3.10.9/bin/ansible
  python version = 3.10.9 (main, Jan 12 2023, 14:23:36) [Clang 14.0.0 (clang-1400.0.29.202)] (/Users/robert.mitchell/.pyenv/versions/3.10.9/bin/python3.10)
  jinja version = 3.1.2
  libyaml = True

### Version of Python

Python 3.10.9

### Version of Kubespray (commit)

64447e745

### Network plugin used

calico

### Full inventory with variables

```all:
  hosts:
    sjc-rayo-1:
      ansible_host: sjc-rayo-1.vcraeng.com
  vars:
    ansible_user: tpx-admin
  children:
    kube_control_plane:
      hosts:
        sjc-rayo-1:
    etcd:
      children:
        kube_control_plane:
    kube_node:
      children:
        kube_control_plane:
    k8s_cluster:
      children:
        kube_control_plane:
        kube_node:
      vars:
        calico_apiserver_enabled: true

Command used to invoke ansible

ansible-playbook cluster.yml -i inventory-simple -b

Output of ansible run

https://gist.github.com/bjetal/bbfcc8f5fd2dd301aea9d2a34c99db6c

Anything else we need to know

This can be fixed by updating the calico-crds ClusterRole to add the resource bgpfilters to the list of resources in the crd.calicoproject.org namespace. This would need to be done in roles/network_plugins/calico/templates/calico-apiserver.yml.j2. This was suggested by comparing the manifest file in Kubespray to Calico's equivalent manifest file: https://raw.githubusercontent.com/projectcalico/calico/v3.26.4/manifests/apiserver.yaml.

A workaround is to edit the cluster role and add the resource manually. This will also cause any garbage collection delayed by the issue to also take place within a few seconds with no further intervention.

Best guess is this is triggered by a change in Calico v3.26.0. It is probably closely related to Calico bug projectcalico/calico#7598 (which was caused by a similar bug in the Tigera Operator).

This type of thing appears to be a good argument for moving to the Tigera Operator to install Calico, which takes care of a lot of these details for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant