vsphere-csi-node-xxxxx are in CrashLoopBackOff #2519

dattebayo6716 · 2023-11-27T19:28:09Z

/kind bug

What steps did you take and what happened:

Setup a kind boostrap-cluster to create a 1-control-plane-node and 3-worker node cluster on my vSphere account.
I am using Ubuntu 22.04 OVA by VMWare.
On K apply I can see the VMs being created on my vSphere account.
I installed Calico as instructed using these instructions: https://docs.tigera.io/calico/latest/getting-started/kubernetes/self-managed-onprem/onpremises (Because the machines don't have full internet access from the onpremise environment)

What I see on the provisioned cluster

Some calico pods are in pending state
Some coredns pods are in pending state
vsphere-csi-controller-manager pod is in pending state
vsphere-csi-node-xxxxx are in CrashLoopBackOff without much information
There is NO log of what error has occurred. I checked logs in CAPI and CAPV pods in the bootstrap cluster. There is NO error in the provisioned cluster's pods as well.

What did you expect to happen:
I expected to see a cluster with all pods running.

Anything else you would like to add:
Below are some of the K output for reference.

Here are some of the env variables I have

# VSPHERE_TEMPLATE: "ubuntu-2204-kube-v1.27.3"
# CONTROL_PLANE_ENDPOINT_IP: "10.63.32.100"
# VIP_NETWORK_INTERFACE: "ens192"
# VSPHERE_TLS_THUMBPRINT: ""
# EXP_CLUSTER_RESOURCE_SET: true  
# VSPHERE_SSH_AUTHORIZED_KEY: ""

# VSPHERE_STORAGE_POLICY: ""
# CPI_IMAGE_K8S_VERSION: "v1.27.3"

All bootstrap pods are running without errors.

ubuntu@frun10926:~/k8s$ kubectl get po -A -o wide
NAMESPACE                           NAME                                                             READY   STATUS    RESTARTS      AGE     IP            NODE                 NOMINATED NODE   READINESS GATES
capi-kubeadm-bootstrap-system       capi-kubeadm-bootstrap-controller-manager-557b778d6b-qpxn7       1/1     Running   1 (24h ago)   2d22h   10.244.0.9    kind-control-plane   <none>           <none>
capi-kubeadm-control-plane-system   capi-kubeadm-control-plane-controller-manager-55d8f6b576-8hl5r   1/1     Running   1 (24h ago)   2d22h   10.244.0.10   kind-control-plane   <none>           <none>
capi-system                         capi-controller-manager-685454967c-tnmcj                         1/1     Running   3 (24h ago)   2d22h   10.244.0.8    kind-control-plane   <none>           <none>
capv-system                         capv-controller-manager-84d85cdcbd-cb2wp                         1/1     Running   3 (24h ago)   2d22h   10.244.0.11   kind-control-plane   <none>           <none>
cert-manager                        cert-manager-75d57c8d4b-7j4tk                                    1/1     Running   1 (24h ago)   2d22h   10.244.0.6    kind-control-plane   <none>           <none>
cert-manager                        cert-manager-cainjector-69d6f4d488-rvp67                         1/1     Running   2 (24h ago)   2d22h   10.244.0.5    kind-control-plane   <none>           <none>
cert-manager                        cert-manager-webhook-869b6c65c4-h6xdt                            1/1     Running   0             2d22h   10.244.0.7    kind-control-plane   <none>           <none>
kube-system                         coredns-5d78c9869d-djj9s                                         1/1     Running   0             2d22h   10.244.0.4    kind-control-plane   <none>           <none>
kube-system                         coredns-5d78c9869d-vltjl                                         1/1     Running   0             2d22h   10.244.0.3    kind-control-plane   <none>           <none>
kube-system                         etcd-kind-control-plane                                          1/1     Running   0             2d22h   172.18.0.2    kind-control-plane   <none>           <none>
kube-system                         kindnet-zp6c5                                                    1/1     Running   1 (24h ago)   2d22h   172.18.0.2    kind-control-plane   <none>           <none>
kube-system                         kube-apiserver-kind-control-plane                                1/1     Running   1 (24h ago)   2d22h   172.18.0.2    kind-control-plane   <none>           <none>
kube-system                         kube-controller-manager-kind-control-plane                       1/1     Running   1 (24h ago)   2d22h   172.18.0.2    kind-control-plane   <none>           <none>
kube-system                         kube-proxy-t2g5b                                                 1/1     Running   0             2d22h   172.18.0.2    kind-control-plane   <none>           <none>
kube-system                         kube-scheduler-kind-control-plane                                1/1     Running   1 (24h ago)   2d22h   172.18.0.2    kind-control-plane   <none>           <none>
local-path-storage                  local-path-provisioner-6bc4bddd6b-rkwwm                          1/1     Running   0             2d22h   10.244.0.2    kind-control-plane   <none>           <none>

Here are the pods on the vSphere cluster that was provisioned using CAPI

ubuntu@frun10926:~/k8s$ kubectl get po -A --kubeconfig=mcluster.kubeconfig -o wide
NAMESPACE         NAME                                       READY   STATUS             RESTARTS          AGE     IP                NODE                        NOMINATED NODE   READINESS GATES
calico-system     calico-kube-controllers-5f9d445bb4-hp7rt   0/1     Pending            0                 2d20h   <none>            <none>                      <none>           <none>
calico-system     calico-node-6mrpv                          1/1     Running            0                 2d20h   10.63.32.83       mcluster-md-0-4kxmk-zplmd   <none>           <none>
calico-system     calico-node-dg42m                          1/1     Running            0                 2d20h   10.63.32.84       mcluster-klljm              <none>           <none>
calico-system     calico-node-f6n9r                          1/1     Running            0                 2d20h   10.63.32.81       mcluster-md-0-4kxmk-wfscb   <none>           <none>
calico-system     calico-node-gtxcg                          1/1     Running            0                 2d20h   10.63.32.82       mcluster-md-0-4kxmk-gbcjj   <none>           <none>
calico-system     calico-typha-5b866db66c-sdnpv              1/1     Running            0                 2d20h   10.63.32.81       mcluster-md-0-4kxmk-wfscb   <none>           <none>
calico-system     calico-typha-5b866db66c-trwlj              1/1     Running            0                 2d20h   10.63.32.82       mcluster-md-0-4kxmk-gbcjj   <none>           <none>
calico-system     csi-node-driver-drblt                      2/2     Running            0                 2d20h   192.168.232.193   mcluster-klljm              <none>           <none>
calico-system     csi-node-driver-pbhvm                      2/2     Running            0                 2d20h   192.168.68.65     mcluster-md-0-4kxmk-zplmd   <none>           <none>
calico-system     csi-node-driver-vflj4                      2/2     Running            0                 2d20h   192.168.141.66    mcluster-md-0-4kxmk-gbcjj   <none>           <none>
calico-system     csi-node-driver-wzmtr                      2/2     Running            0                 2d20h   192.168.83.65     mcluster-md-0-4kxmk-wfscb   <none>           <none>
kube-system       coredns-5d78c9869d-ckdjb                   0/1     Pending            0                 2d20h   <none>            <none>                      <none>           <none>
kube-system       coredns-5d78c9869d-vlpkw                   0/1     Pending            0                 2d20h   <none>            <none>                      <none>           <none>
kube-system       etcd-mcluster-klljm                        1/1     Running            0                 2d20h   10.63.32.84       mcluster-klljm              <none>           <none>
kube-system       kube-apiserver-mcluster-klljm              1/1     Running            0                 2d20h   10.63.32.84       mcluster-klljm              <none>           <none>
kube-system       kube-controller-manager-mcluster-klljm     1/1     Running            0                 2d20h   10.63.32.84       mcluster-klljm              <none>           <none>
kube-system       kube-proxy-7dxb2                           1/1     Running            0                 2d20h   10.63.32.82       mcluster-md-0-4kxmk-gbcjj   <none>           <none>
kube-system       kube-proxy-gsgzz                           1/1     Running            0                 2d20h   10.63.32.84       mcluster-klljm              <none>           <none>
kube-system       kube-proxy-mp98t                           1/1     Running            0                 2d20h   10.63.32.83       mcluster-md-0-4kxmk-zplmd   <none>           <none>
kube-system       kube-proxy-x97w4                           1/1     Running            0                 2d20h   10.63.32.81       mcluster-md-0-4kxmk-wfscb   <none>           <none>
kube-system       kube-scheduler-mcluster-klljm              1/1     Running            0                 2d20h   10.63.32.84       mcluster-klljm              <none>           <none>
kube-system       kube-vip-mcluster-klljm                    1/1     Running            0                 2d20h   10.63.32.84       mcluster-klljm              <none>           <none>
kube-system       vsphere-cloud-controller-manager-hzvzj     1/1     Running            0                 2d20h   10.63.32.84       mcluster-klljm              <none>           <none>
kube-system       vsphere-csi-controller-664c45f69b-6ddz4    0/5     Pending            0                 2d20h   <none>            <none>                      <none>           <none>
kube-system       vsphere-csi-node-dtvrg                     2/3     CrashLoopBackOff   809 (3m57s ago)   2d20h   192.168.141.65    mcluster-md-0-4kxmk-gbcjj   <none>           <none>
kube-system       vsphere-csi-node-jcpxj                     2/3     CrashLoopBackOff   810 (73s ago)     2d20h   192.168.232.194   mcluster-klljm              <none>           <none>
kube-system       vsphere-csi-node-lpjxj                     2/3     CrashLoopBackOff   809 (2m22s ago)   2d20h   192.168.83.66     mcluster-md-0-4kxmk-wfscb   <none>           <none>
kube-system       vsphere-csi-node-nkh6m                     2/3     CrashLoopBackOff   809 (3m35s ago)   2d20h   192.168.68.66     mcluster-md-0-4kxmk-zplmd   <none>           <none>
tigera-operator   tigera-operator-84cf9b6dbb-w6lkf           1/1     Running            0                 2d20h   10.63.32.83       mcluster-md-0-4kxmk-zplmd   <none>           <none>

Here is a sample `k describe` for `vsphere-csi-node-xxxx`

ubuntu@frun10926:~/k8s$ kubectl describe pod  vsphere-csi-node-dtvrg -n kube-system --kubeconfig=mcluster.kubeconfig
Name:             vsphere-csi-node-dtvrg
Namespace:        kube-system
Priority:         0
Service Account:  default
Node:             mcluster-md-0-4kxmk-gbcjj/10.63.32.82
Start Time:       Fri, 24 Nov 2023 19:14:52 +0000
Labels:           app=vsphere-csi-node
                  controller-revision-hash=69967bd89d
                  pod-template-generation=1
                  role=vsphere-csi
Annotations:      cni.projectcalico.org/containerID: 0e30215c3f275ce821e98584c24cd139273c8c061af590ef5ddeb915b421e6ec
                  cni.projectcalico.org/podIP: 192.168.141.65/32
                  cni.projectcalico.org/podIPs: 192.168.141.65/32
Status:           Running
IP:               192.168.141.65
IPs:
  IP:           192.168.141.65
Controlled By:  DaemonSet/vsphere-csi-node
Containers:
  node-driver-registrar:
    Container ID:  containerd://075a9e6aa183294562e6edfbd55577f8eeca891c19cb43603973a1057d2f8125
    Image:         quay.io/k8scsi/csi-node-driver-registrar:v2.0.1
    Image ID:      quay.io/k8scsi/csi-node-driver-registrar@sha256:a104f0f0ec5fdd007a4a85ffad95a93cfb73dd7e86296d3cc7846fde505248d3
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --csi-address=$(ADDRESS)
      --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
    State:          Running
      Started:      Fri, 24 Nov 2023 19:31:30 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      ADDRESS:               /csi/csi.sock
      DRIVER_REG_SOCK_PATH:  /var/lib/kubelet/plugins/csi.vsphere.vmware.com/csi.sock
    Mounts:
      /csi from plugin-dir (rw)
      /registration from registration-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-glb6m (ro)
  vsphere-csi-node:
    Container ID:   containerd://b8ec60cc34ad576e31564f0d993b2b50440f8de2753f744c545cb772407ee654
    Image:          gcr.io/cloud-provider-vsphere/csi/release/driver:v3.1.2
    Image ID:       gcr.io/cloud-provider-vsphere/csi/release/driver@sha256:471db9143b6daf2abdb656383f9d7ad34123a22c163c3f0e62dc8921048566bb
    Port:           9808/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 27 Nov 2023 15:56:46 +0000
      Finished:     Mon, 27 Nov 2023 15:56:46 +0000
    Ready:          False
    Restart Count:  807
    Liveness:       http-get http://:healthz/healthz delay=10s timeout=3s period=5s #success=1 #failure=3
    Environment:
      CSI_ENDPOINT:               unix:///csi/csi.sock
      X_CSI_MODE:                 node
      X_CSI_SPEC_REQ_VALIDATION:  false
      VSPHERE_CSI_CONFIG:         /etc/cloud/csi-vsphere.conf
      LOGGER_LEVEL:               PRODUCTION
      X_CSI_LOG_LEVEL:            INFO
      NODE_NAME:                   (v1:spec.nodeName)
    Mounts:
      /csi from plugin-dir (rw)
      /dev from device-dir (rw)
      /etc/cloud from vsphere-config-volume (rw)
      /var/lib/kubelet from pods-mount-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-glb6m (ro)
  liveness-probe:
    Container ID:  containerd://3ccf0d77472d57ac853a20305fd7862c97163b2509e40977cdc735e26b21665a
    Image:         quay.io/k8scsi/livenessprobe:v2.1.0
    Image ID:      quay.io/k8scsi/livenessprobe@sha256:04a9c4a49de1bd83d21e962122da2ac768f356119fb384660aa33d93183996c3
    Port:          <none>
    Host Port:     <none>
    Args:
      --csi-address=/csi/csi.sock
    State:          Running
      Started:      Fri, 24 Nov 2023 19:31:54 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /csi from plugin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-glb6m (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  vsphere-config-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  csi-vsphere-config
    Optional:    false
  registration-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins_registry
    HostPathType:  Directory
  plugin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins/csi.vsphere.vmware.com/
    HostPathType:  DirectoryOrCreate
  pods-mount-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet
    HostPathType:  Directory
  device-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:  
  kube-api-access-glb6m:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 :NoSchedule op=Exists
                             :NoExecute op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason            Age                      From     Message
  ----     ------            ----                     ----     -------
  Warning  DNSConfigForming  28s (x20490 over 2d20h)  kubelet  Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.242.46.35 10.242.46.36 10.250.46.36

Environment:

Cluster-api-provider-vsphere version: 1.5.3
Kubernetes version: (use kubectl version): 1.27.3
OS (e.g. from /etc/os-release): Ubuntu 22.04 OVA image that vSphere recommends (with no changes to the OVA).

The text was updated successfully, but these errors were encountered:

chrischdi · 2023-12-06T09:24:29Z

Could you take a look on why vsphere-csi-controller-664c45f69b-6ddz4 is in Pending? (via kubectl describe pod)?

If I got it right this pod needs to be up first so the daemonset pods can succeed.

Did you use the default templates provided by CAPV or did you manually deploy CSI?

dattebayo6716 · 2023-12-14T03:38:12Z

I posted the sample output from the kubectl describe <pod> above.

I used the default template and followed instructions from the quick-start page to generate cluster yaml file.
I am not using the yaml files from the templates folder.

chrischdi · 2023-12-14T08:02:10Z

So something prevents the vsphere-csi-controller from getting scheduled. There may be taints or something else why this happens.

You need to figure out why that is and then the daemonset pods should also get ready.

rvanderp3 · 2023-12-14T20:14:13Z

can you get the events from that namespace?

habibullinrsh · 2024-02-01T07:21:33Z

The csi-node-driver, which is installed using tigera-operator, conflicts with vsphere-csi-node. I couldn't disable the installation of csi-node-driver, so I use kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/calico.yaml

chrischdi · 2024-02-01T10:27:54Z

Would be interesting to figure out together with https://github.com/kubernetes/cloud-provider-vsphere where the gaps are that both can run at the same time. (for CSI we simply consume the above).

k8s-triage-robot · 2024-05-01T10:32:44Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-05-31T10:50:12Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 27, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 1, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vsphere-csi-node-xxxxx are in CrashLoopBackOff #2519

vsphere-csi-node-xxxxx are in CrashLoopBackOff #2519

dattebayo6716 commented Nov 27, 2023 •

edited

chrischdi commented Dec 6, 2023

dattebayo6716 commented Dec 14, 2023 •

edited

chrischdi commented Dec 14, 2023 •

edited

rvanderp3 commented Dec 14, 2023

habibullinrsh commented Feb 1, 2024 •

edited

chrischdi commented Feb 1, 2024

k8s-triage-robot commented May 1, 2024

k8s-triage-robot commented May 31, 2024

vsphere-csi-node-xxxxx are in CrashLoopBackOff #2519

vsphere-csi-node-xxxxx are in CrashLoopBackOff #2519

Comments

dattebayo6716 commented Nov 27, 2023 • edited

Here are some of the env variables I have

All bootstrap pods are running without errors.

Here are the pods on the vSphere cluster that was provisioned using CAPI

Here is a sample k describe for vsphere-csi-node-xxxx

chrischdi commented Dec 6, 2023

dattebayo6716 commented Dec 14, 2023 • edited

chrischdi commented Dec 14, 2023 • edited

rvanderp3 commented Dec 14, 2023

habibullinrsh commented Feb 1, 2024 • edited

chrischdi commented Feb 1, 2024

k8s-triage-robot commented May 1, 2024

k8s-triage-robot commented May 31, 2024

dattebayo6716 commented Nov 27, 2023 •

edited

Here is a sample `k describe` for `vsphere-csi-node-xxxx`

dattebayo6716 commented Dec 14, 2023 •

edited

chrischdi commented Dec 14, 2023 •

edited

habibullinrsh commented Feb 1, 2024 •

edited