Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The load balancer IP addresses are not being release after the LB-type Services are deleted #835

Open
starbops opened this issue Apr 29, 2024 · 1 comment

Comments

@starbops
Copy link
Contributor

Describe the bug

In ARP mode, kube-vip does not remove the load balancer IP address from the node's network interface when the LB-type Service object is deleted.

To Reproduce

  1. Follow the document to prepare the RBAC and kube-vip DaemonSet manifests
  2. Create a K3s cluster consists of three nodes using k3sup
  3. Create a test Deployment called blog with nginx image
    kubectl create deployment blog --image=nginx --replicas=3
  4. Create a LB-type Service exposing port 80 of the blog Deployment
    kubectl expose deployment blog --port 80 --type=LoadBalancer --overrides='{"metadata": {"annotations": {"kube-vip.io/loadbalancerIPs": "192.168.100.201"}}}'
    The logs generated by the kube-vip leader Pod:
    $ kubectl -n kube-system logs kube-vip-ds-68ldv
    time="2024-04-29T08:28:59Z" level=info msg="Starting kube-vip.io [v0.8.0]"
    time="2024-04-29T08:28:59Z" level=info msg="namespace [kube-system], Mode: [ARP], Features(s): Control Plane:[true], Services:[true]"
    time="2024-04-29T08:28:59Z" level=info msg="prometheus HTTP server started"
    time="2024-04-29T08:28:59Z" level=info msg="Using node name [k3s-03]"
    time="2024-04-29T08:28:59Z" level=info msg="Starting Kube-vip Manager with the ARP engine"
    time="2024-04-29T08:28:59Z" level=info msg="beginning services leadership, namespace [kube-system], lock name [plndr-svcs-lock], id [k3s-03]"
    I0429 08:28:59.926436       1 leaderelection.go:250] attempting to acquire leader lease kube-system/plndr-svcs-lock...
    time="2024-04-29T08:28:59Z" level=info msg="Beginning cluster membership, namespace [kube-system], lock name [plndr-cp-lock], id [k3s-03]"
    I0429 08:28:59.926733       1 leaderelection.go:250] attempting to acquire leader lease kube-system/plndr-cp-lock...
    time="2024-04-29T08:28:59Z" level=info msg="Node [k3s-02] is assuming leadership of the cluster"
    time="2024-04-29T08:28:59Z" level=info msg="new leader elected: k3s-02"
    I0429 08:29:01.419796       1 leaderelection.go:260] successfully acquired lease kube-system/plndr-cp-lock
    time="2024-04-29T08:29:01Z" level=info msg="Node [k3s-03] is assuming leadership of the cluster"
    time="2024-04-29T08:29:01Z" level=info msg="Gratuitous Arp broadcast will repeat every 3 seconds for [192.168.100.100/enp1s0]"
    I0429 08:29:05.139854       1 leaderelection.go:260] successfully acquired lease kube-system/plndr-svcs-lock
    time="2024-04-29T08:29:05Z" level=info msg="(svcs) starting services watcher for all namespaces"
    time="2024-04-29T08:31:25Z" level=info msg="(svcs) adding VIP [192.168.100.201] via enp1s0 for [default/blog]"
    time="2024-04-29T08:31:25Z" level=info msg="[service] synchronised in 15ms"
    time="2024-04-29T08:31:25Z" level=warning msg="(svcs) already found existing address [192.168.100.201] on adapter [enp1s0]"
    time="2024-04-29T08:31:28Z" level=warning msg="Re-applying the VIP configuration [192.168.100.201] to the interface [enp1s0]"
  5. Find out where the leader kube-vip Pod is and SSH into the node to check whether the load balancer IP address is actually assigned to the network interface that kube-vip is working on
    $ ip addr show enp1s0
    2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
        link/ether 96:27:05:1e:11:d0 brd ff:ff:ff:ff:ff:ff
        inet 192.168.100.140/24 brd 192.168.100.255 scope global dynamic enp1s0
           valid_lft 31355738sec preferred_lft 31355738sec
        inet 192.168.100.100/32 scope global enp1s0
           valid_lft forever preferred_lft forever
        inet 192.168.100.201/32 scope global enp1s0
           valid_lft forever preferred_lft forever
        inet6 fe80::9427:5ff:fe1e:11d0/64 scope link 
           valid_lft forever preferred_lft forever
  6. Delete the LB-type Service
    kubectl delete svc blog
    There's only one line of log generated by the kube-vip leader Pod:
    time="2024-04-29T08:34:46Z" level=info msg="(svcs) [default/blog] has been deleted"
    
  7. Go back to the node and see the IP address is still there
    $ ip addr show enp1s0
    2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
        link/ether 96:27:05:1e:11:d0 brd ff:ff:ff:ff:ff:ff
        inet 192.168.100.140/24 brd 192.168.100.255 scope global dynamic enp1s0
           valid_lft 31355589sec preferred_lft 31355589sec
        inet 192.168.100.100/32 scope global enp1s0
           valid_lft forever preferred_lft forever
        inet 192.168.100.201/32 scope global enp1s0
           valid_lft forever preferred_lft forever
        inet6 fe80::9427:5ff:fe1e:11d0/64 scope link 
           valid_lft forever preferred_lft forever

Expected behavior

The load balancer IP address should be removed from the network interface of the node after the corresponding LB-type Service object is deleted.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS/Distro: Ubuntu 20.04
  • Kubernetes Version: K3s v1.29.3+k3s1
    $ kubectl get nodes -o wide
    NAME     STATUS   ROLES                       AGE     VERSION        INTERNAL-IP       EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
    k3s-01   Ready    control-plane,etcd,master   5h48m   v1.29.3+k3s1   192.168.100.117   <none>        Ubuntu 20.04.4 LTS   5.4.0-121-generic   containerd://1.7.11-k3s2
    k3s-02   Ready    control-plane,etcd,master   5h46m   v1.29.3+k3s1   192.168.100.105   <none>        Ubuntu 20.04.4 LTS   5.4.0-121-generic   containerd://1.7.11-k3s2
    k3s-03   Ready    control-plane,etcd,master   5h46m   v1.29.3+k3s1   192.168.100.140   <none>        Ubuntu 20.04.4 LTS   5.4.0-121-generic   containerd://1.7.11-k3s2
  • Kube-vip Version: v0.8.0

Kube-vip.yaml:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    deprecated.daemonset.template.generation: "1"
    objectset.rio.cattle.io/applied: H4sIAAAAAAAA/7xVUW8iRwz+K5WfF44lCaEr9QE1qRS1RdElah8ihMyMgSmzM1OPdxMU7X+vZuHIQo5c24d7QPKMP39re+yPV8Bg/iCOxjsoAEOIn+ocMtgYp6GAG6TSuwcSyKAkQY2CULwCOucFxXgX09Ev/iIlkaTPxvcViljqG//JJA7Izvr9syPureoNFLC5iB1PnWc//Gqc/mmitXffpHBYUuKoFtSrTejxAtW/CooB1SEybqNQCU0Giqmt7tGUFAXLAIWrrM3A4oJsWzOG0E9R7EgoJtbTJHSE7Cuw+tDuetAf9wcf5LnGuIYCFmPE0ZjoajQe5otrxCt9OSTC4ej66hoXo7HOry+0VinzrybxQaUxkErlRLKkxHOySxS1/u2/Vdo0GQiVwaJQy9GZlu/azaZTFC6XxhnZJtt5TZPOmenvyjDpm4qNWz2oNenKGre6Wzl/uL59IVVJy79jeNi36ZG4jFA87Zt1+xKYYtztw9MrbGgLRRvQY2/pJOcSoxCndw/E2DYdbl9MlAjNrMn+F6fyTtjbXrDo6Bz1rEndSVA0jnjHi7xKBpTocEUMswzI1a1r/wa1CXPkABnUaKt0I1wRpEz3iOBZOu7R5eVF150IUuLtcQ/7hX2Zuro0ZPVnWh7se5Q09OkN+ylomoJS4kd0xgnxMo3022fJhTwOTj+sjOYO6GLYBWgX56XXXZal4ShdjApzcriw9EEDVJi/Ldkb7HjZDuhYq29TJpAljEddS722TnMv1ir2rFeb03ItoSZux9S0ynmOfw8+w6/COfZIumI8Ib86BTI5etaE2hrXZX83F0zC20BsvO7A8i4MtU6b0HX/OOzno3E/HwzS72gU2Zcka6riPBLX1H37YpjnQ2hmGZgSV+lmtVac1ueLzByM4iDOLfS+svbeW6PSDt4tp17umSI5gXeSCxkwRV+xoiRtSY5IVWxk+7N3Qi/SSiIGXBhrxNBO/7ROOzi9fZxPbn6/m0LW2p8nf0Ja2lkGax9lSvLseQNFesvEy7VRNFHKV06m79IQb4m//Es/vQItl6QECpj6vd6dUYrsCLsTwbOi0mRQBY1CD8IotErq2jTNPwEAAP//uInBLlsIAAA
    objectset.rio.cattle.io/id: ""
    objectset.rio.cattle.io/owner-gvk: k3s.cattle.io/v1, Kind=Addon
    objectset.rio.cattle.io/owner-name: kube-vip-rbac
    objectset.rio.cattle.io/owner-namespace: kube-system
  creationTimestamp: "2024-04-29T02:49:32Z"
  generation: 1
  labels:
    app.kubernetes.io/name: kube-vip-ds
    app.kubernetes.io/version: v0.8.0
    objectset.rio.cattle.io/hash: b8aa68ee56821b7aa5d42eea26757ab68d173ddc
  name: kube-vip-ds
  namespace: kube-system
  resourceVersion: "8708"
  uid: 47ef7952-1d6a-49e7-a9d7-24ba3654246b
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-vip-ds
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/name: kube-vip-ds
        app.kubernetes.io/version: v0.8.0
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-role.kubernetes.io/master
                operator: Exists
            - matchExpressions:
              - key: node-role.kubernetes.io/control-plane
                operator: Exists
      containers:
      - args:
        - manager
        env:
        - name: vip_arp
          value: "true"
        - name: port
          value: "6443"
        - name: vip_nodename
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: vip_interface
          value: enp1s0
        - name: vip_cidr
          value: "32"
        - name: dns_mode
          value: first
        - name: cp_enable
          value: "true"
        - name: cp_namespace
          value: kube-system
        - name: svc_enable
          value: "true"
        - name: svc_leasename
          value: plndr-svcs-lock
        - name: vip_leaderelection
          value: "true"
        - name: vip_leasename
          value: plndr-cp-lock
        - name: vip_leaseduration
          value: "5"
        - name: vip_renewdeadline
          value: "3"
        - name: vip_retryperiod
          value: "1"
        - name: address
          value: 192.168.100.100
        - name: prometheus_server
          value: :2112
        image: ghcr.io/kube-vip/kube-vip-iptables:v0.8.0
        imagePullPolicy: IfNotPresent
        name: kube-vip
        resources: {}
        securityContext:
          capabilities:
            add:
            - NET_ADMIN
            - NET_RAW
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      hostNetwork: true
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: kube-vip
      serviceAccountName: kube-vip
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        operator: Exists
      - effect: NoExecute
        operator: Exists
  updateStrategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate
status:
  currentNumberScheduled: 3
  desiredNumberScheduled: 3
  numberAvailable: 3
  numberMisscheduled: 0
  numberReady: 3
  observedGeneration: 1
  updatedNumberScheduled: 3

Additional context

This was first found in a Harvester deployment (RKE2-based cluster) validating the load balancer functionality when upgrading kube-vip from v0.6.0 to v0.8.0 (details were recorded in harvester/harvester#5682), but the same issue is also spotted on a newly created K3s cluster, as described in the reproduction section. I hope this will help reduce the noise.

@starbops
Copy link
Contributor Author

I enabled the debug level and saw the same Service object was reconciled again and again.

time="2024-04-30T06:43:26Z" level=debug msg="(svcs) [blog] has been added/modified with addresses [[192.168.100.201]]"
time="2024-04-30T06:43:26Z" level=debug msg="[STARTING] Service Sync"
time="2024-04-30T06:43:26Z" level=debug msg="init enable service security: false"
time="2024-04-30T06:43:26Z" level=info msg="(svcs) adding VIP [192.168.100.201] via enp1s0 for [default/blog]"
time="2024-04-30T06:43:26Z" level=debug msg="(svcs) will update [default/blog]"
time="2024-04-30T06:43:26Z" level=debug msg="(svcs) broadcasting ARP update for 192.168.100.201 via enp1s0, every 3000ms"
time="2024-04-30T06:43:26Z" level=info msg="[service] synchronised in 15ms"
time="2024-04-30T06:43:26Z" level=warning msg="(svcs) already found existing address [192.168.100.201] on adapter [enp1s0]"
time="2024-04-30T06:43:26Z" level=debug msg="(svcs) [blog] has been added/modified with addresses [[192.168.100.201]]"
time="2024-04-30T06:43:26Z" level=debug msg="[STARTING] Service Sync"
time="2024-04-30T06:43:26Z" level=debug msg="isDHCP: false, newServiceAddress: 192.168.100.201"
time="2024-04-30T06:43:26Z" level=debug msg="(svcs) [blog] has been added/modified with addresses [[192.168.100.201]]"
time="2024-04-30T06:43:26Z" level=debug msg="[STARTING] Service Sync"
time="2024-04-30T06:43:26Z" level=debug msg="isDHCP: false, newServiceAddress: 192.168.100.201"
time="2024-04-30T06:43:29Z" level=warning msg="Re-applying the VIP configuration [192.168.100.201] to the interface [enp1s0]"

It seems that the Service's UID was not put into the activeSerivce map, which is why the peculiar behavior occurred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant