Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kube-VIP ARPs conflicting VIP information #788

Open
dacjames opened this issue Mar 11, 2024 · 4 comments
Open

Kube-VIP ARPs conflicting VIP information #788

dacjames opened this issue Mar 11, 2024 · 4 comments

Comments

@dacjames
Copy link

Describe the bug
We are using kube-vip to advertise LoadBalancer VIPs using ARP. Normally, this works fine. In at least two cases, we have observed duplicate conflicting ARP advertisements happening simultaneously.

To Reproduce
The issue has been difficult to reproduce in isolation.

The cluster will run happily for a while and then seemingly at random we'll see issues resulting from the conflicting ARPs.

20:28:45.815587 ARP, Reply 10.240.17.183 is-at fa:c3:09:08:be:14, length 28
20:28:47.130643 ARP, Reply 10.240.17.183 is-at 42:92:ad:23:12:3z, length 46

When this occurs, the instance sending out ARP appears not to know that it a leader for the other IPs, because when shut down (by killing the pod), it does not report cleaning up the IP in question.

Shutting down the bad node will cause ARPs to stop but will not clear out the local IP address on the interface.

It appears that the ARP broadcasting goroutine is somehow running in the background while kube-vip is in this state. An investigation of the code made me suspicious of this line: https://github.com/kube-vip/kube-vip/blob/main/pkg/cluster/clusterLeaderElection.go#L116 not calling arpCancel(). But, of course, that is just a wild guess.

Expected behavior
All VIPs should only ever be advertised from one leader at a time. If an instance is not leader for a service, it should not send ARPs for that service.

Environment (please complete the following information):

  • OS/Distro: SLE Micro (based on SLES 15.5)
  • Kubernetes Version: v1.26 (rke2, rancher)
  • Kube-vip Version: 0.6.2

Kube-vip.yaml:
I think this is the relevant config. The only non-obvious variable here is the vip_interface, which is set to eth1 in this instance.

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    app.kubernetes.io/name: kube-vip-ds
    app.kubernetes.io/version: ${daemonset_version}
  name: kube-vip-ds
  namespace: ${app_namespace}
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-vip-ds
  template:
    metadata:
      labels:
        app.kubernetes.io/name: kube-vip-ds
        app.kubernetes.io/version: ${daemonset_version}
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: node-role.kubernetes.io/master
                    operator: Exists
              - matchExpressions:
                  - key: node-role.kubernetes.io/control-plane
                    operator: Exists
      containers:
        - args:
            - manager
          env:
            - name: vip_arp
              value: 'true'
            - name: port
              value: '6443'
            - name: vip_interface
              value: '${vip_interface}'
            - name: vip_servicesinterface
              value: '${vip_interface}'
            - name: vip_cidr
              value: '32'
            - name: svc_enable
              value: 'true'
            - name: svc_election
              value: 'true'
            - name: enable_service_security
              value: 'false'
            - name: prometheus_server
              value: ':2112'
          image: ghcr.io/kube-vip/kube-vip:${daemonset_version}
          imagePullPolicy: IfNotPresent
          name: kube-vip
          resources: {}
          securityContext:
            capabilities:
              add:
                - NET_ADMIN
                - NET_RAW
      hostNetwork: true
      serviceAccountName: kube-vip
      tolerations:
        - effect: NoSchedule
          operator: Exists
        - effect: NoExecute
          operator: Exists
  updateStrategy: {}
status:
  currentNumberScheduled: 0
  desiredNumberScheduled: 0
  numberMisscheduled: 0
  numberReady: 0

Additional context
Add any other context about the problem here.

@lubronzhan
Copy link
Contributor

You enabled svc_election and this is only for advertising service type lb, so it's expected that each service will have its own leader, and they acquires leader on each kube-vip independently, so there could be 2 services acquiring leader on same node or on different node.
Relevant code is here https://github.com/kube-vip/kube-vip/blob/main/pkg/manager/servicesLeader.go#L18

It's possible that the previous leader didn't exist correctly, so two leader are ARPing for same service, but it's possible that two service type lbs got assigned with same IP, so two node are reporting the same IP, this is also worth checking.

I would recommend using newer kube-vip since many refactoring have been made, and please upload kube-vip log for each of the leader when this issue happens, at least we could check if it's ARPing for same service or not

@dacjames
Copy link
Author

@lubronzhan Thanks for the help!

there could be 2 services acquiring leader on same node or on different node.

How do I check for this condition and prevent it's occurrence?

Will do on the update.

@lubronzhan
Copy link
Contributor

there could be 2 services acquiring leader on same node or on different node.
How do I check for this condition and prevent it's occurrence?

This is expected, since you enabled svc_election, each service will has its own leader. and it should work as long as two services have different loadbalancerIP. So I don't think you want to prevent its occurrence. Unless you have two services share the same loadbalancerIP.

I would need more log to help understand the issue. Please update once you run into that again.

Thanks

@dacjames
Copy link
Author

dacjames commented Mar 12, 2024

Ack on the per-service load balancer. I was getting confused on nodes vs service leader but I think I get it now. My understanding is that each service (differentiated by loadbalancerIP) should have a unique leader. Which is what appears not to have happened as two instances were ARPing the same loadbalancerIP to different interfaces simultaneously.

Unless you have two services share the same loadbalancerIP.

That is not the case now but might have occurred transiently and been subsequently corrected.

Working on an update and will provide additional logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants