-
-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kube-VIP ARPs conflicting VIP information #788
Comments
You enabled It's possible that the previous leader didn't exist correctly, so two leader are ARPing for same service, but it's possible that two service type lbs got assigned with same IP, so two node are reporting the same IP, this is also worth checking. I would recommend using newer kube-vip since many refactoring have been made, and please upload kube-vip log for each of the leader when this issue happens, at least we could check if it's ARPing for same service or not |
@lubronzhan Thanks for the help!
How do I check for this condition and prevent it's occurrence? Will do on the update. |
This is expected, since you enabled svc_election, each service will has its own leader. and it should work as long as two services have different loadbalancerIP. So I don't think you want to prevent its occurrence. Unless you have two services share the same loadbalancerIP. I would need more log to help understand the issue. Please update once you run into that again. Thanks |
Ack on the per-service load balancer. I was getting confused on nodes vs service leader but I think I get it now. My understanding is that each service (differentiated by loadbalancerIP) should have a unique leader. Which is what appears not to have happened as two instances were ARPing the same loadbalancerIP to different interfaces simultaneously.
That is not the case now but might have occurred transiently and been subsequently corrected. Working on an update and will provide additional logs. |
Describe the bug
We are using kube-vip to advertise LoadBalancer VIPs using ARP. Normally, this works fine. In at least two cases, we have observed duplicate conflicting ARP advertisements happening simultaneously.
To Reproduce
The issue has been difficult to reproduce in isolation.
The cluster will run happily for a while and then seemingly at random we'll see issues resulting from the conflicting ARPs.
20:28:45.815587 ARP, Reply 10.240.17.183 is-at fa:c3:09:08:be:14, length 28
20:28:47.130643 ARP, Reply 10.240.17.183 is-at 42:92:ad:23:12:3z, length 46
When this occurs, the instance sending out ARP appears not to know that it a leader for the other IPs, because when shut down (by killing the pod), it does not report cleaning up the IP in question.
Shutting down the bad node will cause ARPs to stop but will not clear out the local IP address on the interface.
It appears that the ARP broadcasting goroutine is somehow running in the background while kube-vip is in this state. An investigation of the code made me suspicious of this line: https://github.com/kube-vip/kube-vip/blob/main/pkg/cluster/clusterLeaderElection.go#L116 not calling arpCancel(). But, of course, that is just a wild guess.
Expected behavior
All VIPs should only ever be advertised from one leader at a time. If an instance is not leader for a service, it should not send ARPs for that service.
Environment (please complete the following information):
Kube-vip.yaml
:I think this is the relevant config. The only non-obvious variable here is the vip_interface, which is set to
eth1
in this instance.Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: