Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.7.2] Failing to assign VIPs to services with externalTrafficPolicy: Local and servicesElection: true #790

Open
Poldovico opened this issue Mar 15, 2024 · 4 comments

Comments

@Poldovico
Copy link

Describe the bug
I'm trying to provide LoadBalancer services on a bare metal cluster, using kube-vip in ARP mode with servicesElection=true and the kube-vip-cloud-controller providing IPs.
When I create a service with externalTrafficPolicy: Local, the kube-vip.io/loadbalancerIPs annotation and loadBalancerIP field are set, but the external IP remains pending forever and the VIP does not get assigned to a Node.
If I create the service with externalTrafficPolicy: Cluster everything works as expected, and provided the VIP happens to end up on the same node as a Pod backing the Service, I can even edit it to Local later and it will still work. However, if that Pod terminates and its replacement is scheduled on a different node, the VIP remains on the previous Node, leaving the Service unabe to reach its endpoints.

To Reproduce
Steps to reproduce the behavior:

  1. Create a bare metal cluster with kubeadm
  2. Install kube-vip-cloud-controller
  3. kubectl apply -f to install the kube-vip manifest provided below
  4. Install ingress-nginx from the official helm chart with default options (presumably this can be anything, ingress-nginx is just what I happen to be trying to expose)
  5. Expose ingress-nginx on a LoadBalancer service with the following manifest:
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/name: ingress-nginx
  name: ingress-nginx-controller
  namespace: ingress-nginx
spec:
  externalTrafficPolicy: Local
  ports:
  - appProtocol: http
    name: http
    port: 80
    protocol: TCP
    targetPort: http
  - appProtocol: https
    name: https
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/name: ingress-nginx
  type: LoadBalancer

Expected behavior
When I create a new Service with externalTrafficPolicy: Local, its VIP is assigned to a Node that has Pods matching its selector, and if an existing Service with externalTrafficPolicy: Local is left with no pods on its Node, its VIP is reassigned to a different Node that has Pods.

Environment:

  • OS/Distro: Debian 12
  • Kubernetes Version: v1.29
  • Kube-vip Version: 0.7.2

Kube-vip.yaml:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  creationTimestamp: null
  labels:
    app.kubernetes.io/name: kube-vip-ds
    app.kubernetes.io/version: v0.7.2
  name: kube-vip-ds
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-vip-ds
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/name: kube-vip-ds
        app.kubernetes.io/version: v0.7.2
    spec:
      containers:
      - args:
        - manager
        env:
        - name: vip_arp
          value: "true"
        - name: port
          value: "6443"
        - name: vip_cidr
          value: "32"
        - name: dns_mode
          value: first
        - name: svc_enable
          value: "true"
        - name: svc_leasename
          value: plndr-svcs-lock
        - name: svc_election
          value: "true"
        - name: vip_leaderelection
          value: "true"
        - name: vip_leasename
          value: plndr-cp-lock
        - name: vip_leaseduration
          value: "5"
        - name: vip_renewdeadline
          value: "3"
        - name: vip_retryperiod
          value: "1"
        - name: vip_address
        - name: prometheus_server
          value: :2112
        image: ghcr.io/kube-vip/kube-vip:v0.7.2
        imagePullPolicy: Always
        name: kube-vip
        resources: {}
        securityContext:
          capabilities:
            add:
            - NET_ADMIN
            - NET_RAW
      hostNetwork: true
      serviceAccountName: kube-vip
  updateStrategy: {}
status:
  currentNumberScheduled: 0
  desiredNumberScheduled: 0
  numberMisscheduled: 0
  numberReady: 0

This was generated by

docker run --network host --rm ghcr.io/kube-vip/kube-vip:0.7.2" manifest daemonset \
    --inCluster \
    --services \
    --arp \
    --leaderElection \
    --servicesElection 

Additional context
This is a new cluster created with kubeadm.

@Poldovico
Copy link
Author

Poldovico commented Mar 15, 2024

Upon creating the service, the logs for kube-vip-cloud-controller read

I0315 10:41:41.096370       1 event.go:294] "Event occurred" object="ingress-nginx/ingress-nginx-controller" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
I0315 10:41:41.106515       1 loadBalancer.go:91] syncing service 'ingress-nginx-controller' (a079d243-f71f-422c-83e9-260daa25dbe7)
I0315 10:41:41.114123       1 loadBalancer.go:247] no cidr config for namespace [ingress-nginx] exists in key [cidr-ingress-nginx] configmap [kubevip]
I0315 10:41:41.114746       1 loadBalancer.go:250] no global cidr config exists [cidr-global]
I0315 10:41:41.114829       1 loadBalancer.go:264] no range config for namespace [ingress-nginx] exists in key [range-ingress-nginx] configmap [kubevip]
I0315 10:41:41.114870       1 loadBalancer.go:269] Taking address from [range-global] pool
I0315 10:41:41.122886       1 loadBalancer.go:209] Updating service [ingress-nginx-controller], with load balancer IPAM address(es) [192.168.62.40]
I0315 10:41:41.147143       1 event.go:294] "Event occurred" object="ingress-nginx/ingress-nginx-controller" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuredLoadBalancer" message="Ensured load balancer"
I0315 10:41:41.154436       1 loadBalancer.go:91] syncing service 'ingress-nginx-controller' (a079d243-f71f-422c-83e9-260daa25dbe7)
I0315 10:41:41.157139       1 event.go:294] "Event occurred" object="ingress-nginx/ingress-nginx-controller" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="LoadbalancerIP" message=" -> 192.168.62.40"
I0315 10:41:41.157263       1 event.go:294] "Event occurred" object="ingress-nginx/ingress-nginx-controller" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
I0315 10:41:41.157322       1 event.go:294] "Event occurred" object="ingress-nginx/ingress-nginx-controller" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuredLoadBalancer" message="Ensured load balancer"

while all kube-vip pods in the daemonset only log the one line:

time="2024-03-15T10:41:41Z" level=info msg="[endpoints] watching for service [ingress-nginx-controller] in namespace [ingress-nginx]"

Could this issue be related to #666 ?

@Poldovico Poldovico changed the title Failing to assign VIPs to services with externalTrafficPolicy: Local and servicesElection: true [0.7.2] Failing to assign VIPs to services with externalTrafficPolicy: Local and servicesElection: true Mar 15, 2024
@Cellebyte
Copy link
Collaborator

Cellebyte commented Mar 15, 2024

Does the service has endpoints?
Does the service has the loadbalancerIP set in the status field?

@Poldovico
Copy link
Author

Poldovico commented Mar 18, 2024

Does the service has endpoints? Does the service has the loadbalancerIP set in the status field?

The service does have endpoints, but the status field is completely empty.

In fact, since all IPs are private I can show you the whole thing as kubectl reports it:

apiVersion: v1
kind: Service
metadata:
  annotations:
    kube-vip.io/loadbalancerIPs: 192.168.62.65
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"controller","app.kubernetes.io/instance":"ingress-nginx","app.kubernetes.io/name":"ingress-nginx"},"name":"ingress-nginx-controller","namespace":"ingress-nginx"},"spec":{"externalTrafficPolicy":"Local","ports":[{"appProtocol":"http","name":"http","port":80,"protocol":"TCP","targetPort":"http"},{"appProtocol":"https","name":"https","port":443,"protocol":"TCP","targetPort":"https"}],"selector":{"app.kubernetes.io/component":"controller","app.kubernetes.io/instance":"ingress-nginx","app.kubernetes.io/name":"ingress-nginx"},"type":"LoadBalancer"}}
  creationTimestamp: "2024-03-15T13:21:10Z"
  finalizers:
  - service.kubernetes.io/load-balancer-cleanup
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/name: ingress-nginx
    implementation: kube-vip
  name: ingress-nginx-controller
  namespace: ingress-nginx
  resourceVersion: "4456635"
  uid: 80dcf0aa-dc8d-4203-9149-4d53ed153cba
spec:
  allocateLoadBalancerNodePorts: true
  clusterIP: 10.98.171.239
  clusterIPs:
  - 10.98.171.239
  externalTrafficPolicy: Local
  healthCheckNodePort: 31335
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  loadBalancerIP: 192.168.62.65
  ports:
  - appProtocol: http
    name: http
    nodePort: 31603
    port: 80
    protocol: TCP
    targetPort: http
  - appProtocol: https
    name: https
    nodePort: 31239
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/name: ingress-nginx
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer: {}
Name:                     ingress-nginx-controller
Namespace:                ingress-nginx
Labels:                   app.kubernetes.io/component=controller
                          app.kubernetes.io/instance=ingress-nginx
                          app.kubernetes.io/name=ingress-nginx
                          implementation=kube-vip
Annotations:              kube-vip.io/loadbalancerIPs: 192.168.62.65
Selector:                 app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.98.171.239
IPs:                      10.98.171.239
IP:                       192.168.62.65
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  31603/TCP
Endpoints:                172.20.74.39:80
Port:                     https  443/TCP
TargetPort:               https/TCP
NodePort:                 https  31239/TCP
Endpoints:                172.20.74.39:443
Session Affinity:         None
External Traffic Policy:  Local
HealthCheck NodePort:     31335
Events:                   <none>

@ChristianCiach
Copy link

ChristianCiach commented Mar 21, 2024

Same issue here. This is my first time evaluating kube-vip, so I never used a previous version before.

I think I am seeing the same issue, but it doesn't seem to have anything to do with externalTrafficPolicy: Local. It just "sometimes" (or even "most often") doesn't work. Very often status.loadBalancer: {} just keeps being empty after deploying the Service. But sometimes it randomly does work.

In the rare cases where assigning a VIP to a service does work (and status.loadBalancer gets populated), I can easily trigger the issue again but deleting the service pod. Kube-VIP will then remove status.loadBalancer and it will be in "pending" state forever, even though the service-pod gets recreated by kubernetes immediately.

This is my Service definition:

apiVersion: v1
kind: Service
metadata:
  name: traefik-vip
  annotations:
    kube-vip.io/loadbalancerIPs: '172.28.180.134'
spec:
  type: LoadBalancer
  loadBalancerClass: kube-vip.io/kube-vip-class
  selector:
    app.kubernetes.io/name: traefik
  ports:
    - port: 443
      targetPort: 8443
  externalTrafficPolicy: Local

There is no CCM involved. I especially set loadBalancerClass: kube-vip.io/kube-vip-class because I want to assign the VIP manually.

I am testing this with K3s v1.29.2+k3s1 and Kube-vip is running on all (12) nodes. The manifest has been generated by this:

kube-vip manifest daemonset --inCluster --arp --services --servicesElection --lbClassOnly

Does the service has endpoints?

Yes, even many of them, because we're deploying the service with multiple replicas:

$ kubectl -n traefik-system get endpoints
NAME          ENDPOINTS                                                      AGE
traefik-vip   10.42.0.12:8443,10.42.1.3:8443,10.42.12.199:8443 + 6 more...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants