Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Service/LoadBalancer reconciliation performance #5909

Open
desek opened this issue Apr 12, 2024 · 5 comments
Open

Improve Service/LoadBalancer reconciliation performance #5909

desek opened this issue Apr 12, 2024 · 5 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@desek
Copy link

desek commented Apr 12, 2024

What would you like to be added:

I'd like for a Service of type=LoadBalancer (a Service with a Public IP) to reconcile faster.
The current implementation only reconciles 1 Service at a time and --concurrent-service-syncs only allows 1 as a value.
This makes the reconcile loop, which processes all Services, to seqeuentially process 1 Service at a time.
In a cluster with 500+ Services the processing of each Service takes 5-10 seconds resulting in a reconciliation loop to take approx. 1 hour. Essentially making it a Service created just after the current reconciliation loop started taking at least double the time to reconcile (~2 hours).

I'm assuming Services are processed sequentially one-by-one due to the nature of Azure Load Balancers.

So the suggestion to improve performance in Service/LoadBalancer reconciliation either (or both):

  1. Reconcile one Azure Load Balancer at the time instead of one Service
  2. Make the cloud controller manager configurable to only reconcile Services based on label selectors
  • This would enable deployment of multiple cloud controller manager which would be dedicated for one Azure LB

Why is this needed:

  • Large cluster with Service of type=LoadBalancer won't scale without this

Dupliate issue in the AKS repo: Azure/AKS#4281

@desek desek added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 12, 2024
@zioproto
Copy link

zioproto commented May 6, 2024

@desek can you open this very same issue also at https://github.com/Azure/AKS/issues

The AKS Product Group monitors that repo and might consider your issue for their roadmap

Thanks

@bridgetkromhout
Copy link

As recently as February, @feiskyer stated this limit is still needed: #249 (comment) - I will ask for a re-evaluation. Thanks for the issue, @desek!

@feiskyer
Copy link
Member

feiskyer commented May 6, 2024

Thanks for the feedback. This couldn't be supported with current LoadBalancer sku as lots of resources are shared, but it is under the plan with container native LoadBalancer (which is still WIP).

For the reconciling latency, have you tried NodeIP based SLB (e.g. set loadBalancerBackendPoolConfigurationType to nodeIP in the cloud configuration file)? VM Nic operations would be skipped with this nodeIP mode, hence its provisioning would be faster that the default mode.

@desek
Copy link
Author

desek commented May 13, 2024

Thanks for the feedback. This couldn't be supported with current LoadBalancer sku as lots of resources are shared, but it is under the plan with container native LoadBalancer (which is still WIP).

For the reconciling latency, have you tried NodeIP based SLB (e.g. set loadBalancerBackendPoolConfigurationType to nodeIP in the cloud configuration file)? VM Nic operations would be skipped with this nodeIP mode, hence its provisioning would be faster that the default mode.

Yes, we're using nodeIP. It's not fast enough for clusters running 500+ services since the bottleneck is that the cloud-provider-azure is processing Kubernetes services sequentially.

@desek
Copy link
Author

desek commented May 13, 2024

@desek can you open this very same issue also at https://github.com/Azure/AKS/issues

The AKS Product Group monitors that repo and might consider your issue for their roadmap

Thanks

I've added it here Azure/AKS#4281

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

4 participants