New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too many applications causing webhook to timeout #14269
Comments
We had similar problems in the past. Clusters with ~500 apps, the webhook request take about 10 seconds to process all. With the latest version that time has been decreased to about 5s. If I am not mistaken in a webhook request ArgoCD processes all applications sequentially instead to parallelise it. Is there any reason not to do it in parallel? |
I'm fairly certain it can and should be paralleled. I'm surprised it takes more than a second. Feels a bit like the processing loop is probably doing some network-bound work as part of each iteration. That would probably be worth some investigation. |
Faced the same issue, webhook calls taking >10s to finish. It is surely doing quite some network-bound work here, potentially calling argo-cd/util/webhook/webhook.go Lines 291 to 311 in eb526ff
|
While doing the refreshes concurrently should solve the issue since it will be much faster, I was looking at the code and it seems like the result of the operation is not used in any way: argo-cd/util/webhook/webhook.go Line 512 in f33005b
I'm wondering if we should instead return a response as soon as the request is validated and do the actual processing in parallel. I don't see why we need to make the webhook wait for the operation to complete if we don't need the result in the response. |
I also imagined similar thing and thought that we'd ideally need some kind of queue for the webhook requests, but this would make the argocd-server somewhat stateful. We could optionally use the existing redis as the queue and run background workers inside the argocd-server, not stateful but not sure if we want to do this (additionally redis is not just a cache now).
Thinking about it again, maybe we could just put all processing to the background and then return 200 immediately to webhook clients. The webhook handler anyway just runs to tell argocd-app-controller to refresh specific apps by adding the refresh-annotation. The actual "queue" is all those apps with their refresh-annotation set. Also, I think we still want the would-be-background app-refreshes to finish quickly (with concurrent processing). |
I'm running into this too. I don't know that much about ArgoCD internals, but I think refreshing the apps in a background thread and returning a response immediately could be a decent solution - even if the processing fails to complete, it's not the end of the world - apps get refreshed periodically anyway, so "best effort" seems alright (to me anyway). |
We are running into this too where our webhook requests are being timed out at argocd. Has there been any update on the fix? I see the PR #15326 has no updates since 4 months. |
Any update on this bug? or any workaround like increase the timeout? |
…rgoproj#14269) Signed-off-by: dhruvang1 <[email protected]>
…rgoproj#14269) Signed-off-by: dhruvang1 <[email protected]>
…rgoproj#14269) Signed-off-by: dhruvang1 <[email protected]>
I have raised a PR to do processing in background in #18173 |
Hello everyone, |
Checklist:
argocd version
.Describe the bug
We have over 1500 applications, in Gitlab when we have configured a webhook it has a maximum timeout of 10s. Running the webhook locally its taking around 15s, meaning gitlab is timing out and disabling the webhook.
Is there anyway of speeding this up?
To Reproduce
Create over 1500 applications in argocd, call the webhook url within postman with a valid push event, time how long it takes.
Expected behavior
Returns with a response within 10s
Version
v2.6.11+697fd7c
The text was updated successfully, but these errors were encountered: