-
Notifications
You must be signed in to change notification settings - Fork 983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autoscaling with multiple metrics does not work #3638
Comments
@shazinahmed Scaling based on multiple metrics is not supported, can you elaborate how you want to scale with both of these metrics? |
@yuzisun Sorry, I missed this one. I want to have an HPA created with two triggers, one for CPU and one for memory, like we can have it in a normal Kubernetes deployment. This will enable us to scale both on CPU and memory triggers depending on what is over utilized. |
Based on the options 1 and 2: it seems that the annotations are not used when the metrics are created, it defaults to CPU and 80% if Didn't find a doc link to HPA as well, we might be missing this part. On the other hand, it seems k8s API supports it: Maybe we could evaluate to bring this functionality to KServer when HPA is enabled. wdyt? |
/kind bug
What steps did you take and what happened:
Tried the following
predictor.scaleMetric
tomemory
and a memory based HPA was created. Yaay!In scenario 1 and 2 HPAs are created as expected if the metric is set as
cpu
.Now, I want my HPA to be controlled by both memory and CPU. I tried setting
predictor.scaleMetric
tomemory
and a correspondingscaleTarget
. Also set CPU thresholds usingserving.kserve.io/metric: cpu
. But onlypredictor.scaleMetric
is respected.What did you expect to happen:
I want HPA to have both memory and CPU based triggers.
What's the InferenceService yaml:
Anything else you would like to add:
The HPA YAML inferenceservice generates
Environment:
/etc/os-release
): Amazon Linux 2The text was updated successfully, but these errors were encountered: