How to process predictor error response ? #684

yinsenyan · 2023-05-11T12:13:44Z

What would you like to be added:

If predict http request failed , return an error and cancel scheduling , like this:
https://github.com/clusternet/clusternet/blob/main/pkg/scheduler/framework/plugins/predictor/predictor.go#L128
One cluster predictor failure resulted in a subscription scheduling failure, which is inappropriate.

Why is this needed:

The task is to find a better way to solve this problem.

If predict request failed, return 0
drop cluster from available list when predict is not health

If method 1 is used, cluster which replicas is 0 will still in binding cluster, and cannot be removed, either it needs to be removed during the merge process, or there might be other ways to address this.

And if method 2, drop the cluster from available cluster list when one feed predict failed even this subs have many feeds, It is a radical approach when there are only a few child clusters.

yinsenyan · 2023-05-12T01:44:35Z

add post-predict extension point to process predictor unhealthy cluster

yinsenyan · 2023-05-12T01:44:53Z

@dixudx @Garrybest

dixudx · 2023-05-15T09:09:59Z

If predict request failed, return 0
If method 1 is used, cluster which replicas is 0 will still in binding cluster, and cannot be removed, either it needs to be removed during the merge process, or there might be other ways to address this.

I'd prefer using method 1 to return 0 replica, which is friendly to current scheduling framework and implementations.

By adding a new flag in struct ClusterScore to indicate such unhealthy predictor cases, all clusters with replicas 0 could be easily pruned in the function RunPredictPlugins.

add post-predict extension point to process predictor unhealthy cluster

If so, it will be better to taint the cluster.

yinsenyan added the kind/feature New feature or request label May 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to process predictor error response ? #684

How to process predictor error response ? #684

yinsenyan commented May 11, 2023 •

edited

yinsenyan commented May 12, 2023

yinsenyan commented May 12, 2023

dixudx commented May 15, 2023

How to process predictor error response ? #684

How to process predictor error response ? #684

Comments

yinsenyan commented May 11, 2023 • edited

What would you like to be added:

Why is this needed:

yinsenyan commented May 12, 2023

yinsenyan commented May 12, 2023

dixudx commented May 15, 2023

yinsenyan commented May 11, 2023 •

edited