Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to suspend work #4688

Open
a7i opened this issue Mar 8, 2024 · 17 comments · May be fixed by #4838
Open

Ability to suspend work #4688

a7i opened this issue Mar 8, 2024 · 17 comments · May be fixed by #4838
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Milestone

Comments

@a7i
Copy link
Contributor

a7i commented Mar 8, 2024

What would you like to be added:
Ability to suspend work to ensure that changes are not being reconciled.

Why is this needed:

User Story 1: As a cluster admin, we may get stuck in a reconciliation loop where karmada controller-manager will update a resource but due to some unknown reason (perhaps a controller on the member cluster) the resource is reverted. The operator can decide to suspend work while debugging the issue.

User Story 2: We're using Karmada to migrate workload from one blue cluster to a green cluster. Once we move a workload, we want to suspend any updates from blue to green until we cut over to green.

Persona

Cluster Admin who is oncall and has permission to modify karmada resources

Implementation Details

There are three ways that we can go about this:

1. Expose suspend on Work CRD

This is the simplest approach but requires the Cluster Admin to identify Work in the karmada-es-${cluster} namespace before patching them. Once the field is set, the controller will get out early. In the code here https://github.com/karmada-io/karmada/blob/master/pkg/controllers/status/work_status_controller.go#L93-L96, we would add the following:

	if work.Susped {
		return controllerruntime.Result{}, nil
	}

Effort: Low

2. Expose suspend on ResourceBinding CRD

This requires the suspend field on ResourceBinding which will also get updated to Work. The Cluster Admin will have to identify the ResourceBinding in the workload namespace before patching them. The name of resource binding is more predictable than Work.

Effort: Medium

3. Expose suspend on PropagationPolicy and ClusterPropagationPolicy CRD

The Cluster Admin will have to know the PP or CPP with the highest priority and decide to suspend them. Changes tosuspend will have to get updated to ResourceBinding and Work.

Effort: X-Large

@a7i a7i added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 8, 2024
@a7i
Copy link
Contributor Author

a7i commented Mar 8, 2024

Created a PR to get feedback: #4689

@chaunceyjiang
Copy link
Member

Related to it #1859

@RainbowMango
Copy link
Member

User Story 1: As a cluster admin, we may get stuck in a reconciliation loop where karmada controller-manager will update a resource but due to some unknown reason (perhaps a controller on the member cluster) the resource is reverted. The operator can decide to suspend work while debugging the issue.

+1 on this user story.
This feature would be helpful for debugging.

User Story 2: We're using Karmada to migrate workload from one blue cluster to a green cluster. Once we move a workload, we want to suspend any updates from blue to green until we cut over to green.

May I ask for more detailed info about how you do the migration? I don't understand how can the workload be synced from blue to green, I understand both the blue and green clusters are taking karmada-apiserver as the source.

Just a guess, the process of migrating a workload would be something like:

Step 1: Create a PropagationPolicy to take over an application from blue cluster, like

apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: foo
spec:
  resourceSelectors:
    - apiVersion: apps/v1
      kind: Deployment
      name: nginx
  placement:
    clusterAffinity:
      clusterNames:
        - blue

Step 2: Add green cluster to the same PropagationPolicy, so that the application can be synced:

apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: foo
spec:
  resourceSelectors:
    - apiVersion: apps/v1
      kind: Deployment
      name: nginx
  placement:
    clusterAffinity:
      clusterNames:
        - blue
        - green

Step 3: Testing against the green cluster.

Step 4: Remove the application from the blue cluster by removing blue from the PropagationPolicy:

apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: foo
spec:
  resourceSelectors:
    - apiVersion: apps/v1
      kind: Deployment
      name: nginx
  placement:
    clusterAffinity:
      clusterNames:
        - green

@RainbowMango
Copy link
Member

  1. Expose suspend on Work CRD
    This is the simplest approach but requires the Cluster Admin to identify Work in the karmada-es-${cluster} namespace before patching them. Once the field is set, the controller will get out early. In the code here https://github.com/karmada-io/karmada/blob/master/pkg/controllers/status/work_status_controller.go#L93-L96, we would add the following:

I totally agree that Work deserves to have a field to present if the propagation should be suspended.

Actually we already have an annotation(propagation.karmada.io/instruction:suppressed) for the same purpose:

// PropagationInstruction is used to mark a resource(like Work) propagation instruction.
// Valid values includes:
// - suppressed: indicates that the resource should not be propagated.
//
// Note: This instruction is intended to set on Work objects to indicate the Work should be ignored by
// execution controller. The instruction maybe deprecated once we extend the Work API and no other scenario want this.
PropagationInstruction = "propagation.karmada.io/instruction"

But this annotation is used internally in karmada itself in scenario of a Work is not managed by a ResourceBinding.

But, in your case, even after Work introduced the .spec.suspend, your modification maybe overrides by ResourceBinding controller, see
https://github.com/karmada-io/karmada/blob/master/pkg/util/helper/work.go#L81.

But don't worry, there are also two approaches to eliminate the risk:

  1. Expose suspend on ResourceBinding CRD
  2. Mutate each field when dealing with the conflict. See [what we have done at the detector](Expose suspend on ResourceBinding CRD).

I tend to introduce the functionality to ResourceBinding or PropagationPolicy, looking forward to hearing your use case.

@XiShanYongYe-Chang
Copy link
Member

Hi @a7i, we are going to push this feature recently. Can you help to provide some feedback with @RainbowMango above?

@a7i
Copy link
Contributor Author

a7i commented Apr 16, 2024

May I ask for more detailed info about how you do the migration? I don't understand how can the workload be synced from blue to green, I understand both the blue and green clusters are taking karmada-apiserver as the source.

While we have long-term plans to use karmada apiserver for handling our multi-cluster / blue-green uses-cases, we currently reuse our kube-apiserver (on the blue cluster). We essentially bootstrap karmada controller-manager, webhook, scheduler, aggregated-api onto an existing cluster (which reuses the kube-apiserver). We then use policies to migrate workloads gradually to another cluster. I realize that majority of karmada use-cases do not follow this pattern.

Actually we already have an annotation(propagation.karmada.io/instruction:suppressed) for the same purpose:

Ideally we deprecate this annotation in favor of using the same .spec.Suspend or consider using this annotation for consistency.

Mutate each field when dealing with the conflict. See [what we have done at the detector](Expose suspend on ResourceBinding CRD).
I tend to introduce the functionality to ResourceBinding or PropagationPolicy, looking forward to hearing your use case.

I agree with your recommendation. Let me pick this back up and have a PR ready this week.

@a7i
Copy link
Contributor Author

a7i commented Apr 17, 2024

@XiShanYongYe-Chang I have updated the PR to reflect the requested changes. I look forward to your feedback

@a7i a7i linked a pull request Apr 17, 2024 that will close this issue
@XiShanYongYe-Chang
Copy link
Member

We then use policies to migrate workloads gradually to another cluster.

Can you elaborate on this step a little bit?

@a7i
Copy link
Contributor Author

a7i commented Apr 18, 2024

Can you elaborate on this step a little bit?

Given that we have two clusters:

  1. cluster-blue: this is where the original workloads are (and karmada controller-manager also runs here in this setup)
  2. cluster-green: this is the target cluster we're moving workloads to

Let's take a simple workload (Deployment with 3 replicas and Service) as an example.

  1. Create a MultiClusterService
  2. Create an OverridePolicy with replicas as 0 matching cluster-green
  3. Create a PropagationPolicy (with cluster-green in the placement.clusterAffinity.clusterNames[0])
    -- observe that it created a Deployment in cluster-green with zero replicas
    -- observe that the original Deployment in cluster-blue is still 3 replicas
  4. Update OverridePolicy plaintext to /spec/replicas 1
    -- we observe that cluster-blue still has a deployment with 3 replicas (no change)
    -- we observe that cluster-green now has a deployment with 1 replica
  5. Decrement replicas in cluster-blue to 2
    -- we observe that cluster-blue now has a deployment with 2 replicas
    -- we observe that cluster-green still has a deployment with 1 replica (no change)
  6. Update OverridePolicy to /spec/replicas 2
    -- we observe that cluster-blue still has a deployment with 2 replicas (no change)
    -- we observe that cluster-green now has a deployment with 2 replicas
  7. Decrement replicas in cluster-blue to 1
    -- we observe that cluster-blue now has a deployment with 1 replica
    -- we observe that cluster-green still has a deployment with 2 replicas (no change)
  8. Update OverridePolicy to /spec/replicas 3
    -- we observe that cluster-blue still has a deployment with 1 replica (no change)
    -- we observe that cluster-green now has a deployment with 3 replicas
  9. Decrement replicas in cluster-blue to 0
    -- we observe that cluster-blue now has a deployment with 0 replica
    -- we observe that cluster-green still has a deployment with 3 replicas (no change)

@XiShanYongYe-Chang
Copy link
Member

Thanks for the detailed explanation.

Do you need to pause work at each step interval between 3 and 9 steps? If so, what condition is the pause work waiting for?

@a7i
Copy link
Contributor Author

a7i commented Apr 19, 2024

No, given that we still rely on karmada-controller-manager for handling propagation syncs. After step 9, however, it needs to be suspended.

I think this feature request is generic enough to not have to make our use-case the only use-case. I believe that "User Story 1" from the Issue description is a valid use-case and as a cluster-admin, I want to have this capability at my disposal.

@XiShanYongYe-Chang
Copy link
Member

Thanks @a7i.

I wonder if Karmada can do more to facilitate users to use out-of-the-box capabilities, such as canary release, in addition to providing APIs for suspending work propagation (which can be considered as an atomic capability).

I may be wrong, please correct me if so.

@XiShanYongYe-Chang
Copy link
Member

Hi @a7i I would like to know, do you use argo-cd for canary release? I was wondering if we could do some combination with argo-cd by suspending work.

@Monokaix
Copy link

Hi, we also have the need to suspend rb, mainly to do a global sorting based on priority, and to perform capacity admission on the queue associated with rb.
As we discussed in this issue #4937 (comment), I think we two's requirements are different. I need to pause during the scheduling stage, and your requirement is to suspend during dispatch. Therefore, in order to avoid conflicts, different fields should be set separately for scheduling suspend and dispatch suspend. In our case, only need to change one field of rb. If you want to suspend the work when dispatching, setting a suspend on the pp may be a better way, or if you have other ideas, we can communicate together: )

@tpiperatgod
Copy link

Hi, we also have the need to suspend rb, mainly to do a global sorting based on priority, and to perform capacity admission on the queue associated with rb. As we discussed in this issue #4937 (comment), I think we two's requirements are different. I need to pause during the scheduling stage, and your requirement is to suspend during dispatch. Therefore, in order to avoid conflicts, different fields should be set separately for scheduling suspend and dispatch suspend. In our case, only need to change one field of rb. If you want to suspend the work when dispatching, setting a suspend on the pp may be a better way, or if you have other ideas, we can communicate together: )

I raised a similar need in the issue #4559 , getting the scheduling results of the RB helps to increase flexibility in scheduling.

@XiShanYongYe-Chang
Copy link
Member

I raised a similar need in the issue #4559 , getting the scheduling results of the RB helps to increase flexibility in scheduling.

Hi @tpiperatgod , from issue $4559, I think what you want to pause is the propagation of the Work, which is the stage where you can get the rb scheduling results. But for the rb scheduling pause, there is no scheduling result yet. What would you do with this?

@tpiperatgod
Copy link

Yes, that's what I meant, there may be ambiguity in the presentation.
The process from PP to RB to Work is continuous, and I was hoping to provide a way to suspend the process in between to increase the granularity of control.
Currently, I'm preferring to use the "spec.suspend" field of the config PP and pass it through to RB and Work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
Status: Triaging
Development

Successfully merging a pull request may close this issue.

6 participants