Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom drain command? #737

Open
harrytang opened this issue Feb 28, 2023 · 12 comments
Open

Custom drain command? #737

harrytang opened this issue Feb 28, 2023 · 12 comments
Assignees
Labels
keep This won't be closed by the stale bot.

Comments

@harrytang
Copy link

Hi there,

I have been using Kured and I find it to be a great tool for automating node reboots in a Kubernetes cluster.

I was wondering if there are any plans to add support for custom drain commands in Kured? It would be really helpful if we could specify our own custom drain command that Kured would execute before rebooting a node.

If this is not currently on the roadmap, I would love to know if it's something that the Kured development team would consider adding in the future.

Thank you for your time and for your work on Kured. I look forward to hearing back from you.

Best regards,
Harry

@jackfrancis
Copy link
Collaborator

Hi Harry, we could be open to that. Could you provide an example of what you'd like to do in addition to (or instead of) the normal k8s "drain node" behavior?

@harrytang
Copy link
Author

Hi,

We are currently using Longhorn Storage in our cluster, and while a node is being drained, we still need some components functioning so that the volumes can be properly detached. (see https://longhorn.io/docs/1.4.0/volumes-and-nodes/maintenance/#updating-the-node-os-or-container-runtime)

We normally use this drain command:

kubectl drain NODEX --delete-emptydir-data --ignore-daemonsets --pod-selector='app!=csi-attacher,app!=csi-provisioner,longhorn.io/component!=instance-manager'

Hope you find this helpful.

Thank1

@jackfrancis
Copy link
Collaborator

That helps, thanks @harrytang!

I'll think about how we might put something like this together, stay tuned!

@github-actions
Copy link

This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days).

@harrytang
Copy link
Author

Github keep

@dholbach dholbach added keep This won't be closed by the stale bot. and removed no-issue-activity labels Apr 30, 2023
@ddsmith2-eprod
Copy link

ddsmith2-eprod commented May 16, 2023

Are there any additional suggestions on how to deal with this issue? If you have volume that is not replicated the node fails to drain due to the pod disruption budget. This happens over time when a volume is not used as much and you only have one replica. I see that my predecessors used to stop dockerd and iscsid using Ansible to patch and reboot nodes.

I tried setting forceReboot=true, but it does not seem to help.

EDIT: I did find a setting in Longhorn for Allow Node Drain with the Last Healthy Replica. I'll test this and try to remember to report back here.

@docbobo
Copy link
Contributor

docbobo commented Jun 27, 2023

I am in the same boat as everyone else regarding Longhorn draining. "Allow Node Drain with the Last Healthy Replica" is not solving this issue for me though.

@tylerauerbeck
Copy link

Is anybody taking a look at this? I'm currently attempting to find ways of creating alert manager silences and then removing them when the node comes back up, so I'd want to have a pre reboot command and a post reboot command just like there are for labels. I'd imagine it could tie into the same hooks that the labels use and take a similar approach to how users are able to specify their own reboot command.

I'd be happy to pull something together for this, just want to see if this approach is acceptable to folks.

@ckotzbauer
Copy link
Member

Hm, it depends a bit on how you want to implement/use pre- and post-reboot commands. Do you want to call a command on the host (with nsenter as for the sentinel- and reboot-commands) or should the command work inside the container?
We're currently working on restricting privileges of kured and finding a way to avoid commands on the host with nsenter. Otherwise, there are no plans to add commands/binaries to our own docker-image which can be used within the container.

@ant31
Copy link

ant31 commented Aug 15, 2023

We are looking for the same kind of features (pre-reboot).

The usecase is that we must sometime switchover leader database before rebooting a node. We could have this action triggered by Kured automatically before rebooting.

@ckotzbauer There are various ways to do that without changing the kured container image.
For example, it could be a pod template configuration, and the pod/job are then executed (with no privilege) and the controller would wait for them to terminate successfully.

--pre-boot=' {containers: [image: switchover-pg,
                                          command: ["switch-db --node-name=$(NODE_ID)"]
                       }
--pre-boot=' {containers: [image: silence-alerts,
                                          command: ["turn-off-alerts --node-name=$(NODE_ID)"]
                       }
--post-boot=' {containers: [image: silence-alerts,
                                          command: ["turn-on-alerts --node-name=$(NODE_ID)"]
                       }

I'm sure there are other ways to define/execute those kinds of commands, it's just a quick example.

IMO, the feature would be useful.
it could also reduce a bit the need for you to implement too many integrations upstream.

@kingnarmer
Copy link

Any plans to add this feature to roadmap ?

@ckotzbauer
Copy link
Member

@kingnarmer
When there's a good concept and someone who needs this is able to support here with a PR, it can be implemented anytime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep This won't be closed by the stale bot.
Projects
None yet
Development

No branches or pull requests

9 participants