Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to kill worker after the remote action execution? #815

Open
vors opened this issue Mar 29, 2024 · 3 comments · May be fixed by #825
Open

how to kill worker after the remote action execution? #815

vors opened this issue Mar 29, 2024 · 3 comments · May be fixed by #825

Comments

@vors
Copy link

vors commented Mar 29, 2024

Thank you for the awesome project!

Let's say that we have the following setup:

  • k8s deployment with fixed (for simplicity) number of pods
  • each pod is running nativelink worker

It would be very useful to allow work to exit nativelink binary after a single execution -- this way I can kill the pod and it would be re-created proving a clean environment for the next action execution.

@allada
Copy link
Collaborator

allada commented Mar 30, 2024

Currently this is not supported.

@zbirenbaum, could you make a PR that will shutdown the worker after N number of jobs have been processed and make it configurable in the json? This should allow @vors to just set this value to 1 which solves this issue.

In the long run we are likely going to split workers into two parts (worker & executor). The executor would be super light weight and it's job is to just do book keeping. The worker would be a single process running on the same machine (required) and it's job is to prepare the environment for the executor then instruct the executor to do the actual work. By doing this we can then make a worker implementation that can talk to k8s/docker/containerd directly and just launch nativelink inside a pod on the same machine.

@vors
Copy link
Author

vors commented Mar 31, 2024

I'd really appreciate if you can implement this proposal. That is one thing that is needed for our deployment.

@zbirenbaum
Copy link
Contributor

Currently this is not supported.

@zbirenbaum, could you make a PR that will shutdown the worker after N number of jobs have been processed and make it configurable in the json? This should allow @vors to just set this value to 1 which solves this issue.

Sure! I'll get started on this

zbirenbaum pushed a commit to zbirenbaum/nativelink that referenced this issue Apr 2, 2024
Allows users to set the maximum number of action executions a worker is
allowed to complete. Upon reaching this limit, the worker will no longer
accept new jobs, and will exit upon completing all assigned ones.

closes: TraceMachina#815
zbirenbaum pushed a commit to zbirenbaum/nativelink that referenced this issue Apr 2, 2024
Allows users to set the maximum number of action executions a worker is
allowed to complete. Upon reaching this limit, the worker will no longer
accept new jobs, and will exit upon completing all assigned ones.

closes: TraceMachina#815
zbirenbaum pushed a commit to zbirenbaum/nativelink that referenced this issue Apr 2, 2024
Allows users to set the maximum number of action executions a worker is
allowed to complete. Upon reaching this limit, the worker will no longer
accept new jobs, and will exit upon completing all assigned ones.

closes: TraceMachina#815
@zbirenbaum zbirenbaum linked a pull request Apr 2, 2024 that will close this issue
5 tasks
zbirenbaum pushed a commit to zbirenbaum/nativelink that referenced this issue Apr 3, 2024
Allows users to set the maximum number of action executions a worker is
allowed to complete. Upon reaching this limit, the worker will no longer
accept new jobs, and will exit upon completing all assigned ones.

closes: TraceMachina#815
zbirenbaum pushed a commit to zbirenbaum/nativelink that referenced this issue Apr 3, 2024
Allows users to set the maximum number of action executions a worker is
allowed to complete. Upon reaching this limit, the worker will no longer
accept new jobs, and will exit upon completing all assigned ones.

closes: TraceMachina#815
zbirenbaum pushed a commit to zbirenbaum/nativelink that referenced this issue Apr 3, 2024
Allows users to set the maximum number of action executions a worker is
allowed to complete. Upon reaching this limit, the worker will no longer
accept new jobs, and will exit upon completing all assigned ones.

closes: TraceMachina#815
zbirenbaum pushed a commit to zbirenbaum/nativelink that referenced this issue Apr 11, 2024
Allows users to set the maximum number of action executions a worker is
allowed to complete. Upon reaching this limit, the worker will no longer
accept new jobs, and will exit upon completing all assigned ones.

closes: TraceMachina#815
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants