-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass SIGTERM to training script to stop training #125
Comments
bstriner
added a commit
to bstriner/sagemaker-training-toolkit
that referenced
this issue
May 19, 2022
feature: Pass SIGTERM to training subprocess fix: aws#125
6 tasks
bstriner
added a commit
to bstriner/sagemaker-training-toolkit
that referenced
this issue
May 20, 2022
feature: Pass SIGTERM to training subprocess fix: aws#125
bstriner
added a commit
to bstriner/sagemaker-training-toolkit
that referenced
this issue
May 20, 2022
feature: Pass SIGTERM to training subprocess fix: aws#125
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
SIGTERM from StopTrainingJob doesn't appear to be passed to the training subprocess.
To reproduce
Add a SIGTERM handler to a training script, start a training job, then click "Stop". The signal handler will not fire.
Expected behavior
Signal handler should fire when "StopTrainingJob" happens
Screenshots or logs
If applicable, add screenshots or logs to help explain your problem.
System information
A description of your system.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: