-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto Sklearn never stop training model #1695
Comments
Have any update in this problem? And what is minimum configuration for run autosklearn ? |
Hello, I used auto-sklearn in several projects now, but never faced this issue... until today. I think the problem is that autosklearn doesn't really stops ongoing training for certain algorithms but just don't start a newer one if beyond the time limit. I guess that the reason is that certain algorithms ignore some kill signals. I'm also on Linux. |
I used this function as a work-around. Instead of using SIGSTOP, it uses SIGKILL, so any running process is killed and the fit errors, but continues. It needs def _monitor_children_processes(min_time_limit, max_time_limit):
"""
Monitor the children processes of this process and kill them if they take
too long. This spawns a new process which does nothing until `min_time_limit`
is reached, then it starts waiting for the children processes of this process
(the parent, not the monitor). If the children processes are still running
after `max_time_limit`, it kills them with -9.
"""
import psutil
from multiprocessing import Process
def monitor_children_processes(parent):
pid = psutil.Process().pid
start_time = time.time()
while True:
if time.time() - start_time < min_time_limit:
time.sleep(60)
continue
children = parent.children()
if len(children) > 1:
for child in children:
# avoid killing this same process
if child.pid != pid:
try:
remaining_time = max_time_limit - (time.time() - start_time)
if remaining_time < 0:
# kill with -9
child.kill()
else:
child.wait(timeout=remaining_time)
except psutil.TimeoutExpired:
# kill with -9
child.kill()
except psutil.NoSuchProcess:
pass
else:
break
# run the monitor in a new process
monitor = Process(target=monitor_children_processes, args=(psutil.Process(),))
return monitor
monitor = _monitor_children_processes(3500, 3600)
monitor.start() # starts the monitor process
model.fit(X, y) # starts the fit
monitor.wait(3600) # waits for the monitor to finish, but it should end even without this command ``` |
Describe the bug
I have a pod in k8s with 56 cpu. When i run fit() model with classification or regression it will never done task even though time trainng set
time_left_for_this_task=60
.But when run it in local machine with 8cpu everything work fine. But if i increase time on local machine to
time_left_for_this_task=1500
. Local machine will not stop training after 1500 seconds like model on k8s. I dont know what leading this error maybe about computer configuration or something elseIn case have an error i hope have any message return
Expected behavior
Model stop training after end time_left_for_this_task
Actual behavior, stacktrace or logfile
in AutoML(...).log two end lines shows:
Environment and installation:
Please give details about your installation:
The text was updated successfully, but these errors were encountered: