Auto Sklearn never stop training model #1695

whoisltd · 2023-09-19T07:33:58Z

Describe the bug

I have a pod in k8s with 56 cpu. When i run fit() model with classification or regression it will never done task even though time trainng set time_left_for_this_task=60.
But when run it in local machine with 8cpu everything work fine. But if i increase time on local machine to time_left_for_this_task=1500. Local machine will not stop training after 1500 seconds like model on k8s. I dont know what leading this error maybe about computer configuration or something else
In case have an error i hope have any message return

Expected behavior

Model stop training after end time_left_for_this_task

Actual behavior, stacktrace or logfile

in AutoML(...).log two end lines shows:

[DEBUG] [2023-09-18 17:15:44,242:Client-pynisher] Redirecting output of the function to files. Access them via the stdout and stderr attributes of the wrapped function.
[DEBUG] [2023-09-18 17:15:44,243:Client-pynisher] call function

Environment and installation:

Please give details about your installation:

OS Ubuntu 20.04
virtual environment
Python 3.8
Auto-sklearn 0.15.0

The text was updated successfully, but these errors were encountered:

whoisltd · 2023-09-28T19:46:39Z

Have any update in this problem? And what is minimum configuration for run autosklearn ?

00sapo · 2024-05-16T07:41:51Z

Hello, I used auto-sklearn in several projects now, but never faced this issue... until today. I think the problem is that autosklearn doesn't really stops ongoing training for certain algorithms but just don't start a newer one if beyond the time limit. I guess that the reason is that certain algorithms ignore some kill signals. I'm also on Linux.

00sapo · 2024-05-17T07:15:08Z

I used this function as a work-around. Instead of using SIGSTOP, it uses SIGKILL, so any running process is killed and the fit errors, but continues. It needs psutil, though.

def _monitor_children_processes(min_time_limit, max_time_limit):
    """
    Monitor the children processes of this process and kill them if they take
    too long. This spawns a new process which does nothing until `min_time_limit`
    is reached, then it starts waiting for the children processes of this process
    (the parent, not the monitor). If the children processes are still running
    after `max_time_limit`, it kills them with -9.
    """
    import psutil
    from multiprocessing import Process

    def monitor_children_processes(parent):
        pid = psutil.Process().pid
        start_time = time.time()
        while True:
            if time.time() - start_time < min_time_limit:
                time.sleep(60)
                continue
            children = parent.children()
            if len(children) > 1:
                for child in children:
                    # avoid killing this same process
                    if child.pid != pid:
                        try:
                            remaining_time = max_time_limit - (time.time() - start_time)
                            if remaining_time < 0:
                                # kill with -9
                                child.kill()
                            else:
                                child.wait(timeout=remaining_time)
                        except psutil.TimeoutExpired:
                            # kill with -9
                            child.kill()
                        except psutil.NoSuchProcess:
                            pass
            else:
                break

    # run the monitor in a new process
    monitor = Process(target=monitor_children_processes, args=(psutil.Process(),))
    return monitor

monitor = _monitor_children_processes(3500, 3600)
monitor.start() # starts the monitor process
model.fit(X, y) # starts the fit
monitor.wait(3600) # waits for the monitor to finish, but it should end even without this command ```

whoisltd changed the title ~~Auto Sklearn never done training task~~ Auto Sklearn never stop training model Oct 5, 2023

whoisltd mentioned this issue Oct 5, 2023

What's in store for Auto-Sklearn? -- From the Developers #1677

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto Sklearn never stop training model #1695

Auto Sklearn never stop training model #1695

whoisltd commented Sep 19, 2023 •

edited

whoisltd commented Sep 28, 2023

00sapo commented May 16, 2024

00sapo commented May 17, 2024

Auto Sklearn never stop training model #1695

Auto Sklearn never stop training model #1695

Comments

whoisltd commented Sep 19, 2023 • edited

Describe the bug

Expected behavior

Actual behavior, stacktrace or logfile

Environment and installation:

whoisltd commented Sep 28, 2023

00sapo commented May 16, 2024

00sapo commented May 17, 2024

whoisltd commented Sep 19, 2023 •

edited