Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update to run_sorter_jobs() and slurm #3105

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

MarinManuel
Copy link

I am trying to run spikeinterface sorters on a HPC cluster and needed to pass specific parameters to slurm/sbatch.

I added the option to pass arbitrary arguments to sbatch in the engine_kwargs.

Let me know if that's causing any issues

@alejoe91
Copy link
Member

@MarinManuel thank you! This is great!

The SLURM deployment has been unexplored.on our side, so we are very glad that it works for you.

Would you mind sharing the script that you are using? I'd love to add an How To page on how to deploy multiple spike sorting jobs on SLURM

Copy link
Collaborator

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alejoe91 / @samuelgarcia know the launcher stuff better, is popping the best strategy or could you just pull out the kwargs without changing the dict as you go?

Otherwise, I just added some fixes for our docstring style :)


return_output : bool, dfault False
In the case of engine="slum", possible kwargs are:
- tmp_script_folder: str, default None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- tmp_script_folder: str, default None
- tmp_script_folder : str, default: None

return_output : bool, dfault False
In the case of engine="slum", possible kwargs are:
- tmp_script_folder: str, default None
the folder in which the job scripts are created. Default: directory created by
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the folder in which the job scripts are created. Default: directory created by
the folder in which the job scripts are created. If None, it will be the directory created by

- tmp_script_folder: str, default None
the folder in which the job scripts are created. Default: directory created by
the `tempfile` library
- sbatch_executable_path: str, default 'sbatch'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- sbatch_executable_path: str, default 'sbatch'
- sbatch_executable_path : str, default: 'sbatch'

- other kwargs are interpreted as arguments to sbatch, and are translated to the --args to be passed to sbatch.
see the [documentation for `sbatch`](https://slurm.schedmd.com/sbatch.html) for a list of possible arguments

return_output : bool, default False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return_output : bool, default False
return_output : bool, default: False

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to follow whatever guidelines you recommend. The advantage of pop-ing out the few arguments that we want to use for something else is to not have to specify which aruments to sbatch we accept. That way, if sbatch were to add an additional argument, for instance, then we do not need to update spikeinterface to match.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I don't use slurm I'm not sure. Some times there are json dump of some params for provenance, so I would wait to see whatever Sam and Alessio say regarding that.

@alejoe91 alejoe91 added the sorters Related to sorters module label Jul 1, 2024
Copy link
Collaborator

@JoeZiminski JoeZiminski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very cool! really opens up the SLURM options here. I have a couple of comments now, and will try it out on our HPC ASAP!

)


_implemented_engine = list(_default_engine_kwargs.keys())


def run_sorter_jobs(job_list, engine="loop", engine_kwargs={}, return_output=False):
def run_sorter_jobs(job_list, engine="loop", engine_kwargs=None, return_output=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

@@ -26,14 +26,14 @@
joblib=dict(n_jobs=-1, backend="loky"),
processpoolexecutor=dict(max_workers=2, mp_context=None),
dask=dict(client=None),
slurm=dict(tmp_script_folder=None, cpus_per_task=1, mem="1G"),
slurm={"tmp_script_folder": None, "sbatch_executable_path": "sbatch", "cpus-per-task": 1, "mem": "1G"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would lean towards having something like 'sbatch_kwargs' as a sub-dict, with all kwargs that are passed directly to sbatch held there. The reason is that later engine_kwargs is iterated over an all parameters passed to sbatch. In future when editing people may miss this and not pop new arguments before engine_kwargs is iterated over, so accidentally pass unrelated arguments to sbatch. So could make sense to have "sbatch_kwargs" as a sub-dict to make the separation very clear then iterate only over engine_kwargs["sbatch_kwargs"].

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was what I had originally done, but then I thought it made the kwargs pretty cluttered and changed my mind. But I can see how that would help avoid mistakes in the future, I will change the code.

- tmp_script_folder: str, default None
the folder in which the job scripts are created. Default: directory created by
the `tempfile` library
- sbatch_executable_path: str, default 'sbatch'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool, out of interest what is the use case? will sbatch alone not always work?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case sbatch is not in the PATH. Unlikely but why not ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see why this would be useful but I think in terms of codebase maintainability it would be better that the user handles this on their end by adding sbatch to PATH, given that it is (I think?) a real edge case. Otherwise, if this is a general pattern that should be adopted it will require adding such arguments everywhere (e.g. if a sorter uses matlab, it will have to expose a path to the matlab executable, similar for anything that calls a system executable).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair

Return a sortings or None.
This also overwrite kwargs in in run_sorter(with_sorting=True/False)
This also overwrite kwargs in run_sorter(with_sorting=True/False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This also overwrite kwargs in run_sorter(with_sorting=True/False)
This also overwrites kwargs in run_sorter(with_sorting=True/False)

@@ -146,12 +155,18 @@ def run_sorter_jobs(job_list, engine="loop", engine_kwargs={}, return_output=Fal

elif engine == "slurm":
# generate python script for slurm
tmp_script_folder = engine_kwargs["tmp_script_folder"]
tmp_script_folder = engine_kwargs.pop("tmp_script_folder")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the popping is done so these arguments are not passed to sbatch. If using a subdict (comment above) I would revert these just to passing the variable so the code-reader does not have to track the state of engine_kwargs when reading.


# for backward compatibility with previous version
if "cpus_per_task" in engine_kwargs:
warnings.warn("cpus_per_task is deprecated, use cpus-per-task instead", DeprecationWarning)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I guess this is to match the syntax of SLURM? 👍

if "cpus_per_task" in engine_kwargs:
warnings.warn("cpus_per_task is deprecated, use cpus-per-task instead", DeprecationWarning)
cpus_per_task = engine_kwargs.pop("cpus_per_task")
if "cpus-per-task" not in engine_kwargs:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if better (though a bit more verbose) to raise an error here, if the user has accidentally passed two versions of the same kwarg it would be better to let them know I think.

progr.append(f"--{k}")
progr.append(f"{v}")
progr.append(str(script_name.absolute()))
p = subprocess.run(progr, capture_output=True, text=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will subprocess.run() mean this is no longer run asynchronously vs. subprocess.Popen()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sbatch returns immediately after submitting the job to the queue, so it's still going to be asynchronous. The run() syntax is just a simpler way to interact with the subprocess module and the one I have more experience with, that's all

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay nice ofc sbatch will just return, agree run is the preferred way to use subprocess. Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sorters Related to sorters module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants