-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for OAR Scheduler #1744
base: main
Are you sure you want to change the base?
Conversation
…key (short or long parameter syntax)
318bd7c
to
c167c62
Compare
0cd5f6d
to
bf2b368
Compare
Hi, thanks for contributing this. The steps to follow are the following:
If all work well, we can add an entry in the readme that point to your plugin. |
a03259d
to
f66da09
Compare
Hello, thanks for your review and your proposal about the plugin. Here is the repository: https://github.com/ychiat35/submitit_oar. I will try to add some CI/CD actions for tests and package releases. About this point:
have you thinked about some CI tests for OAR (and Slurm), similarly to what is done for Slurm and SGE clusters on Dask-jobqueue repository: https://github.com/dask/dask-jobqueue/blob/main/ci/slurm/docker-compose.yml ? maybe it will be a good way to test real jobs launched on OAR/Slurm clusters. |
Hello, We'd like to inform you that we have successfully integrated the submitit_oar plugin into the Grid5000 repositories, at this link: Grid5000/submitit_oar. Additionally, we have released a new version of the plugin on PyPi, accessible here: submitit_oar 1.1.1. The integration of the submitit_oar plugin has been smooth, and it seamlessly aligns with the Submitit's plugin system. To finalize the pull request, we'd like to confirm if you're still fine with us submitting a PR to update the readme to mention our plugin. Thanks a lot for your feedback. |
The Oar scheduler is widely used in France, including mesocentre supercomputers (e.g., GRICAD), INRIA supercomputers, Grid5000 testbed and other plateforms.
This PR adds support for the OAR Scheduler as a plugin. Four main classes have been implemented in
oar.py
(following the previous implementation made for slurm):oarstat
command (similar to thesinfo
command on the Slurm scheduler).Unit tests were created in
test_oar.py
andtest_auto.py
to ensure that the OAR plugin offers the same basic functionalities as the Slurm plugin.A few notes about the implementation:
_equivalence_dict
dictionary). Additional OAR parameters can be set with theadditional_parameters
dictionary._make_submission_command
method in the OarExecutor class is overridden from PicklingExecutor. The content of the file is read and the job is submitted using the OAR "inline command" instead of using the submission file.scontrol
(i.e.,oarsub
) is not available on nodes. To automatically requeue the job after preemption, the original job must be submitted with theidempotent
type and be exited with the99
code.Our implemented OAR plugin covers most of submitit features (e.g., job submission, checkpointing, job array). The only feature that we did not address is the task submission. Indeed, contrary to Slurm, OAR does not provide such a feature. We believe a workaround could be implemented in another iteration. Meanwhile, we raise a "NotImplemeted" error if a user attempts to use such a feature.