Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Condor as scheduler #74

Open
fredbevia opened this issue Mar 13, 2018 · 4 comments
Open

Condor as scheduler #74

fredbevia opened this issue Mar 13, 2018 · 4 comments

Comments

@fredbevia
Copy link

Hello,
would it be possible to implement Condor (https://research.cs.wisc.edu/htcondor/) as backend scheduler?
It's one of the rares scheduler being free, powerfull, well supported and documenter, open source and multi-platform (*nix/windows). LSF, SGE, SLURM are equaly powerfull but only *nix and/or non-free/commercial. Since the majority of desktop PC are Windows, using this vast amount of unused cpu/ram for clustered computation with clustermq would be very usefull.
Thanx to give it a thought
F.Bevia

@mschubert
Copy link
Owner

Hi Fred,

Thank you for your interest in the package. I have heard of Condor before, but I have never personally used it (and also have got no means of testing it).

Are you using this scheduler yourself, or do you think it would be nice to support it generally?

In the work environments I have been so far, we always had *nix compute clusters and were never running heavy computations on Windows machines. And in terms of commercial schedulers vs. free/OSS, Slurm falls in the latter category as well.

@mschubert
Copy link
Owner

Closing this due to inactivity; please reopen if still an actual request

@bomeara
Copy link

bomeara commented Apr 11, 2019

I would actually like this, too. I have tried configuring SLURM on a cluster of computers without sufficient success, but HTCondor works with little problems, and putting it into clustermq so I could use it with drake would be great (I've used drake to do ssh in parallel, but R has a limit on parallel connections, and it has an issue if any computers go down). I'm happy to test it, too.

@mschubert mschubert reopened this Apr 11, 2019
@bomeara
Copy link

bomeara commented Apr 11, 2019

Thanks for reopening! One potential issue with HTCondor is that it typically works with an executable and a batch script, so templating might be hard. However, this is a script one can submit via condor_submit that will return back files of results that could serve as an example (it runs the command 17 times, and it doesn't require a separate executable [though it is basically copying Rscript to each worker node, I believe]):

executable=/usr/bin/Rscript
arguments= -e unname(Sys.info()['nodename'])
universe=vanilla
log=results.log
output=results.output.$(Process)
error=results.error.$(Process)
notification=never
should_transfer_files=YES
when_to_transfer_output = ON_EXIT
queue 17

A sample output file (say, results.output.10) just has this as the return in this case: [1] "omearalab7"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants