-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SGE: Add qrsh example how to launch multi-host subprocesses #110
Comments
@HenrikBengtsson Do you know of an an easy way to run these in parallel? I tried running the |
Oh... yes, you're right.
We can use standard shell tools for this, i.e. #!/bin/env bash
#$ -S /bin/bash
#$ -cwd
#$ -j y
echo "Call: $0 ..."
echo "Script name: $(basename "${BASH_SOURCE[0]}")"
echo "Arguments: $*"
echo "PID: ${PID}"
module load CBI r
Rscript demo_pe_mpi_qrsh.R
#' Reads PE_HOSTFILE and returns an array of hostnames, where each
#' hostname is repeated the number of times per second column.
#' For example,
#'
#' opt88 3 short.q@opt88 UNDEFINED
#' iq242 2 short.q@iq242 UNDEFINED
#' opt116 1 short.q@opt116 UNDEFINED
#'
#' returns array (opt88 opt88 opt88 iq242 iq242 opt116)
read_pe_hostfile_expanded() {
local -a hosts rows args
local row kk
[[ -n "$PE_HOSTFILE" ]] || { >&2 echo "ERROR: Environment variable 'PE_HOSTFILE' is not set"; exit 1; }
[[ -f "$PE_HOSTFILE" ]] || { >&2 echo "ERROR: No such file: ${PE_HOSTFILE}"; exit 1; }
## Parse PE_HOSTFILE file
mapfile -t rows < <(cat "$PE_HOSTFILE")
for row in "${rows[@]}"; do
read -r -a args <<< "${row}"
# shellcheck disable=SC2034
for kk in $(seq "${args[1]}"); do
hosts+=("${args[0]}")
done
done
echo "${hosts[@]}"
}
read -r -a hosts < <(read_pe_hostfile_expanded)
#echo "hosts=${hosts[*]}"
#echo "nhosts=${#hosts[@]}"
cmd='echo "begin"; hostname; date; echo "done"'
echo "Launching ${#hosts[@]} parallel tasks ..."
echo " - task: $cmd"
for host in "${hosts[@]}"; do
echo "- launch: qrsh -inherit -nostdin -V ${host} \"$cmd\" &"
qrsh -inherit -nostdin -V "${host}" "$cmd" &
done
echo "Launching ${#hosts[@]} parallel tasks ... done"
## Wait for all tasks to complete
echo "Waiting for ${#hosts[@]} parallel tasks to complete ..."
wait
echo "Waiting for ${#hosts[@]} parallel tasks to complete ... done"
## End-of-job summary, if running as a job
[[ -n "$JOB_ID" ]] && qstat -j "$JOB_ID" # This is useful for debugging and usage purposes,
# e.g. "did my job exceed its memory request?"
echo "Call: $0 ... done" It's probably useful to put all that into a new shell function |
Here's the version with a #!/bin/env bash
#$ -S /bin/bash
#$ -cwd
#$ -j y
#-----------------------------------------------------------------
# SGE utility functions
#-----------------------------------------------------------------
sge_debug() {
${SGE_DEBUG:-false} && >&2 echo "$@"
}
#' Reads PE_HOSTFILE and returns an array of hostnames, where each
#' hostname is repeated the number of times per second column.
#' For example,
#'
#' opt88 3 short.q@opt88 UNDEFINED
#' iq242 2 short.q@iq242 UNDEFINED
#' opt116 1 short.q@opt116 UNDEFINED
#'
#' returns array (opt88 opt88 opt88 iq242 iq242 opt116)
read_pe_hostfile_expanded() {
local -a hosts rows args
local row kk
[[ -n "$PE_HOSTFILE" ]] || { >&2 echo "ERROR: Environment variable 'PE_HOSTFILE' is not set"; exit 1; }
[[ -f "$PE_HOSTFILE" ]] || { >&2 echo "ERROR: No such file: ${PE_HOSTFILE}"; exit 1; }
## Parse PE_HOSTFILE file
mapfile -t rows < <(cat "$PE_HOSTFILE")
for row in "${rows[@]}"; do
read -r -a args <<< "${row}"
# shellcheck disable=SC2034
for kk in $(seq "${args[1]}"); do
hosts+=("${args[0]}")
done
done
echo "${hosts[@]}"
}
#' Calls a command on parallel workers allotted by SGE
#'
#' This function identifies the parallel workers that SGE has
#' given to the current job by parsing the file given by the
#' 'PE_HOSTFILE' environment variable. It then uses:
#'
#' qrsh -inherit -nostdin -V <worker-hostname> <command>
#'
#' to launch the <command> on each parallel worker.
#'
#' Example:
#' qrsh_run 'echo "begin"; hostname; date; echo "done"'
qrsh_run() {
local -a hosts
read -r -a hosts < <(read_pe_hostfile_expanded)
## Nothing to do?
[[ ${#hosts[@]} == 0 ]] && return 0
sge_debug "Launching ${#hosts[@]} parallel tasks ..."
sge_debug " - task: $*"
for host in "${hosts[@]}"; do
sge_debug "- launch: qrsh -inherit -nostdin -V ${host} \"$*\" &"
qrsh -inherit -nostdin -V "${host}" "$@" &
done
sge_debug "Launching ${#hosts[@]} parallel tasks ... done"
## Wait for all tasks to complete
sge_debug "Waiting for ${#hosts[@]} parallel tasks to complete ..."
wait
sge_debug "Waiting for ${#hosts[@]} parallel tasks to complete ... done"
}
#-----------------------------------------------------------------
# Main script
#-----------------------------------------------------------------
echo "Call: $0 ..."
echo "Script name: $(basename "${BASH_SOURCE[0]}")"
echo "Arguments: $*"
echo "PPID: ${PPID}"
## Launch command on all parallel workers allotted by SGE
qrsh_run 'echo "begin"; hostname; date; echo "done"'
## Launch another set of parallel tasks after the above have completed
qrsh_run 'echo "begin 2nd round"; hostname; date; echo "done"'
## End-of-job summary, if running as a job
[[ -n "$JOB_ID" ]] && qstat -j "$JOB_ID" # This is useful for debugging and usage purposes,
# e.g. "did my job exceed its memory request?"
echo "Call: $0 ... done" |
A user said in an email:
Coincidentally, a few weeks ago, I figured out how to launch mult-host subprocesses using
qrsh
instead ofmpirun
. Here's an example - it would be nice to be able to simplify it more:The text was updated successfully, but these errors were encountered: