Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add autobuild on delta #3764

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
1d596d2
add autobuild on delta
matthiasdiener Sep 4, 2023
42d18d9
add a test
matthiasdiener Sep 4, 2023
45e1409
cancel concurrent
matthiasdiener Sep 4, 2023
e9e6ee0
on push
matthiasdiener Sep 4, 2023
560764f
remove old ref
matthiasdiener Oct 11, 2023
11a4013
actually run a build, more comments
matthiasdiener Oct 11, 2023
bbc78d5
slurm script to submit autobuild run
ericjbohm Nov 3, 2023
6bd105a
switch to interactive
ericjbohm Nov 3, 2023
f0eb63f
switch to all-test
ericjbohm Nov 3, 2023
d914b09
slimmed down script
ericjbohm Nov 3, 2023
6fddd49
use srun directly
matthiasdiener Nov 3, 2023
b33e7b3
bump nnodes, ntasks, mem
matthiasdiener Nov 3, 2023
ea44037
remove ntasks
matthiasdiener Nov 3, 2023
c1db5dc
proper production and error checking flags in build
ericjbohm Nov 3, 2023
b55b653
more verbosity in the script
ericjbohm Nov 3, 2023
325221f
bump cpus-per-task
matthiasdiener Nov 3, 2023
05a93a9
temporarily remove Github CI files
matthiasdiener Nov 3, 2023
f9b6235
nnodes=1
matthiasdiener Nov 3, 2023
253c1f8
run the tests after compiling them
ericjbohm Nov 3, 2023
d93f40c
add delta sbatch as run dependency
matthiasdiener Nov 3, 2023
5ee524b
switch back to sbatch approach
ericjbohm Nov 3, 2023
0a62264
script to poll the jobqueue and then cat the results of the run
ericjbohm Nov 10, 2023
b982df5
batch version revised to use SLURM variables for filenames
ericjbohm Nov 10, 2023
3596981
revised to call the jobmonitor for resource montoring
ericjbohm Nov 10, 2023
88cc925
preserve output and status in latest.output and latest.status
ericjbohm Nov 21, 2023
e28a559
revise to check status file and report success/failure
ericjbohm Nov 21, 2023
07be593
remove tabs that yaml hates
ericjbohm Nov 21, 2023
92931ff
more whitespace syntax adherence
ericjbohm Nov 21, 2023
7e08549
!fixup typo in result file name
ericjbohm Nov 21, 2023
91297ae
Merge branch 'main' into autobuild-delta
matthiasdiener Nov 21, 2023
1eb8e7f
disable normal CI
matthiasdiener Nov 22, 2023
084cf33
cleanups, first attempt at pages upload
matthiasdiener Nov 22, 2023
075364d
Merge branch 'main' into autobuild-delta
matthiasdiener Feb 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions .github/workflows/autobuild.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
name: Autobuild

on:
schedule:
# Run the build as part of a fixed nightly schedule
- cron: '15 06 * * *' # UTC 6:15am, corresponds to 00:15 CST or 01:15 CDT
push:
paths:
# Also run the build when this file gets modified as part of a PR
- '.github/workflows/autobuild.yml'
- '.github/workflows/delta-sbatch-slurm.sh'


# Cancel in progress CI runs when a new run targeting the same PR or branch/tag is triggered.
# https://stackoverflow.com/questions/66335225/how-to-cancel-previous-runs-in-the-pr-when-you-push-new-commitsupdate-the-curre
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
Delta:
timeout-minutes: 60

runs-on: delta
name: Delta mpi-linux-x86_64 # Could test various builds (e.g., MPI, UCX, ...) via a build matrix

steps:
- uses: actions/checkout@v3
- name: Host info
run: |
set -x
echo "Running autobuild on delta"
hostname
uname -a
lsb_release -a
pwd
- name: build
run: |
set -ex
export target="mpi-linux-x86_64"
.github/workflows/jobmonitor.sh .github/workflows/delta-sbatch-slurm.sh
- name: results
run: |
if grep '0' result.latest
then
echo "Success"
else
echo "Failure"
fi
# should also https://github.com/marketplace/actions/send-email
21 changes: 21 additions & 0 deletions .github/workflows/delta-sbatch-slurm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/bash -l
#SBATCH -N 2
#SBATCH -n 64
#SBATCH -o %j.output
#SBATCH -e %j.output
#SBATCH -t 1:00:00
#SBATCH -J autobuild
#SBATCH -p cpu
#SBATCH -A mzu-delta-cpu
#cd $indir
set -x
ericjbohm marked this conversation as resolved.
Show resolved Hide resolved
module load libfabric; module load cmake
./build all-test $target --with-production --enable-error-checking -j64 -g
#
cd $target
make -C tests test OPTS="$flags" TESTOPTS="$testopts" $maketestopts
make -C examples test OPTS="$flags" TESTOPTS="$testopts" $maketestopts
make -C benchmarks test OPTS="$flags" TESTOPTS="$testopts" $maketestopts
# Save make exit status
status=$?
echo $status > ../$SLURM_JOBID.result
69 changes: 69 additions & 0 deletions .github/workflows/jobmonitor.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
#!/bin/bash

export script=$1;
queue_qsub=sbatch
queue_kill=scancel
queue_stat="squeue -j"

End() {
echo "autobuild> $queue_kill $jobid ..."
$queue_kill $jobid
exit $1
}
echo "Submitting batch job for> $target OPTS=$flags"
echo " using the command> $queue_qsub $script"
chmod 755 $script
while [ -z "$jobid" ]
do
$queue_qsub $script > .status.$$ 2>&1
if grep 'have no accounts' .status.$$ > /dev/null
then
echo "NO account for submitting batch job!"
rm -f .status.$$
exit 1
fi
jobid=`cat .status.$$ | tail -1 | awk '{print $4}'`
rm -f .status.$$
done
echo "Job enqueued under job ID $jobid"
export output=$jobid.output
export result=$jobid.result
# kill job if interrupted
trap 'End 1' 2 3
retry=0
# Wait for the job to complete, by checking its status
while [ true ]
do
$queue_stat $jobid > tmp.$$
linecount=`wc -l tmp.$$ | awk '{print $1}' `
exitstatus=$?
#if [[ $exitstatus != 0 || $linecount != 2 ]]
if [[ $linecount != 2 ]]
then
# The job is done-- print its output
rm tmp.$$
# When job hangs, result file does not exist
test -f $result && status=`cat $result` || status=1
echo "==================================== OUTPUT (STDOUT & STDERR) ========================================"
cat $output
echo "======================================================================================================"
if [[ "$status" != 0 ]];
then
#print script
echo "=============================================== SCRIPT ==============================================="
cat $script
echo "======================================================================================================"
echo "=============================================== RESULT ==============================================="
cat $result
echo "======================================================================================================"
fi
# mv result and output to result.latest
mv $result result.latest
mv $output output.latest
exit $status
fi
# The job is still queued or running-- print status and wait
tail -1 tmp.$$
rm tmp.$$
sleep 60
done