Skip to content

SciBob is a meta-builder for scientific software, it includes EasyBuild/EESSI, Spack, Conda/Mamba to build a single environment and integrates documentation from multiple sites

License

Notifications You must be signed in to change notification settings

dirkpetersen/scibob

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Bob the Builder in Science Mode

SciBob

SciBob is a longer term project to create a meta-builder and management tool for scientific software, it will drive other tools such as EasyBuild/EESSI, Spack and Conda/Mamba to build a single environment using lmod modules and integrates documentation from various sites.

The first element of SciBob is aws-eb, a tool to build recent EasyBuild software (EasyConfigs) in AWS.

AWS-EB (EasyBuild in AWS)

This tool does 2 things:

  • First, it gives you quick access to a large stack of performance optimized HPC software compiled by the Easybuild (EB) Framework, which could otherwise take more than a week to deploy. You can install the software including the lmod envionment modules directly from a public s3 bucket (s3://easybuild-cache). Use the aws-eb download sub-command to load the software onto your machine.
  • Second, it allows you to run a fully automated build of all latest EB (and soon Bioconda) packages (newest version only). After a successful build, it will tar and upload these packages to AWS S3 for later use and sharing with others. Previously built packages will be automatically uploaded/downloaded which allows aws-eb to use unreliable instances in the low cost AWS spot market by default. You can also use the aws-eb launch sub-command on-premises but you need an S3 compatible bucket (e.g. Ceph). The EB root (EASYBUILD_PREFIX) is currently set to /opt/eb.

Note: https://www.eessi.io/ is a new approach for using software generated by EasyBuild that does not require you to download anything. As more Software is added to EESSI over time, it will likelty be the preferred approach for accessing EasyBuild scientific software.

Install

curl -Ls https://raw.githubusercontent.com/dirkpetersen/scibob/main/aws-eb.py?token=$(date +%s) -o ~/.local/bin/aws-eb
chmod +x ~/.local/bin/aws-eb
python3 -m pip install --user --upgrade boto3

Boto3 is the Python interface to AWS. Make sure ~/.local/bin is in your PATH.

Try it - Download and use compiled software

To download the software, the target folder (default: /opt/eb) must exist, be writable and have about 300 GB free disk space. Each package is compressed as an archive ending eb.tar.gz and will be automatically untarred after downloading. Each operating system and CPU type combination requires about 100GB of eb.tar.gz files in S3. By default, please use Amazon Linux 2023 with a modern --cpu-type such as graviton-3, epyc-gen-4 or xeon-gen-4. If you prefer using the latest RHEL (aka Rocky) or the latest Ubuntu LTS, please select xeon-gen-1. This will be the best choice if you are using the tool on-premises. CPU type xeon-gen-1 will work on all x86-64 Intel and AMD cpus that are offerd on AWS but performance will not be optimized on newer CPUs. We will start with graviton-3, as this ARM cpu is the cheapest option with performance similar to Xeon but 1/3 slower than Epyc

ec2-user@aws-eb:~$ aws-eb download --cpu-type graviton-3

Downloading packages from s3://easybuild-cache/aws/amzn-2023_graviton-3 to /opt/eb ...
  Downloading Modules ...
   Rclone copy: 4198 file(s) with 3.262 MiB transferred.
  Downloading Software ...
   Rclone copy: 2097 file(s) with 76.219 GiB transferred.
 Untarring packages ...
Unpacking /opt/eb/software/Anaconda3/Anaconda3-2023.09-0.eb.tar.gz into /opt/eb/software/Anaconda3...
Unpacking /opt/eb/software/ASAP/ASAP-2.1-foss-2022a.eb.tar.gz into /opt/eb/software/ASAP...
Unpacking /opt/eb/software/BEDTools/BEDTools-2.31.0-GCC-12.3.0.eb.tar.gz into /opt/eb/software/BEDTools...
Successfully unpacked: /opt/eb/software/BEDTools/BEDTools-2.31.0-GCC-12.3.0.eb.tar.gz
Unpacking /opt/eb/software/Automake/Automake-1.16.4-GCCcore-11.2.0.eb.tar.gz into /opt/eb/software/Automake...
Unpacking /opt/eb/software/BLAST+/BLAST+-2.2.31.eb.tar.gz into /opt/eb/software/BLAST+...
Successfully unpacked: /opt/eb/software/Automake/Automake-1.16.5-GCCcore-12.3.0.eb.tar.gz

All software was downloaded to: /opt/eb

To use these software modules, source .bashrc after adding MODULEPATH, e.g.:
echo "export MODULEPATH=${MODULEPATH}:/opt/eb/modules/all" >> ~/.bashrc
source ~/.bashrc

now you can use the HPC module system (e.g. Lmod) to load and run software

$ ml R
$ which R
/opt/eb/software/R/4.3.2-gfbf-2023a/bin/R
$ R
R version 4.3.2 (2023-10-31) -- "Eye Holes"

The aws-eb download sub-command will try to detect your OS and download the binaries compiled for your OS. If that does not work, you can use the --prefix option instead of --cpu-type. You can also download to a different directory, for example:

aws-eb download --prefix ubuntu-22.04_xeon-gen-1 /your/folder

Note: If you select a different download folder, you need to create a symlink (ln -s /your/folder /opt/eb) so that calls to /opt/eb are redirected to your other folder.

Building addional software

Before building your own EasyBuild stack you should first review how many packages are already built with success or error and which ones have been skipped, for example because a compiler toolchain is too old. Then we will run aws-eb config to implement a few settings, for example your own AWS bucket.

Review build status

Each OS/CPU combination has a eb-build-status.json file where the status of each package build is tracked. aws-eb buildstatus provides a summary.

$ aws-eb buildstatus amzn-2023_graviton-3

Summarizing s3://easybuild-cache/aws/amzn-2023_graviton-3/eb-build-status.json ...

Status: 'success'
  Total Occurrences: 1357
  Reasons:
    - easyconfig built successfully: 1357 occurrences

Status: 'error'
  Total Occurrences: 509
  Reasons:
    - n/a: 509 occurrences

Status: 'skipped'
  Total Occurrences: 1886
  Reasons:
    - toolchain not supported: intel: 398 occurrences
    - toolchain version too old: GCCcore-10.2.0: 140 occurrences
    - toolchain version too old: foss-2021b: 105 occurrences
    - toolchain version too old: foss-2021a: 104 occurrences
    - dependencies have errors: 94 occurrences
    - dependency requires too old toolchain: 10 occurrences

configure for building

AWS-EB requires only minimal configuration (aws-eb config) such as setting your own S3 storage bucket. When asked for the "root path" please leave the default (aws) as other options are untested.

$ aws-eb config

 Installing rclone ... please wait ... Done!

*** Asking a few questions ***
*** For most you can just hit <Enter> to accept the default. ***

*** Enter your email address: ***
  [Default: [email protected]] [email protected]

*** Please confirm/edit S3 bucket name to be created in all used profiles.: ***
  [Default: aws-eb-myuser-domain-edu] my-aws-bucket-name
*** Please confirm/edit the root path inside your S3 bucket: ***
  [Default: aws] 
*** Please confirm/edit the AWS S3 Storage class: ***
  [Default: INTELLIGENT_TIERING]
*** Please confirm/edit the AWS S3 region: ***
  [Default: us-west-2]

  Verify that bucket 'my-aws-bucket-name' is configured ...

Done!

To change the compiler toolchains and their minimum versions supported by aws-eb you can edit the json structure in min_toolchains under ~/.config/aws-eb/general:

cat ~/.config/aws-eb/general/min_toolchains
{
    "system": "system",
    "GCC": "11.0",
    "GCCcore": "11.0",
    "LLVM": "12.0",
    "foss": "2022a",
    "gfbf": "2022a"
}

Now you have 2 options. You can either start building from scratch, for example if you would like a OS/CPU combination that is currently not provided via easybuild-cache. Or you can use the binaries that are already in S3 bucket easybuild-cache and use them as a template. As building from scratch will take a long time, building from template is recommended.

Building from template

For building on top of existing binaries you use the aws-eb launch sub-command and as a template you pick the graviton-3 cpu. If you do not choose otherwise aws-eb will pick the lastest Amazon Linux (currently 2023) by default. We will use the --skip-sources option because almost everything is already built and it is not worth preloading 200GB of source files. You also need to copy all the content of the template bucket easybuild-cache into your own bucket. The option --first-bucket will do that for you:

$ aws-eb launch --cpu-type graviton-3 --skip-sources --first-bucket easybuild-cache

c7g.xlarge is the cheapest spot instance with at least 4 vcpus / 8 GB mem
Using amazon image id: ami-03bd21ae09xxxxxx
IAM Instance profile: None.
c7g.xlarge in us-west-2b costs $0.1450 as on-demand and $0.0504 as spot.
Launching spot instance i-0ec773e13xxxxxx ... please wait ...
|██████████████████████████████--------------------| 60.0%
Security Group "sg-0dc530211xxxxxxxxx" attached.
Instance IP: 35.167.xx.xxx
 Waiting for ssh host to become ready ...
 will execute 'aws-eb launch -c graviton-3 -f easybuild-cache --skip-sources --build' on 35.167.xx.xxx ...
 Executed bootstrap and build script ... you may have to wait a while ...
 but you can already login using "aws-eb ssh"
Sent email "AWS-EB build on EC2" to [email protected]!

We see that a spot instance has been launched that costs about 1.25 Cents per core/hour. As a reference, mid-size on-premises HPC systems cost typically between 1 and 2 Cents per core/hour in the US. It depends what you include, when you calculate your TCO. In any case, 1.25 Cents is not bad at all.

After waiting a while for the copy of easybuild-cache to your bucket to finish you can launch a few more configurations with different cpu types such as epyc-gen-4 or xeon-gen-4. Finally, you probably want to monitor the build process, use the aws-eb ssh sub-command for that (first use --list or -l to list all running instances)

$ aws-eb ssh -l 

Listing machines ... Running EC2 Instances:
35.88.195.44   | i-0c2123d527545469c | m7g.xlarge | al2023 | 00-00:15 | (OK)
34.219.173.164 | i-0703fce0a761d7690 | r7a.xlarge | al2023 | 00-00:02 | (OK)
54.191.181.2   | i-0b920c11fe09e1818 | c7i.xlarge | al2023 | 00-00:01 | (OK)

aws-eb chooses the lowest cost instance for a specific cpu type. In the AWS spot market this may be a m7, r7, or c7 type instance at any given time. (m7g = ARM graviton-3, r7a = AMD epyc-gen-4, c7i = Intel xeon-gen-4). Let's login to our graviton-3 instance and enter the history command, which shows us a number of prepared commands we can run simply by selecting them via arrow-key-up.

$ aws-eb ssh 35.88.195.44
Last login: Tue Jan  2 9:33:09 2024

$ history

  1  touch ~/no-terminate && pkill -f aws-eb
  2  pkill -f easybuild.main # skip the currently building easyconfig
  3  grep -B1 -A1 'chars): Couldn.t find file' ~/out.easybuild.54.191.181.2.txt | grep FAILED:
  4  grep -A1 '^== FAILED:' ~/out.easybuild.54.191.181.2.txt
  5  grep -A1 '^== COMPLETED:' ~/out.easybuild.54.191.181.2.txt
  6  tail -n 100 -f ~/out.easybuild.54.191.181.2.txt
  7  tail -n 30 -f ~/out.bootstrap.54.191.181.2.txt

The build-process may only take a few seconds if there are no EasyConfigs found as the script will loop through eb-build-status.json and skip all packages for which a build has been attempted previously. To review more details, let's evalutate a build from scratch:

Building from scratch

For a clean start build from scratch without downloading binaries or sources. RHEL has not been built with the latest Epyc CPU yet and aws-eb will not read anything from your bucket even if you have used the --first-bucket option before. It will however try to download about 200GB source files. To prevent this, use the --skip-sources option. Since a build from scratch will take a long time, you want to focus on your research area first. If you are in life sciences you might prefer --include bio,math to only build certain module classes, but there are also astro,geo,chem,phys and generic modules such as ai,cae,compiler,data,debugger,devel,lang,lib,mpi,numlib,perf,system,toolchain,tools,vis. aws-eb uses the development branch of EasyBuild by default, but you want to play it safe and use the released version by adding --eb-release to the command line. Finally, you decide to double the amount of virtual CPUs (--vcpus) to 8 from the default 4 as much horsepower is needed initially to build large compiler toolchains. (The larger the number of vcpus and memory, the less likely your AWS instance will hang using EasyBuild)

$ aws-eb launch --cpu-type epyc-gen-4 --os rhel --skip-sources --include bio,math --eb-release --vcpus 8

c7a.2xlarge is the cheapest spot instance with at least 8 vcpus / 8 GB mem
Using rhel image id: ami-093bd987f8e53e1f2
IAM Instance profile: None.
c7a.2xlarge in us-west-2c costs $0.4106 as on-demand and $0.1956 as spot.
Launching spot instance i-0ceb79495xxxxxx ... please wait ...
|███████████████████████████████████---------------| 70.0%
Security Group "sg-0dc530211xxxxxxx" attached.
Instance IP: 52.35.44.xxx
 Waiting for ssh host to become ready ...
 will execute 'aws-eb.py launch -c epyc-gen-4 -o rhel -s -i bio,math -e -v 8 --build' on 52.35.44.xxx ...
 Executed bootstrap and build script ... you may have to wait a while ...
 but you can already login using "aws-eb ssh"
Sent email "AWS-EB build on EC2" to [email protected]!

Now run the aws-eb ssh sub-command. If you have multiple instances running, you need to enter the ip address also. Now hit the arrow-key-up key and Enter to review the output of 'bash bootstrap.sh &' that sets up the basic system. At the end it will show you some details about the CPU.

$ aws-eb ssh 
Last login: Tue Jan  2 9:33:09 2024

$ tail -n 30 -f ~/out.bootstrap.52.35.44.xxx.txt

Now run tail -n 100 -f ~/out.easybuild.52.35.44.xxx.txt to track the output of easybuild. This will run for a while. Hit ctrl+c and use the arrow-key-up again to review a few of the other grep commands. One of the first issues that you will notice with EasyBuild in production: It cannot download many of the source files and we must search for file-not-found errors:

$ grep -B1 -A1 'chars): Couldn.t find file' ~/out.easybuild.52.35.44.xxx.txt | grep FAILED:

== FAILED: Installation ended unsuccessfully (build directory: /opt/eb/build/Java/1.8.0_66/system-system): build failed (first 300 chars): Couldn't find file jdk-8u66-linux-x64.tar.gz anywhere, and downloading it didn't work either... Paths attempted (in order): /home/rocky/.local/easybuild/easyconfigs/j/Java/j/Java/jdk-8u66-linux-x64.tar.gz, /home/rocky/.local/easybuild/easyconfigs/j/Java/Java/jdk-8u66-linux-x64.tar.gz, /home/rocky/.l (took 0 secs)

== FAILED: Installation ended unsuccessfully (build directory: /opt/eb/build/Perseus/2.0.7.0/GCCcore-11.2.0): build failed (first 300 chars): Couldn't find file Perseus_v2.0.7.0.zip anywhere, please follow the download instructions above, and make the file available in the active source path (/opt/eb/sources) (took 0 secs)

In this example you see that jdk-8u66-linux-x64.tar.gz cannot be found. No surprise, this is Oracle Java which is not available to download by automated processes. You need to login to the Oracle website and download that file and then upload it to your s3 bucket into the sources/j/Java folder

aws s3 cp jdk-8u66-linux-x64.tar.gz s3://your-bucket/aws/sources/j/Java/

Next time you build it can be automatically pulled from there as long as you are not using --skip-sources with aws-eb

Additional details

Below is some addional background information that may lead to a better understanding of aws-eb

file access/transfer

The aws-eb script will use rclone in the background to download data in parallel. For troubleshooting and to see what is actually inside the easybuild-cache bucket you can also use the aws CLI with the --request-payer requester option

 aws s3 ls s3://easybuild-cache/aws/ --request-payer requester
                           PRE amzn-2023_epyc-gen-4/
                           PRE amzn-2023_graviton-3/
                           PRE amzn-2023_xeon-gen-4/
                           PRE rhel-9_xeon-gen-1/
                           PRE sources/
                           PRE ubuntu-22.04_xeon-gen-1/

Note: rclone is also configured to run with "request-payer=requester". The easybuild-cache bucket is installed in the us-west-2 region and if you are downloading the binaries for one OS/CPU (~100 GB) to another region, it will cost you around $5. If you are downloading to on-premises and do not have DirectConnect nor any Egress waivers, it may cost you up to $10.

Review individual package builds

Each OS/CPU combination has their own eb-build-status.json. This file is used to keep track of all EasyConfigs that were attempted to build. By default, each easyconfig is only ever tried once. Why? This design was chosen to allow for quick execution of all new EasyConfigs. The goal is to run aws-eb once a week and compile as much new software as possible within one hour (aws charges by the hour). If we had to re-try many easyconfigs that were skipped or failed previously, this process would take many hours.

$ aws s3 cp --request-payer requester s3://easybuild-cache/aws/amzn-2023_graviton-3/eb-build-status.json .
download: s3://easybuild-cache/aws/amzn-2023_graviton-3/eb-build-status.json to ./eb-build-status.json

$ tail -n 20 eb-build-status.json
{
    },
    "gfbf-2022a.eb": {
        "status": "success",
        "reason": "easyconfig built successfully",
        "returncode": 0,
        "errorcount": 0,
        "trydate": "2023-12-29T11:50:15.631301-08:00",
        "buildtime": 3,
        "modules": null
    },
    "Ferret-7.5.0-foss-2019b.eb": {
        "status": "skipped",
        "reason": "toolchain version too old: foss-2019b",
        "returncode": -1,
        "errorcount": 0,
        "trydate": "2023-12-29T11:51:43.229852-08:00",
        "buildtime": 0,
        "modules": null
    }
}

Note: If you want to re-try EasyConfigs that were previously skipped you need to use the --check-skipped option with aws-eb launch. If you would like to retry an EasyConfig that previously failed with "status": "error" you need to remove the entire dictionary of that eb file from the json file.

Troubleshooting / logfiles

You can review individual STDOUT/STDERR logs, for example:

$ aws s3 ls --request-payer requester s3://easybuild-cache/aws/amzn-2023_graviton-3/logs/
                           PRE failed/
2023-12-29 12:26:48       8494 out.bootstrap.34.213.192.111.txt
2023-12-29 12:26:49   30148226 out.easybuild.34.213.192.111.txt

and individual output logs of failed builds:

$ aws s3 ls --request-payer requester s3://easybuild-cache/aws/amzn-2023_graviton-3/logs/failed/
                           
2023-12-29 12:26:48      42724 tensorflow-compression-2.11.0-foss-2022a-CUDA-11.7.0.eb-easybuild-NCCL-2.12.12-20231214.083639.kOQdV.log
2023-12-29 12:26:48   54373599 tensorflow-probability-0.19.0-foss-2022a.eb-easybuild-axqqwyso.log
2023-12-29 12:26:48    1312872 tidymodels-1.1.0-foss-2022b.eb-easybuild-rztecz4d.log
2023-12-29 12:26:48     129476 torchvision-0.13.1-foss-2022a-CUDA-11.7.0.eb-easybuild-magma-2.6.2-20231218.112341.arpdJ.log

Building Workflow

This is happening behind the scenes:

  1. run aws-eb launch on your machine
  2. Launch cheapest AWS instance in spot that meets criteria
  3. install system software and settings via cloud-init script (_ec2_cloud_init_script)
  4. Attach new 750 GB EBS volumne and mount to /opt
  5. Upload and launch bootstrap.sh script and other configs
  6. Install basic software in /home of ec2-user
  7. Launch aws-eb script with same options and args but add --build option
  8. Download all modules and tarred binaries from S3 and unpack them and download sources optionally
  9. Loop through all *.eb files (except __archive__) and for each eb file:
    1. check for allowed toolchains and --include and --exclude
    2. install all osdependencies via dnf or apt
    3. download software from shared S3 bucket
    4. set all files under sources/generic to executable
    5. untar all .eb.tar.gz files to ./software
    6. check dependencies of each eb file with eb --missing-modules
    7. install each dependency with eb --umask 0002 dependency.eb
    8. run each easyconfig that contains -CUDA- with eb--ignore-test-failure
    9. true up install by running eb --robot --umask 0002 software.eb
    10. tar new software to .eb.tar.gz files
    11. upload software to shared S3 bucket
  10. automatically terminate instance once build is finished.

S3 Folder Layout

Tarred binaries and modules are copied to a platform specific folder (e.g amzn-2023_epyc-gen-4/software) and sources are copies to a shared folder that all platforms use.

image

Instance Mapping

Each CPU family or GPU type is mapped to all AWS instance families that have this CPU family or GPU installed. This will allow to pick any spot instance that has a certain compatible hardware configuration.

  self.cpu_types = {
      "graviton-2": ('c6g', 'c6gd', 'c6gn', 'm6g', 'm6gd', 'r6g', 'r6gd', 't4g' ,'g5g'),
      "graviton-3": ('c7g', 'c7gd', 'c7gn', 'm7g', 'm7gd', 'r7g', 'r7gd'),
      "graviton-4": ('c8g', 'c8gd', 'c8gn', 'm8g', 'm8gd', 'r8g', 'r8gd'),
      "epyc-gen-1": ('t3a',),
      "epyc-gen-2": ('c5a', 'm5a', 'r5a', 'g4ad', 'p4', 'inf2', 'g5'),
      "epyc-gen-3": ('m6a', 'c6a', 'r6a', 'p5'),
      "epyc-gen-4": ('c7a', 'm7a', 'r7a'),
      "xeon-gen-1": ('c4', 'm4', 't2', 'r4', 'p3' ,'p2', 'f1', 'g3', 'i3en'),
      "xeon-gen-2": ('c5', 'c5n', 'm5', 'm5n', 'm5zn', 'r5', 't3', 't3n', 'dl1', 'inf1', 'g4dn', 'vt1'),
      "xeon-gen-3": ('c6i', 'c6in', 'm6i', 'm6in', 'r6i', 'r6id', 'r6idn', 'r6in', 'trn1'),
      "xeon-gen-4": ('c7i', 'm7i', 'm7i-flex', 'r7i', 'r7iz'),
      "core-i7-mac": ('mac1',)
  }

build-machine

The most cost efficient instance type is not clear yet. I started with c7a.xlarge with 4 vcpus and 8GB RAM. It may not make sense to use a larger instance type as there are long periods of time where only a single vcpu is running. At the tail end it installs R packages for hours which is limited to a single vcpu. Perhaps run just more instances

image

It has this envionment.

ec2-user@aws-eb:$ cat ~/.easybuildrc

test -d /usr/share/lmod/lmod/init && source /usr/share/lmod/lmod/init/bash
export MODULEPATH=/opt/eb/modules/all:/opt/eb/modules/lib:/opt/eb/modules/lang:/opt/eb/modules/compiler:/opt/eb/modules/bio
export EASYBUILD_JOB_CORES=8
export EASYBUILD_CUDA_COMPUTE_CAPABILITIES=7.5,8.0,8.6,9.0
# export EASYBUILD_BUILDPATH=/dev/shm/$USER # could run out of space
export EASYBUILD_PREFIX=/opt/eb
export EASYBUILD_JOB_OUTPUT_DIR=$EASYBUILD_PREFIX/batch-output
export EASYBUILD_DEPRECATED=5.0
export EASYBUILD_JOB_BACKEND=Slurm
export EASYBUILD_PARALLEL=16
# export EASYBUILD_GITHUB_USER=$USER
export EASYBUILD_UPDATE_MODULES_TOOL_CACHE=True
export EASYBUILD_ROBOT_PATHS=/home/rocky/.local/easybuild/easyconfigsrocky

allowed toolchains

These are currently the only allowed toolchains

min_toolchains = {'system': 'system', 'GCC': '11.0', 'GCCcore' : '11.0', 
                                   'LLVM' : '12.0', 'foss' : '2022a', 'gfbf': '2022a'}

You can change this here

vi ~/.config/aws-eb/general/min_toolchains
{
    "system": "system",
    "GCC": "11.0",
    "GCCcore": "11.0",
    "LLVM": "12.0",
    "foss": "2022a",
    "gfbf": "2022a"
}

CLI

$ aws-eb --help

usage: aws-eb  [-h] [--debug] [--profile <aws-profile>] [--no-checksums] [--version] {config,cnf,launch,lau,download,dld,buildstatus,sta,ssh,scp} ...

A (mostly) automated build tool for building Sci packages in AWS. The binary packages are stored in an S3 bucket and can be downloaded by anyone.

positional arguments:
  {config,cnf,launch,lau,download,dld,buildstatus,sta,ssh,scp}
                        sub-command help
    config (cnf)        You will need to answer just a few questions about your cloud setup.
    launch (lau)        Launch EC2 instance, build new Easybuild packages and upload them to S3
    download (dld)      Download built eb packages and lmod modules to /opt/eb
    buildstatus (sta)   Show stats on eb-build-status.json in this S3 folder (including prefix), e.g. 'amzn-2023_graviton-3', 'amzn-2023_epyc-gen-4',
                        'amzn-2023_xeon-gen-4' rhel-9_xeon-gen-1 or ubuntu-22.04_xeon-gen-1.
    ssh (scp)           Login to an AWS EC2 build instance

optional arguments:
  -h, --help            show this help message and exit
  --debug, -d           verbose output for all commands
  --profile <aws-profile>, -p <aws-profile>
                        which AWS profile in ~/.aws/ should be used. default="aws"
  --no-checksums, -u    Use --size-only instead of --checksum when using rclone with S3.
  --version, -v         print AWS-EB and Python version info

Basic configuration

$ aws-eb config --help

usage: aws-eb config [-h] [--list] [--software] [--monitor <[email protected]>]

optional arguments:
  -h, --help            show this help message and exit
  --list, -l            List available CPU/GPU types and supported prefixes (OS/CPU)
  --software, -s        List available Software (Names of Easyconfigs)
  --monitor <[email protected]>, -m <[email protected]>
                        setup aws-eb as a monitoring cronjob on an ec2 instance and notify an email address

Build software on AWS

./aws-eb launch --help
usage: aws-eb launch [-h] [--cpu-type <cpu-type>] [--os OS] [--vcpus <number-of-vcpus>] [--gpu-type <gpu-type>] [--mem <memory-size>]
                     [--instance-type <aws.instance>] [--az AZ] [--on-demand] [--monitor] [--build] [--first-bucket <your-s3-bucket>] [--skip-sources]
                     [--eb-release] [--check-skipped] [--include INCLUDE] [--exclude EXCLUDE] [--force-sshkey]

optional arguments:
  -h, --help            show this help message and exit
  --cpu-type <cpu-type>, -c <cpu-type>
                        run config --list to see available CPU types. (e.g graviton-3)
  --os OS, -o OS        build operating system, default=amazon (which is an optimized fedora) valid choices are: amazon, rhel, ubuntu and any AMI name including wilcards *
  --vcpus <number-of-vcpus>, -v <number-of-vcpus>
                        Number of vcpus to be allocated for compilations on the target machine. (default=4) On x86-64 there are 2 vcpus per core and on Graviton (Arm) there is one core per vcpu
  --gpu-type <gpu-type>, -g <gpu-type>
                        run --list to see available GPU types
  --mem <memory-size>, -m <memory-size>
                        GB Memory allocated to instance  (default=8)
  --instance-type <aws.instance>, -t <aws.instance>
                        The EC2 instance type is auto-selected, but you can pick any other type here
  --az AZ, -z AZ        Enforce the availability zone, e.g. us-west-2a
  --on-demand, -d       Enforce on-demand instance instead of using the default spot instance.
  --monitor, -n         Monitor EC2 server for cost and idle time.
  --build, -b           Execute the build on the current system instead of launching a new EC2 instance.
  --first-bucket <your-s3-bucket>, -f <your-s3-bucket>
                        use this bucket (e.g. easybuild-cache) to initially load the already built binaries and sources
  --skip-sources, -s    Do not pre-download sources from build cache, let EB download them.
  --eb-release, -e      Use official Easybuild release instead of dev repos from Github.
  --check-skipped, -k   Re-check all previously skipped software packages and build them if possible.
  --include INCLUDE, -i INCLUDE
                        limit builds to certain module classes, e.g "bio" or "bio,lib,tools"
  --exclude EXCLUDE, -x EXCLUDE
                        exclude certain module classes, e.g "lib" or "dev,lib", only works if --include is not set
  --force-sshkey, -r    This option will overwrite the ssh key pair in AWS with a new one and download it.

Download binaries

usage: aws-eb download [-h] [--cpu-type CPUTYPE] [--prefix <s3_prefix>] [--vcpus VCPUS] [--with-source] [<target_folder>]

positional arguments:
  <target_folder>       Download to other folder than default

optional arguments:
  -h, --help            show this help message and exit
  --cpu-type CPUTYPE, -c CPUTYPE
                        run --list to see available CPU types, use --prefix to select OS-version_cpu-type
  --prefix <s3_prefix>, -p <s3_prefix>
                        your prefix, e.g. amzn-2023_graviton-3, ubuntu-22.04_xeon-gen-1
  --vcpus VCPUS, -v VCPUS
                        Number of vcpus to be allocated for compilations on the target machine. (default=4) On x86-64 there are 2 vcpus per core and on Graviton (Arm) there is one core per vcpu
  --with-source, -s     Also download the source packages

Check the status summary of your builds

usage: aws-eb buildstatus [-h] <s3_prefix>

positional arguments:
  <s3_prefix>  your prefix, e.g. amzn-2023_graviton-3

optional arguments:
  -h, --help   show this help message and exit

Login via ssh or copy via scp

usage: aws-eb ssh [-h] [--list] [--terminate <hostname>] [--add-key <private-ssh-key.pem>] [sshargs ...]

positional arguments:
  sshargs               multiple arguments to ssh/scp such as hostname or user@hostname oder folder

optional arguments:
  -h, --help            show this help message and exit
  --list, -l            List running AWS-EB EC2 instances
  --terminate <hostname>, -t <hostname>
                        Terminate EC2 instance with this public IP Address.
  --add-key <private-ssh-key.pem>, -a <private-ssh-key.pem>
                        Generate a pub key and add it to a remote authorized_keys file.

modules created

after few days of building I see this in the life sciences section

ec2-user@aws-eb:~$ ml ov

------------------------------------------ /opt/eb/modules/bio -------------------------------------------
ADMIXTURE             (1)   KrakenUniq            (1)   alleleIntegrator    (1)
AGAT                  (1)   KronaTools            (1)   angsd               (1)
ANIcalculator         (1)   LSD2                  (2)   anndata             (1)
ASCAT                 (1)   LTR_retriever         (1)   bam-readcount       (1)
AUGUSTUS              (1)   L_RNA_scaffolder      (1)   bamFilters          (1)
AdapterRemoval        (1)   Lighter               (1)   bases2fastq         (1)
Alfred                (1)   Longshot              (1)   bcbio-gff           (2)
AlphaFold             (1)   MACH                  (1)   bcl2fastq2          (1)
AptaSUITE             (1)   MACS2                 (1)   biobakery-workflows (1)
Arriba                (2)   MACS3                 (1)   biobambam2          (1)
Artemis               (1)   MAFFT                 (3)   biom-format         (2)
BA3-SNPS-autotune     (1)   MAGMA-gene-analysis   (1)   breseq              (1)
BAMM                  (1)   MAGeCK                (1)   bwa-meth            (1)
BAli-Phy              (1)   MCL                   (2)   bwakit              (1)
BBMap                 (2)   MDAnalysis            (2)   bx-python           (1)
BCFtools              (3)   MEGACC                (1)   canu                (1)
BEDOPS                (1)   MEGAN                 (1)   castor              (1)
BEDTools              (3)   MMSEQ                 (1)   cooler              (1)
BLAST+                (3)   MMseqs2               (1)   cromwell            (1)
BLAST                 (2)   MRPRESSO              (1)   cutadapt            (2)
BUSCO                 (1)   MSPC                  (1)   cuteSV              (1)
BUStools              (1)   MUMmer                (2)   dRep                (1)
BWA                   (2)   MUSCLE                (2)   dcm2niix            (1)
BXH_XCEDE_TOOLS       (1)   MView                 (1)   deepTools           (1)
BamTools              (4)   MaSuRCA               (1)   duplex-tools        (1)
Bandage               (1)   Maq                   (1)   dxpy                (1)
BayesAss3-SNPs        (1)   Mash                  (2)   easel               (1)
BayesTraits           (1)   Mashtree              (1)   ebGSEA              (1)
Beagle                (1)   MetaBAT               (1)   edlib               (2)
Beast                 (1)   MetaEuk               (2)   eggnog-mapper       (1)
BioPerl               (2)   MetaGeneAnnotator     (1)   elprep              (1)
Biopython             (2)   MetaPhlAn             (1)   epiScanpy           (1)
Bismark               (1)   MethylDackel          (1)   fastPHASE           (1)
Bowtie                (1)   Mikado                (1)   fastahack           (1)
Bowtie2               (2)   MinPath               (1)   fastml              (1)
Bracken               (1)   Minipolish            (1)   fastp               (1)
CD-HIT                (2)   MitoHiFi              (1)   flowFDA             (1)
CMSeq                 (1)   MixMHC2pred           (1)   genomepy            (1)
CSBDeep               (1)   Monocle3              (1)   genozip             (1)
Canvas                (1)   NGS                   (1)   gffutils            (1)
CapnProto             (3)   NanoCaller            (1)   goalign             (1)
CellChat              (1)   NextGenMap            (1)   gofasta             (1)
CellOracle            (1)   OMA                   (1)   gotree              (1)
ChIPseeker            (1)   Oases                 (1)   gubbins             (1)
CheckM                (1)   OpenMM                (1)   hic-straw           (1)
Clair3                (1)   PALEOMIX              (1)   hifiasm             (1)
Cluster-Buster        (1)   PAML                  (1)   humann              (1)
CmdStanR              (1)   PAUP                  (1)   iced                (1)
ColabFold             (1)   PHASE                 (1)   inferCNV            (1)
Coot                  (1)   PICRUSt2              (1)   intervaltree-python (1)
CopyKAT               (1)   PIPITS                (1)   kb-python           (1)
Crumble               (1)   PIRATE                (1)   king                (1)
Cytoscape             (1)   PLINK                 (1)   kma                 (1)
DALI                  (1)   PRANK                 (1)   kneaddata           (1)
DBG2OLC               (1)   PREQUAL               (1)   lDDT                (1)
DIA-NN                (1)   Phenoflow             (1)   leafcutter          (1)
DIAMOND               (2)   PhyloPhlAn            (1)   lifelines           (1)
DSRC                  (1)   PsiCLASS              (1)   loomR               (1)
Delly                 (1)   Pysam                 (2)   loompy              (1)
DendroPy              (2)   QIIME2                (1)   mandrake            (1)
DiffBind              (1)   QUAST                 (1)   mapDamage           (1)
DoubletFinder         (1)   QuickTree             (1)   meRanTK             (1)
EDirect               (1)   R-bundle-Bioconductor (2)   medaka              (1)
EUKulele              (1)   RAxML-NG              (1)   mgltools            (1)
Exonerate             (1)   RDP-Classifier        (1)   miniasm             (1)
FASTA                 (1)   RMBlast               (1)   minimap2            (3)
FASTX-Toolkit         (1)   RSEM                  (1)   mosdepth            (1)
FLASH                 (1)   RTG-Tools             (1)   mpath               (1)
FastANI               (2)   Racon                 (2)   mrcfile             (1)
FastME                (1)   RagTag                (1)   msprime             (1)
FastQC                (2)   Raven                 (1)   muMerge             (1)
FastQ_Screen          (1)   Reads2snp             (1)   multichoose         (1)
FastTree              (1)   RegTools              (1)   mygene              (1)
Flye                  (1)   RepeatMasker          (2)   nanoget             (1)
FragPipe              (1)   ResistanceGA          (1)   nanopolish          (1)
FreeSurfer            (1)   Restrander            (1)   ncbi-vdb            (1)
GATK                  (1)   RnBeads               (1)   nichenetr           (1)
GCTA                  (1)   Roary                 (1)   novaSTA             (1)
GD                    (1)   SAMtools              (5)   ntCard              (1)
GDGraph               (1)   SAP                   (1)   olego               (1)
GEM                   (1)   SEPP                  (2)   ont-fast5-api       (2)
GFF3-toolkit          (1)   SHAPEIT               (1)   ont-guppy           (1)
GOATOOLS              (1)   SMAP                  (1)   oxDNA               (1)
GTDB-Tk               (1)   SMC++                 (1)   parasail            (2)
GTOOL                 (1)   SNAP-HMM              (1)   pftoolsV3           (1)
GapFiller             (1)   SNAP                  (1)   phyx                (1)
GenMap                (1)   SPAdes                (2)   picard              (2)
GenomeThreader        (1)   SRA-Toolkit           (2)   plinkliftover       (1)
GetOrganelle          (1)   SSAHA2                (1)   plot1cell           (1)
GffCompare            (1)   STAR                  (3)   pod5-file-format    (1)
GimmeMotifs           (1)   SUPPA                 (1)   pplacer             (1)
Giotto-Suite          (1)   SURVIVOR              (1)   preseq              (1)
Godon                 (1)   SVIM                  (1)   prodigal            (2)
HAPGEN2               (1)   SVclone               (1)   pyBigWig            (1)
HH-suite              (1)   Sabre                 (1)   pyGenomeTracks      (1)
HISAT2                (1)   Salmon                (2)   pySCENIC            (1)
HMMER                 (2)   Satsuma2              (1)   pybedtools          (2)
HTSeq                 (1)   SeaView               (1)   pyfaidx             (2)
HTSlib                (4)   Seaborn               (2)   pyslim              (1)
HTSplotter            (1)   SeqAn                 (1)   python-parasail     (2)
Health-GPS            (1)   SeqKit                (1)   qnorm               (1)
HiC-Pro               (1)   Seurat                (2)   rapidNJ             (1)
HiCExplorer           (1)   SeuratDisk            (1)   scGSVA              (1)
HiCMatrix             (1)   SeuratWrappers        (1)   scanpy              (1)
Hybpiper              (1)   Sniffles              (1)   sceasy              (1)
IGV                   (1)   SoupX                 (1)   scikit-bio          (1)
IMPUTE2               (1)   SpatialDE             (1)   scrublet            (1)
IQ-TREE               (1)   Strainberry           (1)   seqtk               (2)
ITSx                  (1)   StringTie             (1)   silhouetteRank      (1)
IgBLAST               (1)   Structure             (1)   smfishHmrf          (1)
Inferelator           (1)   T-Coffee              (1)   splitRef            (1)
Infernal              (1)   TM-align              (1)   spoa                (1)
InterProScan          (1)   TRF                   (1)   sradownloader       (1)
Iris                  (1)   TRUST4                (1)   starparser          (1)
IsoQuant              (1)   TWL-NINJA             (1)   tabix               (1)
IsoSeq                (1)   TransDecoder          (1)   tabixpp             (1)
IsoformSwitchAnalyzeR (1)   Trimmomatic           (1)   trimAl              (1)
Jasmine               (1)   USEARCH               (1)   unimap              (1)
Jellyfish             (1)   UniFrac               (1)   vcflib              (1)
KMC                   (1)   VSEARCH               (1)   velocyto            (1)
KMCP                  (1)   WFA2                  (1)   wtdbg2              (1)
Kalign                (1)   WhatsHap              (1)
Kraken                (1)   alleleCount           (1)

supported OS

aws-eb will parse {ID}-{VERSION_ID} as operating system from /etc/os-release

(base) ec2-user@froster:~$ cat /etc/os-release
NAME="Amazon Linux"
VERSION="2023"
ID="amzn"
ID_LIKE="fedora"
VERSION_ID="2023"
PLATFORM_ID="platform:al2023"
PRETTY_NAME="Amazon Linux 2023"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2023"
HOME_URL="https://aws.amazon.com/linux/"
BUG_REPORT_URL="https://github.com/amazonlinux/amazon-linux-2023"
SUPPORT_END="2028-03-01"
dp@r03:~$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.6 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
[dp@node-08-1 ~]$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
dp@grammy:~/gh/dptests$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

About

SciBob is a meta-builder for scientific software, it includes EasyBuild/EESSI, Spack, Conda/Mamba to build a single environment and integrates documentation from multiple sites

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages