Job submission
Once you are logged in to one of the login nodes through SSH, there are several ways to request resources and run jobs at different complexity levels through SLURM, but here are the most essential ways for interactive (foreground) and non-interactive (background) use. Generally you need to be acquainted with 3 SLURM job submission commands depending on your needs. These are srun
, salloc
, and sbatch
. They all share the exact same options to define trackable resource constraints ("TRES" in SLURM parlor, fx number of CPUs, memory, GPU, etc), time limits, email for job status notifications, and many other things, but are made for different use-cases, which will be described below.
It's important to submit your job to the most appropriate hardware partition to ensure an efficient cluster for everyone, so after reading this page please also read the next page carefully.
Interactive jobs
An interactive shell is useful for testing and development purposes where you need resources only for a short time, or to experiment with scripts and workflows on minimal example data before submitting larger jobs using sbatch
that will run for much longer in the background instead.
Important
When using an interactive shell it's important to keep in mind that the allocated resources remain reserved only for you until you exit
the shell session. So don't leave it hanging idle for too long if you know you are not going to actively use it, otherwise other users might have needed the resources in the meantime. For the same reasons, it's not allowed to use salloc
or srun
within an emulated terminal with screen
or tmux
, because resources will remain reserved even though nothing is running after commands/scripts have finished. It's much better to use sbatch
instead. As a last resort if you really insist on an interactive session you can append for example ; exit
to the last command you execute to ensure that the job allocation is automatically terminated when the command exits (regardless of exit status). Or just use sbatch
!! :)
Using the salloc
command
To immediately request and allocate resources (once available) and start an interactive shell session directly on the allocated compute node(s) through SLURM, just type salloc
:
Here SLURM will find a compute node with the default amount of resources available (which is 1CPU, 512MB memory, and a 1-hour time limit at the time of writing) and start the session on the allocated compute node(s) within the requested resource constraints. If you need more resources you need to explicitly ask for it, for example:
Resources will then remain allocated until the shell is exited with CTRL+d
, typing exit
, or closing the window. If it takes more than a few seconds to allocate resources, your job might be queued due to a variety of reasons. If so check the REASON
codes for the job with squeue
from another session.
Using the srun
command
If you just need to run a single command/script in the foreground it's better to use srun
, which will run things directly on a compute node instead of first starting an interactive shell. As opposed to salloc
the job is terminated immediately once the command/script finishes. Any required software modules or conda environments must be loaded first before issuing the command, for example:
The terminal will be blocked for the entire duration, hence for larger jobs it's ideal to submit a job through sbatch
instead, which will run in the background.
srun
is also used if multiple tasks (separate processes) must be run within the same resource allocation (job) already obtained through salloc
or sbatch
, see example below. SLURM tasks can then span multiple compute nodes at once to distribute highly parallel work at any scale.
Graphical apps (GUI)
In order to run graphical programs simply append the --x11
option to salloc
or srun
and run the program. The graphical app will then show up in a window on your own computer, while running inside a SLURM job on the cluster:
It's important to mention that in order for this to work properly, you must first ensure that you have connected to the particular login node using either the ssh -X
option or that you have set the ForwardX11 yes
option in your SSH config file, see example here.
You can also just use the interactive web portal to start a virtual desktop to run any graphical software, which is especially handy if it needs to run for a long time, because you can log off while it's running and come back later to check the status.
Connectivity and interactive jobs
Keep in mind that with interactive jobs briefly losing connection to the login-node can result in the job being killed. This is to avoid that resources would otherwise remain blocked due to unresponsive shell sessions. If you still see the job in the squeue
overview, however, use sattach
to reattach to a running interactive job, just remember to append .interactive
to the job ID, fx 38.interactive
.
Non-interactive jobs
The most convenient and highly recommended way to run most things is by submitting jobs to the job queue for execution in the background in the form of SLURM batch scripts through the sbatch
command. Resource requirements are instead defined by #SBATCH
comment-style directives at the top of a shell script, and the script is then submitted to SLURM using a simple sbatch script.sh
command. This is ideal for submitting large jobs that will run for many hours or days, but of course also for testing/development work. A SLURM batch script should always contain (in order):
- Any number of
#SBATCH
lines with options defining resource constraints and other options for the subsequent SLURM task(s) to be run. - A list of commands to load required software modules or conda environments that are required for all tasks. This can also be done separately within any external scripts being run.
- The main body of the script/workflow, or call to an external script or program to run within the resource allocation (=job ID)
Submit the batch script to the SLURM job queue using sbatch script.sh
, and it will then start once the requested amount of resources are available (also taking into account your past usage and priorities of other jobs etc, all 3 job submission commands do that). If you set the --mail-user
and --mail-type
arguments you should get a notification email once the job starts and finishes with additional details like how many resources you have actually used compared to what you have requested. This is essential information for future jobs to avoid overbooking and maximize resource utilization of the cluster.
You can also simply add #SBATCH
lines to any shell script you already have, and also run the script with arguments, so for example instead of bash script.sh -i input -o output ...
you can simply run sbatch script.sh -i input -o output ...
.
Non-interactive job output (stdout
/stderr
streams)
As the job is handled by SLURM in the background by the SLURM daemons on the individual compute nodes you won't see any output to the terminal. It will instead be written to the file(s) defined by --output
and/or --error
. To follow along in real time use for example tail -f job_123.out
.
Single-node, single-task example
A simple example SLURM sbatch
script for a single task could look like this:
#!/usr/bin/bash -l
#SBATCH --job-name=minimap2test
#SBATCH --output=job_%j_%x.out
#SBATCH --partition=default-op
#SBATCH --cpus-per-task=10
#SBATCH --mem=10G
#SBATCH --time=2-00:00:00
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=abc@bio.aau.dk
# Exit script on the first error
set -euo pipefail
# load conda environment or software modules
conda activate minimap2
# Get number of CPUs from the SLURM allocation (SLURM will sometimes
# give you more than what you have requested to optimize efficiency).
# This only works with single-node jobs
max_threads="$(nproc)"
# run one or more commands as part a full pipeline script or call scripts from elsewhere
minimap2 -t "$max_threads" database.fastq input.fastq > out.file
Multi-node, multi-task example
An example SLURM sbatch
script for parallel (independent) execution across multiple nodes could look like this:
#!/usr/bin/bash -l
#SBATCH --job-name=minimap2test
#SBATCH --output=job_%j_%x.out
#SBATCH --nodes=5
#SBATCH --ntasks=5
#SBATCH --ntasks-per-node=1
#SBATCH --partition=default-op
#SBATCH --cpus-per-task=60
#SBATCH --mem-per-cpu=3G
#SBATCH --time=2-00:00:00
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=abc@bio.aau.dk
# Exit script on the first error
set -euo pipefail
# load conda environment or software modules
conda activate minimap2
# Must use srun when doing distributed work across multiple nodes
srun --ntasks 1 minimap2 -t 60 database.fastq input1.fastq > out.file1
srun --ntasks 1 minimap2 -t 60 database.fastq input2.fastq > out.file2
srun --ntasks 1 minimap2 -t 60 database.fastq input3.fastq > out.file3
srun --ntasks 1 minimap2 -t 60 database.fastq input4.fastq > out.file4
srun --ntasks 1 minimap2 -t 60 database.fastq input5.fastq > out.file5
For more examples of parallel jobs and array jobs, for now see for example this page.
Important
The bash -l
in the top "shebang" line is required for the compute nodes to be able to load software modules and conda environments correctly.
Jobs that span multiple compute nodes
If needed the BioCloud is properly set up with the OpenMPI
and PMIx
message interfaces for distributed work across multiple compute nodes, but it requires you to tailor your scripts and commands specifically for distributed work and is a topic for another time. You can run "brute-force parallel" jobs, however, using for example GNU parallel and distribute them across nodes, but this is only for experienced users and they must figure that out for themselves for now.
Requesting one or more GPUs
If you need to use one or more GPUs you need to specify --partition=gpu
and set --gres=gpu:x
, where x
refers to the number of GPUs you need. Please don't do CPU work on the gpu
partition unless you also need a GPU. It's also worth considering using --cpus-per-gpu
and --mem-per-gpu
. Additional details here.
Most essential options
There are plenty of options with the SLURM job submission commands, but below are the most important ones for our current setup and common use-cases. If you need anything else you can start with the SLURM cheatsheet, or else refer to the SLURM documentation for the individual commands srun
, salloc
, and sbatch
.
Option | Default value(s) | Description |
---|---|---|
--job-name |
The name of the script | A user-defined name for the job or task. This name helps identify the job in logs and accounting records. |
--begin |
Now | Specifies a start time for the job to begin execution. Jobs won't start before this time. Details here. |
--output , --error |
slurm-<jobid>.out |
Redirect the job's standard output/error (stdout /stderr ) to a file, ideally on network storage. All directories in the path must exist before the job can start. By default stderr and stdout are merged into a file slurm-%j.out in the current workdir, where %j is the job allocation number. See filename patterns here. |
--ntasks-per-node |
1 |
Specifies the number of tasks to be launched per allocated compute node. |
--ntasks |
1 |
Indicates the total number of tasks or processes that the job should execute. |
--cpus-per-task |
1 |
Sets the number of CPU cores allocated per task. Required for parallel and multithreaded applications. |
--mem , --mem-per-cpu , or --mem-per-gpu |
512MB (per node) |
Specifies the memory limit per node, or per allocated CPU/GPU. These are mutually exclusive. |
--nodes |
1 |
Indicates the total number of compute nodes to be allocated for the job. |
--nodelist |
Specifies a comma-separated list of specific compute nodes to be allocated for the job. | |
--exclusive |
Flag. If set will request exclusive access to a full compute node, meaning no other jobs will be allowed to run on the node. In this case you might as well also use all available memory by setting --mem=0 , unless there are suspended jobs on the particular node. Details here. |
|
--gres |
List of "generic consumable resources" to use, for example a GPU. Details here. | |
--partition |
general |
The SLURM partition to which the job is submitted. Default is to use the general partition. |
--reservation |
Allocate resources for the job from the named reservation(s). Can be a comma-separated list of more than one. Details here | |
--chdir |
Set the working directory of the batch script before it's executed. Setting this using environment variables is not supported. | |
--time |
0-01:00:00 |
Defines the maximum time limit for job execution before it will be killed automatically. Format DD-HH:MM:SS . Maximum allowed value is that of the partition used. Details here |
--mail-type |
NONE |
Configures email notifications for certain job events. One or more comma-separated values of: NONE , ALL , BEGIN , END , FAIL , REQUEUE , ARRAY_TASKS . Details here |
--mail-user |
Local user | Specifies the email address where job notifications are sent. |
--x11 |
all |
Enable forwarding of graphical applications from the job to your computer. It's required that you have either connected using the ssh -X option or you have set the ForwardX11 yes option in your SSH config file. For salloc or srun only. Details here. |
Most options are self-explanatory. But for our setup and common use-cases you almost always want to set --nodes
to 1, meaning your job will only run on a single compute node at a time. For multithreaded applications (most are nowadays) you mostly only need to set ntasks
to 1
because threads are spawned from a single process (=task in SLURM parlor), and thus increase --cpus-per-task
instead.