Job submission
Once you are logged in to one of the login nodes through SSH, there are several ways to request resources and run jobs at different complexity levels through SLURM. Here are the most essential ways for interactive (foreground) and non-interactive (background) use. Usually, you only need to be acquainted with 3 SLURM job submission commands depending on your needs. These are srun
, salloc
, and sbatch
. They all share the exact same options to define trackable resource constraints ("TRES" in SLURM parlor, fx number of CPUs, memory, GPU, etc), time limits, email for job status notifications, and many other things, but are made for different use-cases, which will be described below.
Interactive jobs
An interactive job is useful for quick testing and development purposes, where you only need resources for a short period of time to experiment with scripts or workflows on minimal test data, before submitting larger batch jobs using sbatch
that are expected to run for much longer in the background.
To immediately request and allocate resources (once available) and start an interactive shell session directly on the allocated compute node(s) through SLURM, just type salloc
:
Here SLURM will find a compute node with the default amount of resources available (which is currently 1CPU, 512MB memory, and a 1-hour time limit) and start the session on the allocated compute node(s) within the requested resource constraints. If you need more resources you need to explicitly ask for it, for example:
Resources will then remain allocated until the shell is terminated with CTRL+d
, typing exit
, or closing the window. If it takes more than a few seconds to allocate resources, your job might be queued due to a variety of reasons. If so check the REASON
codes for the job with squeue
from another session.
Interactive jobs and CPU efficiency
When using an interactive shell through salloc
it's important to keep in mind that the allocated resources remain reserved entirely for you until you exit
the shell session. So please don't leave it hanging idle for too long if you know you are not going to actively use it, otherwise other users might have needed the resources in the meantime. Furthermore, interactive jobs are usually very inefficient, because the allocated CPU's do absolutely nothing when you are just typing or clicking around. Therefore, interactive jobs will run on the dedicated interactive
partition, which is optimized for inefficient jobs.
If you just need to run a single command/script in the foreground, it's much better to use srun
instead of salloc
, which will run things immediately on a compute node instead of first starting an interactive shell. As opposed to salloc
the job is terminated immediately once the command/script finishes, which increases CPU utilization:
The terminal will be blocked for the entire duration, hence for longer running jobs it's much more convenient to instead write the commands in a script and submit a non-interactive batch job using sbatch
, which will run in the background. For shorter commands, you can also instead of srun <options> <command>
use the --wrap
option to sbatch
to submit it as a non-interactive batch job, for example:
srun
is sometimes also used to run multiple tasks/steps (parallel processes) from within batch scripts, which can then span multiple compute nodes and run concurrently.
Connectivity and interactive jobs
Keep in mind that with interactive jobs briefly losing connection to the login-node can result in the job being killed. This is to avoid that resources would otherwise remain blocked due to unresponsive shell sessions hanging until time runs out. If you still see the job in the squeue
overview, however, use sattach
to reattach to a running interactive job, just remember to append .interactive
to the job ID, fx 38.interactive
.
Graphical (GUI) apps
From the command line
In order to run graphical applications, simply append the --x11
option to salloc
or srun
and run the program. The graphical app will then show up in a window on your own computer, while running inside a SLURM job on the cluster:
It's important to mention that in order for this to work properly, you must first ensure that you have connected to the particular login node using either the ssh -X
option, or that you have set the ForwardX11 yes
option in your SSH config file, see example here.
From the interactive web portal (Open Ondemand)
To run graphical/interactive software, you can also just use the interactive web portal to start a virtual desktop and run it in there. This is especially handy if it needs to run for a long time, because you can log off while it's running and come back later to check the status. Just remember to stop/cancel the job as soon as possible to free up resources for other users. If the particular software is started from the command line, you could simply append mycommand && scancel <jobid>
to the command to automatically stop the job serving the virtual desktop once the command has finished running.
Batch jobs (non-interactive jobs)
The most convenient and highly recommended way to run most things is by submitting jobs to the job queue for execution in the background in the form of SLURM batch scripts through the sbatch
command. Resource requirements are instead defined by #SBATCH
comment-style directives at the top of a shell script, and the script is then submitted to SLURM using a simple sbatch script.sh
command. This is ideal for submitting large jobs that will run for many hours or days, but of course also for testing/development work. A SLURM batch script should always contain (in order):
- Any number of
#SBATCH
lines with options defining resource constraints and other options for the subsequent SLURM task(s) to be run. - A list of commands to load any required conda environments needed for all tasks. This can also be done separately within any external scripts being executed from the job.
- The main body of the script/workflow, or call to an external script or program to run within the job.
Submit the batch script to the SLURM job queue using sbatch script.sh
, and it will then start once the requested amount of resources are available (also taking into account your past usage and priorities of other jobs etc, all 3 job submission commands do that). If you set the --mail-user
and --mail-type
arguments you should get a notification email once the job starts and finishes with additional details like how many resources you have actually used compared to what you have requested. This is essential information for future jobs to avoid overbooking and maximize resource utilization of the cluster.
You can also simply add #SBATCH
lines to any shell script you already have, and also run the script with arguments, so for example instead of bash script.sh -i input -o output ...
you can simply run sbatch script.sh -i input -o output ...
.
ALWAYS check up on running and recently completed jobs
The queue time and efficiency of the whole cluster is directly dependent on the average CPU efficiency of all jobs. It is therefore extremely important to ensure that your job uses all the CPU's that you've requested for most of the duration of the job. If not, please cancel the job and submit a new one with fewer CPU's, or adjust the job to use all the CPU's more efficiently. Furthermore, please ALWAYS also check up on the CPU efficiency of recently finished jobs by either checking the stats in job notification emails or by using seff <jobid>
and adjust your next submissions accordingly to avoid idle CPU's.
Non-interactive job output (stdout
/stderr
streams)
The job is handled in the background by the SLURM daemons on the individual compute nodes, so you won't see any output in the terminal. It will instead be written to the file(s) defined by --output
and/or --error
. To follow along in real time run for example tail -f job_123.out
from a login node.
Single-node, single-task example
A simple example SLURM sbatch
script for a single task could look like this:
#!/usr/bin/bash -l
#SBATCH --job-name=myjob
#SBATCH --output=job_%j_%x.out
#SBATCH --cpus-per-task=10
#SBATCH --mem=10G
#SBATCH --time=0-04:00:00
#SBATCH --mail-type=END,FAIL,TIME_LIMIT_90
#SBATCH --mail-user=abc@bio.aau.dk
# Exit script on the first error
set -euo pipefail
# Load conda environment to make required software available
mamba activate minimap2
# Obtain number of CPUs available from the SLURM allocation itself
# (It's best practice to use the same variable everywhere from here and onwards. If you fx change the resource requirements above it's easy to forget to update it everywhere)
max_threads="$(nproc)"
# Run any number of commands as part a full pipeline script or call scripts from elsewhere
minimap2 -t "$max_threads" database.fastq input.fastq > out.file
Array jobs
If you need to run the same command or script multiple times with different input files, parameters, or arguments, you can use SLURM arrays. This allows you to submit any number of jobs at once, which can then run simultaneously in parallel across the cluster. By making the unique array ID for each job within the array available with the environment variable SLURM_ARRAY_TASK_ID
, you can ensure that each job will run with a different input file, parameter, or argument, etc. Each job within the array will get their respective job array ID appended to the parent job ID, for example <jobid>_[0,1,2,3]
, and can be controlled individually. You can also set --mail-type=ARRAY_TASKS
to receive notification emails for each task in the array if necessary.
#!/bin/bash -l
#SBATCH --job-name=myarray_job
#SBATCH --output=job_%j_%x_%a.out
#SBATCH --cpus-per-task=10
#SBATCH --mem=10G
#SBATCH --time=0-04:00:00
#SBATCH --mail-type=END,FAIL,TIME_LIMIT_90,ARRAY_TASKS
#SBATCH --mail-user=abc@bio.aau.dk
#SBATCH --array=0-8
# Run a script for all files in a folder
FILES=(/path/to/data/*)
./my_script ${FILES[$SLURM_ARRAY_TASK_ID]}
# Or run a script with a predefined list of arguments, files, folders, or anything else
ARGS=(0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9)
./my_script ${ARGS[$SLURM_ARRAY_TASK_ID]}
Important
The bash -l
in the top "shebang" line is required for the compute nodes to be able to load conda environments correctly.
Requesting one or more GPUs
If you need to use one or more GPUs you need to add --gres=gpu:x
(where x
refers to the number of GPUs you need) to your sbatch
, salloc
, or srun
commands. Please don't do CPU work on the gpu
partition unless you also need a GPU. Additional details here.
Requesting compute nodes with special features
If you need to run your job on compute nodes with special node features, for example if you need local scratch space or a specific CPU model/generation, you can use the --prefer
or --constraint
options. A list of features for each compute node can be found under Compute node partitions. The --prefer
option will only prefer nodes with the specified list of features, but if no such nodes are available it will still run the job on any other available node. The --constraint
option will instead require the specified features to be available on the compute node(s), and if no nodes with the required features are available the job will be queued until such nodes become available. For example, to ensure that the job will run on a compute node with the zen5
CPU generation and local scratch space, you can use:
If you simply want to prefer nodes with the later zen5
CPU generation, but don't require it, you can use:
Of course these options can be written in the #SBATCH
section of batch scripts as well.
How to check up on the resource utilization of running jobs
As the srun
command can be used to run commands within job allocations you have already been granted, it can also be used to monitor the resource usage of running jobs from the inside in real time. This is possible by obtaining an interactive shell within the job allocation using srun --jobid <jobid> --pty bash
, and then run htop
or top
to inspect CPU and memory usage of processes run by your user on a particular compute node. You can hit u
to filter processes by your user, and then t
for tree view to see all processes running within the process tree started from your slurm job. To inspect GPU usage use nvidia-smi
or nvtop
. Note that only a single interactive shell can be active within the same job allocation at any one time.
Most essential options
There are plenty of options with the SLURM job submission commands. Below are the most important ones for our current setup and common use-cases. If you need anything else you can start with the SLURM cheatsheet, or refer to the SLURM documentation for the individual commands srun
, salloc
, and sbatch
.
Option | Default value(s) | Description |
---|---|---|
--job-name |
The name of the script | A user-defined name for the job or task. This name helps identify the job in logs and accounting records. |
--begin |
now |
Specifies a start time for the job to begin execution. Jobs won't start before this time. Details here. |
--output , --error |
slurm-<jobid>.out |
Redirect the job's standard output/error (stdout /stderr ) to a file, ideally on network storage. All directories in the path must exist before the job can start. By default stderr and stdout are merged into a file slurm-%j.out in the current workdir, where %j is the job allocation number. See filename patterns here. |
--ntasks-per-node |
1 |
Specifies the number of tasks to be launched per allocated compute node. |
--ntasks |
1 |
Indicates the total number of tasks or processes that the job should execute. |
--cpus-per-task |
1 |
Sets the number of CPU cores allocated per task. Required for parallel and multithreaded applications. |
--mem , --mem-per-cpu , or --mem-per-gpu |
512MB (per node) |
Specifies the memory limit per node, or per allocated CPU/GPU. These are mutually exclusive. |
--nodes |
1 |
Indicates the total number of compute nodes to be allocated for the job. |
--nodelist |
Specifies a comma-separated list of specific compute nodes to be allocated for the job. | |
--exclusive |
Flag. If set will request exclusive access to a full compute node, meaning no other jobs will be allowed to run on the node. In this case you might as well also use all available memory by setting --mem=0 , unless there are suspended jobs on the particular node. Details here. |
|
--gres |
List of "generic consumable resources" to use, for example a GPU. Details here. | |
--prefer or --constraint |
Prefer or require specific node features (see Compute node partitions), respectively. Details here | |
--reservation |
Allocate resources for the job from the named reservation(s). Can be a comma-separated list of more than one. Details here | |
--chdir |
Set the working directory of the batch script before it's executed. Setting this using environment variables is not supported. | |
--time |
0-01:00:00 |
Defines the maximum time limit for job execution before it will be killed automatically. Format DD-HH:MM:SS . Maximum allowed value is that of the partition used. Details here |
--mail-type |
NONE |
Configures email notifications for certain job events. One or more comma-separated values of: NONE , ALL , BEGIN , END , FAIL , REQUEUE , ARRAY_TASKS . TIME_LIMIT_90 , TIME_LIMIT_80 , and TIME_LIMIT_50 might also become handy to be able to extend the time limit before the job is killed. Details here |
--mail-user |
Local user | Specifies the email address where job notifications are sent. |
--x11 |
all |
Enable forwarding of graphical applications from the job to your computer. It's required that you have either connected using the ssh -X option or you have set the ForwardX11 yes option in your SSH config file. For salloc or srun only. Details here. |
--wrap |
For sbatch only. Submit a command string as a batch script. Details here. |