Skip to content

Compute node partitions

The compute nodes are divided into separate partitions based on their hardware configuration. This is to allow that for example CPU's from different manufacturer generations can be set up with a different billing factor to ensure fair usage accounting (newer CPU's are faster), nodes with more memory (per CPU) are only used for jobs that actually require more memory, and GPU nodes are only used for jobs that require GPU's, etc.

Automatic partition selection

BioCloud is a quite heterogeneous cluster because nodes are purchased at different times, hence their hardware configuration is also different. Furthermore, the number of partitions will only increase in the future as more nodes are added to the cluster at different times, which increases complexity, making it difficult or confusing to submit jobs to the most appropriate partition(s). This can result in an inefficient cluster with longer queue times and wasted computing resources. Therefore, the most appropriate partition for batch jobs is automatically assigned by the SLURM scheduler according to custom logics defined for our specific setup. Manually specifying a partition using the --partition option will have no effect, as it will be overridden. Interactive jobs will always be assigned the interactive partition.

The most appropriate partition is determined by several factors, the most significant of which are the requested memory per CPU ratio and any specified node features. Partitions are allocated according to the defined priority tiers shown in the table below, ensuring that newer (and faster) compute nodes are always selected first before older nodes. Secondly, if all general compute nodes happen to be fully allocated, jobs with only modest memory requirements will still be able to run on compute nodes with extra memory instead, if available. Conversely, jobs that require extra memory per CPU will not be able to run on a general compute node as this can result in unavailable CPUs due to fully allocated memory. In certain situations, the automatically assigned partition may not be optimal, in which case manual intervention by an administrator may be necessary and exceptions can be made if it makes sense depending on the situation.

CPU partitions

Below is a brief overview of all CPU partitions. Details about the exact CPU model, scratch space and node features for each compute node are listed further down.

Overview

Partition Nodes Total CPUs Total memory Billing factor Priority tier
interactive 2 512T 3.0 TB 0.5x -
zen5 3 864T 4.5 TB 1.0x 1st
zen3 8 1408T 6.5 TB 0.5x 2nd
zen5x 2 576T 4.6 TB 1.5x 3rd
zen3x 2 448T 4.0 TB 1.0x 4th
TOTAL 17 3808 19.6 TB

The interactive partition

This partition is reserved for short and small interactive jobs, where users can do data analysis, quick testing, and day-to-day work without having to wait for hours or even days due to queue time. Therefore, no batch jobs will be able to run here, and there is a limited amount of resources available to ensure high availability. Ideally, the interactive partition should never be fully utilized. Furthermore, it is optimized for interactive jobs, which are usually very inefficient (e.i. the allocated CPU's do absolutely nothing when you are just typing or clicking around).

Hostname CPU model CPUs Memory Scratch space Features
bio-node[16-17] 2x AMD EPYC 9535 128C / 256T 1.5 TB zen5
epyc9535

Batch job partitions

These partitions are dedicated to non-interactive and efficient batch jobs that can potentially run for a long time. Some nodes have a higher memory per CPU ratio than others, hence they are separated into different partitions, where those with more memory are prefixed with x. The partitions are otherwise named appropriately according to the generation of AMD EPYC CPUs installed in the nodes.

zen3

Hostname CPU model CPUs Memory Scratch space Features
bio-node01 2x AMD EPYC 7713 128C / 256T 1.0 TB 3.5 TB NVMe zen3
epyc7713
scratch
bio-node02 2x AMD EPYC 7552 96C / 192T 0.5 TB zen3
epyc7552
bio-node[03,04,06,07] 2x AMD EPYC 7643 96C / 192T 1.0 TB zen3
epyc7643
bio-node05 2x AMD EPYC 7643 96C / 192T 1.0 TB 18 TB NVMe zen3
epyc7643
scratch

zen3x

Hostname CPU model CPUs Memory Scratch space Features
bio-node08 2x AMD EPYC 7643 96C / 192T 2.0 TB zen3
epyc7643
bio-node09 2x AMD EPYC 7713 128C / 256T 2.0 TB 12.8 TB NVMe zen3
epyc7713

zen5

Hostname CPU model CPUs Memory Scratch space Features
bio-node[11-13] 2x AMD EPYC 9565 144C / 288T 1.5 TB zen5
epyc9565

zen5x

Hostname CPU model CPUs Memory Scratch space Features
node[14-15] 2x AMD EPYC 9565 144C / 288T 2.3 TB 12.8 TB NVMe zen5
epyc9565
scratch

GPU partitions

Nodes in this partition have GPUs installed and should ONLY be used when a GPU is needed for the job. The partition is chosen depending on the GPU model requested using the --gres option to salloc, srun, and sbatch job submission commands. Instructions on how to request a GPU node can be found in the job submission page.

gpu-a10

Hostname CPU model CPUs Memory Scratch space GPU Features
bio-node10 2x AMD EPYC 7313 32C / 64T 256 GB 3.0 TB NVMe NVIDIA A10 zen3
epyc7313
scratch
a10