Local scratch and temporary space

Each job will always have a separate and entirely private virtual mount point for temporary data (default location is usually /tmp or /var/tmp). The folder will automatically be cleared after each job, but only when the last job on the node for a particular user finishes, so that multiple jobs on the same node can work on the same files. On compute nodes with extra local scratch space it will simply be mounted there instead, so that there is a lot more space available for temporary files on those nodes (refer to compute node partitions).

For jobs requiring heavy I/O where large amounts of temporary data needs to be written, it's important to avoid using the network storage and instead write to a local harddrive. This will avoid overburdening the network (which other jobs also need) and the storage cluster itself, but it can also be much faster in some cases. When deciding whether you need local scratch space, both the size of the data as well as the number of files are important, however the ladder is by far the most important because it has the biggest impact on the performance of the storage cluster overall, which can impact every other user.

To use local scratch space, you must first ensure to submit your job(s) using the --constraint scratch option to the srun, sbatch, and salloc commands, which will ensure that your job is run on a compute node with local scratch space available. Then, simply ensure that your job writes temporary files to the /tmp folder. The /tmp folder is managed by the SLURM scheduler and is private to each user and their jobs, so you can write whatever you want there without worrying about other jobs overwriting your files.

If you would need more space for temporary data on compute nodes that have no extra local scratch space, or you need even more temporary space than there's available on local scratch space, it's possible to place it on the Ceph network storage as well. However, if you choose to do so, please see the best practices below. It can simply be done by for example setting the environment variable TMPDIR early in the batch script by adding a line, fx export TMPDIR=${HOME}/tmp. Ensure no conflicts can occur within the folder(s) if you run multiple jobs on multiple different compute nodes at once.

Warning

All data owned by your user anywhere on local scratch space on a particular compute node is automatically deleted after the last SLURM job run by your user finishes. Therefore, ensure that any output files generated from jobs are moved to one of the network storage mount points as a last step, otherwise they will be lost.