Local scratch and temporary space
Each job will have a separate and entirely private mount point for temporary data (default location is usually /tmp
or /var/tmp
), which will be automatically deleted after each job. On compute nodes with extra local scratch space it will be automatically there instead, so that there is a lot more space available for temporary files on those nodes (refer to compute node partitions).
For jobs requiring heavy I/O where large amounts of temporary data needs to be written, it's important to avoid using the network storage and instead write to a local harddrive. This will avoid overburdening the network (which other jobs also need) and the storage cluster itself, but it can also be much faster in some cases. When deciding whether you need local scratch space, both the size of the data as well as the number of files are important, however the ladder is by far the most important because it has the biggest impact on the performance of the storage cluster overall, which can impact every other user.
To use local scratch space, you must first ensure to submit your job(s) using the --constraint scratch
option to the srun
, sbatch
, and salloc
commands, which will ensure that your job is run on a compute node with local scratch space available. Then, simply ensure that your job writes to the /tmp
folder, which will automatically be mounted on the local scratch drive on the allocated compute node. The /tmp
folder is managed by the SLURM scheduler and is private to each user and their jobs, so you can write whatever you want there without worrying about other jobs overwriting your files.
If you would need more space for temporary data on compute nodes that have no extra local scratch space, or you need even more temporary space than there's available on local scratch space, it's possible to place it on the Ceph network storage as well. However, if you choose to do so, please see the best practices below. It can simply be done by for example setting the environment variable TMPDIR
early in the batch script by adding a line, fx export TMPDIR=${HOME}/tmp
. Ensure no conflicts can occur within the folder(s) if you run multiple jobs on multiple different compute nodes at once.
Warning
All data owned by your user anywhere on local scratch space on a particular compute node is automatically deleted after the last SLURM job run by your user has finished. Therefore, ensure that output files generated from jobs are moved to one of the network mount points as a last step, otherwise they will be lost.