Usage reporting

Every month users get an automated usage report sent by email. The report contains only basic, but important, information such as total CPU allocation hours and an efficiency summary of all jobs run in the past month. The latter is essential information to minimize wasted resources and thus also the average queue time for all users. For additional data you can use some of the commands below.

Running jobs

Use sstat to show the status and live usage accounting information of only running jobs. For batch scripts you need to add .batch to the job ID, for example:

$ sstat <job_id>.batch

This will print EVERY metric, so it's nice to select only a few most relevant ones, for example:

$ sstat --jobs <job_id>.batch --format=jobid,avecpu,maxrss,ntasks

Useful format variables

Variable	Description
avecpu	Average CPU time of all tasks in job.
averss	Average resident set size of all tasks.
avevmsize	Average virtual memory of all tasks in a job.
jobid	The id of the Job.
maxrss	Maximum number of bytes read by all tasks in the job.
maxvsize	Maximum number of bytes written by all tasks in the job.
ntasks	Number of tasks in a job.

For all variables see the SLURM documentation

Past jobs

To view the status of past jobs and their usage accounting information use sacct. sacct will return everything accounted for by default which is very inconvenient to view in a terminal window, so the below command will show the most essential information:

$ sacct -o jobid,jobname,start,end,NNodes,NCPUS,ReqMem,CPUTime,AveRSS,MaxRSS --user=$USER --units=G -j 138
JobID           JobName               Start                 End   NNodes      NCPUS     ReqMem    CPUTime     AveRSS     MaxRSS 
------------ ---------- ------------------- ------------------- -------- ---------- ---------- ---------- ---------- ---------- 
138          interacti+ 2023-11-21T10:43:48 2023-11-21T10:43:59        1         16        20G   00:02:56                       
138.interac+ interacti+ 2023-11-21T10:43:48 2023-11-21T10:43:59        1         16              00:02:56          0          0 
138.extern       extern 2023-11-21T10:43:48 2023-11-21T10:43:59        1         16              00:02:56      0.00G      0.00G

There is a large number of other options to show, see SLURM docs. If you really want to see everything use for example sacct --long | less -S.

Reservations

Show current reservations in the system and reservation usage of the reservation total

# show current reservations in the system
$ sinfo -T
RESV_NAME       STATE           START_TIME             END_TIME     DURATION  NODELIST
maintenance  INACTIVE  2023-12-18T23:00:00  2023-12-20T01:00:00   1-02:00:00  bio-oscloud[02-09]

# show details about all current reservations
$ scontrol show reservations
ReservationName=amplicon StartTime=2024-11-04T08:00:00 EndTime=2024-11-18T08:00:00 Duration=14-00:00:00
   Nodes=bio-oscloud03 NodeCnt=1 CoreCnt=192 Features=(null) PartitionName=general Flags=
     NodeName=bio-oscloud03 CoreIDs=0-191
   TRES=cpu=192
   Users=abc@bio.aau.dk Groups=(null) Accounts=(null) Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a
   MaxStartDelay=(null)

# show reservation utilization in CPU hours and percent of the reservation total used
$ sreport reservation utilization -t hourper
--------------------------------------------------------------------------------
Reservation Utilization 2024-11-04T00:00:00 - 2024-11-04T23:59:59
Usage reported in TRES Hours/Percentage of Total
--------------------------------------------------------------------------------
  Cluster      Name               Start                 End      TRES Name                     Allocated                          Idle 
--------- --------- ------------------- ------------------- -------------- ----------------------------- ----------------------------- 
 biocloud  amplicon 2024-11-04T08:00:00 2024-11-05T15:18:55            cpu                  1154(19.20%)                  4858(80.80%)

Job efficiency summary

Individual jobs

To view the efficiencies of individual jobs use seff, for example:

$ seff 2357
Job ID: 2357
Cluster: biocloud
User/Group: <username>
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 96
CPU Utilized: 60-11:06:29
CPU Efficiency: 45.76% of 132-02:48:00 core-walltime
Job Wall-clock time: 1-09:01:45
Memory Utilized: 383.42 GB
Memory Efficiency: 45.11% of 850.00 GB

This information will also be shown in notification emails when jobs finish.

Multiple jobs

Perhaps a more useful way to use sacct is through the

   Partition interactive

Usage reporting

Running jobs

Past jobs

Reservations

Job efficiency summary

Individual jobs

Multiple jobs

Usage reports

Account usage by user

User usage by account

Top users