workflow on jobs
Note: Since users' computation tasks are usually depending on other packages and softwares, it would be better to read the module section first. In that section, you would be told how to load softwares you want to use or link with. If you don't load any softwares, the system has very limited default softwares like gcc and openmpi, which in general cannot meet your research needs.
In the following, you will be introduced to the general workflow of using the cluster. We assume that the users have adequate knowledge of compiling source code, linking shared library, running executable and mpi programming in single node Linux. If not, please learn basics about Linux based programming first.
connecting
This part include SSH to use HPC's CLI as well as file transfers to upload src code/download data files. Please see connecting section on these tasks.
compile jobs
Use master node to edit src code and compile your binary. You may want to spack load
approriate packages in this step. For example, if your code depdends on MKL, then you must firstly spack load intel-parallel-studio%intel
first.
Some tips on compiling src code:
Since you may need to compile some projects again and again, we strongly recommend the users write up their own Makefile to speed up and reduce error.
When intel mkl is utilized in the code, explicitly or implicitly via dependences, we strongly recommended to use intel compilers to avoid boring and fragile linking flags for gcc.
submit jobs
Warning: Login nodes [master] are not for running computation intensive jobs in general. But since the cluster has very limited computation nodes, it is ok to run some tasks on master node without affecting others. But always be careful when you decided to run some tasks on master, it is taken as the last sort. Use computation nodes first if available.
Before arranging your jobs on the cluster, you should first learn about hardware specs (CPU and memory info in particular) for each nodes to better utilize the computation resource. See hardware specs here. A very quick summary, for each nodes in the HPC available to normal users, there are 56 CPU threads and a minimum of 128G memory. You could also use sinfo -Nel
to check node info and availability.
In our cluster, we use SLURM as resource and job managers. Ideally, all jobs should be submitted via slurm.
sbatch basics
The most common way to submit jobs is by sbatch
. First you need to create some bash scripts, for example run.sh
.
Let's read the sbtach script line by line.
The first line is just shebang, indication the intepreter of the script. You don't need to change it for most of the time.
The second and the third line is begin with #SBATCH
, which indicates parameters of sbatch, there are more parameters, which can be written down line by line. But -N
is the must-have one, it indicates how manty nodes are required for this task. Besides, -n
is also necessary to be set for using mpi without error (such as slot not enough). Please man sbatch
for other parameters. Some more parameters on compute resource request are recommended, such as --cpus-per-task
, —ntasks
, --mem-per-cpu
.
The following line is to activate spack, it is required when you want to load some modules. Since it is so common, so it would be better to always keep this line in the beginning of the script: source /etc/spack-load
.
The fifth line implies that we want to load intel module, which includes MKL, ICC, IFORT, MPI and so on.
Finally the last line tells slurm to run a mpi task with 56 threads.
Submit the task by sbatch run.sh
.
Warn:
Don't use srun in sbatch scripts for mpi tasks. Instead use mpiexec directly. Since slurm in our cluster has no support for pmix modules, srun doesn't work as the alias of mpiexec as expected.
Better not use srun directly for submitting jobs. Since such jobs would be killed as long as the ssh connection is off.
view jobs
Check the status of the task by squeue
or squeue -o %all
if you really enjoy lots of info. Better use squeue -u <user>
to view only your jobs. If the job ST is PD(pending) and the reason is QOSResourceLimit, it indicates that the total amount of your job is exceed the limit of normal user. Otherwise, if the reason is Resources, it means the cluster has no more compute resource for your job, i.e. the cluster is somewhat full load.
You can check the stdout of the job by slurm-.out. But the important output is better written to some formatted output datafile.
You can cancel the task by scancel , task id can be obtained by squeue.
job arrays
Use the --array
option in sbatch and the env vars SLURM_ARRAY_TASK_ID
. Noe in such a script, both N and n are specified for each one task of the array.
more options for sbatch and srun
specify partition:
-p gpu/bigmem
(default isgeneral
)specify account:
-A <account>
specify qos:
-q <qos>
interactive sessions on compute nodes
srun -N 1 -n 1 -w c3 --pty bash -i
,run a bash shell on c3 node with one node and one cpu core.salloc -n2 -N1 -t 1:00:00
, and then ssh to the assigned nodessh [-X] cn
.
Warning: If you try to ssh to a compute node without any active job allocation, the connection wil be denied by a connection closed prompt. So the correct way to access a compute node, is by 2. Namely, firstly salloc
the resource and then ssh to the assigned node.
Last updated