Some of your work in the cluster that isn't computations might take a long time or require a lot of CPU and RAM memory resources.
For example:
This work should be done as a Slurm job and not on the login node.
The login nodes have very little resources and there are limits on CPU time and memory that each user can use.
As with every job that you run in the cluster you have a choice of working interactively with srun command or in batch mode with sbatch command.
Before starting an interactive job you can create a detachable session with one of the terminal multiplexers that are installed in cluster: tmux
and screen
.
We don’t have a wiki on how to use them, so you can Google to find a tutorial and go over these cheat sheets for tmux and screen.
Start a new named session with tmux:
tmux new -s <session-name>
The <session-name> can be any string you select.
Start a new named session with screen:
screen -S <session-name>
Once the session is created, you can start you interactive job.
If you want you can also detach from the session by pressing these key combinations:
For tmux: Ctrl+b
and d
For screen: Ctrl+a
and d
You can then continue working in the original terminal session.
To return to the session to check on the progress, you can first list running sessions and then connect to one of them.
For tmux:
tmux ls
tmux a -t <session-name>
And for screen:
screen -ls
screen -r <session-number>
Once you finish using the session for your work please make sure to kill the session:
In tmux: tmux kill-session -a -t <session-name>
And in screen: screen -X -S <session-name> kill
The simplest way to work in an interactive session on a compute node is to open a terminal session as a slurm job. This is quick and easy for simple processes, but for optimal resource management, allocating resources in advance using salloc
and submitting specific tasks using srun
(just as in batch mode) is the preferred method. See our full article for more details.
For now, simply opening a terminal session with predefined resources can be easily done with the following command:
srun -n 1 --mem=2G --time=1-0 --pty $SHELL
The srun
command identifies that no resources are allocated yet, and will thus automatically request allocation for the resources requested by the arguments:
-n 1
means 1 task, and since not cpus-per-task
argument was provided, this also implies allocating a single CPU core.--mem=2G
means that the job has a global limit of 2G of memory. This memory will be shared between all job steps/tasks performed within the job.--time=1-0
puts a time limit of 1 day.--pty
means that the first task is performed in a pseudo-terminal. Usually the first task is then a shell environment, meaning that the actual job will be an interactive terminal session!In this case, $SHELL
can be any shell environment available on HURCS machines (bash, csh, zsh, etc.). Apart from the above mentioned arguments, any other valid srun
argument may be used.
Now that you have a shell on one of the compute nodes you can run your commands of data management, downloads or software installation.
Tip 1:
Whenever you can, use the verbose option in the command you run (usually -v) to see progress and to verify that the command is actually running and doing something.
Tip 2:
You can use rsync also for deleting files and folders. It can be faster than using the rm command when you have to delete a directory with a large amount of files.
First create an empty folder in cluster:
mkdir path/to/empty-dir
Then run this command:
rsync -av --delete /path/to/empty-dir /path/of/dir/to/delete
In batch mode you can create a script with the commands you need to run on your data.
#!/bin/bash
#SBATCH -n 1
#SBATCH --mem=2G
#SBATCH --time=1-0
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<your email address>
## Copy/Move/Delete command under this line
If you use mail type and mail user options then you will be notified by mail when your job starts and ends.
See the full article for more details on running in batch mode!