Interactive sessions offer real-time feedback and interaction, allowing you to use interactive tools ( e.g. Python REPL, debuggers), directly access command line and outputs, and modify parameters on the fly. They are useful for iterative development and debugging, experimenting with parameters, real-time monitoring of processes, interactive data analysis, and software installation.
Example Interactive Commands:
# Basic interactive session (2 hours, 1 CPU)
srun --pty bash
# Interactive session with custom resources
srun --time=4:00:00 --mem=8G --cpus-per-task=4 --pty bash
# Interactive Python session
srun --time=2:00:00 --mem=4G --pty python
# Interactive session with GPU
srun --time=8:00:00 --mem=16G --gres=gpu:1 --pty bash
srun will block until there are resources available, and will redirect the input/output of the program to the executing shell. On some of the clusters interactive jobs have some limitation compared to normal batch jobs.
If the input/output isn't working currectly (e.g. with shell jobs), usually adding the --pty flag solves the issue.
More info in "man srun" or here.
Batch jobs offer well-defined workflows, and can run in the background, can be scheduled to run later, and continue even you disconnect. They automatically log output and errors and offer better resource management. They are best for long-running and/or resource intensive jobs that don't require user interaction, for production runs, and for rerunning identical jobs.
Example Script (job.sh
):
#!/bin/bash
#SBATCH --time=4:00:00
#SBATCH --mem=8G
#SBATCH --cpus-per-task=4
#SBATCH --output=output_%j.log # %j will be replaced with job ID
#SBATCH --error=error_%j.log
# Your commands here
python my_script.py
Submit with:
sbatch job.sh
sbatch and srun have many of the same parameters. You can read (much) more about the sbatch command here.
⚠️ Never use a loop in your script to submit multiple sbatch jobs. ⚠️
⚠️This will overload the Slurm scheduler, impacting submitted jobs for all users in the cluster and even cause it to fail. ⚠️
The correct way to submit multiple jobs is to use array jobs, as follows:
Array jobs offer an efficient way to submit many similar jobs, and maximizing concurrency of tasks, giving a single job ID for all tasks. They are easy to schedule and monitor and offer simpler logging and output management, better scheduling efficiency and resource management than individual submissions. They are best for best for parameter sweeps, for processing multiple datasets, for running the same code with different inputs, for cross-validation in machine learning, for batch processing of files, for Mone Carlo Simulations, and more.
Array jobs are the correct way to submit multiple jobs, as explained here.
⚠️ Never use a loop in your script to submit multiple sbatch jobs. This will overload the Slurm scheduler, impacting submitted jobs for all users in the cluster and even cause it to fail. ⚠️
Example Array Job:
#!/bin/bash
#SBATCH --time=4:00:00
#SBATCH --mem=4G
#SBATCH --array=1-5 # Run 5 copies
#SBATCH --output=array_%A_%a.log # %A: array job ID, %a: task ID
# Example of different behaviors based on array task ID
case $SLURM_ARRAY_TASK_ID in
1) python process.py --dataset "data1.csv";;
2) python process.py --dataset "data2.csv";;
3) python process.py --dataset "data3.csv";;
4) python process.py --dataset "data4.csv";;
5) python process.py --dataset "data5.csv";;
esac