
Giang Nguyen
09 Jan 2026
In large-scale data analysis, using a queue job/distributed system is essential. Therefore, we need a mechanism that allows users to access cluster resources and request appropriate CPU, RAM, and GPU for their tasks. If they use more memory than they requested, the simplest solution is to kill the job. This helps avoid out-of-memory issues where the computer would freeze. Slurm is the most common cluster setup, but understanding how to create a Slurm cluster is not easy. Therefore, I created this blog series to guide you through setting it up on a single machine first, then scaling it later and using it effectively and efficiently.
This is Part 1 of a 3-part series where we'll build a complete Slurm cluster from scratch. In this first post, we'll cover the fundamentals by setting up a single-node Slurm cluster and understanding the core concepts.
When it comes to job scheduling in HPC environments, several options exist including PBS, Grid Engine, and IBM's LSF. However, Slurm (Simple Linux Utility for Resource Management) stands out for several compelling reasons:
Figure 1: The standard architecture of a multi-node Slurm cluster
The main function of Slurm or any cluster is to connect computing machines with large numbers of CPUs, memory, and GPUs. It has a management system where users can request computing resources (16 cores, 32GB RAM, for example). It finds available computing machines and allocates resources to users. How can we design a system to do this at a high level?
We can divide it into 3 types of machines: login nodes, controller nodes, and compute nodes:
Controller nodes: They act as the controller, receive requests from users, allocate resources, and manage resources. Additionally, it's good practice to configure them with a database (SQL database) to store accounting information. This helps track who ran jobs, how they used computing resources, etc.
Login nodes: They act as the gateway, usually accessed via the public network. Users can SSH to login to the machine and request compute resources. The login node sends the request to the controller to decide whether there are available computing resources or not. Then it allocates resources or asks users to wait. Normally, without controller permission, users cannot "stand" on the compute nodes where the large resources actually reside.
Compute nodes: Simply have large resources and connect to the controller to wait for allocation.
From the previous section and Figure 1, we can now identify the related services (software) that help the cluster connect together and work properly
slurmctld (Controller Daemon): The brain of the cluster, running on the controller node. It handles job scheduling, resource tracking, and communicates with compute nodes.
slurmd (Node Daemon): Runs on compute nodes to execute jobs and report status back to the controller.
slurmdbd (Database Daemon): Optional but recommended for storing job accounting data, resource usage tracking, and fair-share scheduling.
| Node Type | Services | Purpose |
|---|---|---|
| Login | Slurm clients | User access point for job submission |
| Controller | slurmctld | Manages job scheduling and resources |
| Compute | slurmd | Executes submitted jobs |
| Database | slurmdbd, MySQL/MariaDB | Stores accounting data |
Figure 2: The Single Node Slurm Cluster Architecture
Starting with a single-node setup helps you understand how Slurm works before scaling up. This approach is perfect for learning and local development. For personal usage, you can configure it to use Slurm for resource allocation. According to Figure 2, we will now install everything on a single machine.
To manually set up the single-node Slurm cluster, instead of using your own computer, it is better to use a virtual machine or Docker:
Vagrant:
Docker:
ubuntu:20.04sudo to run commandsFirst, install the required Slurm components:
sudo apt-get update -y && sudo apt-get install -y slurmd slurmctld
Verify the installation:
# Locate slurmd and slurmctld
which slurmd
# Output: /usr/sbin/slurmd
which slurmctld
# Output: /usr/sbin/slurmctld
The slurm.conf file is the heart of your Slurm configuration. This file must be identical across all nodes in a cluster (but for now, we just have one node).
Create your slurm.conf:
cat <<EOF > slurm.conf
# slurm.conf for a single-node Slurm cluster
ClusterName=localcluster
SlurmctldHost=localhost
MpiDefault=none
ProctrackType=proctrack/linuxproc
ReturnToService=2
SlurmctldPidFile=/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
TaskPlugin=task/none
# TIMERS
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core
# ACCOUNTING (not enabled yet)
AccountingStorageType=accounting_storage/none
JobAcctGatherType=jobacct_gather/none
JobAcctGatherFrequency=30
# LOGGING
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
# COMPUTE NODES (adjust CPUs and RealMemory to match your system)
NodeName=localhost CPUs=2 Sockets=1 CoresPerSocket=2 ThreadsPerCore=1 RealMemory=1024 State=UNKNOWN
# PARTITION CONFIGURATION
PartitionName=LocalQ Nodes=ALL Default=YES MaxTime=INFINITE State=UP
EOF
sudo mv slurm.conf /etc/slurm-llnl/slurm.conf
Start the Slurm daemons:
# Start slurmd (compute daemon)
sudo service slurmd start
sudo service slurmd status
# Start slurmctld (controller daemon)
sudo service slurmctld start
sudo service slurmctld status


Test your setup by submitting a simple interactive job:
srun --mem 500MB -c 1 --pty bash
# Check job details
squeue -o "%i %P %u %T %M %l %D %C %m %R %Z %N" | column -t
Without proper cgroup configuration, jobs can exceed their allocated resources, potentially causing system instability or crashes. The job scheduler will accept resource limits, but won't actually enforce them.
Let's test this problem first. Submit a job requesting 500MB and try to allocate much more:
srun --mem 500MB -c 1 --pty bash
# Try to allocate 1GB of memory (exceeding the 500MB limit)
declare -a mem
i=0
while :; do
mem[$i]=$(head -c 100M </dev/zero | tr '\000' 'x')
((i++))
echo "Allocated: $((i * 100)) MB"
done
Before submitting the job, memory usage is less than 200MB:

After allocating 1GB, the job is not killed due to missing control group (cgroup) configuration:

You'll notice the job continues running even after exceeding 500MB - that's the problem!
Now let's fix it with cgroups:
cat <<EOF > cgroup.conf
CgroupAutomount=yes
CgroupMountpoint=/sys/fs/cgroup
ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainDevices=yes
ConstrainSwapSpace=yes
MaxSwapPercent=5
MemorySwappiness=0
EOF
sudo mv cgroup.conf /etc/slurm-llnl/cgroup.conf
Update slurm.conf to use cgroup plugins:
sudo sed -i -e "s|ProctrackType=proctrack/linuxproc|ProctrackType=proctrack/cgroup|" \
-e "s|TaskPlugin=task/none|TaskPlugin=task/cgroup|" /etc/slurm-llnl/slurm.conf
Enable cgroup in GRUB and reboot:
sudo sed -i 's/^GRUB_CMDLINE_LINUX="/GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1 /' /etc/default/grub
sudo update-grub
sudo reboot
After reboot, restart Slurm services:
sudo service slurmctld restart
sudo service slurmd restart
Now test again with the same memory allocation script - this time, the job will be killed when it exceeds the limit!

Job accounting is essential for:

Install the required packages:
sudo apt-get install slurmdbd mariadb-server -y
Create the database and user:
sudo service mysql start
sudo mysql -e "CREATE DATABASE slurm_acct_db;"
sudo mysql -e "CREATE USER 'slurm'@'localhost' IDENTIFIED BY 'slurm';"
sudo mysql -e "GRANT ALL PRIVILEGES ON slurm_acct_db.* TO 'slurm'@'localhost';"
sudo mysql -e "FLUSH PRIVILEGES;"
Verify the database was created:
sudo mysql -e "SHOW DATABASES;"
sudo mysql -e "SELECT User, Host FROM mysql.user;"

Configure slurmdbd:
cat <<EOF > slurmdbd.conf
PidFile=/run/slurmdbd.pid
LogFile=/var/log/slurm/slurmdbd.log
DebugLevel=error
DbdHost=localhost
DbdPort=6819
# DB connection data
StorageType=accounting_storage/mysql
StorageHost=localhost
StoragePort=3306
StorageUser=slurm
StoragePass=slurm
StorageLoc=slurm_acct_db
SlurmUser=slurm
EOF
sudo mv slurmdbd.conf /etc/slurm-llnl/slurmdbd.conf
sudo service slurmdbd start
Update slurm.conf to enable accounting:
sudo sed -i -e "s|AccountingStorageType=accounting_storage/none|AccountingStorageType=accounting_storage/slurmdbd\nAccountingStorageEnforce=associations,limits,qos\nAccountingStorageHost=localhost\nAccountingStoragePort=6819|" /etc/slurm-llnl/slurm.conf
sudo sed -i -e "s|JobAcctGatherType=jobacct_gather/none|JobAcctGatherType=jobacct_gather/cgroup|" /etc/slurm-llnl/slurm.conf
sudo systemctl restart slurmctld slurmd
Add your cluster and user to accounting:
# Add cluster
sudo sacctmgr -i add cluster localcluster
# Add account for your user
sudo sacctmgr -i add account $USER Cluster=localcluster
# Add your user to the account
sudo sacctmgr -i add user $USER account=$USER DefaultAccount=$USER
sudo systemctl restart slurmctld slurmd

Now test accounting by submitting a job and viewing its details:
# Submit a test job
srun --mem 500MB -c 1 hostname
# View accounting information
sacct

In this first part of our series, we've covered:
What's Next? In Part 2, we'll take this knowledge and scale to a multi-node production cluster using Ansible automation. We'll add monitoring with Grafana, alerting via Slack, and shared storage with NFS.
1.Slurm Overview — Official documentation for Slurm workload manager
2.NVIDIA/deepops — Open-source cluster deployment toolkit (BSD-3-Clause License)
3.elasticluster — Elastic cluster provisioning tool (GPL-3.0 License)
This is Part 1 of the RiverXData series on building Slurm HPC clusters. Continue to Part 2 to learn about production deployment with Ansible.