Removed GPUs from CPUs partition

This commit is contained in:
2019-07-01 18:05:08 +02:00
parent 5c2ea17076
commit 80be5a4b78

View File

@ -8,24 +8,7 @@ sidebar: merlin6_sidebar
permalink: /merlin6/slurm-configuration.html
---
## Using the Slurm batch system
Clusters at PSI use the [Slurm Workload Manager](http://slurm.schedmd.com/) as the batch system technology for managing and scheduling jobs.
Historically, *Merlin4* and *Merlin5* also used Slurm. In the same way, **Merlin6** has been also configured with this batch system.
Slurm has been installed in a **multi-clustered** configuration, allowing to integrate multiple clusters in the same batch system.
For understanding the Slurm configuration setup in the cluster, sometimes may be useful to check the following files:
* ``/etc/slurm/slurm.conf`` - can be found in the login nodes and computing nodes.
* ``/etc/slurm/cgroup.conf`` - can be found in the computing nodes, is also propagated to login nodes for user read access.
* ``/etc/slurm/gres.conf`` - can be found in the GPU nodes, is also propgated to login nodes and computing nodes for user read access.
The previous configuration files which can be found in the login nodes, correspond exclusively to the **merlin6** cluster configuration files.
Configuration files for the old **merlin5** cluster must be checked directly on any of the **merlin5** computing nodes: these are not propagated
to the **merlin6** login nodes.
### About Merlin5 & Merlin6
## About Merlin5 & Merlin6
The new Slurm cluster is called **merlin6**. However, the old Slurm *merlin* cluster will be kept for some time, and it has been renamed to **merlin5**.
It will allow to keep running jobs in the old computing nodes until users have fully migrated their codes to the new cluster.
@ -35,11 +18,11 @@ the old *merlin5* computing nodes by using the option ``--cluster=merlin5``.
In this documentation is only explained the usage of the **merlin6** Slurm cluster.
### Using Slurm 'merlin6' cluster
## Using Slurm 'merlin6' cluster
Basic usage for the **merlin6** cluster will be detailed here. For advanced usage, please use the following document [LINK TO SLURM ADVANCED CONFIG]()
#### Merlin6 Node definition
### Merlin6 Node definition
The following table show default and maximum resources that can be used per node:
@ -56,7 +39,7 @@ and maximum memory allowed is ``Max.Mem/Node``.
In *Merlin6*, memory is considered a Consumable Resource, as well as the CPU.
#### Merlin6 Slurm partitions
### Merlin6 Slurm partitions
Partition can be specified when submitting a job with the ``--partition=<partitionname>`` option.
The following *partitions* (also known as *queues*) are configured in Slurm:
@ -71,7 +54,7 @@ General is the *default*, so when nothing is specified job will be by default as
running jobs. For **daily** this limitation is extended to 60 nodes while for **hourly** there are no limits. Shorter jobs have more priority than
longer jobs, hence in general terms would be scheduled before (however, other factors such like user fair share value can affect to this decision).
#### Merlin6 User limits
### Merlin6 User limits
By default, users can not use more than 704 cores at the same time (Max CPU per user). This is equivalent to 8 exclusive nodes.
This limit applies to the **general** and **daily** partitions. For the **hourly** partition, there is no restriction and user limits are removed.