Removed GPUs from CPUs partition
This commit is contained in:
@ -8,24 +8,7 @@ sidebar: merlin6_sidebar
|
|||||||
permalink: /merlin6/slurm-configuration.html
|
permalink: /merlin6/slurm-configuration.html
|
||||||
---
|
---
|
||||||
|
|
||||||
## Using the Slurm batch system
|
## About Merlin5 & Merlin6
|
||||||
|
|
||||||
Clusters at PSI use the [Slurm Workload Manager](http://slurm.schedmd.com/) as the batch system technology for managing and scheduling jobs.
|
|
||||||
Historically, *Merlin4* and *Merlin5* also used Slurm. In the same way, **Merlin6** has been also configured with this batch system.
|
|
||||||
|
|
||||||
Slurm has been installed in a **multi-clustered** configuration, allowing to integrate multiple clusters in the same batch system.
|
|
||||||
|
|
||||||
For understanding the Slurm configuration setup in the cluster, sometimes may be useful to check the following files:
|
|
||||||
|
|
||||||
* ``/etc/slurm/slurm.conf`` - can be found in the login nodes and computing nodes.
|
|
||||||
* ``/etc/slurm/cgroup.conf`` - can be found in the computing nodes, is also propagated to login nodes for user read access.
|
|
||||||
* ``/etc/slurm/gres.conf`` - can be found in the GPU nodes, is also propgated to login nodes and computing nodes for user read access.
|
|
||||||
|
|
||||||
The previous configuration files which can be found in the login nodes, correspond exclusively to the **merlin6** cluster configuration files.
|
|
||||||
Configuration files for the old **merlin5** cluster must be checked directly on any of the **merlin5** computing nodes: these are not propagated
|
|
||||||
to the **merlin6** login nodes.
|
|
||||||
|
|
||||||
### About Merlin5 & Merlin6
|
|
||||||
|
|
||||||
The new Slurm cluster is called **merlin6**. However, the old Slurm *merlin* cluster will be kept for some time, and it has been renamed to **merlin5**.
|
The new Slurm cluster is called **merlin6**. However, the old Slurm *merlin* cluster will be kept for some time, and it has been renamed to **merlin5**.
|
||||||
It will allow to keep running jobs in the old computing nodes until users have fully migrated their codes to the new cluster.
|
It will allow to keep running jobs in the old computing nodes until users have fully migrated their codes to the new cluster.
|
||||||
@ -35,11 +18,11 @@ the old *merlin5* computing nodes by using the option ``--cluster=merlin5``.
|
|||||||
|
|
||||||
In this documentation is only explained the usage of the **merlin6** Slurm cluster.
|
In this documentation is only explained the usage of the **merlin6** Slurm cluster.
|
||||||
|
|
||||||
### Using Slurm 'merlin6' cluster
|
## Using Slurm 'merlin6' cluster
|
||||||
|
|
||||||
Basic usage for the **merlin6** cluster will be detailed here. For advanced usage, please use the following document [LINK TO SLURM ADVANCED CONFIG]()
|
Basic usage for the **merlin6** cluster will be detailed here. For advanced usage, please use the following document [LINK TO SLURM ADVANCED CONFIG]()
|
||||||
|
|
||||||
#### Merlin6 Node definition
|
### Merlin6 Node definition
|
||||||
|
|
||||||
The following table show default and maximum resources that can be used per node:
|
The following table show default and maximum resources that can be used per node:
|
||||||
|
|
||||||
@ -56,7 +39,7 @@ and maximum memory allowed is ``Max.Mem/Node``.
|
|||||||
|
|
||||||
In *Merlin6*, memory is considered a Consumable Resource, as well as the CPU.
|
In *Merlin6*, memory is considered a Consumable Resource, as well as the CPU.
|
||||||
|
|
||||||
#### Merlin6 Slurm partitions
|
### Merlin6 Slurm partitions
|
||||||
|
|
||||||
Partition can be specified when submitting a job with the ``--partition=<partitionname>`` option.
|
Partition can be specified when submitting a job with the ``--partition=<partitionname>`` option.
|
||||||
The following *partitions* (also known as *queues*) are configured in Slurm:
|
The following *partitions* (also known as *queues*) are configured in Slurm:
|
||||||
@ -71,7 +54,7 @@ General is the *default*, so when nothing is specified job will be by default as
|
|||||||
running jobs. For **daily** this limitation is extended to 60 nodes while for **hourly** there are no limits. Shorter jobs have more priority than
|
running jobs. For **daily** this limitation is extended to 60 nodes while for **hourly** there are no limits. Shorter jobs have more priority than
|
||||||
longer jobs, hence in general terms would be scheduled before (however, other factors such like user fair share value can affect to this decision).
|
longer jobs, hence in general terms would be scheduled before (however, other factors such like user fair share value can affect to this decision).
|
||||||
|
|
||||||
#### Merlin6 User limits
|
### Merlin6 User limits
|
||||||
|
|
||||||
By default, users can not use more than 704 cores at the same time (Max CPU per user). This is equivalent to 8 exclusive nodes.
|
By default, users can not use more than 704 cores at the same time (Max CPU per user). This is equivalent to 8 exclusive nodes.
|
||||||
This limit applies to the **general** and **daily** partitions. For the **hourly** partition, there is no restriction and user limits are removed.
|
This limit applies to the **general** and **daily** partitions. For the **hourly** partition, there is no restriction and user limits are removed.
|
||||||
|
Reference in New Issue
Block a user