Migrating merlin6 user guide from jekyll-example1
From lsm-hpce/jekyll-example1 1eada07
This commit is contained in:
@ -0,0 +1,57 @@
|
||||
---
|
||||
title: Accessing Interactive Nodes
|
||||
#tags:
|
||||
#keywords:
|
||||
last_updated: 13 June 2019
|
||||
#summary: ""
|
||||
sidebar: merlin6_sidebar
|
||||
permalink: /merlin6/interactive.html
|
||||
---
|
||||
|
||||
|
||||
## Login nodes description
|
||||
|
||||
The Merlin6 login nodes are the official machines for accessing the Merlin6 cluster.
|
||||
From these machines, users can submit jobs to the Slurm batch system as well as visualize or compile their software.
|
||||
|
||||
The Merlin6 login nodes are the following:
|
||||
|
||||
| Hostname | SSH | NoMachine | #cores | CPU | Memory | Scratch | Scratch Mountpoint |
|
||||
| ------------------- | --- | --------- | ----------- |:---------------------------------- | ------ | ---------- |:------------------ |
|
||||
| merlin-l-01.psi.ch | yes | - | 32 (2 x 16) | 2 x Intel Xeon E5-2697A v4 2.60GHz | 512GB | 100GB SAS | ``/scratch`` |
|
||||
| merlin-l-02.psi.ch | yes | yes | 32 (2 x 16) | 2 x Intel Xeon E5-2697A v4 2.60GHz | 512GB | 100GB SAS | ``/scratch`` |
|
||||
| merlin-l-001.psi.ch | - | - | 44 (2 x 22) | 2 x Intel Xeon Gold 6152 2.10GHz | 512GB | 2.0TB NVMe | ``/scratch`` |
|
||||
| merlin-l-002.psi.ch | - | - | 44 (2 x 22) | 2 x Intel Xeon Gold 6142 2.10GHz | 512GB | 2.0TB NVMe | ``/scratch`` |
|
||||
|
||||
* ``merlin-l-001`` and ``merlin-l-002`` are not in production yet, hence SSH access is not possible.
|
||||
|
||||
---
|
||||
|
||||
## Remote Access
|
||||
|
||||
### SSH Access
|
||||
|
||||
For interactive command access, use a SSH client. We recommend to use X11 forwarding, despite is not the official way supported. It may help opening X applications.
|
||||
|
||||
For Linux:
|
||||
|
||||
```bash
|
||||
ssh -XY $username@merlin-l-01.psi.ch
|
||||
```
|
||||
|
||||
X applications are supported in the login nodes and X11 forwarding can be used for those users who have properly configured X11 support in their desktops:
|
||||
* Merlin6 administrators **do not offer support** for user desktop configuration (Windows, MacOS, Linux).
|
||||
* Hence, Merlin6 administrators **do not offer official support** for X11 client setup.
|
||||
* However, a generic guide for X11 client setup (Windows, Linux and MacOS) will be provided.
|
||||
* PSI desktop configuration issues must be addressed through **[PSI Service Now](https://psi.service-now.com/psisp)** as an *Incident Request*.
|
||||
* Ticket will be redirected to the corresponding Desktop support group (Windows, Linux).
|
||||
|
||||
### NoMachine Access
|
||||
|
||||
X applications are supported in the login nodes and can run through NoMachine. This service is officially supported in the Merlin6 cluster and is the official X service.
|
||||
* NoMachine *client installation* support has to be requested through **[PSI Service Now](https://psi.service-now.com/psisp)** as an *Incident Request*.
|
||||
* Ticket will be redirected to the corresponding support group (Windows or Linux)
|
||||
* NoMachine *client configuration* and *connectivity* for Merlin6 is fully supported by Merlin6 administrators.
|
||||
* Please contact us through the official channels on any configuration issue with NoMachine.
|
||||
|
||||
---
|
11
pages/merlin6/accessing-merlin6/accessing-merlin6.md
Normal file
11
pages/merlin6/accessing-merlin6/accessing-merlin6.md
Normal file
@ -0,0 +1,11 @@
|
||||
---
|
||||
title: Accessing Merlin6
|
||||
#tags:
|
||||
#keywords:
|
||||
last_updated: 13 June 2019
|
||||
#summary: ""
|
||||
sidebar: merlin6_sidebar
|
||||
permalink: /merlin6/accessing-merlin6.html
|
||||
---
|
||||
|
||||
In this chapter is shown how to access to the Merlin6 cluster.
|
98
pages/merlin6/accessing-merlin6/accessing-slurm.md
Normal file
98
pages/merlin6/accessing-merlin6/accessing-slurm.md
Normal file
@ -0,0 +1,98 @@
|
||||
---
|
||||
title: Accessing Slurm Cluster
|
||||
#tags:
|
||||
#keywords:
|
||||
last_updated: 13 June 2019
|
||||
#summary: ""
|
||||
sidebar: merlin6_sidebar
|
||||
permalink: /merlin6/slurm-access.html
|
||||
---
|
||||
|
||||
## The Merlin6 Slurm batch system
|
||||
|
||||
Clusters at PSI use the [Slurm Workload Manager](http://slurm.schedmd.com/) as the batch system technology for managing and scheduling jobs.
|
||||
Historically, *Merlin4* and *Merlin5* also used Slurm. In the same way, **Merlin6** has been also configured with this batch system.
|
||||
|
||||
Slurm has been installed in a **multi-clustered** configuration, allowing to integrate multiple clusters in the same batch system.
|
||||
* Two different Slurm clusters exist: **merlin5** and **merlin6**.
|
||||
* **merlin5** is a cluster with very old hardware (out-of-warranty).
|
||||
* **merlin5** will exist as long as hardware incidents are soft and easy to repair/fix (i.e. hard disk replacement)
|
||||
* **merlin6** is the default cluster when submitting jobs.
|
||||
|
||||
This document is mostly focused on the **merlin6** cluster. Details for **merlin5** are not shown here, and only basic access and recent
|
||||
changes will be explained (**[Official Merlin5 User Guide](https://intranet.psi.ch/PSI_HPC/Merlin5)** is still valid).
|
||||
|
||||
### Merlin6 Slurm Configuration Details
|
||||
|
||||
For understanding the Slurm configuration setup in the cluster, sometimes can be useful to check the following files:
|
||||
|
||||
* ``/etc/slurm/slurm.conf`` - can be found in the login nodes and computing nodes.
|
||||
* ``/etc/slurm/cgroup.conf`` - can be found in the computing nodes, is also propagated to login nodes for user read access.
|
||||
* ``/etc/slurm/gres.conf`` - can be found in the GPU nodes, is also propgated to login nodes and computing nodes for user read access.
|
||||
|
||||
The previous configuration files can be found in the *login nodes* correspond exclusively to the **merlin6** cluster configuration files. These
|
||||
configuration files are also present in the **merlin6** *computing nodes*.
|
||||
|
||||
Slurm configuration files for the old **merlin5** cluster have to be directly checked on any of the **merlin5** *computing nodes*: those files *do
|
||||
not* exist in the **merlin6** *login nodes*.
|
||||
|
||||
### Merlin5 Access
|
||||
|
||||
Keeping the **merlin5** cluster will allow running jobs in the old computing nodes until users have fully migrated their codes to the new cluster.
|
||||
|
||||
From July 2019, **merlin6** becomes the **default cluster** and any job submitted to Slurm will be submitted to that cluster.
|
||||
However, users can keep submitting to the old **merlin5** computing nodes by using the option ``--cluster=merlin5`` and using the corresponding
|
||||
Slurm partition with ``--partition=merlin``. In example:
|
||||
|
||||
```bash
|
||||
srun --clusters=merlin5 --partition=merlin hostname
|
||||
sbatch --clusters=merlin5 --partition=merlin myScript.batch
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Using Slurm 'merlin6' cluster
|
||||
|
||||
Basic usage for the **merlin6** cluster will be detailed here. For advanced usage, please use the following document [LINK TO SLURM ADVANCED CONFIG]()
|
||||
|
||||
### Merlin6 Node definition
|
||||
|
||||
The following table show default and maximum resources that can be used per node:
|
||||
|
||||
| Nodes | Def.#CPUs | Max.#CPUs | Def.Mem/CPU | Max.Mem/CPU | Max.Mem/Node | Max.Swap | Def.#GPUs | Max.#GPUs |
|
||||
|:---------------------------------- | ---------:| ---------:| -----------:| -----------:| ------------:| --------:| --------- | --------- |
|
||||
| merlin-c-[001-022,101-122,201-222] | 1 core | 44 cores | 8000 | 352000 | 352000 | 10000 | N/A | N/A |
|
||||
| merlin-g-[001] | 1 core | 8 cores | 8000 | 102498 | 102498 | 10000 | 1 | 2 |
|
||||
| merlin-g-[002-009] | 1 core | 10 cores | 8000 | 102498 | 102498 | 10000 | 1 | 4 |
|
||||
|
||||
If nothing is specified, by default each core will use up to 8GB of memory. More memory per core can be specified with the ``--mem=<memory>`` option,
|
||||
and maximum memory allowed is ``Max.Mem/Node``.
|
||||
|
||||
In *Merlin6*, memory is considered a Consumable Resource, as well as the CPU.
|
||||
|
||||
### Merlin6 Slurm partitions
|
||||
|
||||
Partition can be specified when submitting a job with the ``--partition=<partitionname>`` option.
|
||||
The following *partitions* (also known as *queues*) are configured in Slurm:
|
||||
|
||||
| Partition | Default Partition | Default Time | Max Time | Max Nodes | Priority |
|
||||
|:----------- | ----------------- | ------------ | -------- | --------- | -------- |
|
||||
| **general** | true | 1 day | 1 week | 50 | low |
|
||||
| **daily** | false | 1 day | 1 day | 60 | medium |
|
||||
| **hourly** | false | 1 hour | 1 hour | unlimited | highest |
|
||||
|
||||
General is the *default*, so when nothing is specified job will be by default assigned to that partition. General can not have more than 50 nodes
|
||||
running jobs. For **daily** this limitation is extended to 60 nodes while for **hourly** there are no limits. Shorter jobs have more priority than
|
||||
longer jobs, hence in general terms would be scheduled before (however, other factors such like user fair share value can affect to this decision).
|
||||
|
||||
### Merlin6 User limits
|
||||
|
||||
By default, users can not use more than 528 cores at the same time (Max CPU per user). This limit applies for the **general** and **daily** partitions. For the **hourly** partition, there is no restriction.
|
||||
These limits are softed for the **daily** partition during non working hours and during the weekend as follows:
|
||||
|
||||
| Partition | Mon-Fri 08h-18h | Sun-Thu 18h-0h | From Fri 18h to Sun 8h | From Sun 8h to Mon 18h |
|
||||
|:----------- | --------------- | -------------- | ----------------------- | ---------------------- |
|
||||
| **general** | 528 | 528 | 528 | 528 |
|
||||
| **daily** | 528 | 792 | Unlimited | 792 |
|
||||
| **hourly** | Unlimited | Unlimited | Unlimited | Unlimited |
|
||||
|
156
pages/merlin6/accessing-merlin6/merlin6-directories.md
Normal file
156
pages/merlin6/accessing-merlin6/merlin6-directories.md
Normal file
@ -0,0 +1,156 @@
|
||||
---
|
||||
title: Merlin6 Data Directories
|
||||
#tags:
|
||||
#keywords:
|
||||
last_updated: 13 June 2019
|
||||
#summary: ""
|
||||
sidebar: merlin6_sidebar
|
||||
permalink: /merlin6/data-directories.html
|
||||
---
|
||||
|
||||
## Merlin6 directory structure
|
||||
|
||||
Merlin6 contain the following directories available for users:
|
||||
|
||||
* ``/psi/home/<username>``: private user **home** directory
|
||||
* ``/data/user/<username>``: private user **home** directory
|
||||
* ``/data/project/general/<projectname>``: Shared **Project** directory
|
||||
* For BIO experiments, a dedicate ``/data/project/bio/$projectname`` exists.
|
||||
* ``/scratch``: Local *scratch* disk.
|
||||
* ``/shared-scratch``: Shared *scratch* disk.
|
||||
|
||||
A summary for each directory would be:
|
||||
|
||||
| Directory | Block Quota [Soft:Hard] | Block Quota [Soft:Hard] | Quota Change Policy: Block | Quota Change Policy: Files | Backup | Backup Policy |
|
||||
| ---------------------------------- | ----------------------- | ----------------------- |:--------------------------------- |:-------------------------------- | ------ | :----------------------------- |
|
||||
| /psi/home/$username | USR [10GB:11GB] | *Undef* | Up to x2 when strictly justified. | N/A | yes | Daily snapshots for 1 week |
|
||||
| /data/user/$username | USR [1TB:1.074TB] | USR [1M:1.1M] | Inmutable. Need a project. | Changeable when justified. | no | Users responsible for backup |
|
||||
| /data/project/bio/$projectname | GRP [1TB:1.074TB] | GRP [1M:1.1M] | Subject to project requirements. | Subject to project requirements. | no | Project responsible for backup |
|
||||
| /data/project/general/$projectname | GRP [1TB:1.074TB] | GRP [1M:1.1M] | Subject to project requirements. | Subject to project requirements. | no | Project responsible for backup |
|
||||
| /scratch | *Undef* | *Undef* | N/A | N/A | no | N/A |
|
||||
| /shared-scratch | *Undef* | *Undef* | N/A | N/A | no | N/A |
|
||||
|
||||
---
|
||||
|
||||
## User home directory
|
||||
|
||||
Home directories are part of the PSI NFS Central Home storage provided by AIT.
|
||||
However, administration for the Merlin6 NFS homes is delegated to Merlin6 administrators.
|
||||
|
||||
This is the default directory users will land when login in to any Merlin6 machine.
|
||||
This directory is mounted in the login and computing nodes under the directory:
|
||||
|
||||
```bash
|
||||
/psi/home/$username
|
||||
```
|
||||
|
||||
Users can check their quota by running the following command:
|
||||
|
||||
```bash
|
||||
quota -s
|
||||
```
|
||||
|
||||
### Home directory policy
|
||||
|
||||
* Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies.
|
||||
* Is **forbidden** to use the home directories for IO intensive tasks
|
||||
* Use ``/scratch``, ``/shared-scratch``, ``/data/user`` or ``/data/project`` for this purpose.
|
||||
* Users can recover up to 1 week of their lost data thanks to the automatic **daily snapshorts for 1 week**.
|
||||
Snapshots are found in the following directory:
|
||||
|
||||
```bash
|
||||
/psi/home/.snapshop/$username
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## User data directory
|
||||
|
||||
User data directories are part of the Merlin6 storage cluster and technology is based on GPFS.
|
||||
|
||||
The user data directory is intended for *fast IO access* and keeping large amount of private data.
|
||||
This directory is mounted in the login and computing nodes under the directory
|
||||
|
||||
```bash
|
||||
/data/user/$username
|
||||
```
|
||||
|
||||
Users can check their quota by running the following command:
|
||||
|
||||
```bash
|
||||
mmlsquota -u <username> --block-size auto merlin-user
|
||||
```
|
||||
|
||||
### User Directory policy
|
||||
|
||||
* Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies.
|
||||
* Is **forbidden** to use the data directories as ``scratch`` area during a job runtime.
|
||||
* Use ``/scratch``, ``/shared-scratch`` for this purpose.
|
||||
* No backup policy is applied for user data directories: users are responsible for backing up their data.
|
||||
|
||||
---
|
||||
|
||||
## Project data directory
|
||||
|
||||
Project data directories are part of the Merlin6 storage cluster and technology is based on GPFS.
|
||||
|
||||
This storage is intended for *fast IO access* and keeping large amount of private data, but also for sharing data amogst
|
||||
different users sharing a project.
|
||||
Creating a project is the way in where users can expand his storage space and will optimize the usage of the storage
|
||||
(by avoiding for instance, duplicated data for different users).
|
||||
|
||||
Is **highly** recommended the use of a project when multiple persons are involved in the same project managing similar/common data.
|
||||
Quotas are defined in a *group* and *fileset* basis: Unix Group name must exist for a specific project or must be created for
|
||||
any new project. Contact the Merlin6 administrators for more information about that.
|
||||
|
||||
The project data directory is mounted in the login and computing nodes under the dirctory:
|
||||
|
||||
```bash
|
||||
/data/project/$username
|
||||
```
|
||||
|
||||
Users can check the project quota by running the following command:
|
||||
|
||||
```bash
|
||||
mmrepquota merlin-proj:$projectname
|
||||
```
|
||||
|
||||
### Project Directory policy
|
||||
|
||||
* Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies.
|
||||
* Is **forbidden** to use the data directories as ``scratch`` area during a job runtime.
|
||||
* Use ``/scratch``, ``/shared-scratch`` for this purpose.
|
||||
* No backups: users are responsible for managing the backups of their data directories.
|
||||
|
||||
---
|
||||
|
||||
## Scratch directories
|
||||
|
||||
There are two different types of scratch disk: **local** (``/scratch``) and **shared** (``/shared-scratch``).
|
||||
Specific details of each type is described below.
|
||||
|
||||
Usually **shared** scratch will be used for those jobs running on multiple nodes which need to access to a common shared space
|
||||
for creating temporary files, while **local** scratch should be used by those jobs needing a local space for creating temporary files.
|
||||
|
||||
**local** scratch in Merlin6 computing nodes provides a huge number of IOPS thanks to the NVMe technology,
|
||||
while **shared** scratch, despite being also very fast, is an external GPFS storage with more latency.
|
||||
|
||||
``/shared-scratch`` is only mounted in the *Merlin6* computing nodes, and its current size is 50TB. Whenever necessary, it can be increased in the future.
|
||||
|
||||
A summary for the scratch directories is the following:
|
||||
|
||||
| Cluster | Service | Scratch | Scratch Mountpoint | Shared Scratch | Shared Scratch Mountpoint | Comments |
|
||||
| ------- | -------------- | ------------ | ------------------ | -------------- | ------------------------- | ------------------------------------- |
|
||||
| merlin5 | computing node | 50GB / SAS | ``/scratch`` | ``N/A`` | ``N/A`` | ``merlin-c-[01-64]`` |
|
||||
| merlin6 | login node | 100GB / SAS | ``/scratch`` | ``N/A`` | ``N/A`` | ``merlin-l-0[1,2]`` |
|
||||
| merlin6 | computing node | 1.3TB / NVMe | ``/scratch`` | 50TB / GPFS | ``/shared-scratch`` | ``merlin-c-[001-022,101-122,201-222`` |
|
||||
| merlin6 | login node | 2.0TB / NVMe | ``/scratch`` | ``N/A`` | ``N/A`` | ``merlin-l-00[1,2]`` |
|
||||
|
||||
### Scratch directories policy
|
||||
|
||||
* Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies.
|
||||
* By default, *always* use **local** first and only use **shared** if you specific use case needs a shared scratch area.
|
||||
* Temporary files *must be deleted at the end of the job by the user*.
|
||||
* Remaining files will be deleted by the system if detected.
|
||||
|
||||
---
|
@ -0,0 +1,76 @@
|
||||
---
|
||||
title: Requesting Merlin6 Accounts
|
||||
#tags:
|
||||
#keywords:
|
||||
last_updated: 13 June 2019
|
||||
#summary: ""
|
||||
sidebar: merlin6_sidebar
|
||||
permalink: /merlin6/request-account.html
|
||||
---
|
||||
|
||||
## Requesting Access to Merlin6
|
||||
|
||||
PSI users with their Linux account belonging to the **svc-cluster_merlin6** group are allowed to use Merlin6.
|
||||
|
||||
Registration for **Merlin6** access *must be done* through **[PSI Service Now](https://psi.service-now.com/psisp)**:
|
||||
|
||||
* Please open a ticket as *Incident Request*, with subject:
|
||||
|
||||
```bash
|
||||
Subject: [Merlin6] Access Request for user '$username'
|
||||
```
|
||||
|
||||
* Text content (please use always this template):
|
||||
|
||||
```bash
|
||||
Dear HelpDesk,
|
||||
|
||||
my name is $Name $Surname with PSI username $username and I would like to request access to the Merlin6 cluster.
|
||||
|
||||
Please add me to the following Unix groups:
|
||||
* 'svc-cluster_merlin6'
|
||||
|
||||
Thanks a lot,
|
||||
$Name $Surname
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Requesting Access to Merlin5
|
||||
|
||||
Merlin5 computing nodes will be available for some time as a **best effort** service.
|
||||
For accessing the old Merlin5 resources, users should belong to the **svc-cluster_merlin5** Unix Group.
|
||||
|
||||
Registration for **Merlin5** access *must be done* through **[PSI Service Now](https://psi.service-now.com/psisp)**:
|
||||
|
||||
* Please open a ticket as *Incident Request*, with subject:
|
||||
|
||||
```bash
|
||||
Subject: [Merlin5] Access Request for user '$username'
|
||||
```
|
||||
|
||||
* Text content (please use always this template):
|
||||
|
||||
```bash
|
||||
Dear HelpDesk,
|
||||
|
||||
my name is $Name $Surname with PSI username $username and I would like to request access to the Merlin5 cluster.
|
||||
|
||||
Please add me to the following Unix groups:
|
||||
* 'svc-cluster_merlin5'
|
||||
|
||||
Thanks a lot,
|
||||
$Name $Surname
|
||||
```
|
||||
---
|
||||
|
||||
## Requesting extra Unix groups
|
||||
|
||||
* Some users may require to be added to some extra specific Unix groups.
|
||||
* This will grant access to specific resources.
|
||||
* In example, some BIO groups may belong to a specific BIO group for having access to the project area for that group.
|
||||
* Supervisors should inform new users which extra groups are needed.
|
||||
* When requesting access to **[Merlin6](##Requesting-Access-to-Merlin6)** or **[Merlin5](##Requesting-Access-to-Merlin5)**, extra groups can be added in the same *Incident Request*
|
||||
* Alternatively, this step can be done later in the future on a different **[PSI Service Now](https://psi.service-now.com/psisp)** ticket.
|
||||
* If you want to request access for Merlin5 and Merlin6
|
||||
* Use the template **[Requesting Access to Merlin6](##Requesting-Access-to-Merlin6)** and add also the **``'svc-cluster_merlin5'``** Unix Group to the request.
|
Reference in New Issue
Block a user