Migrating merlin6 user guide from jekyll-example1

From lsm-hpce/jekyll-example1 1eada07
2019-06-14 15:38:22 +02:00
parent 7c6f7b177d
commit ebff53c62c
19 changed files with 598 additions and 763 deletions
--- a/pages/merlin6/accessing-merlin6/accessing-interactive-nodes.md
+++ b/pages/merlin6/accessing-merlin6/accessing-interactive-nodes.md
@ -0,0 +1,57 @@
+---
+title: Accessing Interactive Nodes
+#tags:
+#keywords:
+last_updated: 13 June 2019
+#summary: ""
+sidebar: merlin6_sidebar
+permalink: /merlin6/interactive.html
+---
+
+
+## Login nodes description
+
+The Merlin6 login nodes are the official machines for accessing the Merlin6 cluster.
+From these machines, users can submit jobs to the Slurm batch system as well as visualize or compile their software.
+
+The Merlin6 login nodes are the following:
+
+| Hostname            | SSH | NoMachine | #cores      | CPU                                | Memory | Scratch    | Scratch Mountpoint |
+| ------------------- | --- | --------- | ----------- |:---------------------------------- | ------ | ---------- |:------------------ |
+| merlin-l-01.psi.ch  | yes | -         | 32 (2 x 16) | 2 x Intel Xeon E5-2697A v4 2.60GHz | 512GB  | 100GB SAS  | ``/scratch``       |
+| merlin-l-02.psi.ch  | yes | yes       | 32 (2 x 16) | 2 x Intel Xeon E5-2697A v4 2.60GHz | 512GB  | 100GB SAS  | ``/scratch``       |
+| merlin-l-001.psi.ch | -   | -         | 44 (2 x 22) | 2 x Intel Xeon Gold 6152 2.10GHz   | 512GB  | 2.0TB NVMe | ``/scratch``       |
+| merlin-l-002.psi.ch | -   | -         | 44 (2 x 22) | 2 x Intel Xeon Gold 6142 2.10GHz   | 512GB  | 2.0TB NVMe | ``/scratch``       |
+
+* ``merlin-l-001`` and ``merlin-l-002`` are not in production yet, hence SSH access is not possible.
+
+---
+
+## Remote Access
+
+### SSH Access
+
+For interactive command access, use a SSH client. We recommend to use X11 forwarding, despite is not the official way supported. It may help opening X applications.
+
+For Linux:
+
+```bash
+ssh -XY $username@merlin-l-01.psi.ch
+```
+
+X applications are supported in the login nodes and X11 forwarding can be used for those users who have properly configured X11 support in their desktops:
+* Merlin6 administrators **do not offer support** for user desktop configuration (Windows, MacOS, Linux).
+   * Hence, Merlin6 administrators **do not offer official support** for X11 client setup.
+   * However, a generic guide for X11 client setup (Windows, Linux and MacOS) will be provided.
+* PSI desktop configuration issues must be addressed through  **[PSI Service Now](https://psi.service-now.com/psisp)** as an *Incident Request*.
+   * Ticket will be redirected to the corresponding Desktop support group (Windows, Linux).
+
+### NoMachine Access
+
+X applications are supported in the login nodes and can run through NoMachine. This service is officially supported in the Merlin6 cluster and is the official X service.
+* NoMachine *client installation* support has to be requested through **[PSI Service Now](https://psi.service-now.com/psisp)** as an *Incident Request*.
+   * Ticket will be redirected to the corresponding support group (Windows or Linux)
+* NoMachine *client configuration* and *connectivity* for Merlin6 is fully supported by Merlin6 administrators.
+   * Please contact us through the official channels on any configuration issue with NoMachine.
+
+---
--- a/pages/merlin6/accessing-merlin6/accessing-merlin6.md
+++ b/pages/merlin6/accessing-merlin6/accessing-merlin6.md
@ -0,0 +1,11 @@
+---
+title: Accessing Merlin6
+#tags:
+#keywords:
+last_updated: 13 June 2019
+#summary: ""
+sidebar: merlin6_sidebar
+permalink: /merlin6/accessing-merlin6.html
+---
+
+In this chapter is shown how to access to the Merlin6 cluster.
--- a/pages/merlin6/accessing-merlin6/accessing-slurm.md
+++ b/pages/merlin6/accessing-merlin6/accessing-slurm.md
@ -0,0 +1,98 @@
+---
+title: Accessing Slurm Cluster
+#tags:
+#keywords:
+last_updated: 13 June 2019
+#summary: ""
+sidebar: merlin6_sidebar
+permalink: /merlin6/slurm-access.html
+---
+
+## The Merlin6 Slurm batch system
+
+Clusters at PSI use the [Slurm Workload Manager](http://slurm.schedmd.com/) as the batch system technology for managing and scheduling jobs.
+Historically, *Merlin4* and *Merlin5* also used Slurm. In the same way, **Merlin6** has been also configured with this batch system.
+
+Slurm has been installed in a **multi-clustered** configuration, allowing to integrate multiple clusters in the same batch system.
+* Two different Slurm clusters exist: **merlin5** and **merlin6**.
+   * **merlin5** is a cluster with very old hardware (out-of-warranty).
+   * **merlin5** will exist as long as hardware incidents are soft and easy to repair/fix (i.e. hard disk replacement)
+   * **merlin6** is the default cluster when submitting jobs.
+
+This document is mostly focused on the **merlin6** cluster. Details for **merlin5** are not shown here, and only basic access and recent
+changes will be explained (**[Official Merlin5 User Guide](https://intranet.psi.ch/PSI_HPC/Merlin5)** is still valid).
+
+### Merlin6 Slurm Configuration Details
+
+For understanding the Slurm configuration setup in the cluster, sometimes can be useful to check the following files:
+
+* ``/etc/slurm/slurm.conf`` - can be found in the login nodes and computing nodes.
+* ``/etc/slurm/cgroup.conf`` - can be found in the computing nodes, is also propagated to login nodes for user read access.
+* ``/etc/slurm/gres.conf`` - can be found in the GPU nodes, is also propgated to login nodes and computing nodes for user read access.
+
+The previous configuration files can be found in the *login nodes* correspond exclusively to the **merlin6** cluster configuration files. These
+configuration files are also present in the **merlin6** *computing nodes*.
+
+Slurm configuration files for the old **merlin5** cluster have to be directly checked on any of the **merlin5** *computing nodes*: those files *do
+not* exist in the **merlin6** *login nodes*.
+
+### Merlin5 Access
+
+Keeping the **merlin5** cluster will allow running jobs in the old computing nodes until users have fully migrated their codes to the new cluster.
+
+From July 2019, **merlin6** becomes the **default cluster** and any job submitted to Slurm will be submitted to that cluster.
+However, users can keep submitting to the old **merlin5** computing nodes by using the option ``--cluster=merlin5`` and using the corresponding
+Slurm partition with ``--partition=merlin``. In example:
+
+```bash
+srun --clusters=merlin5 --partition=merlin hostname
+sbatch --clusters=merlin5 --partition=merlin myScript.batch
+```
+
+---
+
+## Using Slurm 'merlin6' cluster
+
+Basic usage for the **merlin6** cluster will be detailed here. For advanced usage, please use the following document [LINK TO SLURM ADVANCED CONFIG]()
+
+### Merlin6 Node definition
+
+The following table show default and maximum resources that can be used per node:
+
+| Nodes                              | Def.#CPUs | Max.#CPUs | Def.Mem/CPU | Max.Mem/CPU | Max.Mem/Node | Max.Swap | Def.#GPUs | Max.#GPUs |
+|:---------------------------------- | ---------:| ---------:| -----------:| -----------:| ------------:| --------:| --------- | --------- |
+| merlin-c-[001-022,101-122,201-222] | 1 core    | 44 cores  | 8000        | 352000      | 352000       | 10000    | N/A       | N/A       |
+| merlin-g-[001]                     | 1 core    | 8 cores   | 8000        | 102498      | 102498       | 10000    | 1         | 2         |
+| merlin-g-[002-009]                 | 1 core    | 10 cores  | 8000        | 102498      | 102498       | 10000    | 1         | 4         |
+
+If nothing is specified, by default each core will use up to 8GB of memory. More memory per core can be specified with the ``--mem=<memory>`` option,
+and maximum memory allowed is ``Max.Mem/Node``.
+
+In *Merlin6*, memory is considered a Consumable Resource, as well as the CPU.
+
+### Merlin6 Slurm partitions
+
+Partition can be specified when submitting a job with the ``--partition=<partitionname>`` option.
+The following *partitions* (also known as *queues*) are configured in Slurm:
+
+| Partition   | Default Partition | Default Time | Max Time | Max Nodes | Priority |
+|:----------- | ----------------- | ------------ | -------- | --------- | -------- |
+| **general** | true              | 1 day        | 1 week   | 50        | low      |
+| **daily**   | false             | 1 day        | 1 day    | 60        | medium   |
+| **hourly**  | false             | 1 hour       | 1 hour   | unlimited | highest  |
+
+General is the *default*, so when nothing is specified job will be by default assigned to that partition. General can not have more than 50 nodes
+running jobs. For **daily** this limitation is extended to 60 nodes while for **hourly** there are no limits. Shorter jobs have more priority than
+longer jobs, hence in general terms would be scheduled before (however, other factors such like user fair share value can affect to this decision).
+
+### Merlin6 User limits
+
+By default, users can not use more than 528 cores at the same time (Max CPU per user). This limit applies for the **general**  and **daily** partitions. For the **hourly** partition, there is no restriction.
+These limits are softed for the **daily** partition during non working hours and during the weekend as follows:
+
+| Partition   | Mon-Fri 08h-18h | Sun-Thu 18h-0h | From Fri 18h to Sun 8h  | From Sun 8h to Mon 18h |
+|:----------- | --------------- | -------------- | ----------------------- | ---------------------- |
+| **general** | 528             | 528            | 528                     | 528                    |
+| **daily**   | 528             | 792            | Unlimited               | 792                    |
+| **hourly**  | Unlimited       | Unlimited      | Unlimited               | Unlimited              |
+
--- a/pages/merlin6/accessing-merlin6/merlin6-directories.md
+++ b/pages/merlin6/accessing-merlin6/merlin6-directories.md
@ -0,0 +1,156 @@
+---
+title: Merlin6 Data Directories
+#tags:
+#keywords:
+last_updated: 13 June 2019
+#summary: ""
+sidebar: merlin6_sidebar
+permalink: /merlin6/data-directories.html
+---
+
+## Merlin6 directory structure
+
+Merlin6 contain the following directories available for users:
+
+* ``/psi/home/<username>``: private user **home** directory
+* ``/data/user/<username>``: private user **home** directory
+* ``/data/project/general/<projectname>``: Shared **Project** directory
+   * For BIO experiments, a dedicate ``/data/project/bio/$projectname`` exists.
+* ``/scratch``: Local *scratch* disk.
+* ``/shared-scratch``: Shared *scratch* disk.
+
+A summary for each directory would be:
+
+| Directory                          | Block Quota [Soft:Hard] | Block Quota [Soft:Hard] | Quota Change Policy: Block        | Quota Change Policy: Files       | Backup |  Backup Policy                 |
+| ---------------------------------- | ----------------------- | ----------------------- |:--------------------------------- |:-------------------------------- | ------ | :----------------------------- |
+| /psi/home/$username                | USR [10GB:11GB]         | *Undef*                 | Up to x2 when strictly justified. | N/A                              | yes    | Daily snapshots for 1 week     |
+| /data/user/$username               | USR [1TB:1.074TB]       | USR [1M:1.1M]           | Inmutable. Need a project.        | Changeable when justified.       | no     | Users responsible for backup   |
+| /data/project/bio/$projectname     | GRP [1TB:1.074TB]       | GRP [1M:1.1M]           | Subject to project requirements.  | Subject to project requirements. | no     | Project responsible for backup |
+| /data/project/general/$projectname | GRP [1TB:1.074TB]       | GRP [1M:1.1M]           | Subject to project requirements.  | Subject to project requirements. | no     | Project responsible for backup |
+| /scratch                           | *Undef*                 | *Undef*                 | N/A                               | N/A                              | no     | N/A                            |
+| /shared-scratch                    | *Undef*                 | *Undef*                 | N/A                               | N/A                              | no     | N/A                            |
+
+---
+
+## User home directory
+
+Home directories are part of the PSI NFS Central Home storage provided by AIT.
+However, administration for the Merlin6 NFS homes is delegated to Merlin6 administrators.
+
+This is the default directory users will land when login in to any Merlin6 machine.
+This directory is mounted in the login and computing nodes under the directory:
+
+```bash
+/psi/home/$username
+```
+
+Users can check their quota by running the following command:
+
+```bash
+quota -s
+```
+
+### Home directory policy
+
+* Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies.
+* Is **forbidden** to use the home directories for IO intensive tasks
+   * Use ``/scratch``, ``/shared-scratch``, ``/data/user`` or ``/data/project`` for this purpose.
+* Users can recover up to 1 week of their lost data thanks to the automatic **daily snapshorts for 1 week**.
+Snapshots are found in the following directory:
+
+   ```bash
+   /psi/home/.snapshop/$username
+   ```
+
+---
+
+## User data directory
+
+User data directories are part of the Merlin6 storage cluster and technology is based on GPFS.
+
+The user data directory is intended for *fast IO access* and keeping large amount of private data.
+This directory is mounted in the login and computing nodes under the directory
+
+ ```bash
+/data/user/$username
+```
+
+Users can check their quota by running the following command:
+
+```bash
+mmlsquota -u <username> --block-size auto merlin-user
+```
+
+### User Directory policy
+
+* Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies.
+* Is **forbidden** to use the data directories as ``scratch`` area during a job runtime.
+   * Use ``/scratch``, ``/shared-scratch`` for this purpose.
+* No backup policy is applied for user data directories: users are responsible for backing up their data.
+
+---
+
+## Project data directory
+
+Project data directories are part of the Merlin6 storage cluster and technology is based on GPFS.
+
+This storage is intended for *fast IO access* and keeping large amount of private data, but also for sharing data amogst
+different users sharing a project.
+Creating a project is the way in where users can expand his storage space and will optimize the usage of the storage
+(by avoiding for instance, duplicated data for different users).
+
+Is **highly** recommended the use of a project when multiple persons are involved in the same project managing similar/common data.
+Quotas are defined in a *group* and *fileset* basis: Unix Group name must exist for a specific project or must be created for
+any new project. Contact the Merlin6 administrators for more information about that.
+
+The project data directory is mounted in the login and computing nodes under the dirctory:
+
+```bash
+/data/project/$username
+```
+
+Users can check the project quota by running the following command:
+
+```bash
+mmrepquota merlin-proj:$projectname
+```
+
+### Project Directory policy
+
+* Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies.
+* Is **forbidden** to use the data directories as ``scratch`` area during a job runtime.
+  * Use ``/scratch``, ``/shared-scratch`` for this purpose.
+* No backups: users are responsible for managing the backups of their data directories.
+
+---
+
+## Scratch directories
+
+There are two different types of scratch disk: **local** (``/scratch``) and **shared** (``/shared-scratch``).
+Specific details of each type is described below.
+
+Usually **shared** scratch will be used for those jobs running on multiple nodes which need to access to a common shared space
+for creating temporary files, while **local** scratch should be used by those jobs needing a local space for creating temporary files.
+
+**local** scratch in Merlin6 computing nodes provides a huge number of IOPS thanks to the NVMe technology,
+while **shared** scratch, despite being also very fast, is an external GPFS storage with more latency.
+
+``/shared-scratch`` is only mounted in the *Merlin6* computing nodes, and its current size is 50TB. Whenever necessary, it can be increased in the future.
+
+A summary for the scratch directories is the following:
+
+| Cluster | Service        | Scratch      | Scratch Mountpoint | Shared Scratch | Shared Scratch Mountpoint | Comments                              |
+| ------- | -------------- | ------------ | ------------------ | -------------- | ------------------------- | ------------------------------------- |
+| merlin5 | computing node | 50GB  / SAS  | ``/scratch``       | ``N/A``        | ``N/A``                   | ``merlin-c-[01-64]``                  |
+| merlin6 | login node     | 100GB / SAS  | ``/scratch``       | ``N/A``        | ``N/A``                   | ``merlin-l-0[1,2]``                   |
+| merlin6 | computing node | 1.3TB / NVMe | ``/scratch``       | 50TB / GPFS    | ``/shared-scratch``       | ``merlin-c-[001-022,101-122,201-222`` |
+| merlin6 | login node     | 2.0TB / NVMe | ``/scratch``       | ``N/A``        | ``N/A``                   | ``merlin-l-00[1,2]``                  |
+
+### Scratch directories policy
+
+* Read **[Important: Code of Conduct](## Important: Code of Conduct)** for more information about Merlin6 policies.
+* By default, *always* use **local** first and only use **shared** if you specific use case needs a shared scratch area.
+* Temporary files *must be deleted at the end of the job by the user*.
+   * Remaining files will be deleted by the system if detected.
+
+---
--- a/pages/merlin6/accessing-merlin6/requesting-merlin6-accounts.md
+++ b/pages/merlin6/accessing-merlin6/requesting-merlin6-accounts.md
@ -0,0 +1,76 @@
+---
+title: Requesting Merlin6 Accounts
+#tags:
+#keywords:
+last_updated: 13 June 2019
+#summary: ""
+sidebar: merlin6_sidebar
+permalink: /merlin6/request-account.html
+---
+
+## Requesting Access to Merlin6
+
+PSI users with their Linux account belonging to the **svc-cluster_merlin6** group are allowed to use Merlin6.
+
+Registration for **Merlin6** access *must be done* through **[PSI Service Now](https://psi.service-now.com/psisp)**:
+
+* Please open a ticket as *Incident Request*, with subject:
+
+   ```bash
+   Subject: [Merlin6] Access Request for user '$username'
+   ```
+
+* Text content (please use always this template):
+
+   ```bash
+   Dear HelpDesk,
+
+   my name is $Name $Surname with PSI username $username and I would like to request access to the Merlin6 cluster.
+
+   Please add me to the following Unix groups:
+    * 'svc-cluster_merlin6'
+
+   Thanks a lot,
+   $Name $Surname
+   ```
+
+---
+
+## Requesting Access to Merlin5
+
+Merlin5 computing nodes will be available for some time as a **best effort** service.
+For accessing the old Merlin5 resources, users should belong to the **svc-cluster_merlin5** Unix Group.
+
+Registration for **Merlin5** access *must be done* through **[PSI Service Now](https://psi.service-now.com/psisp)**:
+
+* Please open a ticket as *Incident Request*, with subject:
+
+   ```bash
+   Subject: [Merlin5] Access Request for user '$username'
+   ```
+
+* Text content (please use always this template):
+
+   ```bash
+   Dear HelpDesk,
+
+   my name is $Name $Surname with PSI username $username and I would like to request access to the Merlin5 cluster.
+
+   Please add me to the following Unix groups:
+    * 'svc-cluster_merlin5'
+
+   Thanks a lot,
+   $Name $Surname
+   ```
+---
+
+## Requesting extra Unix groups
+
+* Some users may require to be added to some extra specific Unix groups.
+   * This will grant access to specific resources.
+      * In example, some BIO groups may belong to a specific BIO group for having access to the project area for that group.
+   * Supervisors should inform new users which extra groups are needed.
+* When requesting access to **[Merlin6](##Requesting-Access-to-Merlin6)** or **[Merlin5](##Requesting-Access-to-Merlin5)**, extra groups can be added in the same *Incident Request*
+   * Alternatively, this step can be done later in the future on a different **[PSI Service Now](https://psi.service-now.com/psisp)** ticket.
+* If you want to request access for Merlin5 and Merlin6
+   * Use the template **[Requesting Access to Merlin6](##Requesting-Access-to-Merlin6)** and add also the **``'svc-cluster_merlin5'``** Unix Group to the request.