diff --git a/_data/sidebars/home_sidebar.yml b/_data/sidebars/home_sidebar.yml index 75b1213..53fac26 100644 --- a/_data/sidebars/home_sidebar.yml +++ b/_data/sidebars/home_sidebar.yml @@ -13,7 +13,7 @@ entries: - title: News url: /news.html output: web - - title: Merlin7 HPC Cluster (W.I.P.) + - title: Merlin7 HPC Cluster url: /merlin7/introduction.html output: web - title: Merlin6 HPC Cluster diff --git a/_data/sidebars/merlin7_sidebar.yml b/_data/sidebars/merlin7_sidebar.yml index 4c9eb70..98be0de 100644 --- a/_data/sidebars/merlin7_sidebar.yml +++ b/_data/sidebars/merlin7_sidebar.yml @@ -54,6 +54,10 @@ entries: url: /merlin7/interactive-jobs.html - title: Slurm Batch Script Examples url: /merlin7/slurm-examples.html + - title: Jupyterhub + folderitems: + - title: Jupyterhub service + url: /merlin7/jupyterhub.html - title: Software Support folderitems: - title: PSI Modules diff --git a/pages/merlin7/03-Slurm-General-Documentation/merlin7-configuration.md b/pages/merlin7/03-Slurm-General-Documentation/merlin7-configuration.md index 69f22ad..80d6bdd 100644 --- a/pages/merlin7/03-Slurm-General-Documentation/merlin7-configuration.md +++ b/pages/merlin7/03-Slurm-General-Documentation/merlin7-configuration.md @@ -25,8 +25,8 @@ The specification of the node types is: | ----: | ------ | --- | --- | ---- | | Login Nodes | 2 | _2x_ AMD EPYC 7742 (x86_64 Rome, 64 Cores, 2.25GHz) | 512GB DDR4 3200Mhz | | | CPU Nodes | 77 | _2x_ AMD EPYC 7742 (x86_64 Rome, 64 Cores, 2.25GHz) | 512GB DDR4 3200Mhz | | -| A100 GPU Nodes | 8 | _2x_ AMD EPYC 7713 (x86_64 Milan, 64 Cores, 3.2GHz) | 512GB DDR4 3200Mhz | 4 x NV_A100 (80GB) | -| GH GPU Nodes | 5 | _2x_ NVidia Grace Neoverse-V2 (SBSA ARM 64bit, 144 Cores, 3.1GHz) | _2x_ 480GB DDR5X (CPU+GPU) | 4 x NV_GH200 (120GB) | +| A100 GPU Nodes | 5 | _2x_ AMD EPYC 7713 (x86_64 Milan, 64 Cores, 3.2GHz) | 512GB DDR4 3200Mhz | 4 x NV_A100 (80GB) | +| GH GPU Nodes | 3 | _2x_ NVidia Grace Neoverse-V2 (SBSA ARM 64bit, 144 Cores, 3.1GHz) | _2x_ 480GB DDR5X (CPU+GPU) | 4 x NV_GH200 (120GB) | ### Network diff --git a/pages/merlin7/03-Slurm-General-Documentation/slurm-configuration.md b/pages/merlin7/03-Slurm-General-Documentation/slurm-configuration.md index c827587..c6c1e4a 100644 --- a/pages/merlin7/03-Slurm-General-Documentation/slurm-configuration.md +++ b/pages/merlin7/03-Slurm-General-Documentation/slurm-configuration.md @@ -14,11 +14,12 @@ This documentation shows basic Slurm configuration and options needed to run job ### CPU public partitions -| PartitionName | DefaultTime | MaxTime | Priority | Account | Per Job Limits | Per User Limits | -| -----------------: | -----------: | ----------: | -------: | ---------------: | -----------------: | -----------------: | -| **general** | 1-00:00:00 | 7-00:00:00 | Low | merlin | cpu=1024,mem=1920G | cpu=1024,mem=1920G | -| **daily** | 0-01:00:00 | 1-00:00:00 | Medium | merlin | cpu=1024,mem=1920G | cpu=2048,mem=3840G | -| **hourly** | 0-00:30:00 | 0-01:00:00 | High | merlin | cpu=2048,mem=3840G | cpu=8192,mem=15T | +| PartitionName | DefaultTime | MaxTime | Priority | Account | Per Job Limits | Per User Limits | +| -----------------: | -----------: | ----------: | -------: | ---------------: | --------------------: | --------------------: | +| **general** | 1-00:00:00 | 7-00:00:00 | Low | merlin | cpu=1024,mem=1920G | cpu=1024,mem=1920G | +| **daily** | 0-01:00:00 | 1-00:00:00 | Medium | merlin | cpu=1024,mem=1920G | cpu=2048,mem=3840G | +| **hourly** | 0-00:30:00 | 0-01:00:00 | High | merlin | cpu=2048,mem=3840G | cpu=8192,mem=15T | +| **interactive** | 0-04:00:00 | 0-12:00:00 | Highest | merlin | cpu=16,mem=30G,node=1 | cpu=32,mem=60G,node=1 | ### GPU public partitions @@ -110,12 +111,13 @@ various QoS definitions applicable to the merlin7 CPU-based cluster. Here: * `MaxTRES` specifies resource limits per job. * `MaxTRESPU` specifies resource limits per user. -| Name | MaxTRES | MaxTRESPU | Scope | -| --------------: | -----------------: | -----------------: | ---------------------: | -| **normal** | | | partition | -| **cpu_general** | cpu=1024,mem=1920G | cpu=1024,mem=1920G | user, partition | -| **cpu_daily** | cpu=1024,mem=1920G | cpu=2048,mem=3840G | partition | -| **cpu_hourly** | cpu=2048,mem=3840G | cpu=8192,mem=15T | partition | +| Name | MaxTRES | MaxTRESPU | Scope | +| -------------------: | --------------------: | --------------------: | ---------------------: | +| **normal** | | | partition | +| **cpu_general** | cpu=1024,mem=1920G | cpu=1024,mem=1920G | user, partition | +| **cpu_daily** | cpu=1024,mem=1920G | cpu=2048,mem=3840G | partition | +| **cpu_hourly** | cpu=2048,mem=3840G | cpu=8192,mem=15T | partition | +| **cpu_interactive** | cpu=16,mem=30G,node=1 | cpu=32,mem=60G,node=1 | partition | Where: * **`normal` QoS:** This QoS has no limits and is typically applied to partitions that do not require user or job @@ -127,6 +129,8 @@ Where: with higher resource needs. * **`cpu_hourly` QoS:** Offers the least constraints, allowing more resources to be used for the `hourly` partition, which caters to very short-duration jobs. +* **`cpu_interactive` QoS:** Is restricted to one node and a few CPUs only, and is intended to be used when interactive +allocations are necessary (`salloc`, `srun`). For additional details, refer to the [CPU partitions](/merlin7/slurm-configuration.html#CPU-partitions) section. @@ -155,11 +159,12 @@ Always verify partition configurations for potential changes using the 'scon #### CPU public partitions -| PartitionName | DefaultTime | MaxTime | TotalNodes | PriorityJobFactor | PriorityTier | QoS | AllowAccounts | -| -----------------: | -----------: | ----------: | --------: | ----------------: | -----------: | ----------: | -------------: | -| **general** | 1-00:00:00 | 7-00:00:00 | 50 | 1 | 1 | cpu_general | merlin | -| **daily** | 0-01:00:00 | 1-00:00:00 | 62 | 500 | 1 | cpu_daily | merlin | -| **hourly** | 0-00:30:00 | 0-01:00:00 | 77 | 1000 | 1 | cpu_hourly | merlin | +| PartitionName | DefaultTime | MaxTime | TotalNodes | PriorityJobFactor | PriorityTier | QoS | AllowAccounts | +| -----------------: | -----------: | ----------: | --------: | ----------------: | -----------: | --------------: | -------------: | +| **general** | 1-00:00:00 | 7-00:00:00 | 46 | 1 | 1 | cpu_general | merlin | +| **daily** | 0-01:00:00 | 1-00:00:00 | 58 | 500 | 1 | cpu_daily | merlin | +| **hourly** | 0-00:30:00 | 0-01:00:00 | 77 | 1000 | 1 | cpu_hourly | merlin | +| **interactive** | 0-04:00:00 | 0-12:00:00 | 58 | 1 | 2 | cpu_interactive | merlin | All Merlin users are part of the `merlin` account, which is used as the *default account* when submitting jobs. Similarly, if no partition is specified, jobs are automatically submitted to the `general` partition by default. @@ -174,6 +179,20 @@ The **`hourly`** partition may include private nodes as an additional buffer. Ho by **`PriorityTier`**, ensures that jobs submitted to private partitions are prioritized and processed first. As a result, access to the **`hourly`** partition might experience delays in such scenarios. +The **`interactive`** partition is designed specifically for real-time, interactive work. Here are the key characteristics: + +* **CPU Oversubscription:** This partition allows CPU oversubscription (configured as `FORCE:4`), meaning that up to four interactive +jobs may share the same physical CPU core. This can impact performance, but enables fast access for short-term tasks. +* **Highest Scheduling Priority:** Jobs submitted to the interactive partition are always prioritized. They will be scheduled +before any jobs in other partitions. +* **Intended Use:** This partition is ideal for debugging, testing, compiling, short interactive runs, and other activities where +immediate access is important. + +{{site.data.alerts.warning}} +Because of CPU sharing, the performance on the **'interactive'** partition may not be optimal for compute-intensive tasks. +For long-running or production workloads, use a dedicated batch partition instead. +{{site.data.alerts.end}} + #### CPU private partitions ##### CAS / ASA diff --git a/pages/merlin7/03-Slurm-General-Documentation/slurm-examples.md b/pages/merlin7/03-Slurm-General-Documentation/slurm-examples.md index 7bcb7b4..1fcf414 100644 --- a/pages/merlin7/03-Slurm-General-Documentation/slurm-examples.md +++ b/pages/merlin7/03-Slurm-General-Documentation/slurm-examples.md @@ -8,13 +8,6 @@ sidebar: merlin7_sidebar permalink: /merlin7/slurm-examples.html --- -![Work In Progress](/images/WIP/WIP1.webp){:style="display:block; margin-left:auto; margin-right:auto"} - -{{site.data.alerts.warning}}The Merlin7 documentation is Work In Progress. -Please do not use or rely on this documentation until this becomes official. -This applies to any page under https://hpce.pages.psi.ch/merlin7/ -{{site.data.alerts.end}} - ## Single core based job examples ```bash diff --git a/pages/merlin7/04-Jupyterhub/jupyterhub.md b/pages/merlin7/04-Jupyterhub/jupyterhub.md new file mode 100644 index 0000000..ea39b44 --- /dev/null +++ b/pages/merlin7/04-Jupyterhub/jupyterhub.md @@ -0,0 +1,71 @@ +--- +title: Jupyterhub on Merlin7 +#tags: +keywords: jupyterhub, jupyter, jupyterlab, notebook, notebooks +last_updated: 24 July 2025 +summary: "Jupyterhub service description" +sidebar: merlin7_sidebar +permalink: /merlin7/jupyterhub.html +--- + +Jupyterhub provides [jupyter notebooks](https://jupyter.org/) that are launched on +cluster nodes of merlin and can be accessed through a web portal. + +## Accessing Jupyterhub and launching a session + +The service is available inside of PSI (or through a VPN connection) at + +**** + + + 1. **Login**: You will be presented with a **Login** web page for + authenticating with your PSI account. + 1. **Spawn job**: The **Spawner Options** page allows you to + specify the properties (Slurm partition, running time,...) of + the batch jobs that will be running your jupyter notebook. Once + you click on the `Spawn` button, your job will be sent to the + Slurm batch system. If the cluster is not currently overloaded + and the resources you requested are available, your job will + usually start within 30 seconds. + +### Recommended partitions + +Running on the `merlin7` cluster and using the `interactive` partition would +in general guarantee fast access to resources. Keep in mind, that this partition +has a limit of 12 hours. + +## Requesting additional resources + +The **Spawner Options** page covers the most common options. These are used to +create a submission script for the jupyterhub job and submit it to the slurm +queue. Additional customization can be implemented using the *'Optional user +defined line to be added to the batch launcher script'* option. This line is +added to the submission script at the end of other `#SBATCH` lines. Parameters can +be passed to SLURM by starting the line with `#SBATCH`, like in [Running Slurm +Scripts](/merlin7/running-jobs.html). Some ideas: + +**Request additional memory** + +``` +#SBATCH --mem=100G +``` + +**Request multiple GPUs** (gpu partition only) + +``` +#SBATCH --gpus=2 +``` + +**Log additional information** + +``` +hostname; date; echo $USER +``` + +Output is found in `~/jupyterhub_batchspawner_.log`. + +## Contact +In case of problems or requests, please either submit a **[PSI Service +Now](https://psi.service-now.com/psisp)** incident containing *"Merlin +Jupyterhub"* as part of the subject, or contact us by mail through +. diff --git a/pages/merlin7/99-support/contact.md b/pages/merlin7/99-support/contact.md index 69f41ce..47af0af 100644 --- a/pages/merlin7/99-support/contact.md +++ b/pages/merlin7/99-support/contact.md @@ -33,12 +33,12 @@ Basic contact information is also displayed on every shell login to the system u ## Get updated through the Merlin User list! -Is strictly recommended that users subscribe to the Merlin Users mailing list: **** +Is strictly recommended that users subscribe to the Merlin Users mailing list: **** This mailing list is the official channel used by Merlin administrators to inform users about downtimes, interventions or problems. Users can be subscribed in two ways: -* *(Preferred way)* Self-registration through **[Sympa](https://psilists.ethz.ch/sympa/info/merlin7-users)** +* *(Preferred way)* Self-registration through **[Sympa](https://psilists.ethz.ch/sympa/info/merlin-users)** * If you need to subscribe many people (e.g. your whole group) by sending a request to the admin list **** and providing a list of email addresses.