1 line
564 KiB
JSON
1 line
564 KiB
JSON
{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"],"fields":{"title":{"boost":1000.0},"text":{"boost":1.0},"tags":{"boost":1000000.0}}},"docs":[{"location":"","title":"Home","text":""},{"location":"#hpce-user-documentation","title":"HPCE User Documentation","text":"<p>The HPCE group is part of the PSI Center for Scientific Computing, Theory and Data at Paul Scherrer Institute. It provides a range of HPC services for PSI researchers, staff, and external collaborators, such as the Merlin series of HPC clusters. Furthermore the HPCE group engages in research activities on technologies (data analysis and machine learning technologies) used on these systems.</p>"},{"location":"cscs-userlab/introduction/","title":"PSI HPC@CSCS","text":""},{"location":"cscs-userlab/introduction/#psi-hpccscs","title":"PSI HPC@CSCS","text":"<p>PSI has a long standing collaboration with CSCS for offering high end HPC resources to PSI projects. PSI had co-invested in CSCS' initial Cray XT3 supercomputer Horizon in 2005 and we continue to procure a share on the CSCS flagship systems.</p> <p>The share is intended for projects that by their nature cannot profit from applying for regular CSCS user lab allocation schemes..</p> <p>We can also help PSI groups to procure additional resources based on the PSI conditions - please contact us in such a case.</p>"},{"location":"cscs-userlab/introduction/#yearly-survey-for-requesting-a-project-on-the-psi-share","title":"Yearly survey for requesting a project on the PSI share","text":"<p>At the end of each year we prepare a survey process and notify all subscribed users of the specialized PSI HPC@CSCS mailing list (see below) and the merlin cluster lists, to enter their next year resource requests. Projects receive resources in the form of allocations over the four quarters of the following year.</p> <p>The projects requests get reviewed and requests may get adapted to fit into the available capacity.</p> <p>The survey is done through ServiceNow, please navigate to Home > Service Catalog > Research Computing > Apply for computing resources at CSCS and submit the form.</p> <p>Applications will be reviewed and the final resource allocations, in case of oversubscription, will be arbitrated by a panel within CSD.</p>"},{"location":"cscs-userlab/introduction/#instructions-for-filling-out-the-2026-survey","title":"Instructions for filling out the 2026 survey","text":"<ul> <li>We have a budget of 100 kCHF for 2026, which translates to 435'000 multicore node hours or 35'600 node hours on the GPU Grace Hopper nodes.</li> <li>multicore projects: The minimum allocation is 10'000 node hours, an average project allocation amounts to 30'000 node hours</li> <li>GPU projects: The minimum allocation is 800 node hours, an average project allocation is 2000 node hours.</li> <li>You need to specify the total resource request for your project in node hours, and how you would like to split the resources over the 4 quarters. For the allocations per quarter year, please enter the number in percent (e.g. 25%, 25%, 25%, 25%). If you indicate nothing, a 25% per quarter will be assumed.</li> <li>We currently have a total of 65 TB of storage for all projects. Additional storage can be obtained, but large storage assignments are not in scope for these projects.</li> </ul>"},{"location":"cscs-userlab/introduction/#cscs-systems-reference-information","title":"CSCS Systems reference information","text":"<p>For 2025 we can offer access to CSCS Alps Eiger (CPU multicore) and Daint (GPU) systems.</p> <ul> <li>CSCS User Portal</li> <li>Documentation<ul> <li>CSCS Eiger CPU multicore cluster</li> <li>CSCS Daint GPU cluster</li> </ul> </li> </ul>"},{"location":"cscs-userlab/introduction/#contact-information","title":"Contact information","text":"<ul> <li>PSI Contacts:<ul> <li>Mailing list contact: psi-hpc-at-cscs-admin@lists.psi.ch<ul> <li>Marc Caubet Serrabou marc.caubet@psi.ch</li> <li>Derek Feichtinger derek.feichtinger@psi.ch</li> </ul> </li> <li>Mailing list for receiving user notifications and survey information: psi-hpc-at-cscs@lists.psi.ch (subscribe)</li> </ul> </li> </ul>"},{"location":"cscs-userlab/transfer-data/","title":"Transferring Data","text":""},{"location":"cscs-userlab/transfer-data/#transferring-data","title":"Transferring Data","text":"<p>This document shows how to transfer data between PSI and CSCS by using a Linux workstation.</p>"},{"location":"cscs-userlab/transfer-data/#preparing-ssh-configuration","title":"Preparing SSH configuration","text":"<p>If the directory <code>.ssh</code> does not exist in your home directory, create it with <code>0700</code> permissions:</p> Bash<pre><code>mkdir ~/.ssh\nchmod 0700 ~/.ssh\n</code></pre> <p>Then, if it does not exist, create a new file <code>.ssh/config</code>, otherwise add the following lines to the already existing file, by replacing <code>$cscs_accountname</code> by your CSCS <code>username</code>:</p> Bash<pre><code>Host daint.cscs.ch\n Compression yes\n ProxyJump ela.cscs.ch\nHost *.cscs.ch\n User $cscs_accountname\n</code></pre>"},{"location":"cscs-userlab/transfer-data/#advanced-ssh-configuration","title":"Advanced SSH configuration","text":"<p>There are many different SSH settings available which would allow advanced configurations. Users may have some configurations already present, therefore would need to adapt it accordingly.</p>"},{"location":"cscs-userlab/transfer-data/#transferring-files","title":"Transferring files","text":"<p>Once the above configuration is set, then try to rsync from Merlin to CSCS, on any direction:</p> Bash<pre><code># CSCS -> PSI\nrsync -azv daint.cscs.ch:<source_path> <destination_path>\n\n# PSI -> CSCS\nrsync -azv <source_path> daint.cscs.ch:<destination_path>\n</code></pre>"},{"location":"gmerlin6/cluster-introduction/","title":"Introduction","text":""},{"location":"gmerlin6/cluster-introduction/#introduction","title":"Introduction","text":""},{"location":"gmerlin6/cluster-introduction/#about-merlin6-gpu-cluster","title":"About Merlin6 GPU cluster","text":""},{"location":"gmerlin6/cluster-introduction/#introduction_1","title":"Introduction","text":"<p>Merlin6 was a the official PSI Local HPC cluster for development and mission-critical applications that has been built in 2019. It replaced the Merlin5 cluster.</p> <p>Merlin6 was designed to be extensible, so was technically possible to add more compute nodes and cluster storage without significant increase of the costs of the manpower and the operations.</p> <p>Merlin6 was mostly based on CPU resources, but also contained a small amount of GPU-based resources which are mostly used by the BIO experiments.</p>"},{"location":"gmerlin6/cluster-introduction/#slurm-gmerlin6","title":"Slurm 'gmerlin6'","text":"<p>THe GPU nodes have a dedicated Slurm cluster, called <code>gmerlin6</code>.</p> <p>This cluster contains the same shared storage resources (<code>/data/user</code>, <code>/data/project</code>, <code>/shared-scracth</code>, <code>/afs</code>, <code>/psi/home</code>) which are present in the Merlin6 Slurm CPU cluster. The Slurm <code>gmerlin6</code> cluster is maintainted independently to ease access for the users and keep independent user accounting.</p>"},{"location":"gmerlin6/cluster-introduction/#merlin6-architecture","title":"Merlin6 Architecture","text":""},{"location":"gmerlin6/cluster-introduction/#merlin6-cluster-architecture-diagram","title":"Merlin6 Cluster Architecture Diagram","text":"<p>The following image shows the Merlin6 cluster architecture diagram:</p> <p></p>"},{"location":"gmerlin6/cluster-introduction/#merlin6-slurm-cluster-architecture-design","title":"Merlin6 Slurm Cluster Architecture Design","text":"<p>The following image shows the Slurm architecture design for the Merlin6 clusters:</p> <p></p>"},{"location":"gmerlin6/hardware-and-software-description/","title":"Hardware And Software Description","text":""},{"location":"gmerlin6/hardware-and-software-description/#hardware-and-software-description","title":"Hardware And Software Description","text":""},{"location":"gmerlin6/hardware-and-software-description/#hardware","title":"Hardware","text":""},{"location":"gmerlin6/hardware-and-software-description/#gpu-computing-nodes","title":"GPU Computing Nodes","text":"<p>The GPU Merlin6 cluster was initially built from recycled workstations from different groups in the BIO division. From then, little by little it was updated with new nodes from sporadic investments from the same division, and it was never possible a central big investment. Hence, due to this, the Merlin6 GPU computing cluster has a non homogeneus solution, consisting on a big variety of hardware types and components.</p> <p>On 2018, for the common good, BIO decided to open the cluster to the Merlin users and make it widely accessible for the PSI scientists.</p> <p>The below table summarizes the hardware setup for the old Merlin6 GPU computing nodes:</p> Merlin6 GPU Computing Nodes Node Processor Sockets Cores Threads Scratch Memory GPUs GPU Model merlin-g-001 Intel Core i7-5960X 1 16 2 1.8TB 128GB 2 GTX1080 merlin-g-00[2-5] Intel Xeon E5-2640 2 20 1 1.8TB 128GB 4 GTX1080 merlin-g-006 Intel Xeon E5-2640 2 20 1 800GB 128GB 4 GTX1080Ti merlin-g-00[7-9] Intel Xeon E5-2640 2 20 1 3.5TB 128GB 4 GTX1080Ti merlin-g-01[0-3] Intel Xeon Silver 4210R 2 20 1 1.7TB 128GB 4 RTX2080Ti merlin-g-014 Intel Xeon Gold 6240R 2 48 1 2.9TB 384GB 8 RTX2080Ti merlin-g-015 Intel(R) Xeon Gold 5318S 2 48 1 2.9TB 384GB 8 RTX A5000"},{"location":"gmerlin6/hardware-and-software-description/#gwendolen","title":"Gwendolen","text":"<p>Currently only Gwendolen is available on <code>gmerlin6</code>.</p>"},{"location":"gmerlin6/hardware-and-software-description/#login-nodes","title":"Login Nodes","text":"<p>The login nodes are part of the Merlin6 HPC cluster, and are used to compile and to submit jobs to the different Merlin Slurm clusters (<code>merlin5</code>,<code>merlin6</code>,<code>gmerlin6</code>,etc.). Please refer to the Merlin6 Hardware Documentation for further information.</p>"},{"location":"gmerlin6/hardware-and-software-description/#storage","title":"Storage","text":"<p>The storage is part of the Merlin6 HPC cluster, and is mounted in all the Slurm clusters (<code>merlin5</code>,<code>merlin6</code>,<code>gmerlin6</code>,etc.). Please refer to the Merlin6 Hardware Documentation for further information.</p>"},{"location":"gmerlin6/hardware-and-software-description/#network","title":"Network","text":"<p>The Merlin6 cluster connectivity is based on the Infiniband FDR and EDR technologies. This allows fast access with very low latencies to the data as well as running extremely efficient MPI-based jobs. To check the network speed (56Gbps for FDR, 100Gbps for EDR) of the different machines, it can be checked by running on each node the following command:</p> Bash<pre><code>ibstat | grep Rate\n</code></pre>"},{"location":"gmerlin6/hardware-and-software-description/#software","title":"Software","text":"<p>In the Merlin6 GPU computing nodes, we try to keep software stack coherency with the main cluster Merlin6.</p> <p>Due to this, the Merlin6 GPU nodes run:</p> <ul> <li>RedHat Enterprise Linux 7</li> <li>Slurm, we usually try to keep it up to date with the most recent versions.</li> <li>GPFS v5</li> <li>MLNX_OFED LTS v.5.2-2.2.0.0 or newer for all ConnectX-4 or superior cards.</li> </ul>"},{"location":"gmerlin6/slurm-configuration/","title":"Slurm cluster 'gmerlin6'","text":""},{"location":"gmerlin6/slurm-configuration/#slurm-cluster-gmerlin6","title":"Slurm cluster 'gmerlin6'","text":"<p>This documentation shows basic Slurm configuration and options needed to run jobs in the GPU cluster.</p>"},{"location":"gmerlin6/slurm-configuration/#merlin6-gpu-nodes-definition","title":"Merlin6 GPU nodes definition","text":"<p>The table below shows a summary of the hardware setup for the different GPU nodes</p> Nodes Def.#CPUs Max.#CPUs #Threads Def.Mem/CPU Max.Mem/CPU Max.Mem/Node Max.Swap GPU Type Def.#GPUs Max.#GPUs merlin-g-[001] 1 core 8 cores 1 5120 102400 102400 10000 geforce_gtx_1080 1 2 merlin-g-[002-005] 1 core 20 cores 1 5120 102400 102400 10000 geforce_gtx_1080 1 4 merlin-g-[006-009] 1 core 20 cores 1 5120 102400 102400 10000 geforce_gtx_1080_ti 1 4 merlin-g-[010-013] 1 core 20 cores 1 5120 102400 102400 10000 geforce_rtx_2080_ti 1 4 merlin-g-014 1 core 48 cores 1 5120 360448 360448 10000 geforce_rtx_2080_ti 1 8 merlin-g-015 1 core 48 cores 1 5120 360448 360448 10000 A5000 1 8 merlin-g-100 1 core 128 cores 2 3900 998400 998400 10000 A100 1 8 <p>Tip</p> <p>Always check <code>/etc/slurm/gres.conf</code> and <code>/etc/slurm/slurm.conf</code> for changes in the GPU type and details of the hardware.</p>"},{"location":"gmerlin6/slurm-configuration/#running-jobs-in-the-gmerlin6-cluster","title":"Running jobs in the 'gmerlin6' cluster","text":"<p>In this chapter we will cover basic settings that users need to specify in order to run jobs in the GPU cluster.</p>"},{"location":"gmerlin6/slurm-configuration/#merlin6-gpu-cluster","title":"Merlin6 GPU cluster","text":"<p>To run jobs in the <code>gmerlin6</code> cluster users must specify the cluster name in Slurm:</p> Bash<pre><code>#SBATCH --cluster=gmerlin6\n</code></pre>"},{"location":"gmerlin6/slurm-configuration/#merlin6-gpu-partitions","title":"Merlin6 GPU partitions","text":"<p>Users might need to specify the Slurm partition. If no partition is specified, it will default to <code>gpu</code>:</p> Bash<pre><code>#SBATCH --partition=<partition_name> # Possible <partition_name> values: gpu, gpu-short, gwendolen\n</code></pre> <p>The table below resumes shows all possible partitions available to users:</p> GPU Partition Default Time Max Time PriorityJobFactor PriorityTier <code>gpu</code> 1 day 1 week 1 1 <code>gpu-short</code> 2 hours 2 hours 1000 500 <code>gwendolen</code> 30 minutes 2 hours 1000 1000 <code>gwendolen-long</code> 30 minutes 8 hours 1 1 <p>The PriorityJobFactor value will be added to the job priority (PARTITION column in <code>sprio -l</code> ). In other words, jobs sent to higher priority partitions will usually run first (however, other factors such like job age or mainly fair share might affect to that decision). For the GPU partitions, Slurm will also attempt first to allocate jobs on partitions with higher priority over partitions with lesser priority.</p> <p>Jobs submitted to a partition with a higher PriorityTier value will be dispatched before pending jobs in partition with lower PriorityTier value and, if possible, they will preempt running jobs from partitions with lower PriorityTier values.</p> <p>gwnedolen-long is a special partition which is enabled during non-working hours only. As of Nov 2023, the current policy is to disable this partition from Mon to Fri, from 1am to 5pm. However, jobs can be submitted anytime, but can only be scheduled outside this time range.</p>"},{"location":"gmerlin6/slurm-configuration/#merlin6-gpu-accounts","title":"Merlin6 GPU Accounts","text":"<p>Users need to ensure that the public <code>merlin</code> account is specified. No specifying account options would default to this account.</p> <p>This is mostly needed by users which have multiple Slurm accounts, which may define by mistake a different account.</p> Bash<pre><code>#SBATCH --account=merlin # Possible values: merlin, gwendolen\n</code></pre> <p>Not all the accounts can be used on all partitions. This is resumed in the table below:</p> Slurm Account Slurm Partitions <code>merlin</code> <code>gpu</code>,<code>gpu-short</code> <code>gwendolen</code> <code>gwendolen</code>,<code>gwendolen-long</code> <p>By default, all users belong to the <code>merlin</code> Slurm accounts, and jobs are submitted to the <code>gpu</code> partition when no partition is defined.</p> <p>Users only need to specify the <code>gwendolen</code> account when using the <code>gwendolen</code> or <code>gwendolen-long</code> partitions, otherwise specifying account is not needed (it will always default to <code>merlin</code>).</p>"},{"location":"gmerlin6/slurm-configuration/#the-gwendolen-account","title":"The 'gwendolen' account","text":"<p>For running jobs in the <code>gwendolen</code>/<code>gwendolen-long</code> partitions, users must specify the <code>gwendolen</code> account. The <code>merlin</code> account is not allowed to use the Gwendolen partitions.</p> <p>Gwendolen is restricted to a set of users belonging to the <code>unx-gwendolen</code> Unix group. If you belong to a project allowed to use Gwendolen, or you are a user which would like to have access to it, please request access to the <code>unx-gwendolen</code> Unix group through PSI Service Now: the request will be redirected to the responsible of the project (Andreas Adelmann).</p>"},{"location":"gmerlin6/slurm-configuration/#slurm-gpu-specific-options","title":"Slurm GPU specific options","text":"<p>Some options are available when using GPUs. These are detailed here.</p>"},{"location":"gmerlin6/slurm-configuration/#number-of-gpus-and-type","title":"Number of GPUs and type","text":"<p>When using the GPU cluster, users must specify the number of GPUs they need to use:</p> Bash<pre><code>#SBATCH --gpus=[<type>:]<number>\n</code></pre> <p>The GPU type is optional: if left empty, it will try allocating any type of GPU. The different <code>[<type>:]</code> values and <code><number></code> of GPUs depends on the node. This is detailed in the below table.</p> Nodes GPU Type #GPUs merlin-g-[001] <code>geforce_gtx_1080</code> 2 merlin-g-[002-005] <code>geforce_gtx_1080</code> 4 merlin-g-[006-009] <code>geforce_gtx_1080_ti</code> 4 merlin-g-[010-013] <code>geforce_rtx_2080_ti</code> 4 merlin-g-014 <code>geforce_rtx_2080_ti</code> 8 merlin-g-015 <code>A5000</code> 8 merlin-g-100 <code>A100</code> 8"},{"location":"gmerlin6/slurm-configuration/#constraint-features","title":"Constraint / Features","text":"<p>Instead of specifying the GPU type, sometimes users would need to specify the GPU by the amount of memory available in the GPU card itself.</p> <p>This has been defined in Slurm with Features, which is a tag which defines the GPU memory for the different GPU cards. Users can specify which GPU memory size needs to be used with the <code>--constraint</code> option. In that case, notice that in many cases there is not need to specify <code>[<type>:]</code> in the <code>--gpus</code> option.</p> Bash<pre><code>#SBATCH --contraint=<Feature> # Possible values: gpumem_8gb, gpumem_11gb, gpumem_24gb, gpumem_40gb\n</code></pre> <p>The table below shows the available Features and which GPU card models and GPU nodes they belong to:</p> Merlin6 GPU Computing Nodes Nodes GPU Type Feature merlin-g-[001-005] `geforce_gtx_1080` `gpumem_8gb` merlin-g-[006-009] `geforce_gtx_1080_ti` `gpumem_11gb` merlin-g-[010-014] `geforce_rtx_2080_ti` merlin-g-015 `A5000` `gpumem_24gb` merlin-g-100 `A100` `gpumem_40gb`"},{"location":"gmerlin6/slurm-configuration/#other-gpu-options","title":"Other GPU options","text":"<p>Alternative Slurm options for GPU based jobs are available. Please refer to the man pages</p> <p>for each Slurm command for further information about it (<code>man salloc</code>, <code>man sbatch</code>, <code>man srun</code>). Below are listed the most common settings:</p> Bash<pre><code>#SBATCH --hint=[no]multithread\n#SBATCH --ntasks=\\<ntasks\\>\n#SBATCH --ntasks-per-gpu=\\<ntasks\\>\n#SBATCH --mem-per-gpu=\\<size[units]\\>\n#SBATCH --cpus-per-gpu=\\<ncpus\\>\n#SBATCH --gpus-per-node=[\\<type\\>:]\\<number\\>\n#SBATCH --gpus-per-socket=[\\<type\\>:]\\<number\\>\n#SBATCH --gpus-per-task=[\\<type\\>:]\\<number\\>\n#SBATCH --gpu-bind=[verbose,]\\<type\\>\n</code></pre> <p>Please, notice that when defining <code>[<type>:]</code> once, then all other options must use it too!</p>"},{"location":"gmerlin6/slurm-configuration/#dealing-with-hyper-threading","title":"Dealing with Hyper-Threading","text":"<p>The <code>gmerlin6</code> cluster contains the partitions <code>gwendolen</code> and <code>gwendolen-long</code>, which have a node with Hyper-Threading enabled.</p> <p>In that case, one should always specify whether to use Hyper-Threading or not. If not defined, Slurm will generally use it (exceptions apply). For this machine, generally HT is recommended.</p> Bash<pre><code>#SBATCH --hint=multithread # Use extra threads with in-core multi-threading.\n#SBATCH --hint=nomultithread # Don't use extra threads with in-core multi-threading.\n</code></pre>"},{"location":"gmerlin6/slurm-configuration/#user-and-job-limits","title":"User and job limits","text":"<p>The GPU cluster contains some basic user and job limits to ensure that a single user can not overabuse the resources and a fair usage of the cluster. The limits are described below.</p>"},{"location":"gmerlin6/slurm-configuration/#per-job-limits","title":"Per job limits","text":"<p>These are limits applying to a single job. In other words, there is a maximum of resources a single job can use. Limits are defined using QoS, and this is usually set at the partition level. Limits are described in the table below with the format: <code>SlurmQoS(limits)</code> (possible <code>SlurmQoS</code> values can be listed with the command <code>sacctmgr show qos</code>):</p> Partition Slurm Account Mon-Sun 0h-24h gpu <code>merlin</code> gpu_week(gres/gpu=8) gpu-short <code>merlin</code> gpu_week(gres/gpu=8) gwendolen <code>gwendolen</code> No limits gwendolen-long <code>gwendolen</code> No limits, active from 9pm to 5:30am <ul> <li> <p>With the limits in the public <code>gpu</code> and <code>gpu-short</code> partitions, a single job using the <code>merlin</code> acccount</p> <p>(default account) can not use more than 40 CPUs, more than 8 GPUs or more than 200GB. Any job exceeding such limits will stay in the queue with the message <code>QOSMax[Cpu|GRES|Mem]PerJob</code>. As there are no more existing QoS during the week temporary overriding job limits (this happens for instance in the CPU daily partition), the job needs to be cancelled, and the requested resources must be adapted according to the above resource limits.</p> </li> <li> <p>The gwendolen and gwendolen-long partitions are two special partitions for a NVIDIA DGX A100 machine.</p> <p>Only users belonging to the <code>unx-gwendolen</code> Unix group can run in these partitions. No limits are applied (machine resources can be completely used).</p> </li> <li> <p>The <code>gwendolen-long</code> partition is available 24h. However,</p> <ul> <li>from 5:30am to 9pm the partition is <code>down</code> (jobs can be submitted, but can not run until the partition is set to <code>active</code>).</li> <li>from 9pm to 5:30am jobs are allowed to run (partition is set to <code>active</code>).</li> </ul> </li> </ul>"},{"location":"gmerlin6/slurm-configuration/#per-user-limits-for-gpu-partitions","title":"Per user limits for GPU partitions","text":"<p>These limits apply exclusively to users. In other words, there is a maximum of resources a single user can use. Limits are defined using QoS, and this is usually set at the partition level. Limits are described in the table below with the format: <code>SlurmQoS(limits)</code> (possible <code>SlurmQoS</code> values can be listed with the command <code>sacctmgr show qos</code>):</p> Partition Slurm Account Mon-Sun 0h-24h gpu <code>merlin</code> gpu_week(gres/gpu=16) gpu-short <code>merlin</code> gpu_week(gres/gpu=16) gwendolen <code>gwendolen</code> No limits gwendolen-long <code>gwendolen</code> No limits, active from 9pm to 5:30am <ul> <li> <p>With the limits in the public <code>gpu</code> and <code>gpu-short</code> partitions, a single user can not use more than 80 CPUs, more than 16 GPUs or more than 400GB.</p> <p>Jobs sent by any user already exceeding such limits will stay in the queue with the message <code>QOSMax[Cpu|GRES|Mem]PerUser</code>. In that case, job can wait in the queue until some of the running resources are freed.</p> </li> <li> <p>Notice that user limits are wider than job limits. In that way, a user can run up to two 8 GPUs based jobs, or up to four 4 GPUs based jobs, etc.</p> </li> </ul> <p>Warning</p> <p>Please try to avoid occupying all GPUs of the same type for several hours or multiple days, otherwise it would block other users needing the same type of GPU.</p>"},{"location":"gmerlin6/slurm-configuration/#advanced-slurm-configuration","title":"Advanced Slurm configuration","text":"<p>Clusters at PSI use the Slurm Workload Manager as the batch system technology for managing and scheduling jobs. Slurm has been installed in a multi-clustered configuration, allowing to integrate multiple clusters in the same batch system.</p> <p>For understanding the Slurm configuration setup in the cluster, sometimes may be useful to check the following files:</p> <ul> <li><code>/etc/slurm/slurm.conf</code> - can be found in the login nodes and computing nodes.</li> <li><code>/etc/slurm/gres.conf</code> - can be found in the GPU nodes, is also propgated to login nodes and computing nodes for user read access.</li> <li><code>/etc/slurm/cgroup.conf</code> - can be found in the computing nodes, is also propagated to login nodes for user read access.</li> </ul> <p>The previous configuration files which can be found in the login nodes, correspond exclusively to the merlin6 cluster configuration files.</p> <p>Configuration files for the old merlin5 cluster or for the gmerlin6 cluster must be checked directly on any of the merlin5 or gmerlin6 computing nodes (in example, by login in to one of the nodes while a job or an active allocation is running).</p>"},{"location":"meg/contact/","title":"Support","text":""},{"location":"meg/contact/#support","title":"Support","text":"<p>Support can be asked through:</p> <ul> <li>PSI Service Now</li> <li>E-Mail: meg-admins@lists.psi.ch</li> </ul> <p>Basic contact information is also displayed on every shell login to the system using the Message of the Day mechanism.</p>"},{"location":"meg/contact/#psi-service-now","title":"PSI Service Now","text":"<p>PSI Service Now: is the official PSI tool for opening incident requests. However, contact via email (see below) is preferred.</p> <ul> <li>PSI HelpDesk will redirect the incident to the corresponding department, or</li> <li>you can always assign it directly by checking the box <code>I know which service is affected</code> and providing the service name <code>Local HPC Resources (e.g. MEG) [CF]</code> (just type in <code>Local</code> and you should get the valid completions).</li> </ul>"},{"location":"meg/contact/#contact-meg-administrators","title":"Contact Meg Administrators","text":"<p>E-Mail meg-admins@lists.psi.ch or merlin-admins@lists.psi.ch</p> <ul> <li>This is the preferred way to contact MEG Administrators. Do not hesitate to contact us for such cases.</li> </ul>"},{"location":"meg/contact/#get-updated-through-the-merlin-user-list","title":"Get updated through the Merlin User list","text":"<p>Is strongly recommended that users subscribe to the Merlin Users mailing list: merlin-users@lists.psi.ch</p> <p>This mailing list is the official channel used by Merlin administrators to inform users about downtimes, interventions or problems. Users can be subscribed in two ways:</p> <ul> <li>(Preferred way) Self-registration through Sympa</li> <li>If you need to subscribe many people (e.g. your whole group) by sending a request to the admin list merlin-admins@lists.psi.ch and providing a list of email addresses.</li> </ul>"},{"location":"meg/contact/#the-meg-cluster-team","title":"The MEG Cluster Team","text":"<p>The PSI Merlin and MEG clusters are managed by the High Performance Computing and Emerging technologies Group, which is part of the Science IT Infrastructure, and Services department (AWI) in PSI's Center for Scientific Computing, Theory and Data (SCD).</p>"},{"location":"meg/introduction/","title":"The MEG local HPC cluster","text":""},{"location":"meg/introduction/#the-meg-local-hpc-cluster","title":"The MEG local HPC cluster","text":"<p>The MEG II collaboration includes almost 70 physicists from research institutions from five countries. Researchers and technicians from PSI have played a leading role, particularly with providing the high-quality beam, technical support in the detector integration, and in the design, construction, and operation of the detector readout electronics.\"</p> <p>\u2014\u2014 Source</p> <p>The MEG data analysis cluster is a cluster tightly coupled to Merlin and dedicated to the analysis of data from the MEG experiment. Operated for the Muon Physics group.</p>"},{"location":"meg/migration-to-merlin7/","title":"Meg to Merlin7 Migration Guide","text":""},{"location":"meg/migration-to-merlin7/#meg-to-merlin7-migration-guide","title":"Meg to Merlin7 Migration Guide","text":"<p>Welcome to the official documentation for migrating experiment data from MEG to Merlin7. Please follow the instructions carefully to ensure a smooth and secure transition.</p>"},{"location":"meg/migration-to-merlin7/#directory-structure-changes","title":"Directory Structure Changes","text":""},{"location":"meg/migration-to-merlin7/#meg-vs-merlin6-vs-merlin7","title":"Meg vs Merlin6 vs Merlin7","text":"Cluster Home Directory User Data Directory Experiment data Additional notes merlin6 /psi/home/<code>$USER</code> /data/user/<code>$USER</code> /data/experiments/meg Symlink /meg meg /meg/home/<code>$USER</code> N/A /meg merlin7 /data/user/<code>$USER</code> /data/user/<code>$USER</code> /data/project/meg <ul> <li>The Merlin6 home and user data directores have been merged into the single new home directory <code>/data/user/$USER</code> on Merlin7.<ul> <li>This is the same for the home directory in the meg cluster, which has to be merged into <code>/data/user/$USER</code> on Merlin7.</li> <li>Users are responsible for moving the data.</li> </ul> </li> <li>The experiment directory has been integrated into <code>/data/project/meg</code>.</li> </ul>"},{"location":"meg/migration-to-merlin7/#recommended-cleanup-actions","title":"Recommended Cleanup Actions","text":"<ul> <li>Remove unused files and datasets.</li> <li>Archive large, inactive data sets.</li> </ul>"},{"location":"meg/migration-to-merlin7/#mandatory-actions","title":"Mandatory Actions","text":"<ul> <li>Stop activity on Meg and Merlin6 when performing the last rsync.</li> </ul>"},{"location":"meg/migration-to-merlin7/#migration-instructions","title":"Migration Instructions","text":""},{"location":"meg/migration-to-merlin7/#preparation","title":"Preparation","text":"<p>A <code>experiment_migration.setup</code> migration script must be executed from any MeG node using the account that will perform the migration.</p>"},{"location":"meg/migration-to-merlin7/#when-using-the-local-root-account","title":"When using the local <code>root</code> account","text":"<ul> <li>The script must be executed after every reboot of the destination nodes.</li> <li>Reason: On Merlin7, the home directory for the <code>root</code> user resides on ephemeral storage (no physical disk).</li> </ul> <p>After a reboot, this directory is cleaned, so SSH keys need to be redeployed before running the migration again.</p>"},{"location":"meg/migration-to-merlin7/#when-using-a-psi-active-directory-ad-account","title":"When using a PSI Active Directory (AD) account","text":"<ul> <li>Applicable accounts include, for example:<ul> <li><code>gac-meg2_data</code></li> <li><code>gac-meg2</code></li> </ul> </li> <li>The script only needs to be executed once, provided that:<ul> <li>The home directory for the AD account is located on a shared storage area.</li> <li>This shared storage is accessible from the node executing the transfer.</li> </ul> </li> <li>Reason: On Merlin7, these accounts have their home directories on persistent shared storage, so the SSH keys remain available across reboots.</li> </ul> <p>To run it:</p> Bash<pre><code>experiment_migration.setup\n</code></pre> <p>This script will:</p> <ul> <li>Check that you have an account on Merlin7.</li> <li>Configure and check that your environment is ready for transferring files via Slurm job.</li> </ul> <p>If there are issues, the script will:</p> <ul> <li>Print clear diagnostic output</li> <li>Give you some hints to resolve the issue</li> </ul> <p>If you are stuck, email: merlin-admins@lists.psi.ch/meg-admins@lists.psi.ch</p>"},{"location":"meg/migration-to-merlin7/#migration-procedure","title":"Migration Procedure","text":"<ol> <li> <p>Run an initial sync, ideally within a <code>tmux</code> session</p> <ul> <li>This copies the bulk of the data from MeG to Merlin7.</li> <li>IMPORTANT: Do not modify the destination directories</li> <li>Please, before starting the transfer ensure that:<ul> <li>The source and destination directories are correct.</li> <li>The destination directories exist.</li> </ul> </li> </ul> </li> <li> <p>Run additional syncs if needed</p> <ul> <li>Subsequent syncs can be executed to transfer changes.</li> <li>Ensure that only one sync for the same directory runs at a time.</li> <li>Multiple syncs are often required since the first one may take several hours or even days.</li> </ul> </li> <li> <p>Schedule a date for the final migration:</p> <ul> <li>Any activity must be stopped on the source directory.</li> <li>In the same way, no activity must be done on the destination until the migration is complete.</li> </ul> </li> <li> <p>Perform a final sync with the <code>-E</code> option (if it applies)</p> <ul> <li>Use <code>-E</code> only if you need to delete files on the destination that were removed from the source.</li> <li>This ensures the destination becomes an exact mirror of the source.</li> <li>Never use <code>-E</code> after the destination has gone into production, as it will delete new data created there.</li> </ul> </li> <li> <p>Disable access on the source folder.</p> </li> <li>Enable access on the destination folder.<ul> <li>At this point, no new syncs have to be performed.</li> </ul> </li> </ol> <p>Important</p> <p>The <code>-E</code> option is destructive; handle with care. Always verify that the destination is ready before triggering the final sync. For optimal performance, use up to 12 threads with the -t option.</p>"},{"location":"meg/migration-to-merlin7/#running-the-migration-script","title":"Running The Migration Script","text":"<p>The migration script is installed on the <code>meg-s-001</code> server at: <code>/usr/local/bin/experiment_migration.bash</code></p> <p>This script is primarily a wrapper around <code>fpsync</code>, providing additional logic for synchronizing MeG experiment data.</p> Bash<pre><code>[root@meg-s-001 ~]# experiment_migration.bash --help\nUsage: /usr/local/bin/experiment_migration.bash [options] -p <project_name>\n\nOptions:\n -t | --threads N Number of parallel threads (default: 10). Recommended 12 as max.\n -b | --experiment-src-basedir DIR Experiment base directory (default: /meg)\n -S | --space-source SPACE Source project space name (default: data1)\n -B | --experiment-dst-basedir DIR Experiment base directory (default: /data/project/meg)\n -D | --space-destination SPACE Destination project space name (default: data1)\n -p | --project-name PRJ_NAME Mantadory field. MeG project name. Examples:\n - 'online'\n - 'offline'\n - 'shared'\n -F | --force-destination-mkdir Create the destination parent directory (default: false)\n Example: mkdir -p $(dirname /data/project/meg/data1/PROJECT_NAME)\n Result: mkdir -p /data/project/meg/data1\n -s | --split N Number of files per split (default: 20000)\n -f | --filesize SIZE File size threshold (default: 100G)\n -r | --runid ID Reuse an existing runid session\n -l | --list-runids List available runid sessions and exit\n -x | --delete-runid Delete runid. Requires: -r | --runid ID\n -E | --rsync-delete-option [WARNING] Use this to delete files in the destination\n which are not present in the source any more.\n [WARNING] USE THIS OPTION CAREFULLY!\n Typically used in last rsync to have an exact\n mirror of the source directory.\n [WARNING] Some files in destination might be deleted!\n Use 'man fpsync' for more information.\n\n -h | --help Show this help message\n -v | --verbose Run fpsync with -v option\n</code></pre> <p>Tip</p> <p>Defaults can be updated if necessary.</p>"},{"location":"meg/migration-to-merlin7/#migration-examples","title":"Migration examples","text":""},{"location":"meg/migration-to-merlin7/#example-migrating-the-entire-online-directory","title":"Example: Migrating the Entire <code>online</code> Directory","text":"<p>The following example demonstrates how to migrate the entire <code>online</code> directory.</p> <p>Tip</p> <p>You may also choose to migrate only specific subdirectories if needed. However, migrating full directories is generally simpler and less error-prone compared to handling multiple subdirectory migrations.</p> Bash<pre><code>[root@meg-s-001 ~]# experiment_migration.bash -S data1 -D data1 -p \"online\"\n\ud83d\udd04 Transferring project:\n From: /meg/data1/online\n To: login001.merlin7.psi.ch:/data/project/meg/data1/online\n Threads: 10 | Split: 20000 files | Max size: 100G\n RunID:\n\nPlease confirm to start (y/N):\n\u274c Transfer cancelled by user.\n</code></pre>"},{"location":"meg/migration-to-merlin7/#example-migrating-a-specific-subdirectory","title":"Example: Migrating a Specific Subdirectory","text":"<p>The following example demonstrates how to migrate only a subdirectory. In this case, we use the option <code>-F</code> to create the parent directory in the destination, to ensure that this exists before transferring:</p> <p>\u26a0\ufe0f Important:</p> <ul> <li>When migrating a subdirectory, do not run concurrent migrations on its parent directories.</li> <li>For example, avoid running migrations with <code>-p \"shared\"</code> while simultaneously migrating <code>-p \"shared/subprojects\"</code>.</li> </ul> Bash<pre><code>[root@meg-s-001 ~]# experiment_migration.bash -p \"shared/subprojects/meg1\" -F\n\ud83d\udd04 Transferring project:\n From: /meg/data1/shared/subprojects/meg1\n To: login002.merlin7.psi.ch:/data/project/meg/data1/shared/subprojects/meg1\n Threads: 10 | Split: 20000 files | Max size: 100G\n RunID:\n\nPlease confirm to start (y/N): N\n\u274c Transfer cancelled by user.\n</code></pre> <p>This command initiates the migration of the directory, by creating the destination parant directory (<code>-F</code> option):</p> <ul> <li>Creates the destination directory as follows:</li> </ul> Bash<pre><code>ssh login002.merlin.psi.ch mkdir -p /data/project/meg/data1/shared/subprojects\n</code></pre> <ul> <li>Runs FPSYNC with 10 threads and N parts of max 20000 files or 100G files:<ul> <li>Source: <code>/meg/data1/shared/subprojects/meg1</code></li> <li>Destination: <code>login002.merlin7.psi.ch:/data/project/meg/data1/shared/subprojects/meg1</code></li> </ul> </li> </ul>"},{"location":"merlin5/cluster-introduction/","title":"Merlin 5 Cluster","text":""},{"location":"merlin5/cluster-introduction/#merlin-5-cluster","title":"Merlin 5 Cluster","text":""},{"location":"merlin5/cluster-introduction/#slurm-cluster","title":"Slurm cluster","text":"<p>Merlin5 was the old official PSI Local HPC cluster for development and mission-critical applications which was built in 2016-2017. It was an extension of the Merlin4 cluster and built from existing hardware due to a lack of central investment on Local HPC Resources. Merlin5 was then replaced by the Merlin6 cluster in 2019, with an important central investment of ~1,5M CHF. Merlin5 was mostly based on CPU resources, but also contained a small amount of GPU-based resources which were mostly used by the BIO experiments.</p> <p>Merlin5 has been kept as a Local HPC Slurm cluster, called <code>merlin5</code>. In that way, the old CPU computing nodes are still available as extra computation resources, and as an extension of the official production <code>merlin6</code> Slurm cluster.</p> <p>The old Merlin5 login nodes, GPU nodes and storage were fully migrated to the Merlin6 cluster, which becomes the main Local HPC Cluster. Hence, Merlin6 contains the storage which is mounted on the different Merlin HPC Slurm Clusters (<code>merlin5</code>, <code>merlin6</code>, <code>gmerlin6</code>).</p>"},{"location":"merlin5/cluster-introduction/#submitting-jobs-to-merlin5","title":"Submitting jobs to 'merlin5'","text":"<p>To submit jobs to the <code>merlin5</code> Slurm cluster, it must be done from the Merlin6 login nodes by using the option <code>--clusters=merlin5</code> on any of the Slurm commands (<code>sbatch</code>, <code>salloc</code>, <code>srun</code>, etc. commands).</p>"},{"location":"merlin5/cluster-introduction/#the-merlin-architecture","title":"The Merlin Architecture","text":""},{"location":"merlin5/cluster-introduction/#multi-non-federated-cluster-architecture-design-the-merlin-cluster","title":"Multi Non-Federated Cluster Architecture Design: The Merlin cluster","text":"<p>The following image shows the Slurm architecture design for Merlin cluster. It contains a multi non-federated cluster setup, with a central Slurm database and multiple independent clusters (<code>merlin5</code>, <code>merlin6</code>, <code>gmerlin6</code>):</p> <p></p>"},{"location":"merlin5/hardware-and-software-description/","title":"Hardware And Software Description","text":""},{"location":"merlin5/hardware-and-software-description/#hardware-and-software-description","title":"Hardware And Software Description","text":""},{"location":"merlin5/hardware-and-software-description/#hardware","title":"Hardware","text":""},{"location":"merlin5/hardware-and-software-description/#computing-nodes","title":"Computing Nodes","text":"<p>Merlin5 is built from recycled nodes, and hardware will be decomissioned as soon as it fails (due to expired warranty and age of the cluster).</p> <ul> <li>Merlin5 is based on the HPE c7000 Enclosure solution, with 16 x HPE ProLiant BL460c Gen8 nodes per chassis.</li> <li>Connectivity is based on Infiniband ConnectX-3 QDR-40Gbps<ul> <li>16 internal ports for intra chassis communication</li> <li>2 connected external ports for inter chassis communication and storage access.</li> </ul> </li> </ul> <p>The below table summarizes the hardware setup for the Merlin5 computing nodes:</p> Merlin5 CPU Computing Nodes Chassis Node Processor Sockets Cores Threads Scratch Memory #0 merlin-c-[18-30] Intel Xeon E5-2670 2 16 1 50GB 64GB merlin-c-[31,32] 128GB #1 merlin-c-[33-45] Intel Xeon E5-2670 2 16 1 50GB 64GB merlin-c-[46,47] 128GB"},{"location":"merlin5/hardware-and-software-description/#login-nodes","title":"Login Nodes","text":"<p>The login nodes are part of the Merlin6 HPC cluster, and are used to compile and to submit jobs to the different Merlin Slurm clusters (<code>merlin5</code>,<code>merlin6</code>,<code>gmerlin6</code>,etc.). Please refer to the Merlin6 Hardware Documentation for further information.</p>"},{"location":"merlin5/hardware-and-software-description/#storage","title":"Storage","text":"<p>The storage is part of the Merlin6 HPC cluster, and is mounted in all the Slurm clusters (<code>merlin5</code>,<code>merlin6</code>,<code>gmerlin6</code>,etc.). Please refer to the Merlin6 Hardware Documentation for further information.</p>"},{"location":"merlin5/hardware-and-software-description/#network","title":"Network","text":"<p>Merlin5 cluster connectivity is based on the Infiniband QDR technology. This allows fast access with very low latencies to the data as well as running extremely efficient MPI-based jobs. However, this is an old version of Infiniband which requires older drivers and software can not take advantage of the latest features.</p>"},{"location":"merlin5/hardware-and-software-description/#software","title":"Software","text":"<p>In Merlin5, we try to keep software stack coherency with the main cluster Merlin6.</p> <p>Due to this, Merlin5 runs:</p> <ul> <li>RedHat Enterprise Linux 7</li> <li>Slurm, we usually try to keep it up to date with the most recent versions.</li> <li>GPFS v5</li> <li>MLNX_OFED LTS v.4.9-2.2.4.0, which is an old version, but required because ConnectX-3 support has been dropped on newer OFED versions.</li> </ul>"},{"location":"merlin5/slurm-configuration/","title":"Slurm Configuration","text":""},{"location":"merlin5/slurm-configuration/#slurm-configuration","title":"Slurm Configuration","text":"<p>This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin5 cluster.</p> <p>The Merlin5 cluster is an old cluster with old hardware which is maintained in a best effort for increasing the CPU power of the Merlin cluster.</p>"},{"location":"merlin5/slurm-configuration/#merlin5-cpu-nodes-definition","title":"Merlin5 CPU nodes definition","text":"<p>The following table show default and maximum resources that can be used per node:</p> Nodes Def.#CPUs Max.#CPUs #Threads Max.Mem/Node Max.Swap merlin-c-[18-30] 1 core 16 cores 1 60000 10000 merlin-c-[31-32] 1 core 16 cores 1 124000 10000 merlin-c-[33-45] 1 core 16 cores 1 60000 10000 merlin-c-[46-47] 1 core 16 cores 1 124000 10000 <p>There is one main difference between the Merlin5 and Merlin6 clusters: Merlin5 is keeping an old configuration which does not consider the memory as a consumable resource. Hence, users can oversubscribe memory. This might trigger some side-effects, but this legacy configuration has been kept to ensure that old jobs can keep running in the same way they did a few years ago. If you know that this might be a problem for you, please, always use Merlin6 instead.</p>"},{"location":"merlin5/slurm-configuration/#running-jobs-in-the-merlin5-cluster","title":"Running jobs in the 'merlin5' cluster","text":"<p>In this chapter we will cover basic settings that users need to specify in order to run jobs in the Merlin5 CPU cluster.</p>"},{"location":"merlin5/slurm-configuration/#merlin5-cpu-cluster","title":"Merlin5 CPU cluster","text":"<p>To run jobs in the <code>merlin5</code> cluster users must specify the cluster name in Slurm:</p> Bash<pre><code>#SBATCH --cluster=merlin5\n</code></pre>"},{"location":"merlin5/slurm-configuration/#merlin5-cpu-partitions","title":"Merlin5 CPU partitions","text":"<p>Users might need to specify the Slurm partition. If no partition is specified, it will default to <code>merlin</code>:</p> Bash<pre><code>#SBATCH --partition=<partition_name> # Possible <partition_name> values: merlin, merlin-long:\n</code></pre> <p>The table below resumes shows all possible partitions available to users:</p> CPU Partition Default Time Max Time Max Nodes PriorityJobFactor* PriorityTier** merlin 5 days 1 week All nodes 500 1 merlin-long 5 days 21 days 4 1 1 <p>*The PriorityJobFactor value will be added to the job priority (PARTITION column in <code>sprio -l</code> ). In other words, jobs sent to higher priority partitions will usually run first (however, other factors such like job age or mainly fair share might affect to that decision). For the GPU partitions, Slurm will also attempt first to allocate jobs on partitions with higher priority over partitions with lesser priority.</p> <p>**Jobs submitted to a partition with a higher PriorityTier value will be dispatched before pending jobs in partition with lower PriorityTier value and, if possible, they will preempt running jobs from partitions with lower PriorityTier values.</p> <p>The <code>merlin-long</code> partition is limited to 4 nodes, as it might contain jobs running for up to 21 days.</p>"},{"location":"merlin5/slurm-configuration/#merlin5-cpu-accounts","title":"Merlin5 CPU Accounts","text":"<p>Users need to ensure that the public <code>merlin</code> account is specified. No specifying account options would default to this account. This is mostly needed by users which have multiple Slurm accounts, which may define by mistake a different account.</p> Bash<pre><code>#SBATCH --account=merlin # Possible values: merlin\n</code></pre>"},{"location":"merlin5/slurm-configuration/#slurm-cpu-specific-options","title":"Slurm CPU specific options","text":"<p>Some options are available when using CPUs. These are detailed here.</p> <p>Alternative Slurm options for CPU based jobs are available. Please refer to the man pages for each Slurm command for further information about it (<code>man salloc</code>, <code>man sbatch</code>, <code>man srun</code>). Below are listed the most common settings:</p> Bash<pre><code>#SBATCH --ntasks=<ntasks>\n#SBATCH --ntasks-per-core=<ntasks>\n#SBATCH --ntasks-per-socket=<ntasks>\n#SBATCH --ntasks-per-node=<ntasks>\n#SBATCH --mem=<size[units]>\n#SBATCH --mem-per-cpu=<size[units]>\n#SBATCH --cpus-per-task=<ncpus>\n#SBATCH --cpu-bind=[{quiet,verbose},]<type> # only for 'srun' command\n</code></pre> <p>Notice that in Merlin5 no hyper-threading is available (while in Merlin6 it is). Hence, in Merlin5 there is not need to specify <code>--hint</code> hyper-threading related options.</p>"},{"location":"merlin5/slurm-configuration/#user-and-job-limits","title":"User and job limits","text":"<p>In the CPU cluster we provide some limits which basically apply to jobs and users. The idea behind this is to ensure a fair usage of the resources and to avoid overabuse of the resources from a single user or job. However, applying limits might affect the overall usage efficiency of the cluster (in example, pending jobs from a single user while having many idle nodes due to low overall activity is something that can be seen when user limits are applied). In the same way, these limits can be also used to improve the efficiency of the cluster (in example, without any job size limits, a job requesting all resources from the batch system would drain the entire cluster for fitting the job, which is undesirable).</p> <p>Hence, there is a need of setting up wise limits and to ensure that there is a fair usage of the resources, by trying to optimize the overall efficiency of the cluster while allowing jobs of different nature and sizes (it is, single core based vs parallel jobs of different sizes) to run.</p> <p>In the <code>merlin5</code> cluster, as not many users are running on it, these limits are wider than the ones set in the <code>merlin6</code> and <code>gmerlin6</code> clusters.</p>"},{"location":"merlin5/slurm-configuration/#per-job-limits","title":"Per job limits","text":"<p>These are limits which apply to a single job. In other words, there is a maximum of resources a single job can use. These limits are described in the table below, with the format <code>SlurmQoS(limits)</code> (<code>SlurmQoS</code> can be listed from the <code>sacctmgr show qos</code> command):</p> Partition Mon-Sun 0h-24h Other limits merlin merlin5(cpu=384) None merlin-long merlin5(cpu=384) Max. 4 nodes <p>By default, by QoS limits, a job can not use more than 384 cores (max CPU per job). However, for the <code>merlin-long</code>, this is even more restricted: there is an extra limit of 4 dedicated nodes for this partion. This is defined at the partition level, and will overwrite any QoS limit as long as this is more restrictive.</p>"},{"location":"merlin5/slurm-configuration/#per-user-limits-for-cpu-partitions","title":"Per user limits for CPU partitions","text":"<p>No user limits apply by QoS. For the <code>merlin</code> partition, a single user could fill the whole batch system with jobs (however, the restriction is at the job size, as explained above). For the <code>merlin-limit</code> partition, the 4 node limitation still applies.</p>"},{"location":"merlin5/slurm-configuration/#advanced-slurm-configuration","title":"Advanced Slurm configuration","text":"<p>Clusters at PSI use the Slurm Workload Manager as the batch system technology for managing and scheduling jobs. Slurm has been installed in a multi-clustered configuration, allowing to integrate multiple clusters in the same batch system.</p> <p>For understanding the Slurm configuration setup in the cluster, sometimes may be useful to check the following files:</p> <ul> <li><code>/etc/slurm/slurm.conf</code> - can be found in the login nodes and computing nodes.</li> <li><code>/etc/slurm/gres.conf</code> - can be found in the GPU nodes, is also propgated to login nodes and computing nodes for user read access.</li> <li><code>/etc/slurm/cgroup.conf</code> - can be found in the computing nodes, is also propagated to login nodes for user read access.</li> </ul> <p>The previous configuration files which can be found in the login nodes, correspond exclusively to the merlin6 cluster configuration files. Configuration files for the old merlin5 cluster or for the gmerlin6 cluster must be checked directly on any of the merlin5 or gmerlin6 computing nodes (in example, by login in to one of the nodes while a job or an active allocation is running).</p>"},{"location":"merlin6/hardware-and-software-description/","title":"Hardware And Software Description","text":""},{"location":"merlin6/hardware-and-software-description/#hardware-and-software-description","title":"Hardware And Software Description","text":""},{"location":"merlin6/hardware-and-software-description/#hardware","title":"Hardware","text":""},{"location":"merlin6/hardware-and-software-description/#computing-nodes","title":"Computing Nodes","text":"<p>The new Merlin6 cluster contains a solution based on four HPE Apollo k6000 Chassis</p> <ul> <li>Three of them contain 24 x HP Apollo XL230K Gen10 blades.</li> <li>A fourth chassis was purchased on 2021 with HP Apollo XL230K Gen10 blades dedicated to few experiments. Blades have slighly different components depending on specific project requirements.</li> </ul> <p>The connectivity for the Merlin6 cluster is based on ConnectX-5 EDR-100Gbps, and each chassis contains:</p> <ul> <li>1 x HPE Apollo InfiniBand EDR 36-port Unmanaged Switch<ul> <li>24 internal EDR-100Gbps ports (1 port per blade for internal low latency connectivity)</li> <li>12 external EDR-100Gbps ports (for external for internal low latency connectivity)</li> </ul> </li> </ul> Merlin6 CPU Computing Nodes Chassis Node Processor Sockets Cores Threads Scratch Memory #0 merlin-c-0[01-24] Intel Xeon Gold 6152 2 44 2 1.2TB 384GB #1 merlin-c-1[01-24] Intel Xeon Gold 6152 2 44 2 1.2TB 384GB #2 merlin-c-2[01-24] Intel Xeon Gold 6152 2 44 2 1.2TB 384GB #3 merlin-c-3[01-12] Intel Xeon Gold 6240R 2 48 2 1.2TB 768GB merlin-c-3[03-18] 1 merlin-c-3[19-24] 2 384GB <p>Each blade contains a NVMe disk, where up to 300TB are dedicated to the O.S., and ~1.2TB are reserved for local <code>/scratch</code>.</p>"},{"location":"merlin6/hardware-and-software-description/#login-nodes","title":"Login Nodes","text":"<p>One old login node (<code>merlin-l-01.psi.ch</code>) is inherit from the previous Merlin5 cluster. Its mainly use is for running some BIO services (<code>cryosparc</code>) and for submitting jobs. Two new login nodes (<code>merlin-l-001.psi.ch</code>,<code>merlin-l-002.psi.ch</code>) with similar configuration to the Merlin6 computing nodes are available for the users. The mainly use is for compiling software and submitting jobs.</p> <p>The connectivity is based on ConnectX-5 EDR-100Gbps for the new login nodes, and ConnectIB FDR-56Gbps for the old one.</p> Merlin6 CPU Computing Nodes Hardware Node Processor Sockets Cores Threads Scratch Memory Old merlin-l-01 Intel Xeon E5-2697AV4 2 16 2 100GB 512GB New merlin-l-00[1,2] Intel Xeon Gold 6152 2 44 2 1.8TB 384GB"},{"location":"merlin6/hardware-and-software-description/#storage","title":"Storage","text":"<p>The storage node is based on the Lenovo Distributed Storage Solution for IBM Spectrum Scale.</p> <ul> <li>2 x Lenovo DSS G240 systems, each one composed by 2 IO Nodes ThinkSystem SR650 mounting 4 x Lenovo Storage D3284 High Density Expansion enclosures.</li> <li>Each IO node has a connectivity of 400Gbps (4 x EDR 100Gbps ports, 2 of them are ConnectX-5 and 2 are ConnectX-4).</li> </ul> <p>The storage solution is connected to the HPC clusters through 2 x Mellanox SB7800 InfiniBand 1U Switches for high availability and load balancing.</p>"},{"location":"merlin6/hardware-and-software-description/#network","title":"Network","text":"<p>Merlin6 cluster connectivity is based on the Infiniband technology. This allows fast access with very low latencies to the data as well as running extremely efficient MPI-based jobs:</p> <ul> <li>Connectivity amongst different computing nodes on different chassis ensures up to 1200Gbps of aggregated bandwidth.</li> <li>Inter connectivity (communication amongst computing nodes in the same chassis) ensures up to 2400Gbps of aggregated bandwidth.</li> <li>Communication to the storage ensures up to 800Gbps of aggregated bandwidth.</li> </ul> <p>Merlin6 cluster currently contains 5 Infiniband Managed switches and 3 Infiniband Unmanaged switches (one per HP Apollo chassis):</p> <ul> <li>1 x MSX6710 (FDR) for connecting old GPU nodes, old login nodes and MeG cluster to the Merlin6 cluster (and storage). No High Availability mode possible.</li> <li>2 x MSB7800 (EDR) for connecting Login Nodes, Storage and other nodes in High Availability mode.</li> <li>3 x HP EDR Unmanaged switches, each one embedded to each HP Apollo k6000 chassis solution.</li> <li>2 x MSB7700 (EDR) are the top switches, interconnecting the Apollo unmanaged switches and the managed switches (MSX6710, MSB7800).</li> </ul>"},{"location":"merlin6/hardware-and-software-description/#software","title":"Software","text":"<p>In Merlin6, we try to keep the latest software stack release to get the latest features and improvements. Due to this, Merlin6 runs:</p> <ul> <li>RedHat Enterprise Linux 7</li> <li>Slurm, we usually try to keep it up to date with the most recent versions.</li> <li>GPFS v5</li> <li>MLNX_OFED LTS v.5.2-2.2.0.0 or newer for all ConnectX-5 or superior cards.<ul> <li>MLNX_OFED LTS v.4.9-2.2.4.0 is installed for remaining ConnectX-3 and ConnectIB cards.</li> </ul> </li> </ul>"},{"location":"merlin6/introduction/","title":"About Merlin6","text":""},{"location":"merlin6/introduction/#about-merlin6","title":"About Merlin6","text":"<p>Merlin6 availability</p> <p>Merlin6 is closed to new users.</p> <p>Only a reduced footprint of Merlin6 is still maintained, and exclusively for a small number of stakeholders who own dedicated nodes and cannot yet fully run their workloads on the new cluster.</p> <p>New users must request access to Merlin7 instead.</p>"},{"location":"merlin6/slurm-configuration/","title":"Slurm Configuration","text":""},{"location":"merlin6/slurm-configuration/#slurm-configuration","title":"Slurm Configuration","text":"<p>This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin6 CPU cluster.</p>"},{"location":"merlin6/slurm-configuration/#merlin6-cpu-nodes-definition","title":"Merlin6 CPU nodes definition","text":"<p>The following table show default and maximum resources that can be used per node:</p> Nodes Def.#CPUs Max.#CPUs #Threads Max.Mem/CPU Max.Mem/Node Max.Swap Def.#GPUs Max.#GPUs merlin-c-[301-312] 1 core 44 cores 2 748800 748800 10000 N/A N/A merlin-c-[313-318] 1 core 44 cores 1 748800 748800 10000 N/A N/A merlin-c-[319-324] 1 core 44 cores 2 748800 748800 10000 N/A N/A <p>If nothing is specified, by default each core will use up to 8GB of memory. Memory can be increased with the <code>--mem=<mem_in_MB></code> and <code>--mem-per-cpu=<mem_in_MB></code> options, and maximum memory allowed is <code>Max.Mem/Node</code>.</p> <p>In <code>merlin6</code>, Memory is considered a Consumable Resource, as well as the CPU. Hence, both resources will account when submitting a job, and by default resources can not be oversubscribed. </p> <p>and memory was by default oversubscribed.</p> <p>Check Configuration</p> <p>Always check <code>/etc/slurm/slurm.conf</code> for changes in the hardware.</p>"},{"location":"merlin6/slurm-configuration/#merlin6-cpu-cluster","title":"Merlin6 CPU cluster","text":"<p>To run jobs in the <code>merlin6</code> cluster users can optionally specify the cluster name in Slurm:</p> Bash<pre><code>#SBATCH --cluster=merlin6\n</code></pre> <p>If no cluster name is specified, by default any job will be submitted to this cluster (as this is the main cluster). Hence, this would be only necessary if one has to deal with multiple clusters or when one has defined some environmental variables which can modify the cluster name.</p>"},{"location":"merlin6/slurm-configuration/#merlin6-cpu-partitions","title":"Merlin6 CPU partitions","text":"<p>Users might need to specify the Slurm partition. If no partition is specified, it will default to <code>general</code>:</p> Bash<pre><code>#SBATCH --partition=<partition_name> # Possible <partition_name> values: general, daily, hourly\n</code></pre> <p>The following partitions (also known as queues) are configured in Slurm:</p> CPU Partition Default Time Max Time Max Nodes PriorityJobFactor* PriorityTier** DefMemPerCPU general 1 day 1 week 50 1 1 4000 daily 1 day 1 day 67 500 1 4000 hourly 1 hour 1 hour unlimited 1000 1 4000 asa-general 1 hour 2 weeks unlimited 1 2 3712 asa-daily 1 hour 1 week unlimited 500 2 3712 asa-visas 1 hour 90 days unlimited 1000 4 3712 asa-ansys 1 hour 90 days unlimited 1000 4 15600 mu3e 1 day 7 days unlimited 1000 4 3712 <p>The PriorityJobFactor value will be added to the job priority (PARTITION column in <code>sprio -l</code> ). In other words, jobs sent to higher priority partitions will usually run first (however, other factors such like job age or mainly fair share might affect to that decision). For the GPU partitions, Slurm will also attempt first to allocate jobs on partitions with higher priority over partitions with lesser priority.</p> <p>Jobs submitted to a partition with a higher PriorityTier value will be dispatched before pending jobs in partition with lower PriorityTier value and, if possible, they will preempt running jobs from partitions with lower PriorityTier values.</p> <ul> <li>The <code>general</code> partition is the default. It can not have more than 50 nodes running jobs.</li> <li>For <code>daily</code> this limitation is extended to 67 nodes.</li> <li>For <code>hourly</code> there are no limits.</li> <li><code>asa-general</code>,<code>asa-daily</code>,<code>asa-ansys</code>,<code>asa-visas</code> and <code>mu3e</code> are private partitions, belonging to different experiments owning the machines. Access is restricted in all cases. However, by agreement with the experiments, nodes are usually added to the <code>hourly</code> partition as extra resources for the public resources.</li> </ul> <p>Partition Selection</p> <p>Jobs which would run for less than one day should be always sent to daily, while jobs that would run for less than one hour should be sent to hourly. This would ensure that you have highest priority over jobs sent to partitions with less priority, but also because general has limited the number of nodes that can be used for that. The idea behind that, is that the cluster can not be blocked by long jobs and we can always ensure resources for shorter jobs.</p>"},{"location":"merlin6/slurm-configuration/#merlin5-cpu-accounts","title":"Merlin5 CPU Accounts","text":"<p>Users need to ensure that the public <code>merlin</code> account is specified. No specifying account options would default to this account.</p> <p>This is mostly needed by users which have multiple Slurm accounts, which may define by mistake a different account.</p> Bash<pre><code>#SBATCH --account=merlin # Possible values: merlin, gfa-asa\n</code></pre> <p>Not all the accounts can be used on all partitions. This is resumed in the table below:</p> Slurm Account Slurm Partitions merlin <code>hourly</code>,<code>daily</code>, <code>general</code> gfa-asa <code>asa-general</code>,<code>asa-daily</code>,<code>asa-visas</code>,<code>asa-ansys</code>,<code>hourly</code>,<code>daily</code>, <code>general</code> mu3e <code>mu3e</code>"},{"location":"merlin6/slurm-configuration/#private-accounts","title":"Private accounts","text":"<ul> <li>The <code>gfa-asa</code> and <code>mu3e</code> accounts are private accounts. These can be used for accessing dedicated partitions with nodes owned by different groups.</li> </ul>"},{"location":"merlin6/slurm-configuration/#slurm-cpu-specific-options","title":"Slurm CPU specific options","text":"<p>Some options are available when using CPUs. These are detailed here. Alternative Slurm options for CPU based jobs are available. Please refer to the man pages for each Slurm command for further information about it (<code>man salloc</code>, <code>man sbatch</code>, <code>man srun</code>). Below are listed the most common settings:</p> Bash<pre><code>#SBATCH --hint=[no]multithread\n#SBATCH --ntasks=<ntasks>\n#SBATCH --ntasks-per-core=<ntasks>\n#SBATCH --ntasks-per-socket=<ntasks>\n#SBATCH --ntasks-per-node=<ntasks>\n#SBATCH --mem=<size[units]>\n#SBATCH --mem-per-cpu=<size[units]>\n#SBATCH --cpus-per-task=<ncpus>\n#SBATCH --cpu-bind=[{quiet,verbose},]<type> # only for 'srun' command\n</code></pre>"},{"location":"merlin6/slurm-configuration/#enablingdisabling-hyper-threading","title":"Enabling/Disabling Hyper-Threading","text":"<p>The <code>merlin6</code> cluster contains nodes with Hyper-Threading enabled. One should always specify whether to use Hyper-Threading or not. If not defined, Slurm will generally use it (exceptions apply).</p> Bash<pre><code>#SBATCH --hint=multithread # Use extra threads with in-core multi-threading.\n#SBATCH --hint=nomultithread # Don't use extra threads with in-core multi-threading.\n</code></pre>"},{"location":"merlin6/slurm-configuration/#constraint-features","title":"Constraint / Features","text":"<p>Slurm allows to define a set of features in the node definition. This can be used to filter and select nodes according to one or more specific features. For the CPU nodes, we have the following features:</p> Text Only<pre><code>NodeName=merlin-c-[001-024,101-124,201-224] Features=mem_384gb,xeon-gold-6152\nNodeName=merlin-c-[301-312] Features=mem_768gb,xeon-gold-6240r\nNodeName=merlin-c-[313-318] Features=mem_768gb,xeon-gold-6240r\nNodeName=merlin-c-[319-324] Features=mem_384gb,xeon-gold-6240r\n</code></pre> <p>Therefore, users running on <code>hourly</code> can select which node they want to use (fat memory nodes vs regular memory nodes, CPU type). This is possible by using the option <code>--constraint=<feature_name></code> in Slurm.</p> <p>Examples:</p> <ol> <li> <p>Select nodes with 48 cores only (nodes with 2 x Xeon Gold 6240R):</p> Bash<pre><code>sbatch --constraint=xeon-gold-6240r ...\n</code></pre> </li> <li> <p>Select nodes with 44 cores only (nodes with 2 x Xeon Gold 6152):</p> Bash<pre><code>sbatch --constraint=xeon-gold-6152 ...\n</code></pre> </li> <li> <p>Select fat memory nodes only:</p> Bash<pre><code>sbatch --constraint=mem_768gb ...\n</code></pre> </li> <li> <p>Select regular memory nodes only:</p> Bash<pre><code>sbatch --constraint=mem_384gb ...\n</code></pre> </li> <li> <p>Select fat memory nodes with 48 cores only:</p> Bash<pre><code>sbatch --constraint=mem_768gb,xeon-gold-6240r ...\n</code></pre> </li> </ol> <p>Detailing exactly which type of nodes you want to use is important, therefore, for groups with private accounts (<code>mu3e</code>,<code>gfa-asa</code>) or for public users running on the <code>hourly</code> partition, constraining nodes by features is recommended. This becomes even more important when having heterogeneous clusters.</p>"},{"location":"merlin6/slurm-configuration/#running-jobs-in-the-merlin6-cluster","title":"Running jobs in the 'merlin6' cluster","text":"<p>In this chapter we will cover basic settings that users need to specify in order to run jobs in the Merlin6 CPU cluster.</p>"},{"location":"merlin6/slurm-configuration/#user-and-job-limits","title":"User and job limits","text":"<p>In the CPU cluster we provide some limits which basically apply to jobs and users. The idea behind this is to ensure a fair usage of the resources and to avoid overabuse of the resources from a single user or job. However, applying limits might affect the overall usage efficiency of the cluster (in example, pending jobs from a single user while having many idle nodes due to low overall activity is something that can be seen when user limits are applied). In the same way, these limits can be also used to improve the efficiency of the cluster (in example, without any job size limits, a job requesting all resources from the batch system would drain the entire cluster for fitting the job, which is undesirable).</p> <p>Hence, there is a need of setting up wise limits and to ensure that there is a fair usage of the resources, by trying to optimize the overall efficiency of the cluster while allowing jobs of different nature and sizes (it is, single core based vs parallel jobs of different sizes) to run.</p> <p>Resource Limits</p> <p>Wide limits are provided in the daily and hourly partitions, while for general those limits are more restrictive. However, we kindly ask users to inform the Merlin administrators when there are plans to send big jobs which would require a massive draining of nodes for allocating such jobs. This would apply to jobs requiring the unlimited QoS (see below \"Per job limits\").</p> <p>Custom Requirements</p> <p>If you have different requirements, please let us know, we will try to accommodate or propose a solution for you.</p>"},{"location":"merlin6/slurm-configuration/#per-job-limits","title":"Per job limits","text":"<p>These are limits which apply to a single job. In other words, there is a maximum of resources a single job can use. Limits are described in the table below with the format: <code>SlurmQoS(limits)</code> (possible <code>SlurmQoS</code> values can be listed with the command <code>sacctmgr show qos</code>). Some limits will vary depending on the day and time of the week.</p> Partition Mon-Fri 0h-18h Sun-Thu 18h-0h From Fri 18h to Mon 0h general normal(cpu=704,mem=2750G) normal(cpu=704,mem=2750G) normal(cpu=704,mem=2750G) daily daytime(cpu=704,mem=2750G) nighttime(cpu=1408,mem=5500G) unlimited(cpu=2200,mem=8593.75G) hourly unlimited(cpu=2200,mem=8593.75G) unlimited(cpu=2200,mem=8593.75G) unlimited(cpu=2200,mem=8593.75G) <p>By default, a job can not use more than 704 cores (max CPU per job). In the same way, memory is also proportionally limited. This is equivalent as running a job using up to 8 nodes at once. This limit applies to the general partition (fixed limit) and to the daily partition (only during working hours).</p> <p>Limits are softed for the daily partition during non working hours, and during the weekend limits are even wider. For the hourly partition, despite running many parallel jobs is something not desirable (for allocating such jobs it requires massive draining of nodes), wider limits are provided. In order to avoid massive nodes drain in the cluster, for allocating huge jobs, setting per job limits is necessary. Hence, unlimited QoS mostly refers to \"per user\" limits more than to \"per job\" limits (in other words, users can run any number of hourly jobs, but the job size for such jobs is limited with wide values).</p>"},{"location":"merlin6/slurm-configuration/#per-user-limits-for-cpu-partitions","title":"Per user limits for CPU partitions","text":"<p>These limits which apply exclusively to users. In other words, there is a maximum of resources a single user can use. Limits are described in the table below with the format: <code>SlurmQoS(limits)</code> (possible <code>SlurmQoS</code> values can be listed with the command <code>sacctmgr show qos</code>). Some limits will vary depending on the day and time of the week.</p> Partition Mon-Fri 0h-18h Sun-Thu 18h-0h From Fri 18h to Mon 0h general normal(cpu=704,mem=2750G) normal(cpu=704,mem=2750G) normal(cpu=704,mem=2750G) daily daytime(cpu=1408,mem=5500G) nighttime(cpu=2112,mem=8250G) unlimited(cpu=6336,mem=24750G) hourly unlimited(cpu=6336,mem=24750G) unlimited(cpu=6336,mem=24750G) unlimited(cpu=6336,mem=24750G) <p>By default, users can not use more than 704 cores at the same time (max CPU per user). Memory is also proportionally limited in the same way. This is equivalent to 8 exclusive nodes. This limit applies to the general partition (fixed limit) and to the daily partition (only during working hours).</p> <p>For the hourly partition, there are no limits restriction and user limits are removed. Limits are softed for the daily partition during non working hours, and during the weekend limits are removed.</p>"},{"location":"merlin6/slurm-configuration/#advanced-slurm-configuration","title":"Advanced Slurm configuration","text":"<p>Clusters at PSI use the Slurm Workload Manager as the batch system technology for managing and scheduling jobs. Slurm has been installed in a multi-clustered configuration, allowing to integrate multiple clusters in the same batch system.</p> <p>For understanding the Slurm configuration setup in the cluster, sometimes may be useful to check the following files:</p> <ul> <li><code>/etc/slurm/slurm.conf</code> - can be found in the login nodes and computing nodes.</li> <li><code>/etc/slurm/gres.conf</code> - can be found in the GPU nodes, is also propgated to login nodes and computing nodes for user read access.</li> <li><code>/etc/slurm/cgroup.conf</code> - can be found in the computing nodes, is also propagated to login nodes for user read access.</li> </ul> <p>The previous configuration files which can be found in the login nodes, correspond exclusively to the merlin6 cluster configuration files.</p> <p>Configuration files for the gmerlin6 cluster must be checked directly on the gmerlin6 computing nodes (in example, by login in to one of the nodes while a job or an active allocation is running).</p>"},{"location":"merlin6/98-announcements/downtimes/","title":"Downtimes","text":""},{"location":"merlin6/98-announcements/downtimes/#downtimes","title":"Downtimes","text":"<p>On the first Monday of each month the Merlin6 cluster might be subject to interruption due to maintenance. Users will be informed with at least one week in advance when a downtime is scheduled for the next month.</p> <p>Downtimes will be informed to users through the merlin-users@lists.psi.ch mail list. Also, a detailed description for the nexts scheduled interventions will be available in Next Scheduled Downtimes).</p>"},{"location":"merlin6/98-announcements/downtimes/#scheduled-downtime-draining-policy","title":"Scheduled Downtime Draining Policy","text":"<p>Scheduled downtimes mostly affecting the storage and Slurm configurantions may require draining the nodes. When this is required, users will be informed accordingly. Two different types of draining are possible:</p> <ul> <li>soft drain: new jobs may be queued on the partition, but queued jobs may not be allocated nodes and run from the partition.</li> </ul> <p>Jobs already running on the partition continue to run. This will be the default drain method.</p> <ul> <li>hard drain: no new jobs may be queued on the partition (job submission requests will be denied with an error message), but jobs already queued on the partition may be allocated to nodes and run.</li> </ul> <p>Unless explicitly specified, the default draining policy for each partition will be the following:</p> <ul> <li>The daily and general partitions will be soft drained 12h before the downtime.</li> <li>The hourly partition will be soft drained 1 hour before the downtime.</li> <li>The gpu and gpu-short partitions will be soft drained 1 hour before the downtime.</li> </ul> <p>Finally, remaining running jobs will be killed by default when the downtime starts. In some specific rare cases jobs will be just paused and resumed back when the downtime finished.</p>"},{"location":"merlin6/98-announcements/downtimes/#draining-policy-summary","title":"Draining Policy Summary","text":"<p>The following table contains a summary of the draining policies during a Schedule Downtime:</p> Partition Drain Policy Default Drain Type Default Job Policy general 12h before the SD soft drain Kill running jobs when SD starts daily 12h before the SD soft drain Kill running jobs when SD starts hourly 1h before the SD soft drain Kill running jobs when SD starts gpu 1h before the SD soft drain Kill running jobs when SD starts gpu-short 1h before the SD soft drain Kill running jobs when SD starts gfa-asa 1h before the SD soft drain Kill running jobs when SD starts"},{"location":"merlin6/98-announcements/downtimes/#next-scheduled-downtimes","title":"Next Scheduled Downtimes","text":"<p>The table below shows a description for the next Scheduled Downtime:</p> From To Service Description 05.09.2020 8am 05.09.2020 6pm <ul> <li>Note: An e-mail will be sent when the services are fully available.</li> </ul>"},{"location":"merlin6/98-announcements/past-downtimes/","title":"Past Downtimes","text":""},{"location":"merlin6/98-announcements/past-downtimes/#past-downtimes","title":"Past Downtimes","text":""},{"location":"merlin6/98-announcements/past-downtimes/#past-downtimes-log-changes","title":"Past Downtimes: Log Changes","text":""},{"location":"merlin6/98-announcements/past-downtimes/#2020","title":"2020","text":"From To Service Clusters Description Exceptions 03.08.2020 8am 03.08.2020 6pm Archive merlin6 Replace old merlin-export-01 for merlin-export-02 03.08.2020 8am 03.08.2020 6pm RemoteAccess merlin6 ra-merlin-0[1,2] Remount merlin-export-02 06.07.2020 06.07.2020 All services merlin5,merlin6 GPFS v5.0.4-4,OFED v5.0,YFS v0.195,RHEL7.7,Slurm v19.05.7,f/w 04.05.2020 04.05.2020 Login nodes merlin6 Outage. YFS (AFS) update v0.194 and reboot 04.05.2020 04.05.2020 CN merlin5 Outage. O.S. update, OFED drivers update, YFS (AFS) update. 03.02.2020 9am 03.02.2020 10am Slurm merlin5,merlin6 Upgrading config HPCLOCAL-321 10.01.2020 9am 10.01.2020 6pm All Services merlin5,merlin6 Slurm v18->v19, IB Connected Mode, other. HPCLOCAL-300"},{"location":"merlin6/98-announcements/past-downtimes/#older-downtimes","title":"Older downtimes","text":"From To Service Clusters Description Exceptions 02.09.2019 02.09.2019 GPFS merlin5,merlin6 v5.0.2-3 -> v5.0.3-2 02.09.2019 02.09.2019 O.S. merlin5 RHEL7.4 (rhel-7.4) -> RHEL7.6 (prod-00048) merlin-g-40, still running RHEL7.4* 02.09.2019 02.09.2019 O.S. merlin6 RHEL7.6 (prod-00030) -> RHEL7.6 (prod-00048) 02.09.2019 02.09.2019 Infiniband merlin5 OFED v4.4 -> v4.6 merlin-g-40, still running OFED v4.4* 02.09.2019 02.09.2019 Infiniband merlin6 OFED v4.5 -> v4.6 02.09.2019 02.09.2019 PModules merlin5,merlin6 PModules v1.0.0rc4 -> v1.0.0rc5 02.09.2019 02.09.2019 AFS(YFS) merlin5 OpenAFS v1.6.22.2-236 -> YFS v188 merlin-g-40, still running OpenAFS* 02.09.2019 02.09.2019 AFS(YFS) merlin6 YFS v186 -> YFS v188 02.09.2019 02.09.2019 O.S. merlin5 RHEL7.4 -> RHEL7.6 (prod-00048) 02.09.2019 02.09.2019 Slurm merlin5,merlin6 Slurm v18.08.6 -> v18.08.8"},{"location":"merlin6/99-support/migration-from-merlin5/","title":"Migration From Merlin5","text":""},{"location":"merlin6/99-support/migration-from-merlin5/#migration-from-merlin5","title":"Migration From Merlin5","text":""},{"location":"merlin6/99-support/migration-from-merlin5/#directories","title":"Directories","text":""},{"location":"merlin6/99-support/migration-from-merlin5/#merlin5-vs-merlin6","title":"Merlin5 vs Merlin6","text":"Cluster Home Directory User Home Directory Group Home Directory merlin5 /gpfs/home/$username /gpfs/data/$username /gpfs/group/$laboratory merlin6 /psi/home/$username /data/user/$username /data/project/[general|bio]/$projectname"},{"location":"merlin6/99-support/migration-from-merlin5/#quota-limits-in-merlin6","title":"Quota limits in Merlin6","text":"Directory Quota_Type [Soft:Hard] (Block) Quota_Type [Soft:Hard] (Files) Quota Change Policy: Block Quota Change Policy: Files /psi/home/$username USR [10GB:11GB] Undef Up to x2 when strictly justified. N/A /data/user/$username USR [1TB:1.074TB] USR [1M:1.1M] Inmutable. Need a project. Changeable when justified. /data/project/bio/$projectname GRP+Fileset [1TB:1.074TB] GRP+Fileset [1M:1.1M] Changeable according to project requirements. Changeable according to project requirements. /data/project/general/$projectname GRP+Fileset [1TB:1.074TB] GRP+Fileset [1M:1.1M] Changeable according to project requirements. Changeable according to project requirements. <p>where: * Block is capacity size in GB and TB * Files is number of files + directories in Millions (M) * Quota types are the following: * USR: Quota is setup individually per user name * GRP: Quota is setup individually per Unix Group name * Fileset: Quota is setup per project root directory.</p> <ul> <li>User data directory <code>/data/user</code> has a strict user block quota limit policy. If more disk space is required, 'project' must be created.</li> <li>Soft quotas can be exceeded for short periods of time. Hard quotas cannot be exceeded.</li> </ul>"},{"location":"merlin6/99-support/migration-from-merlin5/#project-directory","title":"Project directory","text":""},{"location":"merlin6/99-support/migration-from-merlin5/#why-is-project-needed","title":"Why is 'project' needed?","text":"<p>Merlin6 introduces the concept of a project directory. These are the recommended location for all scientific data.</p> <ul> <li><code>/data/user</code> is not suitable for sharing data between users</li> <li>The Merlin5 group directories were a similar concept, but the association with a single organizational group made interdepartmental sharing difficult. Projects can be shared by any PSI user.</li> <li>Projects are shared by multiple users (at a minimum they should be shared with the supervisor/PI). This decreases the chance of data being orphaned by personnel changes.</li> <li>Shared projects are preferable to individual data for transparency and accountability in event of future questions regarding the data.</li> <li>One project member is designated as responsible. Responsibility can be transferred if needed.</li> </ul>"},{"location":"merlin6/99-support/migration-from-merlin5/#requesting-a-project","title":"Requesting a project","text":"<p>Refer to Requesting a project</p>"},{"location":"merlin6/99-support/migration-from-merlin5/#migration-schedule","title":"Migration Schedule","text":""},{"location":"merlin6/99-support/migration-from-merlin5/#phase-1-june-pre-migration","title":"Phase 1 [June]: Pre-migration","text":"<ul> <li>Users keep working on Merlin5</li> <li>Merlin5 production directories: <code>'/gpfs/home/'</code>, <code>'/gpfs/data'</code>, <code>'/gpfs/group'</code></li> <li>Users may raise any problems (quota limits, unaccessible files, etc.) to merlin-admins@lists.psi.ch</li> <li>Users can start migrating data (see Migration steps)</li> <li>Users should copy their data from Merlin5 <code>/gpfs/data</code> to Merlin6 <code>/data/user</code></li> <li>Users should copy their home from Merlin5 <code>/gpfs/home</code> to Merlin6 <code>/psi/home</code></li> <li>Users should inform when migration is done, and which directories were migrated. Deletion for such directories can be requested by admins.</li> </ul>"},{"location":"merlin6/99-support/migration-from-merlin5/#phase-2-july-october-migration-to-merlin6","title":"Phase 2 [July-October]: Migration to Merlin6","text":"<ul> <li>Merlin6 becomes official cluster, and directories are switched to the new structure:</li> <li>Merlin6 production directories: <code>'/psi/home/'</code>, <code>'/data/user'</code>, <code>'/data/project'</code></li> <li>Merlin5 directories available in RW in login nodes: <code>'/gpfs/home/'</code>, <code>'/gpfs/data'</code>, <code>'/gpfs/group'</code><ul> <li>In Merlin5 computing nodes, Merlin5 directories are mounted in RW: <code>'/gpfs/home/'</code>, <code>'/gpfs/data'</code>, <code>'/gpfs/group'</code></li> <li>In Merlin5 computing nodes, Merlin6 directories are mounted in RW: <code>'/psi/home/'</code>, <code>'/data/user'</code>, <code>'/data/project'</code></li> </ul> </li> <li>Users must migrate their data (see Migration steps)</li> <li>ALL data must be migrated</li> <li>Job submissions by default to Merlin6. Submission to Merlin5 computing nodes possible.</li> <li>Users should inform when migration is done, and which directories were migrated. Deletion for such directories can be requested by admins.</li> </ul>"},{"location":"merlin6/99-support/migration-from-merlin5/#phase-3-november-merlin5-decomission","title":"Phase 3 [November]: Merlin5 Decomission","text":"<ul> <li>Old Merlin5 storage unmounted.</li> <li>Migrated directories reported by users will be deleted.</li> <li>Remaining Merlin5 data will be archived.</li> </ul>"},{"location":"merlin6/99-support/migration-from-merlin5/#migration-steps","title":"Migration steps","text":""},{"location":"merlin6/99-support/migration-from-merlin5/#cleanup-archive-files","title":"Cleanup / Archive files","text":"<ul> <li>Users must cleanup and/or archive files, according to the quota limits for the target storage.</li> <li>If extra space is needed, we advise users to request a project</li> <li>If you need a larger quota in respect to the maximal allowed number of files, you can request an increase of your user quota.</li> </ul>"},{"location":"merlin6/99-support/migration-from-merlin5/#file-list","title":"File list","text":""},{"location":"merlin6/99-support/migration-from-merlin5/#step-1-migrating","title":"Step 1: Migrating","text":"<p>First migration:</p> Bash<pre><code>rsync -avAHXS <source_merlin5> <destination_merlin6>\nrsync -avAHXS /gpfs/data/$username/* /data/user/$username\n</code></pre> <p>This can take several hours or days: * You can try to parallelize multiple rsync commands in sub-directories for increasing transfer rate. * Please do not parallelize many concurrent directories. Let's say, don't add more than 10 together. * We may have other users doing the same and it could cause storage / UI performance problems in the Merlin5 cluster.</p>"},{"location":"merlin6/99-support/migration-from-merlin5/#step-2-mirroring","title":"Step 2: Mirroring","text":"<p>Once first migration is done, a second <code>rsync</code> should be ran. This is done with <code>--delete</code>. With this option <code>rsync</code> will behave in a way where it will delete from the destination all files that were removed in the source, but also will propagate new files from the source to the destination.</p> Bash<pre><code>rsync -avAHXS --delete <source_merlin5> <destination_merlin6>\nrsync -avAHXS --delete /gpfs/data/$username/* /data/user/$username\n</code></pre>"},{"location":"merlin6/99-support/migration-from-merlin5/#step-3-removing-archiving-old-data","title":"Step 3: Removing / Archiving old data","text":""},{"location":"merlin6/99-support/migration-from-merlin5/#removing-migrated-data","title":"Removing migrated data","text":"<p>Once you ensure that everything is migrated to the new storage, data is ready to be deleted from the old storage. Users must report when migration is finished and report which directories are affected and ready to be removed.</p> <p>Merlin administrators will remove the directories, always asking for a last confirmation.</p>"},{"location":"merlin6/99-support/migration-from-merlin5/#archiving-data","title":"Archiving data","text":"<p>Once all migrated data has been removed from the old storage, missing data will be archived.</p>"},{"location":"merlin6/how-to-use-merlin/archive/","title":"Archive & PSI Data Catalog","text":""},{"location":"merlin6/how-to-use-merlin/archive/#archive-psi-data-catalog","title":"Archive & PSI Data Catalog","text":""},{"location":"merlin6/how-to-use-merlin/archive/#psi-data-catalog-as-a-psi-central-service","title":"PSI Data Catalog as a PSI Central Service","text":"<p>PSI provides access to the Data Catalog for long-term data storage and retrieval. Data is stored on the PetaByte Archive at the Swiss National Supercomputing Centre (CSCS).</p> <p>The Data Catalog and Archive is suitable for:</p> <ul> <li>Raw data generated by PSI instruments</li> <li>Derived data produced by processing some inputs</li> <li>Data required to reproduce PSI research and publications</li> </ul> <p>The Data Catalog is part of PSI's effort to conform to the FAIR principles for data management. In accordance with this policy, data will be publicly released under CC-BY-SA 4.0 after an embargo period expires.</p> <p>The Merlin cluster is connected to the Data Catalog. Hence, users archive data stored in the Merlin storage under the <code>/data</code> directories (currentlyi, <code>/data/user</code> and <code>/data/project</code>). Archiving from other directories is also possible, however the process is much slower as data can not be directly retrieved by the PSI archive central servers (central mode), and needs to be indirectly copied to these (decentral mode).</p> <p>Archiving can be done from any node accessible by the users (usually from the login nodes).</p> <p>Tip</p> <p>Archiving can be done in two different ways:</p> <p>'Central mode': Possible for the user and project data directories, is the fastest way as it does not require remote copy (data is directly retreived by central AIT servers from Merlin through 'merlin-archive.psi.ch').</p> <p>'Decentral mode': Possible for any directory, is the slowest way of archiving as it requires to copy ('rsync') the data from Merlin to the central AIT servers.</p>"},{"location":"merlin6/how-to-use-merlin/archive/#procedure","title":"Procedure","text":""},{"location":"merlin6/how-to-use-merlin/archive/#overview","title":"Overview","text":"<p>Below are the main steps for using the Data Catalog.</p> <ul> <li>Ingest the dataset into the Data Catalog. This makes the data known to the Data Catalog system at PSI:<ul> <li>Prepare a metadata file describing the dataset</li> <li>Run <code>datasetIngestor</code> script</li> <li>If necessary, the script will copy the data to the PSI archive servers</li> <li>Usually this is necessary when archiving from directories other than <code>/data/user</code> or <code>/data/project</code>. It would be also necessary when the Merlin export server (<code>merlin-archive.psi.ch</code>) is down for any reason.</li> </ul> </li> <li> <p>Archive the dataset:</p> <ul> <li>Visit <https://discovery.psi.ch</li> <li>Click <code>Archive</code> for the dataset</li> <li>The system will now copy the data to the PetaByte Archive at CSCS</li> </ul> </li> <li> <p>Retrieve data from the catalog:</p> <ul> <li>Find the dataset on <https://discovery.psi.ch and click <code>Retrieve</code></li> <li>Wait for the data to be copied to the PSI retrieval system</li> <li>Run <code>datasetRetriever</code> script</li> </ul> </li> </ul> <p>Since large data sets may take a lot of time to transfer, some steps are designed to happen in the background. The discovery website can be used to track the progress of each step.</p>"},{"location":"merlin6/how-to-use-merlin/archive/#account-registration","title":"Account Registration","text":"<p>Two types of account permit access to the Data Catalog. If your data was collected at a beamline, you may have been assigned a <code>p-group</code> (e.g. <code>p12345</code>) for the experiment. Other users are assigned <code>a-group</code> (e.g. <code>a-12345</code>).</p> <p>Groups are usually assigned to a PI, and then individual user accounts are added to the group. This must be done under user request through PSI Service Now. For existing a-groups and p-groups, you can follow the standard central procedures. Alternatively, if you do not know how to do that, follow the Merlin6 Requesting extra Unix groups procedure, or open a PSI Service Now ticket.</p>"},{"location":"merlin6/how-to-use-merlin/archive/#documentation","title":"Documentation","text":"<p>Accessing the Data Catalog is done through the SciCat software. Documentation is here: ingestManual.</p>"},{"location":"merlin6/how-to-use-merlin/archive/#loading-datacatalog-tools","title":"Loading datacatalog tools","text":"<p>The latest datacatalog software is maintained in the PSI module system. To access it from the Merlin systems, run the following command:</p> Bash<pre><code>module load datacatalog\n</code></pre> <p>It can be done from any host in the Merlin cluster accessible by users. Usually, login nodes will be the nodes used for archiving.</p>"},{"location":"merlin6/how-to-use-merlin/archive/#finding-your-token","title":"Finding your token","text":"<p>As of 2022-04-14 a secure token is required to interact with the data catalog. This is a long random string that replaces the previous user/password authentication (allowing access for non-PSI use cases). This string should be treated like a password and not shared.</p> <ol> <li>Go to discovery.psi.ch</li> <li>Click 'Sign in' in the top right corner. Click the 'Login with PSI account' and log in on the PSI login1. page.</li> <li>You should be redirected to your user settings and see a 'User Information' section. If not, click on1. your username in the top right and choose 'Settings' from the menu.</li> <li>Look for the field 'Catamel Token'. This should be a 64-character string. Click the icon to copy the1. token.</li> </ol> <p></p> <p>You will need to save this token for later steps. To avoid including it in all the commands, I suggest saving it to an environmental variable (Linux):</p> Bash<pre><code>SCICAT_TOKEN=RqYMZcqpqMJqluplbNYXLeSyJISLXfnkwlfBKuvTSdnlpKkU\n</code></pre> <p>(Hint: prefix this line with a space to avoid saving the token to your bash history.)</p> <p>Tokens expire after 2 weeks and will need to be fetched from the website again.</p>"},{"location":"merlin6/how-to-use-merlin/archive/#ingestion","title":"Ingestion","text":"<p>The first step to ingesting your data into the catalog is to prepare a file describing what data you have. This is called <code>metadata.json</code>, and can be created with a text editor (e.g. <code>vim</code>). It can in principle be saved anywhere, but keeping it with your archived data is recommended. For more information about the format, see the 'Bio metadata' section below. An example follows:</p> YAML<pre><code>{\n \"principalInvestigator\": \"albrecht.gessler@psi.ch\",\n \"creationLocation\": \"/PSI/EMF/JEOL2200FS\",\n \"dataFormat\": \"TIFF+LZW Image Stack\",\n \"sourceFolder\": \"/gpfs/group/LBR/pXXX/myimages\",\n \"owner\": \"Wilhelm Tell\",\n \"ownerEmail\": \"wilhelm.tell@psi.ch\",\n \"type\": \"raw\",\n \"description\": \"EM micrographs of amygdalin\",\n \"ownerGroup\": \"a-12345\",\n \"scientificMetadata\": {\n \"description\": \"EM micrographs of amygdalin\",\n \"sample\": {\n \"name\": \"Amygdalin beta-glucosidase 1\",\n \"uniprot\": \"P29259\",\n \"species\": \"Apple\"\n },\n \"dataCollection\": {\n \"date\": \"2018-08-01\"\n },\n \"microscopeParameters\": {\n \"pixel size\": {\n \"v\": 0.885,\n \"u\": \"A\"\n },\n \"voltage\": {\n \"v\": 200,\n \"u\": \"kV\"\n },\n \"dosePerFrame\": {\n \"v\": 1.277,\n \"u\": \"e/A2\"\n }\n }\n }\n}\n</code></pre> <p>It is recommended to use the ScicatEditor for creating metadata files. This is a browser-based tool specifically for ingesting PSI data. Using the tool avoids syntax errors and provides templates for common data sets and options. The finished JSON file can then be downloaded to merlin or copied into a text editor.</p> <p>Another option is to use the SciCat graphical interface from NoMachine. This provides a graphical interface for selecting data to archive. This is particularly useful for data associated with a DUO experiment and p-group. Type <code>SciCat</code> to get started after loading the <code>datacatalog</code> module. The GUI also replaces the the command-line ingestion described below.</p> <p>The following steps can be run from wherever you saved your <code>metadata.json</code>. First, perform a \"dry-run\" which will check the metadata for errors:</p> Bash<pre><code>datasetIngestor --token $SCICAT_TOKEN metadata.json\n</code></pre> <p>It will ask for your PSI credentials and then print some info about the data to be ingested. If there are no errors, proceed to the real ingestion:</p> Bash<pre><code>datasetIngestor --token $SCICAT_TOKEN --ingest --autoarchive metadata.json\n</code></pre> <p>You will be asked whether you want to copy the data to the central system:</p> <ul> <li>If you are on the Merlin cluster and you are archiving data from <code>/data/user</code> or <code>/data/project</code>, answer 'no' since the data catalog can directly read the data.</li> <li>If you are on a directory other than <code>/data/user</code> and <code>/data/project</code>, or you are on a desktop computer, answer 'yes'. Copying large datasets to the PSI archive system may take quite a while (minutes to hours).</li> </ul> <p>If there are no errors, your data has been accepted into the data catalog! From now on, no changes should be made to the ingested data. This is important, since the next step is for the system to copy all the data to the CSCS Petabyte archive. Writing to tape is slow, so this process may take several days, and it will fail if any modifications are detected.</p> <p>If using the <code>--autoarchive</code> option as suggested above, your dataset should now be in the queue. Check the data catalog: https://discovery.psi.ch. Your job should have status 'WorkInProgress'. You will receive an email when the ingestion is complete.</p> <p>If you didn't use <code>--autoarchive</code>, you need to manually move the dataset into the archive queue. From discovery.psi.ch, navigate to the 'Archive' tab. You should see the newly ingested dataset. Check the dataset and click <code>Archive</code>. You should see the status change from <code>datasetCreated</code> to <code>scheduleArchiveJob</code>. This indicates that the data is in the process of being transferred to CSCS.</p> <p>After a few days the dataset's status will change to <code>datasetOnAchive</code> indicating the data is stored. At this point it is safe to delete the data.</p>"},{"location":"merlin6/how-to-use-merlin/archive/#useful-commands","title":"Useful commands","text":"<p>Running the datasetIngestor in dry mode (without <code>--ingest</code>) finds most errors. However, it is sometimes convenient to find potential errors yourself with simple unix commands.</p> <p>Find problematic filenames</p> Bash<pre><code>find . -iregex '.*/[^/]*[^a-zA-Z0-9_ ./-][^/]*'=\n</code></pre> <p>Find broken links</p> Bash<pre><code>find -L . -type l\n</code></pre> <p>Find outside links</p> Bash<pre><code>find . -type l -exec bash -c 'realpath --relative-base \"`pwd`\" \"$0\" 2>/dev/null |egrep \"^[./]\" |sed \"s|^|$0 ->|\" ' '{}' ';'\n</code></pre> <p>Delete certain files (use with caution)</p> Bash<pre><code># Empty directories\nfind . -type d -empty -delete\n# Backup files\nfind . -name '*~' -delete\nfind . -name '*#autosave#' -delete\n</code></pre>"},{"location":"merlin6/how-to-use-merlin/archive/#troubleshooting-known-bugs","title":"Troubleshooting & Known Bugs","text":"<ul> <li>The following message can be safely ignored:</li> </ul> Bash<pre><code>key_cert_check_authority: invalid certificate\nCertificate invalid: name is not a listed principal\n</code></pre> <p>It indicates that no kerberos token was provided for authentication. You can avoid the warning by first running kinit (PSI linux systems).</p> <ul> <li>For decentral ingestion cases, the copy step is indicated by a message <code>Running [/usr/bin/rsync -e ssh -avxz ...</code>. It is expected that this step will take a long time and may appear to have hung. You can check what files have been successfully transfered using rsync:</li> </ul> Bash<pre><code>rsync --list-only user_n@pb-archive.psi.ch:archive/UID/PATH/\n</code></pre> <p>where UID is the dataset ID (12345678-1234-1234-1234-123456789012) and PATH is the absolute path to your data. Note that rsync creates directories first and that the transfer order is not alphabetical in some cases, but it should be possible to see whether any data has transferred.</p> <ul> <li> <p>There is currently a limit on the number of files per dataset (technically, the limit is from the total length of all file paths). It is recommended to break up datasets into 300'000 files or less.</p> <ul> <li>If it is not possible or desirable to split data between multiple datasets, an alternate work-around is to package files into a tarball. For datasets which are already compressed, omit the -z option for a considerable speedup:</li> </ul> Bash<pre><code>tar -f [output].tar [srcdir]\n</code></pre> <p>Uncompressed data can be compressed on the cluster using the following command:</p> Bash<pre><code>sbatch /data/software/Slurm/Utilities/Parallel_TarGz.batch -s [srcdir] -t [output].tar -n\n</code></pre> <p>Run /data/software/Slurm/Utilities/Parallel_TarGz.batch -h for more details and options.</p> </li> </ul>"},{"location":"merlin6/how-to-use-merlin/archive/#sample-ingestion-output-datasetingestor-1111","title":"Sample ingestion output (datasetIngestor 1.1.11)","text":"Text Only<pre><code>/data/project/bio/myproject/archive $ datasetIngestor -copy -autoarchive -allowexistingsource -ingest metadata.json\n2019/11/06 11:04:43 Latest version: 1.1.11\n\n2019/11/06 11:04:43 Your version of this program is up-to-date\n2019/11/06 11:04:43 You are about to add a dataset to the === production === data catalog environment...\n2019/11/06 11:04:43 Your username:\nuser_n\n2019/11/06 11:04:48 Your password:\n2019/11/06 11:04:52 User authenticated: XXX\n2019/11/06 11:04:52 User is member in following a or p groups: XXX\n2019/11/06 11:04:52 OwnerGroup information a-XXX verified successfully.\n2019/11/06 11:04:52 contactEmail field added: XXX\n2019/11/06 11:04:52 Scanning files in dataset /data/project/bio/myproject/archive\n2019/11/06 11:04:52 No explicit filelistingPath defined - full folder /data/project/bio/myproject/archive is used.\n2019/11/06 11:04:52 Source Folder: /data/project/bio/myproject/archive at /data/project/bio/myproject/archive\n2019/11/06 11:04:57 The dataset contains 100000 files with a total size of 50000000000 bytes.\n2019/11/06 11:04:57 creationTime field added: 2019-07-29 18:47:08 +0200 CEST\n2019/11/06 11:04:57 endTime field added: 2019-11-06 10:52:17.256033 +0100 CET\n2019/11/06 11:04:57 license field added: CC BY-SA 4.0\n2019/11/06 11:04:57 isPublished field added: false\n2019/11/06 11:04:57 classification field added: IN=medium,AV=low,CO=low\n2019/11/06 11:04:57 Updated metadata object:\n{\n \"accessGroups\": [\n \"XXX\"\n ],\n \"classification\": \"IN=medium,AV=low,CO=low\",\n \"contactEmail\": \"XXX\",\n \"creationLocation\": \"XXX\",\n \"creationTime\": \"2019-07-29T18:47:08+02:00\",\n \"dataFormat\": \"XXX\",\n \"description\": \"XXX\",\n \"endTime\": \"2019-11-06T10:52:17.256033+01:00\",\n \"isPublished\": false,\n \"license\": \"CC BY-SA 4.0\",\n \"owner\": \"XXX\",\n \"ownerEmail\": \"XXX\",\n \"ownerGroup\": \"a-XXX\",\n \"principalInvestigator\": \"XXX\",\n \"scientificMetadata\": {\n...\n },\n \"sourceFolder\": \"/data/project/bio/myproject/archive\",\n \"type\": \"raw\"\n}\n2019/11/06 11:04:57 Running [/usr/bin/ssh -l user_n pb-archive.psi.ch test -d /data/project/bio/myproject/archive].\nkey_cert_check_authority: invalid certificate\nCertificate invalid: name is not a listed principal\nuser_n@pb-archive.psi.ch's password:\n2019/11/06 11:05:04 The source folder /data/project/bio/myproject/archive is not centrally available (decentral use case).\nThe data must first be copied to a rsync cache server.\n\n2019/11/06 11:05:04 Do you want to continue (Y/n)?\nY\n2019/11/06 11:05:09 Created dataset with id 12.345.67890/12345678-1234-1234-1234-123456789012\n2019/11/06 11:05:09 The dataset contains 108057 files.\n2019/11/06 11:05:10 Created file block 0 from file 0 to 1000 with total size of 413229990 bytes\n2019/11/06 11:05:10 Created file block 1 from file 1000 to 2000 with total size of 416024000 bytes\n2019/11/06 11:05:10 Created file block 2 from file 2000 to 3000 with total size of 416024000 bytes\n2019/11/06 11:05:10 Created file block 3 from file 3000 to 4000 with total size of 416024000 bytes\n...\n2019/11/06 11:05:26 Created file block 105 from file 105000 to 106000 with total size of 416024000 bytes\n2019/11/06 11:05:27 Created file block 106 from file 106000 to 107000 with total size of 416024000 bytes\n2019/11/06 11:05:27 Created file block 107 from file 107000 to 108000 with total size of 850195143 bytes\n2019/11/06 11:05:27 Created file block 108 from file 108000 to 108057 with total size of 151904903 bytes\n2019/11/06 11:05:27 short dataset id: 0a9fe316-c9e7-4cc5-8856-e1346dd31e31\n2019/11/06 11:05:27 Running [/usr/bin/rsync -e ssh -avxz /data/project/bio/myproject/archive/ user_n@pb-archive.psi.ch:archive\n/0a9fe316-c9e7-4cc5-8856-e1346dd31e31/data/project/bio/myproject/archive].\nkey_cert_check_authority: invalid certificate\nCertificate invalid: name is not a listed principal\nuser_n@pb-archive.psi.ch's password:\nPermission denied, please try again.\nuser_n@pb-archive.psi.ch's password:\n/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied\n/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied\n/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied\n/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied\n/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied\n...\n2019/11/06 12:05:08 Successfully updated {\"pid\":\"12.345.67890/12345678-1234-1234-1234-123456789012\",...}\n2019/11/06 12:05:08 Submitting Archive Job for the ingested datasets.\n2019/11/06 12:05:08 Job response Status: okay\n2019/11/06 12:05:08 A confirmation email will be sent to XXX\n12.345.67890/12345678-1234-1234-1234-123456789012\n</code></pre>"},{"location":"merlin6/how-to-use-merlin/archive/#publishing","title":"Publishing","text":"<p>After datasets are are ingested they can be assigned a public DOI. This can be included in publications and will make the datasets on http://doi.psi.ch.</p> <p>For instructions on this, please read the 'Publish' section in the ingest manual.</p>"},{"location":"merlin6/how-to-use-merlin/archive/#retrieving-data","title":"Retrieving data","text":"<p>Retrieving data from the archive is also initiated through the Data Catalog. Please read the 'Retrieve' section in the ingest manual.</p>"},{"location":"merlin6/how-to-use-merlin/archive/#further-information","title":"Further Information","text":"<ul> <li>PSI Data Catalog</li> <li>Full Documentation</li> <li>Published Datasets (doi.psi.ch)</li> <li>Data Catalog PSI page</li> <li>Data catalog SciCat Software</li> <li>FAIR definition and SNF Research Policy</li> <li>Petabyte Archive at CSCS</li> </ul>"},{"location":"merlin6/how-to-use-merlin/connect-from-linux/","title":"Connecting from a Linux Client","text":""},{"location":"merlin6/how-to-use-merlin/connect-from-linux/#connecting-from-a-linux-client","title":"Connecting from a Linux Client","text":""},{"location":"merlin6/how-to-use-merlin/connect-from-linux/#ssh-without-x11-forwarding","title":"SSH without X11 Forwarding","text":"<p>This is the standard method. Official X11 support is provided through NoMachine. For normal SSH sessions, use your SSH client as follows:</p> Bash<pre><code>ssh $username@merlin-l-01.psi.ch\nssh $username@merlin-l-001.psi.ch\nssh $username@merlin-l-002.psi.ch\n</code></pre>"},{"location":"merlin6/how-to-use-merlin/connect-from-linux/#ssh-with-x11-forwarding","title":"SSH with X11 Forwarding","text":"<p>Official X11 Forwarding support is through NoMachine. Please follow the document {Job Submission -> Interactive Jobs} and {Accessing Merlin -> NoMachine} for more details. However, we provide a small recipe for enabling X11 Forwarding in Linux.</p> <ul> <li>For enabling client X11 forwarding, add the following to the start of <code>~/.ssh/config</code> to implicitly add <code>-X</code> to all ssh connections:</li> </ul> Bash<pre><code>ForwardAgent yes\nForwardX11Trusted yes\n</code></pre> <ul> <li>Alternatively, you can add the option <code>-Y</code> to the <code>ssh</code> command. In example:</li> </ul> Bash<pre><code>ssh -X $username@merlin-l-01.psi.ch\nssh -X $username@merlin-l-001.psi.ch\nssh -X $username@merlin-l-002.psi.ch\n</code></pre> <ul> <li>For testing that X11 forwarding works, just run <code>xclock</code>. A X11 based clock should popup in your client session:</li> </ul> Bash<pre><code>xclock\n</code></pre>"},{"location":"merlin6/how-to-use-merlin/connect-from-macos/","title":"Connecting from a MacOS Client","text":""},{"location":"merlin6/how-to-use-merlin/connect-from-macos/#connecting-from-a-macos-client","title":"Connecting from a MacOS Client","text":""},{"location":"merlin6/how-to-use-merlin/connect-from-macos/#ssh-without-x11-forwarding","title":"SSH without X11 Forwarding","text":"<p>This is the standard method. Official X11 support is provided through NoMachine. For normal SSH sessions, use your SSH client as follows:</p> Bash<pre><code>ssh $username@merlin-l-01.psi.ch\nssh $username@merlin-l-001.psi.ch\nssh $username@merlin-l-002.psi.ch\n</code></pre>"},{"location":"merlin6/how-to-use-merlin/connect-from-macos/#ssh-with-x11-forwarding","title":"SSH with X11 Forwarding","text":""},{"location":"merlin6/how-to-use-merlin/connect-from-macos/#requirements","title":"Requirements","text":"<p>For running SSH with X11 Forwarding in MacOS, one needs to have a X server running in MacOS. The official X Server for MacOS is XQuartz. Please ensure you have it running before starting a SSH connection with X11 forwarding.</p>"},{"location":"merlin6/how-to-use-merlin/connect-from-macos/#ssh-with-x11-forwarding-in-macos","title":"SSH with X11 Forwarding in MacOS","text":"<p>Official X11 support is through NoMachine. Please follow the document {Job Submission -> Interactive Jobs} and {Accessing Merlin -> NoMachine} for more details. However, we provide a small recipe for enabling X11 Forwarding in MacOS.</p> <ul> <li> <p>Ensure that XQuartz is installed and running in your MacOS.</p> </li> <li> <p>For enabling client X11 forwarding, add the following to the start of <code>~/.ssh/config</code> to implicitly add <code>-X</code> to all ssh connections:</p> </li> </ul> Bash<pre><code>ForwardAgent yes\nForwardX11Trusted yes\n</code></pre> <ul> <li>Alternatively, you can add the option <code>-Y</code> to the <code>ssh</code> command. In example:</li> </ul> Bash<pre><code>ssh -X $username@merlin-l-01.psi.ch\nssh -X $username@merlin-l-001.psi.ch\nssh -X $username@merlin-l-002.psi.ch\n</code></pre> <ul> <li>For testing that X11 forwarding works, just run <code>xclock</code>. A X11 based clock should popup in your client session.</li> </ul> Bash<pre><code>xclock\n</code></pre>"},{"location":"merlin6/how-to-use-merlin/connect-from-windows/","title":"Connecting from a Windows Client","text":""},{"location":"merlin6/how-to-use-merlin/connect-from-windows/#connecting-from-a-windows-client","title":"Connecting from a Windows Client","text":""},{"location":"merlin6/how-to-use-merlin/connect-from-windows/#ssh-with-putty-without-x11-forwarding","title":"SSH with PuTTY without X11 Forwarding","text":"<p>PuTTY is one of the most common tools for SSH.</p> <p>Check, if the following software packages are installed on the Windows workstation by inspecting the Start menu (hint: use the Search box to save time):</p> <ul> <li>PuTTY (should be already installed)</li> <li>[Optional] Xming (needed for SSH with X11 Forwarding)</li> </ul> <p>If they are missing, you can install them using the Software Kiosk icon on the Desktop.</p> <ol> <li> <p>Start PuTTY</p> </li> <li> <p>[Optional] Enable <code>xterm</code> to have similar mouse behavour as in Linux:</p> </li> </ol> <p></p> <ol> <li>Create session to a Merlin login node and Open:</li> </ol> <p></p>"},{"location":"merlin6/how-to-use-merlin/connect-from-windows/#ssh-with-putty-with-x11-forwarding","title":"SSH with PuTTY with X11 Forwarding","text":"<p>Official X11 Forwarding support is through NoMachine. Please follow the document {Job Submission -> Interactive Jobs} and {Accessing Merlin -> NoMachine} for more details. However, we provide a small recipe for enabling X11 Forwarding in Windows.</p> <p>Check, if the Xming is installed on the Windows workstation by inspecting the Start menu (hint: use the Search box to save time). If missing, you can install it by using the Software Kiosk icon (should be located on the Desktop). </p> <ol> <li> <p>Ensure that a X server (Xming) is running. Otherwise, start it.</p> </li> <li> <p>Enable X11 Forwarding in your SSH client. In example, for Putty:</p> </li> </ol> <p></p>"},{"location":"merlin6/how-to-use-merlin/kerberos/","title":"Kerberos and AFS authentication","text":""},{"location":"merlin6/how-to-use-merlin/kerberos/#kerberos-and-afs-authentication","title":"Kerberos and AFS authentication","text":"<p>Projects and users have their own areas in the central PSI AFS service. In order to access to these areas, valid Kerberos and AFS tickets must be granted.</p> <p>These tickets are automatically granted when accessing through SSH with username and password. Alternatively, one can get a granting ticket with the <code>kinit</code> (Kerberos) and <code>aklog</code> (AFS ticket, which needs to be run after <code>kinit</code>) commands.</p> <p>Due to PSI security policies, the maximum lifetime of the ticket is 7 days, and the default time is 10 hours. It means than one needs to constantly renew (<code>krenew</code> command) the existing granting tickets, and their validity can not be extended longer than 7 days. At this point, one needs to obtain new granting tickets.</p>"},{"location":"merlin6/how-to-use-merlin/kerberos/#obtaining-granting-tickets-with-username-and-password","title":"Obtaining granting tickets with username and password","text":"<p>As already described above, the most common use case is to obtain Kerberos and AFS granting tickets by introducing username and password: * When login to Merlin through SSH protocol, if this is done with username + password authentication, tickets for Kerberos and AFS will be automatically obtained. * When login to Merlin through NoMachine, no Kerberos and AFS are granted. Therefore, users need to</p> <p>run <code>kinit</code> (to obtain a granting Kerberos ticket) followed by <code>aklog</code> (to obtain a granting AFS ticket). See further details below.</p> <p>To manually obtain granting tickets, one has to: 1. To obtain a granting Kerberos ticket, one needs to run <code>kinit $USER</code> and enter the PSI password. </p>Bash<pre><code>kinit $USER@D.PSI.CH\n</code></pre> 2. To obtain a granting ticket for AFS, one needs to run <code>aklog</code>. No password is necessary, but a valid Kerberos ticket is mandatory. Bash<pre><code>aklog\n</code></pre> 3. To list the status of your granted tickets, users can use the <code>klist</code> command. Bash<pre><code>klist\n</code></pre> 4. To extend the validity of existing granting tickets, users can use the <code>krenew</code> command. Bash<pre><code>krenew\n</code></pre> * Keep in mind that the maximum lifetime for granting tickets is 7 days, therefore <code>krenew</code> can not be used beyond that limit, and then <code>kinit</code> should be used instead.<p></p>"},{"location":"merlin6/how-to-use-merlin/kerberos/#obtanining-granting-tickets-with-keytab","title":"Obtanining granting tickets with keytab","text":"<p>Sometimes, obtaining granting tickets by using password authentication is not possible. An example are user Slurm jobs requiring access to private areas in AFS. For that, there's the possibility to generate a keytab file.</p> <p>Be aware that the keytab file must be private, fully protected by correct permissions and not shared with any</p> <p>other users.</p>"},{"location":"merlin6/how-to-use-merlin/kerberos/#creating-a-keytab-file","title":"Creating a keytab file","text":"<p>For generating a keytab, one has to:</p> <ol> <li>Load a newer Kerberos ( <code>krb5/1.20</code> or higher) from Pmodules: Bash<pre><code>module load krb5/1.20\n</code></pre></li> <li>Create a private directory for storing the Kerberos keytab file</li> </ol> <p></p>Bash<pre><code>mkdir -p ~/.k5\n</code></pre> 3. Run the <code>ktutil</code> utility which comes with the loaded <code>krb5</code> Pmodule: Bash<pre><code>ktutil\n</code></pre> 4. In the <code>ktutil</code> console, one has to generate a keytab file as follows:<p></p> <p></p>Bash<pre><code># Replace $USER by your username\nadd_entry -password -k 0 -f -p $USER\nwkt /psi/home/$USER/.k5/krb5.keytab\nexit\n</code></pre> Notice that you will need to add your password once. This step is required for generating the keytab file.<p></p> <ol> <li>Once back to the main shell, one has to ensure that the file contains the proper permissions: Bash<pre><code>chmod 0600 ~/.k5/krb5.keytab\n</code></pre></li> </ol>"},{"location":"merlin6/how-to-use-merlin/kerberos/#obtaining-tickets-by-using-keytab-files","title":"Obtaining tickets by using keytab files","text":"<p>Once the keytab is created, one can obtain kerberos tickets without being prompted for a password as follows:</p> Bash<pre><code>kinit -kt ~/.k5/krb5.keytab $USER\naklog\n</code></pre>"},{"location":"merlin6/how-to-use-merlin/kerberos/#slurm-jobs-accessing-afs","title":"Slurm jobs accessing AFS","text":"<p>Some jobs may require to access private areas in AFS. For that, having a valid keytab file is required. Then, from inside the batch script one can obtain granting tickets for Kerberos and AFS, which can be used for accessing AFS private areas.</p> <p>The steps should be the following:</p> <ul> <li>Setup <code>KRB5CCNAME</code>, which can be used to specify the location of the Kerberos5 credentials (ticket) cache. In general it should point to a shared area (<code>$HOME/.k5</code> is a good location), and is strongly recommended to generate an independent Kerberos5 credential cache (it is, creating a new credential cache per Slurm job): Bash<pre><code>export KRB5CCNAME=\"$(mktemp \"$HOME/.k5/krb5cc_XXXXXX\")\"\n</code></pre></li> <li>To obtain a Kerberos5 granting ticket, run <code>kinit</code> by using your keytab:</li> </ul> <p></p>Bash<pre><code>kinit -kt \"$HOME/.k5/krb5.keytab\" $USER@D.PSI.CH\n</code></pre> * To obtain a granting AFS ticket, run <code>aklog</code>:<p></p> <p></p>Bash<pre><code>aklog\n</code></pre> * At the end of the job, you can remove destroy existing Kerberos tickets.<p></p> Bash<pre><code>kdestroy\n</code></pre>"},{"location":"merlin6/how-to-use-merlin/kerberos/#slurm-batch-script-example-obtaining-krbafs-granting-tickets","title":"Slurm batch script example: obtaining KRB+AFS granting tickets","text":""},{"location":"merlin6/how-to-use-merlin/kerberos/#example-1-independent-crendetial-cache-per-slurm-job","title":"Example 1: Independent crendetial cache per Slurm job","text":"<p>This is the recommended way. At the end of the job, is strongly recommended to remove / destroy the existing kerberos tickets.</p> Bash<pre><code>#!/bin/bash\n#SBATCH --partition=hourly # Specify 'general' or 'daily' or 'hourly'\n#SBATCH --time=01:00:00 # Strictly recommended when using 'general' partition.\n#SBATCH --output=run.out # Generate custom output file\n#SBATCH --error=run.err # Generate custom error file\n#SBATCH --nodes=1 # Uncomment and specify #nodes to use\n#SBATCH --ntasks=1 # Uncomment and specify #nodes to use\n#SBATCH --cpus-per-task=1\n#SBATCH --constraint=xeon-gold-6152\n#SBATCH --hint=nomultithread\n#SBATCH --job-name=krb5\n\nexport KRB5CCNAME=\"$(mktemp \"$HOME/.k5/krb5cc_XXXXXX\")\"\nkinit -kt \"$HOME/.k5/krb5.keytab\" $USER@D.PSI.CH\naklog\nklist\n\necho \"Here should go my batch script code.\"\n\n# Destroy Kerberos tickets created for this job only\nkdestroy\nklist\n</code></pre>"},{"location":"merlin6/how-to-use-merlin/kerberos/#example-2-shared-credential-cache","title":"Example 2: Shared credential cache","text":"<p>Some users may need/prefer to run with a shared cache file. For doing that, one needs to setup <code>KRB5CCNAME</code> from the login node session, before submitting the job.</p> Bash<pre><code>export KRB5CCNAME=\"$(mktemp \"$HOME/.k5/krb5cc_XXXXXX\")\"\n</code></pre> <p>Then, you can run one or multiple jobs scripts (or parallel job with <code>srun</code>). <code>KRB5CCNAME</code> will be propagated to the job script or to the parallel job, therefore a single credential cache will be shared amongst different Slurm runs.</p> Bash<pre><code>#!/bin/bash\n#SBATCH --partition=hourly # Specify 'general' or 'daily' or 'hourly'\n#SBATCH --time=01:00:00 # Strictly recommended when using 'general' partition.\n#SBATCH --output=run.out # Generate custom output file\n#SBATCH --error=run.err # Generate custom error file\n#SBATCH --nodes=1 # Uncomment and specify #nodes to use\n#SBATCH --ntasks=1 # Uncomment and specify #nodes to use \n#SBATCH --cpus-per-task=1\n#SBATCH --constraint=xeon-gold-6152\n#SBATCH --hint=nomultithread\n#SBATCH --job-name=krb5\n\n# KRB5CCNAME is inherit from the login node session\nkinit -kt \"$HOME/.k5/krb5.keytab\" $USER@D.PSI.CH\naklog\nklist\n\necho \"Here should go my batch script code.\"\n\necho \"No need to run 'kdestroy', as it may have to survive for running other jobs\"\n</code></pre>"},{"location":"merlin6/how-to-use-merlin/nomachine/","title":"Remote Desktop Access","text":""},{"location":"merlin6/how-to-use-merlin/nomachine/#remote-desktop-access","title":"Remote Desktop Access","text":"<p>Users can login in Merlin through a Linux Remote Desktop Session. NoMachine is a desktop virtualization tool. It is similar to VNC, Remote Desktop, etc. It uses the NX protocol to enable a graphical login to remote servers.</p>"},{"location":"merlin6/how-to-use-merlin/nomachine/#installation","title":"Installation","text":"<p>NoMachine is available for PSI Windows computers in the Software Kiosk under the name NX Client. Please use the latest version (at least 6.0). For MacOS and Linux, the NoMachine client can be downloaded from https://www.nomachine.com/.</p>"},{"location":"merlin6/how-to-use-merlin/nomachine/#accessing-merlin6-nomachine-from-psi","title":"Accessing Merlin6 NoMachine from PSI","text":"<p>The Merlin6 NoMachine service is hosted in the following machine:</p> <ul> <li><code>merlin-nx.psi.ch</code></li> </ul> <p>This is the front-end (hence, the door) to the NoMachine back-end nodes, which contain the NoMachine desktop service. The back-end nodes are the following:</p> <ul> <li><code>merlin-l-001.psi.ch</code></li> <li><code>merlin-l-002.psi.ch</code></li> </ul> <p>Any access to the login node desktops must be done through <code>merlin-nx.psi.ch</code> (or from <code>rem-acc.psi.ch -> merlin-nx.psi.ch</code> when connecting from outside PSI).</p> <p>The front-end service running on <code>merlin-nx.psi.ch</code> will load balance the sessions and login to any of the available nodes in the back-end.</p> <p>Only 1 session per back-end is possible.</p> <p>Below are explained all the steps necessary for configuring the access to the NoMachine service running on a login node.</p>"},{"location":"merlin6/how-to-use-merlin/nomachine/#creating-a-merlin6-nomachine-connection","title":"Creating a Merlin6 NoMachine connection","text":""},{"location":"merlin6/how-to-use-merlin/nomachine/#adding-a-new-connection-to-the-front-end","title":"Adding a new connection to the front-end","text":"<p>Click the Add button to create a new connection to the <code>merlin-nx.psi.ch</code> front-end, and fill up the following fields:</p> <ul> <li>Name: Specify a custom name for the connection. Examples: <code>merlin-nx</code>, <code>merlin-nx.psi.ch</code>, <code>Merlin Desktop</code></li> <li>Host: Specify the hostname of the front-end service: <code>merlin-nx.psi.ch</code></li> <li>Protocol: specify the protocol that will be used for the connection. Recommended protocol: <code>NX</code></li> <li>Port: Specify the listening port of the front-end. It must be <code>4000</code>.</li> </ul> <p></p>"},{"location":"merlin6/how-to-use-merlin/nomachine/#configuring-nomachine-authentication-method","title":"Configuring NoMachine Authentication Method","text":"<p>Depending on the client version, it may ask for different authentication options. If it's required, choose your authentication method and Continue (Password or Kerberos are the recommended ones).</p> <p>You will be requested for the crendentials (username / password). Do not add <code>PSICH\\</code> as a prefix for the username.</p>"},{"location":"merlin6/how-to-use-merlin/nomachine/#opening-nomachine-desktop-sessions","title":"Opening NoMachine desktop sessions","text":"<p>By default, when connecting to the <code>merlin-nx.psi.ch</code> front-end it will automatically open a new session if none exists.</p> <p>If there are existing sessions, instead of opening a new desktop session, users can reconnect to an existing one by clicking to the proper icon (see image below).</p> <p></p> <p>Users can also create a second desktop session by selecting the <code>New Desktop</code> button (red rectangle in the below image). This will create a second session on the second login node, as long as this node is up and running.</p> <p></p>"},{"location":"merlin6/how-to-use-merlin/nomachine/#nomachine-lightdm-session-example","title":"NoMachine LightDM Session Example","text":"<p>An example of the NoMachine session, which is based on LightDM X Windows:</p> <p></p>"},{"location":"merlin6/how-to-use-merlin/nomachine/#accessing-merlin6-nomachine-from-outside-psi","title":"Accessing Merlin6 NoMachine from outside PSI","text":""},{"location":"merlin6/how-to-use-merlin/nomachine/#no-vpn-access","title":"No VPN access","text":"<p>Access to the Merlin6 NoMachine service is possible without VPN through 'rem-acc.psi.ch'. Please follow the steps described in PSI Remote Interactive Access for remote access to the Merlin6 NoMachine services. Once logged in 'rem-acc.psi.ch', you must then login to the <code>merlin-nx.psi.ch</code> front-end . services.</p>"},{"location":"merlin6/how-to-use-merlin/nomachine/#vpn-access","title":"VPN access","text":"<p>Remote access is also possible through VPN, however, you must not use 'rem-acc.psi.ch', and you have to connect directly to the Merlin6 NoMachine <code>merlin-nx.psi.ch</code> front-end as if you were inside PSI. For VPN access, you should request it to the IT department by opening a PSI Service Now ticket: VPN Access (PSI employees).</p>"},{"location":"merlin6/how-to-use-merlin/nomachine/#advanced-display-settings","title":"Advanced Display Settings","text":"<p>Nomachine Display Settings can be accessed and changed either when creating a new session or by clicking the very top right corner of a running session.</p>"},{"location":"merlin6/how-to-use-merlin/nomachine/#prevent-rescaling","title":"Prevent Rescaling","text":"<p>These settings prevent \"bluriness\" at the cost of some performance! (You might want to choose depending on performance)</p> <ul> <li>Display > Resize remote display (forces 1:1 pixel sizes)</li> <li>Display > Change settings > Quality: Choose Medium-Best Quality</li> <li>Display > Change settings > Modify advanced settings<ul> <li>Check: Disable network-adaptive display quality (diables lossy compression)</li> <li>Check: Disable client side image post-processing</li> </ul> </li> </ul>"},{"location":"merlin6/how-to-use-merlin/ssh-keys/","title":"Configuring SSH Keys in Merlin","text":""},{"location":"merlin6/how-to-use-merlin/ssh-keys/#configuring-ssh-keys-in-merlin","title":"Configuring SSH Keys in Merlin","text":"<p>Merlin users sometimes will need to access the different Merlin services without being constantly requested by a password. One can achieve that with Kerberos authentication, however in some cases some software would require the setup of SSH Keys. One example is ANSYS Fluent, which, when used interactively, the way of communication between the GUI and the different nodes is through the SSH protocol, and the use of SSH Keys is enforced.</p>"},{"location":"merlin6/how-to-use-merlin/ssh-keys/#setting-up-ssh-keys-on-merlin","title":"Setting up SSH Keys on Merlin","text":"<p>For security reason, users must always protect SSH Keys with a passphrase.</p> <p>User can check whether a SSH key already exists. These would be placed in the ~/.ssh/ directory. <code>RSA</code> encryption is usually the default one, and files in there would be <code>id_rsa</code> (private key) and <code>id_rsa.pub</code> (public key).</p> Bash<pre><code>ls ~/.ssh/id*\n</code></pre> <p>For creating SSH RSA Keys, one should:</p> <ol> <li> <p>Run <code>ssh-keygen</code>, a password will be requested twice. You must remember this password for the future.</p> <ul> <li>Due to security reasons, always try protecting it with a password. There is only one exception, when running ANSYS software, which in general should not use password to simplify the way of running the software in Slurm.</li> <li>This will generate a private key id_rsa, and a public key id_rsa.pub in your ~/.ssh directory.</li> </ul> </li> <li> <p>Add your public key to the <code>authorized_keys</code> file, and ensure proper permissions for that file, as follows: </p>Bash<pre><code>cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys\nchmod 0600 ~/.ssh/authorized_keys\n</code></pre><p></p> </li> <li>Configure the SSH client in order to force the usage of the psi.ch domain for trusting keys: Bash<pre><code>echo \"CanonicalizeHostname yes\" >> ~/.ssh/config\n</code></pre></li> <li>Configure further SSH options as follows: Bash<pre><code>echo \"AddKeysToAgent yes\" >> ~/.ssh/config\necho \"ForwardAgent yes\" >> ~/.ssh/config\n</code></pre> Other options may be added.</li> <li>Check that your SSH config file contains at least the lines mentioned in steps 3 and 4: Bash<pre><code>(base) \u2744 [caubet_m@merlin-l-001:/data/user/caubet_m]# cat ~/.ssh/config\nCanonicalizeHostname yes\nAddKeysToAgent yes\nForwardAgent yes\n</code></pre></li> </ol>"},{"location":"merlin6/how-to-use-merlin/ssh-keys/#using-the-ssh-keys","title":"Using the SSH Keys","text":""},{"location":"merlin6/how-to-use-merlin/ssh-keys/#using-authentication-agent-in-ssh-session","title":"Using Authentication Agent in SSH session","text":"<p>By default, when accessing the login node via SSH (with <code>ForwardAgent=yes</code>), it will automatically add your SSH Keys to the authentication agent. Hence, no actions should not be needed by the user. One can configure <code>ForwardAgent=yes</code> as follows:</p> Text Only<pre><code>* **(Recommended)** In your local Linux (workstation, laptop or desktop) add the following line in the\n</code></pre> <p><code>$HOME/.ssh/config</code> (or alternatively in <code>/etc/ssh/ssh_config</code>) file: <code>ForwardAgent yes</code> * Alternatively, on each SSH you can add the option <code>ForwardAgent=yes</code> in the SSH command. In example: <code>bash ssh -XY -o ForwardAgent=yes merlin-l-001.psi.ch</code></p> <p>If <code>ForwardAgent</code> is not enabled as shown above, one needs to run the authentication agent and then add your key to the ssh-agent. This must be done once per SSH session, as follows:</p> Text Only<pre><code>* Run `eval $(ssh-agent -s)` to run the **ssh-agent** in that SSH session\n* Check whether the authentication agent has your key already added:\n</code></pre> <p><code>bash ssh-add -l | grep \"/psi/home/$(whoami)/.ssh\"</code> * If no key is returned in the previous step, you have to add the private key identity to the authentication agent. You will be requested for the passphrase of your key, and it can be done by running: <code>bash ssh-add</code></p>"},{"location":"merlin6/how-to-use-merlin/ssh-keys/#using-authentication-agent-in-nomachine-session","title":"Using Authentication Agent in NoMachine Session","text":"<p>By default, when using a NoMachine session, the <code>ssh-agent</code> should be automatically started. Hence, there is no need of starting the agent or forwarding it.</p> <p>However, for NoMachine one always need to add the private key identity to the authentication agent. This can be done as follows:</p> <ol> <li>Check whether the authentication agent has already the key added: Bash<pre><code>ssh-add -l | grep \"/psi/home/$(whoami)/.ssh\"\n</code></pre></li> <li>If no key is returned in the previous step, you have to add the private key identity to the authentication agent. You will be requested for the passphrase of your key, and it can be done by running: Bash<pre><code>ssh-add\n</code></pre></li> </ol> <p>You just need to run it once per NoMachine session, and it would apply to all terminal windows within that NoMachine session.</p>"},{"location":"merlin6/how-to-use-merlin/ssh-keys/#troubleshooting","title":"Troubleshooting","text":""},{"location":"merlin6/how-to-use-merlin/ssh-keys/#errors-when-running-ssh-add","title":"Errors when running 'ssh-add'","text":"<p>If the error <code>Could not open a connection to your authentication agent.</code> appears when running <code>ssh-add</code>, it means that the authentication agent is not running. Please follow the previous procedures for starting it.</p>"},{"location":"merlin6/how-to-use-merlin/ssh-keys/#addupdate-ssh-rsa-key-password","title":"Add/Update SSH RSA Key password","text":"<p>If an existing SSH Key does not have password, or you want to update an existing password with a new one, you can do it as follows:</p> Bash<pre><code>ssh-keygen -p -f ~/.ssh/id_rsa\n</code></pre>"},{"location":"merlin6/how-to-use-merlin/ssh-keys/#ssh-keys-deployed-but-not-working","title":"SSH Keys deployed but not working","text":"<p>Please ensure proper permissions of the involved files, as well as any typos in the file names involved:</p> Bash<pre><code>chmod u+rwx,go-rwx,g+s ~/.ssh\nchmod u+rw-x,go-rwx ~/.ssh/authorized_keys\nchmod u+rw-x,go-rwx ~/.ssh/id_rsa\nchmod u+rw-x,go+r-wx ~/.ssh/id_rsa.pub\n</code></pre>"},{"location":"merlin6/how-to-use-merlin/ssh-keys/#testing-ssh-keys","title":"Testing SSH Keys","text":"<p>Once SSH Key is created, for testing that the SSH Key is valid, one can do the following:</p> <ol> <li>Create a new SSH session in one of the login nodes: Bash<pre><code>ssh merlin-l-001\n</code></pre></li> <li>In the login node session, destroy any existing Kerberos ticket or active SSH Key: Bash<pre><code>kdestroy\nssh-add -D\n</code></pre></li> <li>Add the new private key identity to the authentication agent. You will be requested by the passphrase. Bash<pre><code>ssh-add\n</code></pre></li> <li>Check that your key is active by the SSH agent: Bash<pre><code>ssh-add -l\n</code></pre></li> <li>SSH to the second login node. No password should be requested: Bash<pre><code>ssh -vvv merlin-l-002\n</code></pre></li> </ol> <p>If the last step succeeds, then means that your SSH Key is properly setup.</p>"},{"location":"merlin6/how-to-use-merlin/storage/","title":"Merlin6 Storage","text":""},{"location":"merlin6/how-to-use-merlin/storage/#merlin6-storage","title":"Merlin6 Storage","text":""},{"location":"merlin6/how-to-use-merlin/storage/#introduction","title":"Introduction","text":"<p>This document describes the different directories of the Merlin6 cluster.</p>"},{"location":"merlin6/how-to-use-merlin/storage/#user-and-project-data","title":"User and project data","text":"<ul> <li> <p>Users are responsible for backing up their own data. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).</p> <ul> <li><code>/psi/home</code>, as this contains a small amount of data, is the only directory where we can provide daily snapshots for one week. This can be found in the following directory <code>/psi/home/.snapshot/</code></li> </ul> </li> <li> <p>When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.</p> </li> </ul> <p>Warning</p> <p>When a user leaves PSI and his account has been removed, her storage space in Merlin may be recycled. Hence, when a user leaves PSI, she, her supervisor or team must ensure that the data is backed up to an external storage</p>"},{"location":"merlin6/how-to-use-merlin/storage/#checking-user-quota","title":"Checking user quota","text":"<p>For each directory, we provide a way for checking quotas (when required). However, a single command <code>merlin_quotas</code> is provided. This is useful to show with a single command all quotas for your filesystems (including AFS, which is not mentioned here).</p> <p>To check your quotas, please run:</p> Bash<pre><code>merlin_quotas\n</code></pre>"},{"location":"merlin6/how-to-use-merlin/storage/#merlin6-directories","title":"Merlin6 directories","text":"<p>Merlin6 offers the following directory classes for users:</p> <ul> <li> <p><code>/psi/home/<username></code>: Private user home directory</p> </li> <li> <p><code>/data/user/<username></code>: Private user data directory</p> </li> <li> <p><code>/data/project/general/<projectname></code>: Shared Project directory</p> <ul> <li>For BIO experiments, a dedicated <code>/data/project/bio/$projectname</code> exists.</li> </ul> </li> <li> <p><code>/scratch</code>: Local scratch disk (only visible by the node running a job).</p> </li> <li><code>/shared-scratch</code>: Shared scratch disk (visible from all nodes).</li> <li><code>/export</code>: Export directory for data transfer, visible from <code>ra-merlin-01.psi.ch</code>, <code>ra-merlin-02.psi.ch</code> and Merlin login nodes.<ul> <li>Refer to Transferring Data for more information about the export area and data transfer service.</li> </ul> </li> </ul> <p>Tip</p> <p>In GPFS there is a concept called GraceTime. Filesystems have a block (amount of data) and file (number of files) quota. This quota contains a soft and hard limits. Once the soft limit is reached, users can keep writing up to their hard limit quota during the grace period. Once GraceTime or hard limit are reached, users will be unable to write and will need remove data below the soft limit (or ask for a quota increase when this is possible, see below table).</p> <p>Properties of the directory classes:</p> Directory Block Quota [Soft:Hard] Block Quota [Soft:Hard] GraceTime Quota Change Policy: Block Quota Change Policy: Files Backup Backup Policy /psi/home/$username USR [10GB:11GB] Undef N/A Up to x2 when strongly justified. N/A yes Daily snapshots for 1 week /data/user/$username USR [1TB:1.074TB] USR [1M:1.1M] 7d Inmutable. Need a project. Changeable when justified. no Users responsible for backup /data/project/general/$projectname GRP [1TB:1.074TB] GRP [1M:1.1M] 7d Subject to project requirements. Subject to project requirements. no Project responsible for backup /scratch Undef Undef N/A N/A N/A no N/A /shared-scratch USR [512GB:2TB] USR [2M:2.5M] 7d Up to x2 when strongly justified. Changeable when justified. no N/A /export USR [10MB:20TB] USR [512K:5M] 10d Soft can be temporary increased. Changeable when justified. no N/A <p>Warning</p> <p>The use of scratch and export areas as an extension of the quota is forbidden. scratch and export areas must not contain final data.</p> <p>Auto cleanup policies in the scratch and export areas are applied.</p>"},{"location":"merlin6/how-to-use-merlin/storage/#user-home-directory","title":"User home directory","text":"<p>This is the default directory users will land when login in to any Merlin6 machine. It is intended for your scripts, documents, software development, and other files which you want to have backuped. Do not use it for data or HPC I/O-hungry tasks.</p> <p>This directory is mounted in the login and computing nodes under the path:</p> Bash<pre><code>/psi/home/$username\n</code></pre> <p>Home directories are part of the PSI NFS Central Home storage provided by AIT and are managed by the Merlin6 administrators.</p> <p>Users can check their quota by running the following command:</p> Bash<pre><code>quota -s\n</code></pre>"},{"location":"merlin6/how-to-use-merlin/storage/#home-directory-policy","title":"Home directory policy","text":"<ul> <li>Read Important: Code of Conduct for more information about Merlin6 policies.</li> <li> <p>Is forbidden to use the home directories for IO intensive tasks</p> <ul> <li>Use <code>/scratch</code>, <code>/shared-scratch</code>, <code>/data/user</code> or <code>/data/project</code> for this purpose.</li> </ul> </li> <li> <p>Users can retrieve up to 1 week of their lost data thanks to the automatic daily snapshots for 1 week. Snapshots can be accessed at this path:</p> </li> </ul> Bash<pre><code>/psi/home/.snapshop/$username\n</code></pre>"},{"location":"merlin6/how-to-use-merlin/storage/#user-data-directory","title":"User data directory","text":"<p>The user data directory is intended for fast IO access and keeping large amounts of private data. This directory is mounted in the login and computing nodes under the directory</p> <p>```bash /data/user/$username </p>Text Only<pre><code>Users can check their quota by running the following command:\n\n```bash\nmmlsquota -u <username> --block-size auto merlin-user\n</code></pre><p></p>"},{"location":"merlin6/how-to-use-merlin/storage/#user-data-directory-policy","title":"User data directory policy","text":"<ul> <li>Read Important: Code of Conduct for more information about Merlin6 policies.</li> <li> <p>Is forbidden to use the data directories as <code>scratch</code> area during a job runtime.</p> <ul> <li>Use <code>/scratch</code>, <code>/shared-scratch</code> for this purpose.</li> </ul> </li> <li> <p>No backup policy is applied for user data directories: users are responsible for backing up their data.</p> </li> </ul>"},{"location":"merlin6/how-to-use-merlin/storage/#project-data-directory","title":"Project data directory","text":"<p>This storage is intended for fast IO access and keeping large amounts of a project's data, where the data also can be shared by all members of the project (the project's corresponding unix group). We recommend to keep most data in project related storage spaces, since it allows users to coordinate. Also, project spaces have more flexible policies regarding extending the available storage space.</p> <p>Experiments can request a project space as described in [Accessing Merlin -> Requesting a Project]</p> <p>Once created, the project data directory will be mounted in the login and computing nodes under the dirctory:</p> Bash<pre><code>/data/project/general/$projectname\n</code></pre> <p>Project quotas are defined on a per group basis. Users can check the project quota by running the following command:</p> Bash<pre><code>mmlsquota -j $projectname --block-size auto -C merlin.psi.ch merlin-proj\n</code></pre>"},{"location":"merlin6/how-to-use-merlin/storage/#project-directory-policy","title":"Project Directory policy","text":"<ul> <li>Read Important: Code of Conduct for more information about Merlin6 policies.</li> <li>It is forbidden to use the data directories as <code>scratch</code> area during a job's runtime, i.e. for high throughput I/O for a job's temporary files. Please Use <code>/scratch</code>, <code>/shared-scratch</code> for this purpose.</li> <li>No backups: users are responsible for managing the backups of their data directories.</li> </ul>"},{"location":"merlin6/how-to-use-merlin/storage/#scratch-directories","title":"Scratch directories","text":"<p>There are two different types of scratch storage: local (<code>/scratch</code>) and shared (<code>/shared-scratch</code>).</p> <p>local scratch should be used for all jobs that do not require the scratch files to be accessible from multiple nodes, which is trivially true for all jobs running on a single node. shared scratch is intended for files that need to be accessible by multiple nodes, e.g. by a MPI-job where tasks are spread out over the cluster and all tasks need to do I/O on the same temporary files.</p> <p>local scratch in Merlin6 computing nodes provides a huge number of IOPS thanks to the NVMe technology. Shared scratch is implemented using a distributed parallel filesystem (GPFS) resulting in a higher latency, since it involves remote storage resources and more complex I/O coordination.</p> <p><code>/shared-scratch</code> is only mounted in the Merlin6 computing nodes (i.e. not on the login nodes), and its current size is 50TB. This can be increased in the future.</p> <p>The properties of the available scratch storage spaces are given in the following table</p> Cluster Service Scratch Scratch Mountpoint Shared Scratch Shared Scratch Mountpoint Comments merlin6 login node 100GB / SAS <code>/scratch</code> 50TB / GPFS <code>/shared-scratch</code> <code>merlin-l-0[1,2]</code> merlin6 computing node 1.3TB / NVMe <code>/scratch</code> 50TB / GPFS <code>/shared-scratch</code> <code>merlin-c-[001-024,101-124,201-224]</code> merlin6 login node 2.0TB / NVMe <code>/scratch</code> 50TB / GPFS <code>/shared-scratch</code> <code>merlin-l-00[1,2]</code>"},{"location":"merlin6/how-to-use-merlin/storage/#scratch-directories-policy","title":"Scratch directories policy","text":"<ul> <li>Read Important: Code of Conduct for more information about Merlin6 policies.</li> <li>By default, always use local first and only use shared if your specific use case requires it.</li> <li>Temporary files must be deleted at the end of the job by the user.<ul> <li>Remaining files will be deleted by the system if detected.</li> <li>Files not accessed within 28 days will be automatically cleaned up by the system.</li> <li>If for some reason the scratch areas get full, admins have the rights to cleanup the oldest data.</li> </ul> </li> </ul>"},{"location":"merlin6/how-to-use-merlin/storage/#export-directory","title":"Export directory","text":"<p>Export directory is exclusively intended for transferring data from outside PSI to Merlin and viceversa. Is a temporary directoy with an auto-cleanup policy. Please read Transferring Data for more information about it.</p>"},{"location":"merlin6/how-to-use-merlin/storage/#export-directory-policy","title":"Export directory policy","text":"<ul> <li>Temporary files must be deleted at the end of the job by the user.<ul> <li>Remaining files will be deleted by the system if detected.</li> <li>Files not accessed within 28 days will be automatically cleaned up by the system.</li> <li>If for some reason the export area gets full, admins have the rights to cleanup the oldest data</li> </ul> </li> </ul>"},{"location":"merlin6/how-to-use-merlin/transfer-data/","title":"Transferring Data","text":""},{"location":"merlin6/how-to-use-merlin/transfer-data/#transferring-data","title":"Transferring Data","text":""},{"location":"merlin6/how-to-use-merlin/transfer-data/#overview","title":"Overview","text":"<p>Most methods allow data to be either transmitted or received, so it may make sense to initiate the transfer from either merlin or the other system, depending on the network visibility.</p> <ul> <li>Merlin login nodes are visible from the PSI network, so direct data transfer (rsync/WinSCP) is generally preferable. This can be initiated from either endpoint.</li> <li>Merlin login nodes can access the internet using a limited set of protocols:<ul> <li>SSH-based protocols using port 22 (rsync-over-ssh, sftp, WinSCP, etc)</li> <li>HTTP-based protocols using ports 80 or 445 (https, WebDav, etc)</li> <li>Protocols using other ports require admin configuration and may only work with specific hosts (ftp, rsync daemons, etc)</li> </ul> </li> <li>Systems on the internet can access the PSI Data Transfer service <code>datatransfer.psi.ch</code>, using ssh-based protocols and Globus</li> </ul>"},{"location":"merlin6/how-to-use-merlin/transfer-data/#direct-transfer-via-merlin6-login-nodes","title":"Direct transfer via Merlin6 login nodes","text":"<p>The following methods transfer data directly via the login nodes. They are suitable for use from within the PSI network.</p>"},{"location":"merlin6/how-to-use-merlin/transfer-data/#rsync","title":"Rsync","text":"<p>Rsync is the preferred method to transfer data from Linux/MacOS. It allows transfers to be easily resumed if they get interrupted. The general syntax is:</p> Bash<pre><code>rsync -avAHXS <src> <dst>\n</code></pre> <p>For example, to transfer files from your local computer to a merlin project directory:</p> Bash<pre><code>rsync -avAHXS ~/localdata user@merlin-l-01.psi.ch:/data/project/general/myproject/\n</code></pre> <p>You can resume interrupted transfers by simply rerunning the command. Previously transferred files will be skipped.</p>"},{"location":"merlin6/how-to-use-merlin/transfer-data/#winscp","title":"WinSCP","text":"<p>The WinSCP tool can be used for remote file transfer on Windows. It is available from the Software Kiosk on PSI machines. Add <code>merlin-l-01.psi.ch</code> as a host and connect with your PSI credentials. You can then drag-and-drop files between your local computer and merlin.</p>"},{"location":"merlin6/how-to-use-merlin/transfer-data/#switchfilesender","title":"SWITCHfilesender","text":"<p>SWITCHfilesender is an installation of the FileSender project (filesender.org) which is a web based application that allows authenticated users to securely and easily send arbitrarily large files to other users.</p> <p>Authentication of users is provided through SimpleSAMLphp, supporting SAML2, LDAP and RADIUS and more. Users without an account can be sent an upload voucher by an authenticated user. FileSender is developed to the requirements of the higher education and research community.</p> <p>The purpose of the software is to send a large file to someone, have that file available for download for a certain number of downloads and/or a certain amount of time, and after that automatically delete the file. The software is not intended as a permanent file publishing platform.</p> <p>SWITCHfilesender is fully integrated with PSI, therefore, PSI employees can log in by using their PSI account (through Authentication and Authorization Infrastructure / AAI, by selecting PSI as the institution to be used for log in).</p>"},{"location":"merlin6/how-to-use-merlin/transfer-data/#psi-data-transfer","title":"PSI Data Transfer","text":"<p>From August 2024, Merlin is connected to the PSI Data Transfer service, <code>datatransfer.psi.ch</code>. This is a central service managed by the Linux team. However, any problems or questions related to it can be directly reported to the Merlin administrators, which will forward the request if necessary.</p> <p>The PSI Data Transfer servers supports the following protocols:</p> <ul> <li>Data Transfer - SSH (scp / rsync)</li> <li>Data Transfer - Globus</li> </ul> <p>Notice that <code>datatransfer.psi.ch</code> does not allow SSH login, only <code>rsync</code>, <code>scp</code> and Globus access is allowed.</p> <p>The following filesystems are mounted:</p> <ul> <li><code>/merlin/export</code> which points to the <code>/export</code> directory in Merlin.</li> <li><code>/merlin/data/experiment/mu3e</code> which points to the <code>/data/experiment/mu3e</code> directories in Merlin.<ul> <li>Mu3e sub-directories are mounted in RW (read-write), except for <code>data</code> (read-only mounted)</li> </ul> </li> <li><code>/merlin/data/project/general</code> which points to the <code>/data/project/general</code> directories in Merlin.<ul> <li>Owners of Merlin projects should request explicit access to it.</li> <li>Currently, only <code>CSCS</code> is available for transferring files between PizDaint/Alps and Merlin</li> </ul> </li> <li><code>/merlin/data/project/bio</code> which points to the <code>/data/project/bio</code> directories in Merlin.</li> <li><code>/merlin/data/user</code> which points to the <code>/data/user</code> directories in Merlin.</li> </ul> <p>Access to the PSI Data Transfer uses Multi factor authentication (MFA). Therefore, having the Microsoft Authenticator App is required as explained here.</p> <p>Official Documentation</p> <p>Please follow the Official PSI Data Transfer documentation for further instructions.</p>"},{"location":"merlin6/how-to-use-merlin/transfer-data/#directories","title":"Directories","text":""},{"location":"merlin6/how-to-use-merlin/transfer-data/#merlindatauser","title":"/merlin/data/user","text":"<p>User data directories are mounted in RW.</p> <p>Secure Permissions</p> <p>Please, ensure proper secured permissions in your <code>/data/user</code> directory. By default, when directory is created, the system applies the most restrictive permissions. However, this does not prevent users for changing permissions if they wish. At this point, users become responsible of those changes.</p>"},{"location":"merlin6/how-to-use-merlin/transfer-data/#merlinexport","title":"/merlin/export","text":"<p>Transferring big amounts of data from outside PSI to Merlin is always possible through <code>/export</code>.</p> <p>Export Directory Access</p> <p>The <code>/export</code> directory can be used by any Merlin user. This is configured in Read/Write mode. If you need access, please, contact the Merlin administrators.</p> <p>Export Usage Policy</p> <p>The use export as an extension of the quota is forbidden. Auto cleanup policies in the export area apply for files older than 28 days.</p>"},{"location":"merlin6/how-to-use-merlin/transfer-data/#exporting-data-from-merlin","title":"Exporting data from Merlin","text":"<p>For exporting data from Merlin to outside PSI by using <code>/export</code>, one has to:</p> <ul> <li>From a Merlin login node, copy your data from any directory (i.e. <code>/data/project</code>, <code>/data/user</code>, <code>/scratch</code>) to <code>/export</code>. Ensure to properly secure your directories and files with proper permissions.</li> <li>Once data is copied, from <code>datatransfer.psi.ch</code>, copy the data from <code>/merlin/export</code> to outside PSI</li> </ul>"},{"location":"merlin6/how-to-use-merlin/transfer-data/#importing-data-to-merlin","title":"Importing data to Merlin","text":"<p>For importing data from outside PSI to Merlin by using <code>/export</code>, one has to:</p> <ul> <li>From <code>datatransfer.psi.ch</code>, copy the data from outside PSI to <code>/merlin/export</code>.</li> </ul> <p>Ensure to properly secure your directories and files with proper permissions.</p> <ul> <li>Once data is copied, from a Merlin login node, copy your data from <code>/export</code> to any directory (i.e. <code>/data/project</code>, <code>/data/user</code>, <code>/scratch</code>).</li> </ul>"},{"location":"merlin6/how-to-use-merlin/transfer-data/#request-access-to-your-project-directory","title":"Request access to your project directory","text":"<p>Optionally, instead of using <code>/export</code>, Merlin project owners can request Read/Write or Read/Only access to their project directory.</p> <p>Project Access</p> <p>Merlin projects can request direct access. This can be configured in Read/Write or Read/Only modes. If your project needs access, please, contact the Merlin administrators.</p>"},{"location":"merlin6/how-to-use-merlin/transfer-data/#connecting-to-merlin6-from-outside-psi","title":"Connecting to Merlin6 from outside PSI","text":"<p>Merlin6 is fully accessible from within the PSI network. To connect from outside you can use:</p> <ul> <li>VPN (alternate instructions)</li> <li>SSH hopx<ul> <li>Please avoid transferring big amount data through hopx</li> </ul> </li> <li>No Machine<ul> <li>Remote Interactive Access through 'rem-acc.psi.ch'</li> <li>Please avoid transferring big amount of data through NoMachine</li> </ul> </li> </ul>"},{"location":"merlin6/how-to-use-merlin/transfer-data/#connecting-from-merlin6-to-outside-file-shares","title":"Connecting from Merlin6 to outside file shares","text":""},{"location":"merlin6/how-to-use-merlin/transfer-data/#merlin_rmount-command","title":"<code>merlin_rmount</code> command","text":"<p>Merlin provides a command for mounting remote file systems, called <code>merlin_rmount</code>. This provides a helpful wrapper over the Gnome storage utilities, and provides support for a wide range of remote file formats, including * SMB/CIFS (Windows shared folders) * WebDav * AFP * FTP, SFTP * others</p> <p>More instruction on using <code>merlin_rmount</code></p>"},{"location":"merlin6/how-to-use-merlin/using-modules/","title":"Using PModules","text":""},{"location":"merlin6/how-to-use-merlin/using-modules/#using-pmodules","title":"Using PModules","text":""},{"location":"merlin6/how-to-use-merlin/using-modules/#environment-modules","title":"Environment Modules","text":"<p>On top of the operating system stack we provide different software using the PSI developed PModule system.</p> <p>PModules is the official supported way and each package is deployed by a specific expert. Usually, in PModules software which is used by many people will be found.</p> <p>If you miss any package/versions or a software with a specific missing feature, contact us. We will study if is feasible or not to install it.</p>"},{"location":"merlin6/how-to-use-merlin/using-modules/#module-release-stages","title":"Module release stages","text":"<p>Three different release stages are available in Pmodules, ensuring proper software life cycling. These are the following: <code>unstable</code>, <code>stable</code> and <code>deprecated</code></p>"},{"location":"merlin6/how-to-use-merlin/using-modules/#unstable-release-stage","title":"Unstable release stage","text":"<p>The <code>unstable</code> release stage contains unstable releases of software. Software compilations here are usually under development or are not fully production ready.</p> <p>This release stage is not directly visible by the end users, and needs to be explicitly invoked as follows:</p> Bash<pre><code>module use unstable\n</code></pre> <p>Once software is validated and considered production ready, this is moved to the <code>stable</code> release stage.</p>"},{"location":"merlin6/how-to-use-merlin/using-modules/#stable-release-stage","title":"Stable release stage","text":"<p>The <code>stable</code> release stage contains stable releases of software, which have been deeply tested and are fully supported.</p> <p>This is the default release stage, and is visible by default. Whenever possible, users are strongly advised to use packages from this release stage.</p>"},{"location":"merlin6/how-to-use-merlin/using-modules/#deprecated-release-stage","title":"Deprecated release stage","text":"<p>The <code>deprecated</code> release stage contains deprecated releases of software. Software in this release stage is usually deprecated or discontinued by their developers. Also, minor versions or redundant compilations are moved here as long as there is a valid copy in the stable repository.</p> <p>This release stage is not directly visible by the users, and needs to be explicitly invoked as follows:</p> Bash<pre><code>module use deprecated\n</code></pre> <p>However, software moved to this release stage can be directly loaded without the need of invoking it. This ensure proper life cycling of the software, and making it transparent for the end users.</p>"},{"location":"merlin6/how-to-use-merlin/using-modules/#module-overlays","title":"Module overlays","text":"<p>Recent Pmodules releases contain a feature called Pmodules overlays. In Merlin, overlays are used to source software from a different location. In that way, we can have custom private versions of software in the cluster installed on high performance storage accessed over a low latency network.</p> <p>Pmodules overlays are still under development, therefore consider that some features may not work or do not work as expected.</p> <p>Pmodule overlays can be used from Pmodules <code>v1.1.5</code>. However, Merlin is running Pmodules <code>v1.0.0rc10</code> as the default version. Therefore, one needs to load first a newer version of it: this is available in the repositories and can be loaded with <code>module load Pmodules/$version</code> command.</p> <p>Once running the proper Pmodules version, overlays are added (or invoked) with the <code>module use $overlay_name</code> command.</p>"},{"location":"merlin6/how-to-use-merlin/using-modules/#overlay_merlin","title":"overlay_merlin","text":"<p>Some Merlin software is already provided through PModule overlays and has been validated for using and running it in that way. Therefore, Melin contains an overlay called <code>overlay_merlin</code>. In this overlay, the software is installed in the Merlin high performance storage, specifically in the <code>/data/software/pmodules</code> directory. In general, if another copy exists in the standard repository, we strongly recommend to use the replica in the <code>overlay_merlin</code> overlay instead, as it provides faster access and it may also provide some customizations for the Merlin6 cluster.</p> <p>For loading the <code>overlay_merlin</code>, please run: </p>Bash<pre><code>module load Pmodules/1.1.6 # Or newer version\nmodule use overlay_merlin\n</code></pre><p></p> <p>Then, once <code>overlay_merlin</code> is invoked, it will disable central software installations with the same version (if exist), and will be replaced by the local ones in Merlin. Releases from the central Pmodules repository which do not have a copy in the Merlin overlay will remain visible. In example, for each ANSYS release, one can identify where it is installed by searching ANSYS in PModules with the <code>--verbose</code> option. This will show the location of the different ANSYS releases as follows: * For ANSYS releases installed in the central repositories, the path starts with <code>/opt/psi</code> * For ANSYS releases installed in the Merlin6 repository (and/or overwritting the central ones), the path starts with <code>/data/software/pmodules</code></p> Bash<pre><code>(base) \u2744 [caubet_m@merlin-l-001:/data/user/caubet_m]# module load Pmodules/1.1.6\nmodule load: unstable module has been loaded -- Pmodules/1.1.6\n\n(base) \u2744 [caubet_m@merlin-l-001:/data/user/caubet_m]# module use merlin_overlay\n\n(base) \u2744 [caubet_m@merlin-l-001:/data/user/caubet_m]# module search ANSYS --verbose\n\nModule Rel.stage Group Dependencies/Modulefile\n-------------------------------------------------------------------------------------------------------------------------------------------------------------------------\nANSYS/2019R3 stable Tools dependencies:\n modulefile: /data/software/pmodules/Tools/modulefiles/ANSYS/2019R3\nANSYS/2020R1 stable Tools dependencies:\n modulefile: /opt/psi/Tools/modulefiles/ANSYS/2020R1\nANSYS/2020R1-1 stable Tools dependencies:\n modulefile: /opt/psi/Tools/modulefiles/ANSYS/2020R1-1\nANSYS/2020R2 stable Tools dependencies:\n modulefile: /data/software/pmodules/Tools/modulefiles/ANSYS/2020R2\nANSYS/2021R1 stable Tools dependencies:\n modulefile: /data/software/pmodules/Tools/modulefiles/ANSYS/2021R1\nANSYS/2021R2 stable Tools dependencies:\n modulefile: /data/software/pmodules/Tools/modulefiles/ANSYS/2021R2\n</code></pre>"},{"location":"merlin6/how-to-use-merlin/using-modules/#pmodules-commands","title":"PModules commands","text":"<p>Below is listed a summary of all available commands:</p> Bash<pre><code>module use # show all available PModule Software Groups as well as Release Stages\nmodule avail # to see the list of available software packages provided via pmodules\nmodule use unstable # to get access to a set of packages not fully tested by the community\nmodule load <package>/<version> # to load specific software package with a specific version\nmodule search <string> # to search for a specific software package and its dependencies.\nmodule list # to list which software is loaded in your environment\nmodule purge # unload all loaded packages and cleanup the environment\n</code></pre>"},{"location":"merlin6/how-to-use-merlin/using-modules/#module-useunuse","title":"module use/unuse","text":"<p>Without any parameter, <code>use</code> lists all available PModule Software Groups and Release Stages.</p> Bash<pre><code>module use\n</code></pre> <p>When followed by a parameter, <code>use</code>/<code>unuse</code> invokes/uninvokes a PModule Software Group or Release Stage.</p> Bash<pre><code>module use EM # Invokes the 'EM' software group\nmodule unuse EM # Uninvokes the 'EM' software group\nmodule use unstable # Invokes the 'unstable' Release stable\nmodule unuse unstable # Uninvokes the 'unstable' Release stable\n</code></pre>"},{"location":"merlin6/how-to-use-merlin/using-modules/#module-avail","title":"module avail","text":"<p>This option lists all available PModule Software Groups and their packages.</p> <p>Please run <code>module avail --help</code> for further listing options.</p>"},{"location":"merlin6/how-to-use-merlin/using-modules/#module-search","title":"module search","text":"<p>This is used to search for software packages. By default, if no Release Stage or Software Group is specified in the options of the <code>module search</code> command, it will search from the already invoked Software Groups and Release Stages. Direct package dependencies will be also showed.</p> Bash<pre><code>(base) [caubet_m@merlin-l-001 caubet_m]$ module search openmpi/4.0.5_slurm\n\nModule Release Group Requires\n---------------------------------------------------------------------------\nopenmpi/4.0.5_slurm stable Compiler gcc/8.4.0\nopenmpi/4.0.5_slurm stable Compiler gcc/9.2.0\nopenmpi/4.0.5_slurm stable Compiler gcc/9.3.0\nopenmpi/4.0.5_slurm stable Compiler intel/20.4\n\n(base) [caubet_m@merlin-l-001 caubet_m]$ module load intel/20.4 openmpi/4.0.5_slurm\n</code></pre> <p>Please run <code>module search --help</code> for further search options.</p>"},{"location":"merlin6/how-to-use-merlin/using-modules/#module-loadunload","title":"module load/unload","text":"<p>This loads/unloads specific software packages. Packages might have direct dependencies that need to be loaded first. Other dependencies will be automatically loaded.</p> <p>In the example below, the <code>openmpi/4.0.5_slurm</code> package will be loaded, however <code>gcc/9.3.0</code> must be loaded as well as this is a strict dependency. Direct dependencies must be loaded in advance. Users can load multiple packages one by one or at once. This can be useful for instance when loading a package with direct dependencies.</p> Bash<pre><code># Single line\nmodule load gcc/9.3.0 openmpi/4.0.5_slurm\n\n# Multiple line\nmodule load gcc/9.3.0\nmodule load openmpi/4.0.5_slurm\n</code></pre>"},{"location":"merlin6/how-to-use-merlin/using-modules/#module-purge","title":"module purge","text":"<p>This command is an alternative to <code>module unload</code>, which can be used to unload all loaded module files.</p> Bash<pre><code>module purge\n</code></pre>"},{"location":"merlin6/how-to-use-merlin/using-modules/#when-to-request-for-new-pmodules-packages","title":"When to request for new PModules packages","text":""},{"location":"merlin6/how-to-use-merlin/using-modules/#missing-software","title":"Missing software","text":"<p>If you don't find a specific software and you know from other people interesing on it, it can be installed in PModules. Please contact us and we will try to help with that. Deploying new software in PModules may take few days.</p> <p>Usually installation of new software are possible as long as few users will use it. If you are insterested in to maintain this software, please let us know.</p>"},{"location":"merlin6/how-to-use-merlin/using-modules/#missing-version","title":"Missing version","text":"<p>If the existing PModules versions for a specific package do not fit to your needs, is possible to ask for a new version.</p> <p>Usually installation of newer versions will be supported, as long as few users will use it. Installation of intermediate versions can be supported if this is strictly justified.</p>"},{"location":"merlin6/jupyterhub/jupyter-examples/","title":"Jupyter examples on merlin6","text":""},{"location":"merlin6/jupyterhub/jupyter-examples/#jupyter-examples-on-merlin6","title":"Jupyter examples on merlin6","text":"<p>These examples demonstrate the use of certain python libraries and modules in the merlin6 environment. They are provided to get you started fast. You can check out a repository of the examples from</p> <p>https://git.psi.ch/lsm-hpce/merlin6-jupyterhub-examples</p> <p>A number of standard data sets for the tutorials of the libraries are hosted on merlin6 centrally under <code>/data/project/general/public</code>, so you do not need to store them in your user space.</p>"},{"location":"merlin6/jupyterhub/jupyter-examples/#dask","title":"Dask","text":"<p>Dask is a flexible library for parallel computing in Python. It provides the abstraction of a dask dataframe that can reside on multiple machines and can be manipulated by an API designed to be as close as possible to pandas. The example shows how to start up dask workers on merlin6 through slurm.</p> <ul> <li>Link to example</li> <li>The data sets for the dask tutorial are hosted at <code>/data/project/general/public/dask-tutorial</code>.</li> </ul>"},{"location":"merlin6/jupyterhub/jupyter-examples/#plotly","title":"Plotly","text":"<p>Plotly is an interactive open source plotting library</p> <ul> <li>Link to example</li> </ul>"},{"location":"merlin6/jupyterhub/jupyter-extensions/","title":"Jupyter Extensions","text":""},{"location":"merlin6/jupyterhub/jupyter-extensions/#jupyter-extensions","title":"Jupyter Extensions","text":""},{"location":"merlin6/jupyterhub/jupyter-extensions/#using-nbextensions-for-adding-features-to-your-notebook","title":"Using nbextensions for adding features to your notebook","text":"<p>There exist a number of useful contributed but unofficial extensions that add useful features to your notebooks.</p> <p>From the classic Notebook UI you can access the available extensions in a separate tab as displayed in the screenshot, below. You may have to unselect the disable configuration for nbextensions without explicit copatibility. The extensions we tested still worked fine with this jupyterhub version of 1.0.0.</p> <p></p>"},{"location":"merlin6/jupyterhub/jupyter-extensions/#extensions-for-working-with-large-notebooks","title":"Extensions for working with large notebooks","text":"<p>Especially the following extensions make working with larger notebooks easier</p> <ul> <li>Table of Contents: Displays a TOC on the left and you can also configure it to add and update a TOC at the head of the document.</li> <li>Collapsible Headings: allows you to fold all the cells below a heading</li> </ul> <p>It may also be interesting for you to explore the Jupytext server extension.</p>"},{"location":"merlin6/jupyterhub/jupyter-extensions/#variable-inspector","title":"Variable Inspector","text":"<p>The <code>variable inspector</code> extension provides a constantly updated window in which you can see the value and type of your notebook's variables.</p>"},{"location":"merlin6/jupyterhub/jupyterhub-trouble/","title":"Jupyterhub Troubleshooting","text":""},{"location":"merlin6/jupyterhub/jupyterhub-trouble/#jupyterhub-troubleshooting","title":"Jupyterhub Troubleshooting","text":"<p>In case of problems or requests, please either submit a PSI Service Now incident containing \"Merlin Jupyterhub\" as part of the subject, or contact us by mail through merlin-admins@lists.psi.ch.</p>"},{"location":"merlin6/jupyterhub/jupyterhub-trouble/#general-steps-for-troubleshooting","title":"General steps for troubleshooting","text":""},{"location":"merlin6/jupyterhub/jupyterhub-trouble/#investigate-the-slurm-output-file","title":"Investigate the Slurm output file","text":"<p>Your jupyterhub session runs as a normal batch job on the cluster, and each launch will create a slurm output file in your home directory named like <code>jupyterhub_batchspawner_{$JOBID}.log</code>, where the <code>$JOBID</code> part is the slurm job ID of your job. After a failed launch, investigate the contents of that file. An error message will usually be found towards the end of the file, often including a python backtrace.</p>"},{"location":"merlin6/jupyterhub/jupyterhub-trouble/#investigate-python-environment-interferences","title":"Investigate python environment interferences","text":"<p>Jupyterhub just runs a jupyter notebook executable as your user inside the batch job. A frequent source of errors consists of a user's local python environment definitions getting mixed up with the environment that jupyter needs to launch.</p> <ul> <li>setting PYTHONPATH inside of the ~/.bash_profile or any other startup script</li> <li>having installed packages to your local user area (e.g. using <code>pip install --user <some-package></code>). Such installation will interfere with the environment offered by the <code>module</code> system on our cluster (based on anaconda). You can list such packages by executing <code>pip list user</code>. They are usually located in <code>~/.local/lib/pythonX.Y/...</code>.</li> </ul> <p>You can investigate the launching of a notebook interactively, by logging in to Merlin6 and running a jupyter command in the correct environment.</p> Bash<pre><code>module use unstable\nmodule load anaconda/2019.07\nconda activate jupyterhub-1.0.0_py36\njupyter --paths\n</code></pre>"},{"location":"merlin6/jupyterhub/jupyterhub-trouble/#known-problems-and-workarounds","title":"Known Problems and workarounds","text":""},{"location":"merlin6/jupyterhub/jupyterhub-trouble/#spawner-times-out","title":"Spawner times out","text":"<p>If the cluster is very full, it may be difficult to launch a session. We always reserve some slots for interactive Jupyterhub use, but it may be that these slots have been taken or that the resources you requested are currently not available.</p> <p>Inside of a Merlin6 terminal shell, you can run the standard commands like <code>sinfo</code> and <code>squeue</code> to get an overview of how full the cluster is.</p>"},{"location":"merlin6/jupyterhub/jupyterhub-trouble/#your-user-environment-is-not-among-the-kernels-offered-for-choice","title":"Your user environment is not among the kernels offered for choice","text":"<p>Refer to our documentation about using your own custom made environments with jupyterhub.</p>"},{"location":"merlin6/jupyterhub/jupyterhub-trouble/#cannot-save-notebook-xsrf-argument-missing","title":"Cannot save notebook - xsrf argument missing","text":"<p>You cannot save your notebook anymore and you get this error:</p> Text Only<pre><code>'_xsrf' argument missing from POST\n</code></pre> <p>This issue occurs very seldomly. There exists the following workaround:</p> <p>Go to the jupyterhub file browsing window and just open another notebook using the same kernel in another browser window. The issue should then go away. For more information refer to this github thread</p>"},{"location":"merlin6/jupyterhub/jupyterhub/","title":"Jupyterhub on Merlin","text":""},{"location":"merlin6/jupyterhub/jupyterhub/#jupyterhub-on-merlin","title":"Jupyterhub on Merlin","text":"<p>Jupyterhub provides jupyter notebooks that are launched on cluster nodes of merlin and can be accessed through a web portal.</p>"},{"location":"merlin6/jupyterhub/jupyterhub/#accessing-jupyterhub-and-launching-a-session","title":"Accessing Jupyterhub and launching a session","text":"<p>The service is available inside of PSI (or through a VPN connection) at</p> <p>https://merlin-jupyter.psi.ch:8000</p> <ol> <li>Login: You will be presented with a Login web page for authenticating with your PSI account.</li> <li>Spawn job: The Spawner Options page allows you to specify the properties (Slurm partition, running time,...) of the batch jobs that will be running your jupyter notebook. Once you click on the <code>Spawn</code> button, your job will be sent to the Slurm batch system. If the cluster is not currently overloaded and the resources you requested are available, your job will usually start within 30 seconds.</li> </ol>"},{"location":"merlin6/jupyterhub/jupyterhub/#jupyter-software-environments-running-different-kernels","title":"Jupyter software environments - running different kernels","text":"<p>Your notebooks can run within different software environments which are offered by a number of available Jupyter kernels.</p> <p>E.g. in this test installation we provide two environments targeted at data science</p> <ul> <li>tensorflow-1.13.1_py37: contains Tensorflow, Keras, scikit-learn, Pandas, numpy, dask, and dependencies. Stable</li> <li>talos_py36: also contains the Talos package. This environment is experimental and subject to updates and changes.</li> </ul> <p>When you create a new notebook you will be asked to specify which kernel you want to use. It is also possible to switch the kernel of a running notebook, but you will lose the state of the current kernel, so you will have to recalculate the notebook cells with this new kernel.</p> <p>These environments are also available for standard work in a shell session. You can activate an environment in a normal merlin terminal session by using the <code>module</code> (q.v. using Pmodules) command to load anaconda python, and from there using the <code>conda</code> command to switch to the desired environment:</p> Bash<pre><code>module use unstable\nmodule load anaconda/2019.07\nconda activate tensorflow-1.13.1_py36\n</code></pre> <p>When the <code>anaconda</code> module has been loaded, you can list the available environments by executing:</p> Bash<pre><code>conda info -e\n</code></pre> <p>You can get more info on the use of the <code>conda</code> package management tool at its official documentation site.</p>"},{"location":"merlin6/jupyterhub/jupyterhub/#using-your-own-custom-made-environments-with-jupyterhub","title":"Using your own custom made environments with jupyterhub","text":"<p>Python environments can take up a lot of space due to the many dependencies that will be installed. You should always install your extra environments to the data area belonging to your account, e.g. <code>/data/user/${YOUR-USERNAME}/conda-envs</code></p> <p>In order for jupyterhub (and jupyter in general) to recognize the provided environment as a valid kernel, make sure that you include the <code>nb_conda_kernels</code> package in your environment. This package provides the necessary activation and the dependencies.</p> <p>Example:</p> Bash<pre><code>conda create -c conda-forge -p /data/user/${USER}/conda-envs/my-test-env python=3.7 nb_conda_kernels\n</code></pre> <p>After this, your new kernel will be visible as <code>my-test-env</code> inside of your jupyterhub session.</p>"},{"location":"merlin6/jupyterhub/jupyterhub/#requesting-additional-resources","title":"Requesting additional resources","text":"<p>The Spawner Options page covers the most common options. These are used to create a submission script for the jupyterhub job and submit it to the slurm queue. Additional customization can be implemented using the 'Optional user defined line to be added to the batch launcher script' option. This line is added to the submission script at the end of other <code>#SBATCH</code> lines. Parameters can be passed to SLURM by starting the line with <code>#SBATCH</code>, like in Running Slurm Scripts. Some ideas:</p>"},{"location":"merlin6/jupyterhub/jupyterhub/#request-additional-memory","title":"Request additional memory","text":"Bash<pre><code>#SBATCH --mem=100G\n</code></pre>"},{"location":"merlin6/jupyterhub/jupyterhub/#request-multiple-gpus-gpu-partition-only","title":"Request multiple GPUs (gpu partition only)","text":"Bash<pre><code>#SBATCH --gpus=2\n</code></pre>"},{"location":"merlin6/jupyterhub/jupyterhub/#log-additional-information","title":"Log additional information","text":"Bash<pre><code>hostname; date; echo $USER\n</code></pre> <p>Output is found in <code>~/jupyterhub_batchspawner_<jobid>.log</code>.</p>"},{"location":"merlin6/jupyterhub/jupyterhub/#contact","title":"Contact","text":"<p>In case of problems or requests, please either submit a PSI Service Now incident containing \"Merlin Jupyterhub\" as part of the subject, or contact us by mail through merlin-admins@lists.psi.ch.</p>"},{"location":"merlin6/jupyterhub/jupyterlab/","title":"Jupyterlab User interface","text":""},{"location":"merlin6/jupyterhub/jupyterlab/#jupyterlab-user-interface","title":"Jupyterlab User interface","text":""},{"location":"merlin6/jupyterhub/jupyterlab/#testing-out-jupyterlab","title":"Testing out Jupyterlab","text":"<p>Jupyterlab is a new interface to interact with your Jupyter notebooks. However, it is in very active development and undergoing constant changes. You can read about its features on the official website.</p> <p>You can test it out on our server by using the following kind of URL, where <code>$YOUR-USER</code> must be replaced by your PSI username. You must already have an active session on the jupyterhub.</p> Text Only<pre><code>https://merlin-jupyter.psi.ch:8000/user/$YOUR-USER/lab\n</code></pre>"},{"location":"merlin6/jupyterhub/jupyterlab/#switching-to-the-classic-notebook-user-interface","title":"Switching to the Classic Notebook user interface","text":"<p>You can switch to the classical notebook UI by using the \"Launch Classic Notebook\" command from the left sidebar of JupyterLab.</p> <p></p>"},{"location":"merlin6/jupyterhub/jupyterlab/#jupyterlab-does-not-support-the-older-nbextensions","title":"Jupyterlab does not support the older nbextensions","text":"<p>These regrettably are not yet supported from within the JupyterLab UI, but you can activate them through the Classic Notebook interface (see above)</p>"},{"location":"merlin6/jupyterhub/jupytext/","title":"Jupytext - efficient editing","text":""},{"location":"merlin6/jupyterhub/jupytext/#jupytext-efficient-editing","title":"Jupytext - efficient editing","text":"<p>Jupytext is a Jupyter serverextension that allows creating a text file from a notebook that can be kept in sync with it with the aim of using more efficient editors or IDEs on it. The file can be created in a number of formats, e.g. markdown, .py (light Script), and others. <code>Jupytext</code> will keep the both the notebook and this paired* file in sync: If you save the paired file, changes will be carried over into the notebook, and vice versa. This pairing will persist also in new sessions of your notebook until you explicitely remove it again.</p> <p>The paired file contains only the cell contents and not the output. Therefore it also is much better suited for revision control, since the differences between versions are limited to the cells and these file formats yield more meaningful text differences than the default notebook storage format.</p>"},{"location":"merlin6/jupyterhub/jupytext/#creating-a-paired-file-in-python-format-for-efficient-refactoring","title":"Creating a paired file in python format for efficient refactoring","text":"<p>From your notebook, go to the <code>file</code> menu and navigate to the <code>jupytext</code> submenu. Select the light script pairing option. This will create a <code>*.py</code> file version with the same basename as your notebook file.</p> <p></p> <p>You can edit that file separately in your favourite python editor. The markdown text parts will be conserved in the file in the form of python comments.</p> <p>When you save the file and do a browser page reload of your jupyter notebook, you will see all the changes carried over into your jupyter notebook.</p>"},{"location":"merlin6/jupyterhub/jupytext/#creating-a-paired-file-in-mardown-format-for-efficient-text-authoring","title":"Creating a paired file in mardown format for efficient text authoring","text":"<p>If you want to efficiently work on the descriptive text base of your notebook, just pair it using the <code>Pair notebook with Markdown</code> menu item and edit the generated <code>*.md</code> file with your favourite Markdown editor.</p>"},{"location":"merlin6/jupyterhub/jupytext/#disable-autosaving-when-working-on-the-paired-file","title":"Disable autosaving when working on the paired file","text":"<p>Your notebooks usually auto save every 2 min (default). Turn this feature off when working with the paired file. Otherwise Jupyter will continue to save the state while you are editing the paired file, and the changes will be synced to the disk version of the paired file. You can disable the autosave by unchecking the <code>Autosave notebook</code> menu item in the Juytext menu (see above image).</p>"},{"location":"merlin6/jupyterhub/jupytext/#further-information","title":"Further information","text":"<p>Please refer to:</p> <ul> <li>the Jupytext FAQ</li> <li>the Jupytext documentation</li> </ul>"},{"location":"merlin6/quick-start-guide/accessing-interactive-nodes/","title":"Accessing Interactive Nodes","text":""},{"location":"merlin6/quick-start-guide/accessing-interactive-nodes/#accessing-interactive-nodes","title":"Accessing Interactive Nodes","text":""},{"location":"merlin6/quick-start-guide/accessing-interactive-nodes/#ssh-access","title":"SSH Access","text":"<p>For interactive command shell access, use an SSH client. We recommend to activate SSH's X11 forwarding to allow you to use graphical applications (e.g. a text editor, but for more performant graphical access, refer to the sections below). X applications are supported in the login nodes and X11 forwarding can be used for those users who have properly configured X11 support in their desktops, however:</p> <ul> <li>Merlin6 administrators do not offer support for user desktop configuration (Windows, MacOS, Linux).<ul> <li>Hence, Merlin6 administrators do not offer official support for X11 client setup.</li> <li>Nevertheless, a generic guide for X11 client setup (Linux, Windows and MacOS) is provided below.</li> </ul> </li> <li>PSI desktop configuration issues must be addressed through PSI Service Now as an Incident Request.<ul> <li>Ticket will be redirected to the corresponding Desktop support group (Windows, Linux).</li> </ul> </li> </ul>"},{"location":"merlin6/quick-start-guide/accessing-interactive-nodes/#accessing-from-a-linux-client","title":"Accessing from a Linux client","text":"<p>Refer to {How To Use Merlin -> Accessing from Linux Clients} for Linux SSH client and X11 configuration.</p>"},{"location":"merlin6/quick-start-guide/accessing-interactive-nodes/#accessing-from-a-windows-client","title":"Accessing from a Windows client","text":"<p>Refer to {How To Use Merlin -> Accessing from Windows Clients} for Windows SSH client and X11 configuration.</p>"},{"location":"merlin6/quick-start-guide/accessing-interactive-nodes/#accessing-from-a-macos-client","title":"Accessing from a MacOS client","text":"<p>Refer to {How To Use Merlin -> Accessing from MacOS Clients} for MacOS SSH client and X11 configuration.</p>"},{"location":"merlin6/quick-start-guide/accessing-interactive-nodes/#nomachine-remote-desktop-access","title":"NoMachine Remote Desktop Access","text":"<p>X applications are supported in the login nodes and can run efficiently through a NoMachine client. This is the officially supported way to run more demanding X applications on Merlin6.</p> <ul> <li>For PSI Windows workstations, this can be installed from the Software Kiosk as 'NX Client'. If you have difficulties installing, please request support through PSI Service Now as an Incident Request.</li> <li>For other workstations The client software can be downloaded from the Nomachine Website.</li> </ul>"},{"location":"merlin6/quick-start-guide/accessing-interactive-nodes/#configuring-nomachine","title":"Configuring NoMachine","text":"<p>Refer to {How To Use Merlin -> Remote Desktop Access} for further instructions of how to configure the NoMachine client and how to access it from PSI and from outside PSI.</p>"},{"location":"merlin6/quick-start-guide/accessing-interactive-nodes/#login-nodes-hardware-description","title":"Login nodes hardware description","text":"<p>The Merlin6 login nodes are the official machines for accessing the recources of Merlin6. From these machines, users can submit jobs to the Slurm batch system as well as visualize or compile their software.</p> <p>The Merlin6 login nodes are the following:</p> Hostname SSH NoMachine #cores #Threads CPU Memory Scratch Scratch Mountpoint merlin-l-001.psi.ch yes yes 2 x 22 2 Intel Xeon Gold 6152 384GB 1.8TB NVMe <code>/scratch</code> merlin-l-002.psi.ch yes yes 2 x 22 2 Intel Xeon Gold 6142 384GB 1.8TB NVMe <code>/scratch</code>"},{"location":"merlin6/quick-start-guide/accessing-slurm/","title":"Accessing Slurm Cluster","text":""},{"location":"merlin6/quick-start-guide/accessing-slurm/#accessing-slurm-cluster","title":"Accessing Slurm Cluster","text":""},{"location":"merlin6/quick-start-guide/accessing-slurm/#the-merlin-slurm-clusters","title":"The Merlin Slurm clusters","text":"<p>Merlin contains a multi-cluster setup, where multiple Slurm clusters coexist under the same umbrella. It basically contains the following clusters:</p> <ul> <li>The Merlin6 Slurm CPU cluster, which is called <code>merlin6</code>.</li> <li>The Merlin6 Slurm GPU cluster, which is called <code>gmerlin6</code>.</li> </ul>"},{"location":"merlin6/quick-start-guide/accessing-slurm/#accessing-the-slurm-clusters","title":"Accessing the Slurm clusters","text":"<p>Any job submission must be performed from a Merlin login node. Please refer to the Accessing the Interactive Nodes documentation for further information about how to access the cluster.</p> <p>In addition, any job must be submitted from a high performance storage area visible by the login nodes and by the computing nodes. For this, the possible storage areas are the following:</p> <ul> <li><code>/data/user</code></li> <li><code>/data/project</code></li> <li><code>/shared-scratch</code></li> </ul> <p>Please, avoid using <code>/psi/home</code> directories for submitting jobs.</p>"},{"location":"merlin6/quick-start-guide/accessing-slurm/#merlin6-cpu-cluster-access","title":"Merlin6 CPU cluster access","text":"<p>The Merlin6 CPU cluster (<code>merlin6</code>) is the default cluster configured in the login nodes. Any job submission will use by default this cluster, unless the option <code>--cluster</code> is specified with another of the existing clusters.</p> <p>For further information about how to use this cluster, please visit: Merlin6 CPU Slurm Cluster documentation.</p>"},{"location":"merlin6/quick-start-guide/accessing-slurm/#merlin6-gpu-cluster-access","title":"Merlin6 GPU cluster access","text":"<p>The Merlin6 GPU cluster (<code>gmerlin6</code>) is visible from the login nodes. However, to submit jobs to this cluster, one needs to specify the option <code>--cluster=gmerlin6</code> when submitting a job or allocation.</p> <p>For further information about how to use this cluster, please visit: Merlin6 GPU Slurm Cluster documentation.</p>"},{"location":"merlin6/quick-start-guide/code-of-conduct/","title":"Code Of Conduct","text":""},{"location":"merlin6/quick-start-guide/code-of-conduct/#code-of-conduct","title":"Code Of Conduct","text":""},{"location":"merlin6/quick-start-guide/code-of-conduct/#the-basic-principle","title":"The Basic principle","text":"<p>The basic principle is courtesy and consideration for other users.</p> <ul> <li>Merlin6 is a system shared by many users, therefore you are kindly requested to apply common courtesy in using its resources. Please follow our guidelines which aim at providing and maintaining an efficient compute environment for all our users.</li> <li>Basic shell programming skills are an essential requirement in a Linux/UNIX HPC cluster environment; a proficiency in shell programming is greatly beneficial.</li> </ul>"},{"location":"merlin6/quick-start-guide/code-of-conduct/#interactive-nodes","title":"Interactive nodes","text":"<ul> <li>The interactive nodes (also known as login nodes) are for development and quick testing:<ul> <li>It is strictly forbidden to run production jobs on the login nodes. All production jobs must be submitted to the batch system.</li> <li>It is forbidden to run long processes occupying big parts of a login node's resources.</li> <li>According to the previous rules, misbehaving running processes will have to be killed. in order to keep the system responsive for other users.</li> </ul> </li> </ul>"},{"location":"merlin6/quick-start-guide/code-of-conduct/#batch-system","title":"Batch system","text":"<ul> <li>Make sure that no broken or run-away processes are left when your job is done. Keep the process space clean on all nodes.</li> <li> <p>During the runtime of a job, it is mandatory to use the <code>/scratch</code> and <code>/shared-scratch</code> partitions for temporary data:</p> <ul> <li>It is forbidden to use the <code>/data/user</code>, <code>/data/project</code> or <code>/psi/home/</code> for that purpose.</li> <li>Always remove files you do not need any more (e.g. core dumps, temporary files) as early as possible. Keep the disk space clean on all nodes.</li> <li>Prefer <code>/scratch</code> over <code>/shared-scratch</code> and use the latter only when you require the temporary files to be visible from multiple nodes.</li> </ul> </li> <li> <p>Read the description in Merlin6 directory structure for learning about the correct usage of each partition type.</p> </li> </ul>"},{"location":"merlin6/quick-start-guide/code-of-conduct/#user-and-project-data","title":"User and project data","text":"<ul> <li> <p>Users are responsible for backing up their own data. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).</p> <ul> <li><code>/psi/home</code>, as this contains a small amount of data, is the only directory where we can provide daily snapshots for one week. This can be found in the following directory <code>/psi/home/.snapshot/</code></li> </ul> </li> <li> <p>When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.</p> </li> </ul> <p>Warning</p> <p>When a user leaves PSI and his account has been removed, her storage space in Merlin may be recycled. Hence, when a user leaves PSI, she, her supervisor or team must ensure that the data is backed up to an external storage</p>"},{"location":"merlin6/quick-start-guide/code-of-conduct/#system-administrator-rights","title":"System Administrator Rights","text":"<ul> <li>The system administrator has the right to temporarily block the access to Merlin6 for an account violating the Code of Conduct in order to maintain the efficiency and stability of the system.<ul> <li>Repetitive violations by the same user will be escalated to the user's supervisor.</li> </ul> </li> <li>The system administrator has the right to delete files in the scratch directories<ul> <li>after a job, if the job failed to clean up its files.</li> <li>during the job in order to prevent a job from destabilizing a node or multiple nodes.</li> </ul> </li> <li>The system administrator has the right to kill any misbehaving running processes.</li> </ul>"},{"location":"merlin6/quick-start-guide/introduction/","title":"Introduction","text":""},{"location":"merlin6/quick-start-guide/introduction/#introduction","title":"Introduction","text":""},{"location":"merlin6/quick-start-guide/introduction/#the-merlin-local-hpc-cluster","title":"The Merlin local HPC cluster","text":"<p>Historically, the local HPC clusters at PSI were named Merlin. Over the years, multiple generations of Merlin have been deployed.</p> <p>Access to the different Slurm clusters is possible from the Merlin login nodes, which can be accessed through the SSH protocol or the NoMachine (NX) service.</p> <p>The following image shows the Slurm architecture design for the Merlin5 & Merlin6 (CPU & GPU) clusters:</p> <p></p>"},{"location":"merlin6/quick-start-guide/introduction/#merlin6","title":"Merlin6","text":"<p>Merlin6 is a the official PSI Local HPC cluster for development and mission-critical applications that has been built in 2019. It replaces the Merlin5 cluster.</p> <p>Merlin6 is designed to be extensible, so is technically possible to add more compute nodes and cluster storage without significant increase of the costs of the manpower and the operations.</p> <p>Merlin6 contains all the main services needed for running cluster, including login nodes, storage, computing nodes and other subservices, connected to the central PSI IT infrastructure.</p>"},{"location":"merlin6/quick-start-guide/introduction/#cpu-and-gpu-slurm-clusters","title":"CPU and GPU Slurm clusters","text":"<p>The Merlin6 computing nodes are mostly based on CPU resources. However, in the past it also contained a small amount of GPU-based resources, which were mostly used by the BIO Division and by Deep Leaning projects. Today, only Gwendolen is available on <code>gmerlin6</code>.</p> <p>These computational resources are split into two different Slurm clusters:</p> <ul> <li>The Merlin6 CPU nodes are in a dedicated Slurm cluster called <code>merlin6</code>.<ul> <li>This is the default Slurm cluster configured in the login nodes: any job submitted without the option <code>--cluster</code> will be submited to this cluster.</li> </ul> </li> <li>The Merlin6 GPU resources are in a dedicated Slurm cluster called <code>gmerlin6</code>.<ul> <li>Users submitting to the <code>gmerlin6</code> GPU cluster need to specify the option <code>--cluster=gmerlin6</code>.</li> </ul> </li> </ul>"},{"location":"merlin6/quick-start-guide/requesting-accounts/","title":"Requesting Merlin Accounts","text":""},{"location":"merlin6/quick-start-guide/requesting-accounts/#requesting-merlin-accounts","title":"Requesting Merlin Accounts","text":""},{"location":"merlin6/quick-start-guide/requesting-accounts/#requesting-access-to-merlin6","title":"Requesting Access to Merlin6","text":"<p>In the past, access to the public Merlin6 cluster was regulated via the <code>svc-cluster_merlin6</code> group, which is no longer in use. Merlin6 has become a private cluster, and to request access, users must now be members of one of the Unix groups authorized to use it, including Gwendolen.</p> <p>Requests for Merlin6 access must be submitted using the Request Linux Group Membership form, available in the PSI ServiceNow Service Catalog. Access is granted by requesting membership in a Unix group that is permitted to use the cluster.</p> <p></p>"},{"location":"merlin6/quick-start-guide/requesting-accounts/#mandatory-fields","title":"Mandatory fields","text":"<p>The following fields must be completed:</p> <ul> <li>Order Access for user: Defaults to the currently logged-in user. Access may also be requested on behalf of another user.</li> <li>Request membership for group: Select a valid Unix group that has access to Merlin6.</li> <li>Justification: Provide a brief explanation of why access to this group is required.</li> </ul> <p>Once the request is submitted, the corresponding group administrators will review and approve it as soon as possible (typically within a few working hours). After approval, it may take up to 30 minutes for the account to be fully configured and access to become effective.</p>"},{"location":"merlin6/quick-start-guide/requesting-accounts/#further-documentation","title":"Further documentation","text":"<p>Additional information is available in the Linux Central Documentation:</p> <ul> <li>Unix Group / Group Management for users</li> <li>Unix Group / Group Management for group managers</li> </ul>"},{"location":"merlin6/quick-start-guide/requesting-projects/","title":"Requesting a Merlin Project","text":""},{"location":"merlin6/quick-start-guide/requesting-projects/#requesting-a-merlin-project","title":"Requesting a Merlin Project","text":"<p>A project owns its own storage area in Merlin, which can be accessed by other group members.</p> <p>Projects can receive a higher storage quota than user areas and should be the primary way of organizing bigger storage requirements in a multi-user collaboration.</p> <p>Access to a project's directories is governed by project members belonging to a common Unix group. You may use an existing Unix group or you may have a new Unix group created especially for the project. The project responsible will be the owner of the Unix group (this is important)!</p> <p>This document explains how to request new Unix group, to request membership for existing groups, and the procedure for requesting a Merlin project.</p>"},{"location":"merlin6/quick-start-guide/requesting-projects/#about-unix-groups","title":"About Unix groups","text":"<p>Before requesting a Merlin project, it is important to have a Unix group that can be used to grant access to it to different members of the project.</p> <p>Unix groups in the PSI Active Directory (which is the PSI central database containing user and group information, and more) are defined by the <code>unx-</code> prefix, followed by a name. In general, PSI employees working on Linux systems (including HPC clusters, like Merlin) can request for a non-existing Unix group, and can become responsible for managing it. In addition, a list of administrators can be set. The administrators, together with the group manager, can approve or deny membership requests. Further information about this topic is covered in the Linux Documentation - Services Admin Guides: Unix Groups / Group Management, managed by the Central Linux Team.</p> <p>To gran access to specific Merlin project directories, some users may require to be added to some specific Unix groups:</p> <ul> <li>Each Merlin project (i.e. <code>/data/project/{bio|general}/$projectname</code>) or experiment (i.e. <code>/data/experiment/$experimentname</code>) directory has access restricted by ownership and group membership (with a very few exceptions allowing public access).</li> <li>Users requiring access to a specific restricted project or experiment directory have to request membership for the corresponding Unix group owning the directory.</li> </ul>"},{"location":"merlin6/quick-start-guide/requesting-projects/#requesting-a-new-unix-group","title":"Requesting a new Unix group","text":"<p>If you need a new Unix group to be created, you need to first get this group through a separate PSI Service Now ticket. Please use the following template. You can also specify the login names of the initial group members and the owner of the group. The owner of the group is the person who will be allowed to modify the group.</p> <ul> <li>Please open an Incident Request with subject:</li> </ul> Text Only<pre><code>Subject: Request for new unix group xxxx\n</code></pre> <ul> <li>and base the text field of the request on this template</li> </ul> Text Only<pre><code>Dear HelpDesk\n\nI would like to request a new unix group.\n\nUnix Group Name: unx-xxxxx\nInitial Group Members: xxxxx, yyyyy, zzzzz, ...\nGroup Owner: xxxxx\nGroup Administrators: aaaaa, bbbbb, ccccc, ....\n\nBest regards,\n</code></pre>"},{"location":"merlin6/quick-start-guide/requesting-projects/#requesting-unix-group-membership","title":"Requesting Unix group membership","text":"<p>Existing Merlin projects have already a Unix group assigned. To have access to a project, users must belong to the proper Unix group owning that project. Supervisors should inform new users which extra groups are needed for their project(s). If this information is not known, one can check the permissions for that directory. In example:</p> Bash<pre><code>(base) \u2744 [caubet_m@merlin-l-001:/data/user/caubet_m]# ls -ltrhd /data/project/general/$projectname\n(base) \u2744 [caubet_m@merlin-l-001:/data/user/caubet_m]# ls -ltrhd /data/project/bio/$projectname\n</code></pre> <p>Requesting membership for a specific Unix group has to be done with the corresponding Request Linux Group Membership form, available in the PSI Service Now Service Catalog.</p> <p></p> <p>Once submitted, the responsible of the Unix group has to approve the request.</p> <p>Important note: Requesting access to specific Unix Groups will require validation from the responsible of the Unix Group. If you ask for inclusion in many groups it may take longer, since the fulfillment of the request will depend on more people.</p> <p>Further information can be found in the Linux Documentation - Services User guide: Unix Groups / Group Management</p>"},{"location":"merlin6/quick-start-guide/requesting-projects/#managing-unix-groups","title":"Managing Unix Groups","text":"<p>Other administration operations on Unix Groups it's mainly covered in the Linux Documentation - Services Admin Guides: Unix Groups / Group Management, managed by the Central Linux Team.</p>"},{"location":"merlin6/quick-start-guide/requesting-projects/#requesting-a-merlin-project_1","title":"Requesting a Merlin project","text":"<p>Once a Unix group is available, a Merlin project can be requested. To request a project, please provide the following information in a PSI Service Now ticket</p> <ul> <li>Please open an Incident Request with subject:</li> </ul> Text Only<pre><code>Subject: [Merlin6] Project Request for project name xxxxxx\n</code></pre> <ul> <li>and base the text field of the request on this template</li> </ul> Text Only<pre><code>Dear HelpDesk\n\nI would like to request a new Merlin6 project.\n\nProject Name: xxxxx\nUnixGroup: xxxxx # Must be an existing Unix Group\n\nThe project responsible is the Owner of the Unix Group.\nIf you need a storage quota exceeding the defaults, please provide a description\nand motivation for the higher storage needs:\n\nStorage Quota: 1TB with a maximum of 1M Files\nReason: (None for default 1TB/1M)\n\nBest regards,\n</code></pre> <p>The default storage quota for a project is 1TB (with a maximal Number of Files of 1M). If you need a larger assignment, you need to request this and provide a description of your storage needs.</p>"},{"location":"merlin6/quick-start-guide/requesting-projects/#further-documentation","title":"Further documentation","text":"<p>Further information it's also available in the Linux Central Documentation:</p> <ul> <li>Unix Group / Group Management for users</li> <li>Unix Group / Group Management for group managers</li> </ul> <p>Special thanks to the Linux Central Team and AIT to make this possible.</p>"},{"location":"merlin6/slurm-general-docs/interactive-jobs/","title":"Running Interactive Jobs","text":""},{"location":"merlin6/slurm-general-docs/interactive-jobs/#running-interactive-jobs","title":"Running Interactive Jobs","text":""},{"location":"merlin6/slurm-general-docs/interactive-jobs/#running-interactive-jobs_1","title":"Running interactive jobs","text":"<p>There are two different ways for running interactive jobs in Slurm. This is possible by using the <code>salloc</code> and <code>srun</code> commands:</p> <ul> <li><code>salloc</code>: to obtain a Slurm job allocation (a set of nodes), execute command(s), and then release the allocation when the command is finished.</li> <li><code>srun</code>: is used for running parallel tasks.</li> </ul>"},{"location":"merlin6/slurm-general-docs/interactive-jobs/#srun","title":"srun","text":"<p>Is run is used to run parallel jobs in the batch system. It can be used within a batch script (which can be run with <code>sbatch</code>), or within a job allocation (which can be run with <code>salloc</code>). Also, it can be used as a direct command (in example, from the login nodes).</p> <p>When used inside a batch script or during a job allocation, <code>srun</code> is constricted to the amount of resources allocated by the <code>sbatch</code>/<code>salloc</code> commands. In <code>sbatch</code>, usually these resources are defined inside the batch script with the format <code>#SBATCH <option>=<value></code>. In other words, if you define in your batch script or allocation 88 tasks (and 1 thread / core) and 2 nodes, <code>srun</code> is constricted to these amount of resources (you can use less, but never exceed those limits).</p> <p>When used from the login node, usually is used to run a specific command or software in an interactive way. <code>srun</code> is a blocking process (it will block bash prompt until the <code>srun</code> command finishes, unless you run it in background with <code>&</code>). This can be very useful to run interactive software which pops up a Window and then submits jobs or run sub-tasks in the background (in example, Relion, cisTEM, etc.)</p> <p>Refer to <code>man srun</code> for exploring all possible options for that command.</p> Running 'hostname' command on 3 nodes, using 2 cores (1 task/core) per node Bash Session<pre><code>(base) [caubet_m@merlin-l-001 ~]$ srun --clusters=merlin6 --ntasks=6 --ntasks-per-node=2 --nodes=3 hostname\nsrun: job 135088230 queued and waiting for resources\nsrun: job 135088230 has been allocated resources\nmerlin-c-102.psi.ch\nmerlin-c-102.psi.ch\nmerlin-c-101.psi.ch\nmerlin-c-101.psi.ch\nmerlin-c-103.psi.ch\nmerlin-c-103.psi.ch\n</code></pre>"},{"location":"merlin6/slurm-general-docs/interactive-jobs/#salloc","title":"salloc","text":"<p><code>salloc</code> is used to obtain a Slurm job allocation (a set of nodes). Once job is allocated, users are able to execute interactive command(s). Once finished (<code>exit</code> or <code>Ctrl+D</code>), the allocation is released. <code>salloc</code> is a blocking command, it is, command will be blocked until the requested resources are allocated.</p> <p>When running <code>salloc</code>, once the resources are allocated, by default the user will get a new shell on one of the allocated resources (if a user has requested few nodes, it will prompt a new shell on the first allocated node). However, this behaviour can be changed by adding a shell (<code>$SHELL</code>) at the end of the <code>salloc</code> command. In example:</p> Bash<pre><code># Typical 'salloc' call\n# - Same as running:\n# 'salloc --clusters=merlin6 -N 2 -n 2 srun -n1 -N1 --mem-per-cpu=0 --gres=gpu:0 --pty --preserve-env --mpi=none $SHELL'\nsalloc --clusters=merlin6 -N 2 -n 2\n\n# Custom 'salloc' call\n# - $SHELL will open a local shell on the login node from where ``salloc`` is running\nsalloc --clusters=merlin6 -N 2 -n 2 $SHELL\n</code></pre> Allocating 2 cores (1 task/core) in 2 nodes (1 core/node) - default Bash Session<pre><code>(base) [caubet_m@merlin-l-001 ~]$ salloc --clusters=merlin6 --ntasks=2 --nodes=2\nsalloc: Pending job allocation 135171306\nsalloc: job 135171306 queued and waiting for resources\nsalloc: job 135171306 has been allocated resources\nsalloc: Granted job allocation 135171306\n\n(base) [caubet_m@merlin-c-213 ~]$ srun hostname\nmerlin-c-213.psi.ch\nmerlin-c-214.psi.ch\n\n(base) [caubet_m@merlin-c-213 ~]$ exit\nexit\nsalloc: Relinquishing job allocation 135171306\n\n(base) [caubet_m@merlin-l-001 ~]$ salloc --clusters=merlin6 -N 2 -n 2 srun -n1 -N1 --mem-per-cpu=0 --gres=gpu:0 --pty --preserve-env --mpi=none $SHELL\nsalloc: Pending job allocation 135171342\nsalloc: job 135171342 queued and waiting for resources\nsalloc: job 135171342 has been allocated resources\nsalloc: Granted job allocation 135171342\n\n(base) [caubet_m@merlin-c-021 ~]$ srun hostname\nmerlin-c-021.psi.ch\nmerlin-c-022.psi.ch\n\n(base) [caubet_m@merlin-c-021 ~]$ exit\nexit\nsalloc: Relinquishing job allocation 135171342\n</code></pre> Allocating 2 cores (1 task/core) in 2 nodes (1 core/node) - <code>$SHELL</code> Bash Session<pre><code>(base) [caubet_m@merlin-export-01 ~]$ salloc --clusters=merlin6 --ntasks=2 --nodes=2 $SHELL\nsalloc: Pending job allocation 135171308\nsalloc: job 135171308 queued and waiting for resources\nsalloc: job 135171308 has been allocated resources\nsalloc: Granted job allocation 135171308\n\n(base) [caubet_m@merlin-export-01 ~]$ srun hostname\nmerlin-c-218.psi.ch\nmerlin-c-117.psi.ch\n\n(base) [caubet_m@merlin-export-01 ~]$ exit\nexit\nsalloc: Relinquishing job allocation 135171308\n</code></pre>"},{"location":"merlin6/slurm-general-docs/interactive-jobs/#running-interactive-jobs-with-x11-support","title":"Running interactive jobs with X11 support","text":""},{"location":"merlin6/slurm-general-docs/interactive-jobs/#requirements","title":"Requirements","text":""},{"location":"merlin6/slurm-general-docs/interactive-jobs/#graphical-access","title":"Graphical access","text":"<p>NoMachine is the official supported service for graphical access in the Merlin cluster. This service is running on the login nodes. Check the document {Accessing Merlin -> NoMachine} for details about how to connect to the NoMachine service in the Merlin cluster.</p> <p>For other non officially supported graphical access (X11 forwarding):</p> <ul> <li>For Linux clients, please follow {How To Use Merlin -> Accessing from Linux Clients}</li> <li>For Windows clients, please follow {How To Use Merlin -> Accessing from Windows Clients}</li> <li>For MacOS clients, please follow {How To Use Merlin -> Accessing from MacOS Clients}</li> </ul>"},{"location":"merlin6/slurm-general-docs/interactive-jobs/#srun-with-x11-support","title":"'srun' with x11 support","text":"<p>Merlin5 and Merlin6 clusters allow running any windows based applications. For that, you need to add the option <code>--x11</code> to the <code>srun</code> command. In example:</p> Bash<pre><code>srun --clusters=merlin6 --x11 xclock\n</code></pre> <p>will popup a X11 based clock.</p> <p>In the same manner, you can create a bash shell with x11 support. For doing that, you need to add the option <code>--pty</code> to the <code>srun --x11</code> command. Once resource is allocated, from there you can interactively run X11 and non-X11 based commands.</p> Bash<pre><code>srun --clusters=merlin6 --x11 --pty bash\n</code></pre> Using 'srun' with X11 support Bash Session<pre><code>(base) [caubet_m@merlin-l-001 ~]$ srun --clusters=merlin6 --x11 xclock\nsrun: job 135095591 queued and waiting for resources\nsrun: job 135095591 has been allocated resources\n\n(base) [caubet_m@merlin-l-001 ~]$\n\n(base) [caubet_m@merlin-l-001 ~]$ srun --clusters=merlin6 --x11 --pty bash\nsrun: job 135095592 queued and waiting for resources\nsrun: job 135095592 has been allocated resources\n\n(base) [caubet_m@merlin-c-205 ~]$ xclock\n\n(base) [caubet_m@merlin-c-205 ~]$ echo \"This was an example\"\nThis was an example\n\n(base) [caubet_m@merlin-c-205 ~]$ exit\nexit\n</code></pre>"},{"location":"merlin6/slurm-general-docs/interactive-jobs/#salloc-with-x11-support","title":"'salloc' with x11 support","text":"<p>Merlin5 and Merlin6 clusters allow running any windows based applications. For that, you need to add the option <code>--x11</code> to the <code>salloc</code> command. In example:</p> Bash<pre><code>salloc --clusters=merlin6 --x11 xclock\n</code></pre> <p>will popup a X11 based clock.</p> <p>In the same manner, you can create a bash shell with x11 support. For doing that, you need to add to run just <code>salloc --clusters=merlin6 --x11</code>. Once resource is allocated, from there you can interactively run X11 and non-X11 based commands.</p> Bash<pre><code>salloc --clusters=merlin6 --x11\n</code></pre> Using 'salloc' with X11 support examples Bash Session<pre><code>(base) [caubet_m@merlin-l-001 ~]$ salloc --clusters=merlin6 --x11 xclock\nsalloc: Pending job allocation 135171355\nsalloc: job 135171355 queued and waiting for resources\nsalloc: job 135171355 has been allocated resources\nsalloc: Granted job allocation 135171355\nsalloc: Relinquishing job allocation 135171355\n\n(base) [caubet_m@merlin-l-001 ~]$ salloc --clusters=merlin6 --x11\nsalloc: Pending job allocation 135171349\nsalloc: job 135171349 queued and waiting for resources\nsalloc: job 135171349 has been allocated resources\nsalloc: Granted job allocation 135171349\nsalloc: Waiting for resource configuration\nsalloc: Nodes merlin-c-117 are ready for job\n\n(base) [caubet_m@merlin-c-117 ~]$ xclock\n\n(base) [caubet_m@merlin-c-117 ~]$ echo \"This was an example\"\nThis was an example\n\n(base) [caubet_m@merlin-c-117 ~]$ exit\nexit\nsalloc: Relinquishing job allocation 135171349\n</code></pre>"},{"location":"merlin6/slurm-general-docs/monitoring/","title":"Monitoring","text":""},{"location":"merlin6/slurm-general-docs/monitoring/#monitoring","title":"Monitoring","text":""},{"location":"merlin6/slurm-general-docs/monitoring/#slurm-monitoring","title":"Slurm Monitoring","text":""},{"location":"merlin6/slurm-general-docs/monitoring/#job-status","title":"Job status","text":"<p>The status of submitted jobs can be check with the <code>squeue</code> command:</p> Bash<pre><code>squeue -u $username\n</code></pre> <p>Common statuses:</p> <ul> <li>merlin-*: Running on the specified host</li> <li>(Priority): Waiting in the queue</li> <li>(Resources): At the head of the queue, waiting for machines to become available</li> <li>(AssocGrpCpuLimit), (AssocGrpNodeLimit): Job would exceed per-user limitations on the number of simultaneous CPUs/Nodes. Use <code>scancel</code> to remove the job and resubmit with fewer resources, or else wait for your other jobs to finish.</li> <li>(PartitionNodeLimit): Exceeds all resources available on this partition. Run <code>scancel</code> and resubmit to a different partition (<code>-p</code>) or with fewer resources.</li> </ul> <p>Check in the man pages (<code>man squeue</code>) for all possible options for this command.</p> Using 'squeue' example Bash Session<pre><code># squeue -u feichtinger\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n134332544 general spawner- feichtin R 5-06:47:45 1 merlin-c-204\n134321376 general subm-tal feichtin R 5-22:27:59 1 merlin-c-204\n</code></pre>"},{"location":"merlin6/slurm-general-docs/monitoring/#partition-status","title":"Partition status","text":"<p>The status of the nodes and partitions (a.k.a. queues) can be seen with the <code>sinfo</code> command:</p> Bash<pre><code>sinfo\n</code></pre> <p>Check in the man pages (<code>man sinfo</code>) for all possible options for this command.</p> Using 'sinfo' example Bash Session<pre><code># sinfo -l\nThu Jan 23 16:34:49 2020\nPARTITION AVAIL TIMELIMIT JOB_SIZE ROOT OVERSUBS GROUPS NODES STATE NODELIST\ntest up 1-00:00:00 1-infinite no NO all 3 mixed merlin-c-[024,223-224]\ntest up 1-00:00:00 1-infinite no NO all 2 allocated merlin-c-[123-124]\ntest up 1-00:00:00 1-infinite no NO all 1 idle merlin-c-023\ngeneral* up 7-00:00:00 1-50 no NO all 6 mixed merlin-c-[007,204,207-209,219]\ngeneral* up 7-00:00:00 1-50 no NO all 57 allocated merlin-c-[001-005,008-020,101-122,201-203,205-206,210-218,220-222]\ngeneral* up 7-00:00:00 1-50 no NO all 3 idle merlin-c-[006,021-022]\ndaily up 1-00:00:00 1-60 no NO all 9 mixed merlin-c-[007,024,204,207-209,219,223-224]\ndaily up 1-00:00:00 1-60 no NO all 59 allocated merlin-c-[001-005,008-020,101-124,201-203,205-206,210-218,220-222]\ndaily up 1-00:00:00 1-60 no NO all 4 idle merlin-c-[006,021-023]\nhourly up 1:00:00 1-infinite no NO all 9 mixed merlin-c-[007,024,204,207-209,219,223-224]\nhourly up 1:00:00 1-infinite no NO all 59 allocated merlin-c-[001-005,008-020,101-124,201-203,205-206,210-218,220-222]\nhourly up 1:00:00 1-infinite no NO all 4 idle merlin-c-[006,021-023]\ngpu up 7-00:00:00 1-infinite no NO all 1 mixed merlin-g-007\ngpu up 7-00:00:00 1-infinite no NO all 8 allocated merlin-g-[001-006,008-009]\n</code></pre>"},{"location":"merlin6/slurm-general-docs/monitoring/#job-accounting","title":"Job accounting","text":"<p>Users can check detailed information of jobs (pending, running, completed, failed, etc.) with the <code>sacct</code> command. This command is very flexible and can provide a lot of information. For checking all the available options, please read <code>man sacct</code>. Below, we summarize some examples that can be useful for the users:</p> Bash<pre><code># Today jobs, basic summary\nsacct\n\n# Today jobs, with details\nsacct --long\n\n# Jobs from January 1, 2022, 12pm, with details\nsacct -S 2021-01-01T12:00:00 --long\n\n# Specific job accounting\nsacct --long -j $jobid\n\n# Jobs custom details, without steps (-X)\nsacct -X --format=User%20,JobID,Jobname,partition,state,time,submit,start,end,elapsed,AveRss,MaxRss,MaxRSSTask,MaxRSSNode%20,MaxVMSize,nnodes,ncpus,ntasks,reqcpus,totalcpu,reqmem,cluster,TimeLimit,TimeLimitRaw,cputime,nodelist%50,AllocTRES%80\n\n# Jobs custom details, with steps\nsacct --format=User%20,JobID,Jobname,partition,state,time,submit,start,end,elapsed,AveRss,MaxRss,MaxRSSTask,MaxRSSNode%20,MaxVMSize,nnodes,ncpus,ntasks,reqcpus,totalcpu,reqmem,cluster,TimeLimit,TimeLimitRaw,cputime,nodelist%50,AllocTRES%80\n</code></pre>"},{"location":"merlin6/slurm-general-docs/monitoring/#job-efficiency","title":"Job efficiency","text":"<p>Users can check how efficient are their jobs. For that, the <code>seff</code> command is available.</p> Bash<pre><code>seff $jobid\n</code></pre> Using 'seff' example Bash Session<pre><code># seff 134333893\nJob ID: 134333893\nCluster: merlin6\nUser/Group: albajacas_a/unx-sls\nState: COMPLETED (exit code 0)\nNodes: 1\nCores per node: 8\nCPU Utilized: 00:26:15\nCPU Efficiency: 49.47% of 00:53:04 core-walltime\nJob Wall-clock time: 00:06:38\nMemory Utilized: 60.73 MB\nMemory Efficiency: 0.19% of 31.25 GB\n</code></pre>"},{"location":"merlin6/slurm-general-docs/monitoring/#list-job-attributes","title":"List job attributes","text":"<p>The <code>sjstat</code> command is used to display statistics of jobs under control of SLURM. To use it</p> Bash<pre><code>sjstat\n</code></pre> Using 'sjstat' example Bash Session<pre><code># sjstat -v\n\nScheduling pool data:\n----------------------------------------------------------------------------------\n Total Usable Free Node Time Other\nPool Memory Cpus Nodes Nodes Nodes Limit Limit traits\n----------------------------------------------------------------------------------\ntest 373502Mb 88 6 6 1 UNLIM 1-00:00:00\ngeneral* 373502Mb 88 66 66 8 50 7-00:00:00\ndaily 373502Mb 88 72 72 9 60 1-00:00:00\nhourly 373502Mb 88 72 72 9 UNLIM 01:00:00\ngpu 128000Mb 8 1 1 0 UNLIM 7-00:00:00\ngpu 128000Mb 20 8 8 0 UNLIM 7-00:00:00\n\nRunning job data:\n---------------------------------------------------------------------------------------------------\n Time Time Time\nJobID User Procs Pool Status Used Limit Started Master/Other\n---------------------------------------------------------------------------------------------------\n13433377 collu_g 1 gpu PD 0:00 24:00:00 N/A (Resources)\n13433389 collu_g 20 gpu PD 0:00 24:00:00 N/A (Resources)\n13433382 jaervine 4 gpu PD 0:00 24:00:00 N/A (Priority)\n13433386 barret_d 20 gpu PD 0:00 24:00:00 N/A (Priority)\n13433382 pamula_f 20 gpu PD 0:00 168:00:00 N/A (Priority)\n13433387 pamula_f 4 gpu PD 0:00 24:00:00 N/A (Priority)\n13433365 andreani 132 daily PD 0:00 24:00:00 N/A (Dependency)\n13433388 marino_j 6 gpu R 1:43:12 168:00:00 01-23T14:54:57 merlin-g-007\n13433377 choi_s 40 gpu R 2:09:55 48:00:00 01-23T14:28:14 merlin-g-006\n13433373 qi_c 20 gpu R 7:00:04 24:00:00 01-23T09:38:05 merlin-g-004\n13433390 jaervine 2 gpu R 5:18 24:00:00 01-23T16:32:51 merlin-g-007\n13433390 jaervine 2 gpu R 15:18 24:00:00 01-23T16:22:51 merlin-g-007\n13433375 bellotti 4 gpu R 7:35:44 9:00:00 01-23T09:02:25 merlin-g-001\n13433358 bellotti 1 gpu R 1-05:52:19 144:00:00 01-22T10:45:50 merlin-g-007\n13433377 lavriha_ 20 gpu R 5:13:24 24:00:00 01-23T11:24:45 merlin-g-008\n13433370 lavriha_ 40 gpu R 22:43:09 24:00:00 01-22T17:55:00 merlin-g-003\n13433373 qi_c 20 gpu R 15:03:15 24:00:00 01-23T01:34:54 merlin-g-002\n13433371 qi_c 4 gpu R 22:14:14 168:00:00 01-22T18:23:55 merlin-g-001\n13433254 feichtin 2 general R 5-07:26:11 156:00:00 01-18T09:11:58 merlin-c-204\n13432137 feichtin 2 general R 5-23:06:25 160:00:00 01-17T17:31:44 merlin-c-204\n13433389 albajaca 32 hourly R 41:19 1:00:00 01-23T15:56:50 merlin-c-219\n13433387 riemann_ 2 general R 1:51:47 4:00:00 01-23T14:46:22 merlin-c-204\n13433370 jimenez_ 2 general R 23:20:45 168:00:00 01-22T17:17:24 merlin-c-106\n13433381 jimenez_ 2 general R 4:55:33 168:00:00 01-23T11:42:36 merlin-c-219\n13433390 sayed_m 128 daily R 21:49 10:00:00 01-23T16:16:20 merlin-c-223\n13433359 adelmann 2 general R 1-05:00:09 48:00:00 01-22T11:38:00 merlin-c-204\n13433377 zimmerma 2 daily R 6:13:38 24:00:00 01-23T10:24:31 merlin-c-007\n13433375 zohdirad 24 daily R 7:33:16 10:00:00 01-23T09:04:53 merlin-c-218\n13433363 zimmerma 6 general R 1-02:54:20 47:50:00 01-22T13:43:49 merlin-c-106\n13433376 zimmerma 6 general R 7:25:42 23:50:00 01-23T09:12:27 merlin-c-007\n13433371 vazquez_ 16 daily R 21:46:31 23:59:00 01-22T18:51:38 merlin-c-106\n13433382 vazquez_ 16 daily R 4:09:23 23:59:00 01-23T12:28:46 merlin-c-024\n13433376 jiang_j1 440 daily R 7:11:14 10:00:00 01-23T09:26:55 merlin-c-123\n13433376 jiang_j1 24 daily R 7:08:19 10:00:00 01-23T09:29:50 merlin-c-220\n13433384 kranjcev 440 daily R 2:48:19 24:00:00 01-23T13:49:50 merlin-c-108\n13433371 vazquez_ 16 general R 20:15:15 120:00:00 01-22T20:22:54 merlin-c-210\n13433371 vazquez_ 16 general R 21:15:51 120:00:00 01-22T19:22:18 merlin-c-210\n13433374 colonna_ 176 daily R 8:23:18 24:00:00 01-23T08:14:51 merlin-c-211\n13433374 bures_l 88 daily R 10:45:06 24:00:00 01-23T05:53:03 merlin-c-001\n13433375 derlet 88 daily R 7:32:05 24:00:00 01-23T09:06:04 merlin-c-107\n13433373 derlet 88 daily R 17:21:57 24:00:00 01-22T23:16:12 merlin-c-002\n13433373 derlet 88 daily R 18:13:05 24:00:00 01-22T22:25:04 merlin-c-112\n13433365 andreani 264 daily R 4:10:08 24:00:00 01-23T12:28:01 merlin-c-003\n13431187 mahrous_ 88 general R 6-15:59:16 168:00:00 01-17T00:38:53 merlin-c-111\n13433387 kranjcev 2 general R 1:48:47 4:00:00 01-23T14:49:22 merlin-c-204\n13433368 karalis_ 352 general R 1-00:05:22 96:00:00 01-22T16:32:47 merlin-c-013\n13433367 karalis_ 352 general R 1-00:06:44 96:00:00 01-22T16:31:25 merlin-c-118\n13433385 karalis_ 352 general R 1:37:24 96:00:00 01-23T15:00:45 merlin-c-213\n13433374 sato 256 general R 14:55:55 24:00:00 01-23T01:42:14 merlin-c-204\n13433374 sato 64 general R 10:43:35 24:00:00 01-23T05:54:34 merlin-c-106\n67723568 sato 32 general R 10:40:07 24:00:00 01-23T05:58:02 merlin-c-007\n13433265 khanppna 440 general R 3-18:20:58 168:00:00 01-19T22:17:11 merlin-c-008\n13433375 khanppna 704 general R 7:31:24 24:00:00 01-23T09:06:45 merlin-c-101\n13433371 khanppna 616 general R 21:40:33 24:00:00 01-22T18:57:36 merlin-c-208\n</code></pre>"},{"location":"merlin6/slurm-general-docs/monitoring/#graphical-user-interface","title":"Graphical user interface","text":"<p>When using ssh with X11 forwarding (<code>ssh -XY</code>), or when using NoMachine, users can use <code>sview</code>. SView is a graphical user interface to view and modify Slurm states. To run sview:</p> Bash<pre><code>ssh -XY $username@merlin-l-001.psi.ch # Not necessary when using NoMachine\nsview\n</code></pre> <p></p>"},{"location":"merlin6/slurm-general-docs/monitoring/#general-monitoring","title":"General Monitoring","text":"<p>The following pages contain basic monitoring for Slurm and computing nodes. Currently, monitoring is based on Grafana + InfluxDB. In the future it will be moved to a different service based on ElasticSearch + LogStash + Kibana.</p> <p>In the meantime, the following monitoring pages are available in a best effort support:</p>"},{"location":"merlin6/slurm-general-docs/monitoring/#merlin6-monitoring-pages","title":"Merlin6 Monitoring Pages","text":"<ul> <li>Slurm monitoring:<ul> <li>Merlin6 Slurm Statistics - XDMOD</li> <li>Merlin6 Slurm Live Status</li> <li>Merlin6 Slurm Overview</li> </ul> </li> <li>Nodes monitoring:<ul> <li>Merlin6 CPU Nodes Overview</li> <li>Merlin6 GPU Nodes Overview</li> </ul> </li> </ul>"},{"location":"merlin6/slurm-general-docs/running-jobs/","title":"Running Slurm Scripts","text":""},{"location":"merlin6/slurm-general-docs/running-jobs/#running-slurm-scripts","title":"Running Slurm Scripts","text":""},{"location":"merlin6/slurm-general-docs/running-jobs/#the-rules","title":"The rules","text":"<p>Before starting using the cluster, please read the following rules:</p> <ol> <li>To ease and improve scheduling and backfilling, always try to estimate and to define a proper run time of your jobs:<ul> <li>Use <code>--time=<D-HH:MM:SS></code> for that.</li> <li>For very long runs, please consider using Job Arrays with Checkpointing</li> </ul> </li> <li>Try to optimize your jobs for running at most within one day. Please, consider the following:<ul> <li>Some software can simply scale up by using more nodes while drastically reducing the run time.</li> <li>Some software allow to save a specific state, and a second job can start from that state: Job Arrays with Checkpointing can help you with that.</li> <li>Jobs submitted to <code>hourly</code> get more priority than jobs submitted to <code>daily</code>: always use <code>hourly</code> for jobs shorter than 1 hour.</li> <li>Jobs submitted to <code>daily</code> get more priority than jobs submitted to <code>general</code>: always use <code>daily</code> for jobs shorter than 1 day.</li> </ul> </li> <li>Is forbidden to run very short jobs as they cause a lot of overhead but also can cause severe problems to the main scheduler.<ul> <li>Question: Is my job a very short job? Answer: If it lasts in few seconds or very few minutes, yes.</li> <li>Question: How long should my job run? Answer: as the Rule of Thumb, from 5' would start being ok, from 15' would preferred.</li> <li>Use Packed Jobs for running a large number of short tasks.</li> </ul> </li> <li>Do not submit hundreds of similar jobs!<ul> <li>Use Array Jobs for gathering jobs instead.</li> </ul> </li> </ol> <p>Tip</p> <p>Having a good estimation of the time needed by your jobs, a proper way for running them, and optimizing the jobs to run within one day will contribute to make the system fairly and efficiently used.</p>"},{"location":"merlin6/slurm-general-docs/running-jobs/#basic-commands-for-running-batch-scripts","title":"Basic commands for running batch scripts","text":"<ul> <li>Use <code>sbatch</code> for submitting a batch script to Slurm.</li> <li>Use <code>srun</code> for running parallel tasks.</li> <li>Use <code>squeue</code> for checking jobs status.</li> <li>Use <code>scancel</code> for cancelling/deleting a job from the queue.</li> </ul> <p>Tip</p> <p>Use Linux <code>man</code> pages when needed (i.e. <code>man sbatch</code>), mostly for checking the available options for the above commands.</p>"},{"location":"merlin6/slurm-general-docs/running-jobs/#basic-settings","title":"Basic settings","text":"<p>For a complete list of options and parameters available is recommended to use the man pages (i.e. <code>man sbatch</code>, <code>man srun</code>, <code>man salloc</code>).</p> <p>Please, notice that behaviour for some parameters might change depending on the command used when running jobs (in example, <code>--exclusive</code> behaviour in <code>sbatch</code> differs from <code>srun</code>).</p> <p>In this chapter we show the basic parameters which are usually needed in the Merlin cluster.</p>"},{"location":"merlin6/slurm-general-docs/running-jobs/#common-settings","title":"Common settings","text":"<p>The following settings are the minimum required for running a job in the Merlin CPU and GPU nodes. Please, consider taking a look to the man pages (i.e. <code>man sbatch</code>, <code>man salloc</code>, <code>man srun</code>) for more information about all possible options. Also, do not hesitate to contact us on any questions.</p> <ul> <li>Clusters: For running jobs in the different Slurm clusters, users should to add the following option:</li> </ul> Bash<pre><code>#SBATCH --clusters=<cluster_name> # Possible values: merlin6, gmerlin6\n</code></pre> <p>Refer to the documentation of each cluster (<code>merlin6</code>,<code>gmerlin6</code> for further information.</p> <ul> <li>Partitions: except when using the default partition for each cluster, one needs to specify the partition:</li> </ul> Bash<pre><code>#SBATCH --partition=<partition_name> # Check each cluster documentation for possible values\n</code></pre> <p>Refer to the documentation of each cluster (<code>merlin6</code>,<code>gmerlin6</code> for further information.</p> <ul> <li>[Optional] Disabling shared nodes: by default, nodes are not exclusive. Hence, multiple users can run in the same node. One can request exclusive node usage with the following option:</li> </ul> Bash<pre><code>#SBATCH --exclusive # Only if you want a dedicated node\n</code></pre> <ul> <li>Time: is important to define how long a job should run, according to the reality. This will help Slurm when scheduling and backfilling, and will let Slurm managing job queues in a more efficient way. This value can never exceed the <code>MaxTime</code> of the affected partition.</li> </ul> Bash<pre><code>#SBATCH --time=<D-HH:MM:SS> # Can not exceed the partition `MaxTime`\n</code></pre> <p>Refer to the documentation of each cluster (<code>merlin6</code>,<code>gmerlin6</code> for further information about partition <code>MaxTime</code> values.</p> <ul> <li>Output and error files: by default, Slurm script will generate standard output (<code>slurm-%j.out</code>, where <code>%j</code> is the job_id) and error (<code>slurm-%j.err</code>, where <code>%j</code> is the job_id) files in the directory from where the job was submitted. Users can change default name with the following options:</li> </ul> Bash<pre><code>#SBATCH --output=<filename> # Can include path. Patterns accepted (i.e. %j)\n#SBATCH --error=<filename> # Can include path. Patterns accepted (i.e. %j)\n</code></pre> <p>Use man sbatch (<code>man sbatch | grep -A36 '^filename pattern'</code>) for getting a list specification of filename patterns.</p> <ul> <li>Enable/Disable Hyper-Threading: Whether a node has or not Hyper-Threading depends on the node configuration. By default, HT nodes have HT enabled, but one should specify it from the Slurm command as follows:</li> </ul> Bash<pre><code>#SBATCH --hint=multithread # Use extra threads with in-core multi-threading.\n#SBATCH --hint=nomultithread # Don't use extra threads with in-core multi-threading.\n</code></pre> <p>Refer to the documentation of each cluster (<code>merlin6</code>,<code>gmerlin6</code> for further information about node configuration and Hyper-Threading. Consider that, sometimes, depending on your job requirements, you might need also to setup how many <code>--ntasks-per-core</code> or <code>--cpus-per-task</code> (even other options) in addition to the <code>--hint</code> command. Please, contact us in case of doubts.</p> <p>!!! tip In general, for the cluster <code>merlin6</code> --hint=[no]multithread is a recommended field. On the other hand, --ntasks-per-core is only needed when one needs to define how a task should be handled within a core, and this setting will not be generally used on Hybrid MPI/OpenMP jobs where multiple cores are needed for single tasks.</p>"},{"location":"merlin6/slurm-general-docs/running-jobs/#batch-script-templates","title":"Batch script templates","text":""},{"location":"merlin6/slurm-general-docs/running-jobs/#cpu-based-jobs-templates","title":"CPU-based jobs templates","text":"<p>The following examples apply to the Merlin6 cluster.</p>"},{"location":"merlin6/slurm-general-docs/running-jobs/#nomultithreaded-jobs-template","title":"Nomultithreaded jobs template","text":"<p>The following template should be used by any user submitting jobs to the Merlin6 CPU nodes:</p> Bash<pre><code>#!/bin/bash\n#SBATCH --cluster=merlin6 # Cluster name\n#SBATCH --partition=general,daily,hourly # Specify one or multiple partitions\n#SBATCH --time=<D-HH:MM:SS> # Strongly recommended\n#SBATCH --output=<output_file> # Generate custom output file\n#SBATCH --error=<error_file> # Generate custom error file\n#SBATCH --hint=nomultithread # Mandatory for multithreaded jobs\n##SBATCH --exclusive # Uncomment if you need exclusive node usage\n##SBATCH --ntasks-per-core=1 # Only mandatory for multithreaded single tasks\n\n## Advanced options example\n##SBATCH --nodes=1 # Uncomment and specify #nodes to use\n##SBATCH --ntasks=44 # Uncomment and specify #nodes to use\n##SBATCH --ntasks-per-node=44 # Uncomment and specify #tasks per node\n##SBATCH --cpus-per-task=44 # Uncomment and specify the number of cores per task\n</code></pre>"},{"location":"merlin6/slurm-general-docs/running-jobs/#multithreaded-jobs-template","title":"Multithreaded jobs template","text":"<p>The following template should be used by any user submitting jobs to the Merlin6 CPU nodes:</p> Bash<pre><code>#!/bin/bash\n#SBATCH --cluster=merlin6 # Cluster name\n#SBATCH --partition=general,daily,hourly # Specify one or multiple partitions\n#SBATCH --time=<D-HH:MM:SS> # Strongly recommended\n#SBATCH --output=<output_file> # Generate custom output file\n#SBATCH --error=<error_file> # Generate custom error file\n#SBATCH --hint=multithread # Mandatory for multithreaded jobs\n##SBATCH --exclusive # Uncomment if you need exclusive node usage\n##SBATCH --ntasks-per-core=2 # Only mandatory for multithreaded single tasks\n\n## Advanced options example\n##SBATCH --nodes=1 # Uncomment and specify #nodes to use\n##SBATCH --ntasks=88 # Uncomment and specify #nodes to use\n##SBATCH --ntasks-per-node=88 # Uncomment and specify #tasks per node\n##SBATCH --cpus-per-task=88 # Uncomment and specify the number of cores per task\n</code></pre>"},{"location":"merlin6/slurm-general-docs/running-jobs/#gpu-based-jobs-templates","title":"GPU-based jobs templates","text":"<p>The following template should be used by any user submitting jobs to GPU nodes:</p> Bash<pre><code>#!/bin/bash\n#SBATCH --cluster=gmerlin6 # Cluster name\n#SBATCH --partition=gpu,gpu-short # Specify one or multiple partitions, or\n#SBATCH --partition=gwendolen,gwendolen-long # Only for Gwendolen users\n#SBATCH --gpus=\"<type>:<num_gpus>\" # <type> is optional, <num_gpus> is mandatory\n#SBATCH --time=<D-HH:MM:SS> # Strongly recommended\n#SBATCH --output=<output_file> # Generate custom output file\n#SBATCH --error=<error_file> # Generate custom error file\n##SBATCH --exclusive # Uncomment if you need exclusive node usage\n\n## Advanced options example\n##SBATCH --nodes=1 # Uncomment and specify number of nodes to use\n##SBATCH --ntasks=1 # Uncomment and specify number of nodes to use\n##SBATCH --cpus-per-gpu=5 # Uncomment and specify the number of cores per task\n##SBATCH --mem-per-gpu=16000 # Uncomment and specify the number of cores per task\n##SBATCH --gpus-per-node=<type>:2 # Uncomment and specify the number of GPUs per node\n##SBATCH --gpus-per-socket=<type>:2 # Uncomment and specify the number of GPUs per socket\n##SBATCH --gpus-per-task=<type>:1 # Uncomment and specify the number of GPUs per task\n</code></pre>"},{"location":"merlin6/slurm-general-docs/running-jobs/#advanced-configurations","title":"Advanced configurations","text":""},{"location":"merlin6/slurm-general-docs/running-jobs/#array-jobs-launching-a-large-number-of-related-jobs","title":"Array Jobs: launching a large number of related jobs","text":"<p>If you need to run a large number of jobs based on the same executable with systematically varying inputs, e.g. for a parameter sweep, you can do this most easily in form of a simple array job.</p> Bash<pre><code>#!/bin/bash\n#SBATCH --job-name=test-array\n#SBATCH --partition=daily\n#SBATCH --ntasks=1\n#SBATCH --time=08:00:00\n#SBATCH --array=1-8\n\necho $(date) \"I am job number ${SLURM_ARRAY_TASK_ID}\"\nsrun myprogram config-file-${SLURM_ARRAY_TASK_ID}.dat\n</code></pre> <p>This will run 8 independent jobs, where each job can use the counter variable <code>SLURM_ARRAY_TASK_ID</code> defined by Slurm inside of the job's environment to feed the correct input arguments or configuration file to the \"myprogram\" executable. Each job will receive the same set of configurations (e.g. time limit of 8h in the example above).</p> <p>The jobs are independent, but they will run in parallel (if the cluster resources allow for it). The jobs will get JobIDs like {some-number}_0 to {some-number}_7, and they also will each have their own output file.</p> <p>Note</p> <ul> <li>Do not use such jobs if you have very short tasks, since each array sub job will incur the full overhead for launching an independent Slurm job. For such cases you should used a packed job (see below).</li> <li>If you want to control how many of these jobs can run in parallel, you can use the <code>#SBATCH --array=1-100%5</code> syntax. The <code>%5</code> will define that only 5 sub jobs may ever run in parallel.</li> </ul> <p>You also can use an array job approach to run over all files in a directory, substituting the payload with</p> Bash<pre><code>FILES=(/path/to/data/*)\nsrun ./myprogram ${FILES[$SLURM_ARRAY_TASK_ID]}\n</code></pre> <p>Or for a trivial case you could supply the values for a parameter scan in form of a argument list that gets fed to the program using the counter variable.</p> Bash<pre><code>ARGS=(0.05 0.25 0.5 1 2 5 100)\nsrun ./my_program.exe ${ARGS[$SLURM_ARRAY_TASK_ID]}\n</code></pre>"},{"location":"merlin6/slurm-general-docs/running-jobs/#array-jobs-running-very-long-tasks-with-checkpoint-files","title":"Array jobs: running very long tasks with checkpoint files","text":"<p>If you need to run a job for much longer than the queues (partitions) permit, and your executable is able to create checkpoint files, you can use this strategy:</p> Bash<pre><code>#!/bin/bash\n#SBATCH --job-name=test-checkpoint\n#SBATCH --partition=general\n#SBATCH --ntasks=1\n#SBATCH --time=7-00:00:00 # each job can run for 7 days\n#SBATCH --cpus-per-task=1\n#SBATCH --array=1-10%1 # Run a 10-job array, one job at a time.\nif test -e checkpointfile; then\n # There is a checkpoint file;\n myprogram --read-checkp checkpointfile\nelse\n # There is no checkpoint file, start a new simulation.\n myprogram\nfi\n</code></pre> <p>The <code>%1</code> in the <code>#SBATCH --array=1-10%1</code> statement defines that only 1 subjob can ever run in parallel, so this will result in subjob n+1 only being started when job n has finished. It will read the checkpoint file if it is present.</p>"},{"location":"merlin6/slurm-general-docs/running-jobs/#packed-jobs-running-a-large-number-of-short-tasks","title":"Packed jobs: running a large number of short tasks","text":"<p>Since the launching of a Slurm job incurs some overhead, you should not submit each short task as a separate Slurm job. Use job packing, i.e. you run the short tasks within the loop of a single Slurm job.</p> <p>You can launch the short tasks using <code>srun</code> with the <code>--exclusive</code> switch (not to be confused with the switch of the same name used in the SBATCH commands). This switch will ensure that only a specified number of tasks can run in parallel.</p> <p>As an example, the following job submission script will ask Slurm for 44 cores (threads), then it will run the =myprog= program 1000 times with arguments passed from 1 to 1000. But with the =-N1 -n1 -c1 --exclusive= option, it will control that at any point in time only 44 instances are effectively running, each being allocated one CPU. You can at this point decide to allocate several CPUs or tasks by adapting the corresponding parameters.</p> Bash<pre><code>#! /bin/bash\n#SBATCH --job-name=test-checkpoint\n#SBATCH --partition=general\n#SBATCH --ntasks=1\n#SBATCH --time=7-00:00:00\n#SBATCH --ntasks=44 # defines the number of parallel tasks\nfor i in {1..1000}\ndo\n srun -N1 -n1 -c1 --exclusive ./myprog $i &\ndone\nwait\n</code></pre> <p>Note</p> <p>The <code>&</code> at the end of the <code>srun</code> line is needed to not have the script waiting (blocking). The <code>wait</code> command waits for all such background tasks to finish and returns the exit code.</p>"},{"location":"merlin6/slurm-general-docs/slurm-basic-commands/","title":"Slurm Basic Commands","text":""},{"location":"merlin6/slurm-general-docs/slurm-basic-commands/#slurm-basic-commands","title":"Slurm Basic Commands","text":"<p>In this document some basic commands for using Slurm are showed. Advanced examples for some of these are explained in other Merlin6 Slurm pages. You can always use <code>man <command></code> pages for more information about options and examples.</p>"},{"location":"merlin6/slurm-general-docs/slurm-basic-commands/#basic-commands","title":"Basic commands","text":"<p>Useful commands for the slurm:</p> Bash<pre><code>sinfo # to see the name of nodes, their occupancy,\n # name of slurm partitions, limits (try out with \"-l\" option)\nsqueue # to see the currently running/waiting jobs in slurm\n # (additional \"-l\" option may also be useful)\nsbatch Script.sh # to submit a script (example below) to the slurm.\nsrun <command> # to submit a command to Slurm. Same options as in 'sbatch' can be used.\nsalloc # to allocate computing nodes. Use for interactive runs.\nscancel job_id # to cancel slurm job, job id is the numeric id, seen by the squeue.\nsview # X interface for managing jobs and track job run information.\nseff # Calculates the efficiency of a job\nsjstat # List attributes of jobs under the SLURM control\nsacct # Show job accounting, useful for checking details of finished jobs.\n</code></pre>"},{"location":"merlin6/slurm-general-docs/slurm-basic-commands/#advanced-basic-commands","title":"Advanced basic commands","text":"Bash<pre><code>sinfo -N -l # list nodes, state, resources (#CPUs, memory per node, ...), etc.\nsshare -a # to list shares of associations to a cluster\nsprio -l # to view the factors that comprise a job's scheduling priority\n # add '-u <username>' for filtering user\n</code></pre>"},{"location":"merlin6/slurm-general-docs/slurm-examples/","title":"Slurm Examples","text":""},{"location":"merlin6/slurm-general-docs/slurm-examples/#slurm-examples","title":"Slurm Examples","text":""},{"location":"merlin6/slurm-general-docs/slurm-examples/#single-core-based-job-examples","title":"Single core based job examples","text":""},{"location":"merlin6/slurm-general-docs/slurm-examples/#example-1-hyperthreaded-job","title":"Example 1: Hyperthreaded job","text":"<p>In this example we want to use hyperthreading (<code>--ntasks-per-core=2</code> and <code>--hint=multithread</code>). In our Merlin6 configuration, the default memory per CPU (a CPU is equivalent to a core thread) is 4000MB, hence each task can use up 8000MB (2 threads x 4000MB).</p> Bash<pre><code>#!/bin/bash\n#SBATCH --partition=hourly # Using 'hourly' will grant higher priority\n#SBATCH --ntasks-per-core=2 # Request the max ntasks be invoked on each core\n#SBATCH --hint=multithread # Use extra threads with in-core multi-threading\n#SBATCH --time=00:30:00 # Define max time job will run\n#SBATCH --output=myscript.out # Define your output file\n#SBATCH --error=myscript.err # Define your error file\n\nmodule purge\nmodule load $MODULE_NAME # where $MODULE_NAME is a software in PModules\nsrun $MYEXEC # where $MYEXEC is a path to your binary file\n</code></pre>"},{"location":"merlin6/slurm-general-docs/slurm-examples/#example-2-non-hyperthreaded-job","title":"Example 2: Non-hyperthreaded job","text":"<p>In this example we do not want hyper-threading (<code>--ntasks-per-core=1</code> and <code>--hint=nomultithread</code>). In our Merlin6 configuration, the default memory per cpu (a CPU is equivalent to a core thread) is 4000MB. If we do not specify anything else, our single core task will use a default of 4000MB. However, one could double it with <code>--mem-per-cpu=8000</code> if you require more memory (remember, the second thread will not be used so we can safely assign +4000MB to the unique active thread).</p> Bash<pre><code>#!/bin/bash\n#SBATCH --partition=hourly # Using 'hourly' will grant higher priority\n#SBATCH --ntasks-per-core=1 # Request the max ntasks be invoked on each core\n#SBATCH --hint=nomultithread # Don't use extra threads with in-core multi-threading\n#SBATCH --time=00:30:00 # Define max time job will run\n#SBATCH --output=myscript.out # Define your output file\n#SBATCH --error=myscript.err # Define your error file\n\nmodule purge\nmodule load $MODULE_NAME # where $MODULE_NAME is a software in PModules\nsrun $MYEXEC # where $MYEXEC is a path to your binary file\n</code></pre>"},{"location":"merlin6/slurm-general-docs/slurm-examples/#multi-core-based-job-examples","title":"Multi core based job examples","text":""},{"location":"merlin6/slurm-general-docs/slurm-examples/#example-1-mpi-with-hyper-threading","title":"Example 1: MPI with Hyper-Threading","text":"<p>In this example we run a job that will run 88 tasks. Merlin6 Apollo nodes have 44 cores each one with hyper-threading enabled. This means that we can run 2 threads per core, in total 88 threads. To accomplish that, users should specify <code>--ntasks-per-core=2</code> and <code>--hint=multithread</code>.</p> <p>Use <code>--nodes=1</code> if you want to use a node exclusively (88 hyperthreaded tasks would fit in a Merlin6 node).</p> Bash<pre><code>#!/bin/bash\n#SBATCH --partition=hourly # Using 'hourly' will grant higher priority\n#SBATCH --ntasks=88 # Job will run 88 tasks\n#SBATCH --ntasks-per-core=2 # Request the max ntasks be invoked on each core\n#SBATCH --hint=multithread # Use extra threads with in-core multi-threading\n#SBATCH --time=00:30:00 # Define max time job will run\n#SBATCH --output=myscript.out # Define your output file\n#SBATCH --error=myscript.err # Define your error file\n\nmodule purge\nmodule load $MODULE_NAME # where $MODULE_NAME is a software in PModules\nsrun $MYEXEC # where $MYEXEC is a path to your binary file\n</code></pre>"},{"location":"merlin6/slurm-general-docs/slurm-examples/#example-2-mpi-without-hyper-threading","title":"Example 2: MPI without Hyper-Threading","text":"<p>In this example, we want to run a job that will run 44 tasks, and due to performance reasons we want to disable hyper-threading. Merlin6 Apollo nodes have 44 cores, each one with hyper-threading enabled. For ensuring that only 1 thread will be used per task, users should specify <code>--ntasks-per-core=1</code> and <code>--hint=nomultithread</code>. With this configuration, we tell Slurm to run only 1 tasks per core and no hyperthreading should be used. Hence, each tasks will be assigned to an independent core.</p> <p>Use <code>--nodes=1</code> if you want to use a node exclusively (44 non-hyperthreaded tasks would fit in a Merlin6 node).</p> Bash<pre><code>#!/bin/bash\n#SBATCH --partition=hourly # Using 'hourly' will grant higher priority\n#SBATCH --ntasks=44 # Job will run 44 tasks\n#SBATCH --ntasks-per-core=1 # Request the max ntasks be invoked on each core\n#SBATCH --hint=nomultithread # Don't use extra threads with in-core multi-threading\n#SBATCH --time=00:30:00 # Define max time job will run\n#SBATCH --output=myscript.out # Define your output file\n#SBATCH --error=myscript.err # Define your output file\n\nmodule purge\nmodule load $MODULE_NAME # where $MODULE_NAME is a software in PModules\nsrun $MYEXEC # where $MYEXEC is a path to your binary file\n</code></pre>"},{"location":"merlin6/slurm-general-docs/slurm-examples/#example-3-hyperthreaded-hybrid-mpiopenmp-job","title":"Example 3: Hyperthreaded Hybrid MPI/OpenMP job","text":"<p>In this example, we want to run a Hybrid Job using MPI and OpenMP using hyperthreading. In this job, we want to run 4 MPI tasks by using 8 CPUs per task. Each task in our example requires 128GB of memory. Then we specify 16000MB per CPU (8 x 16000MB = 128000MB). Notice that since hyperthreading is enabled, Slurm will use 4 cores per task (with hyperthreading 2 threads -a.k.a. Slurm CPUs- fit into a core).</p> Bash<pre><code>#!/bin/bash -l\n#SBATCH --clusters=merlin6\n#SBATCH --job-name=test\n#SBATCH --ntasks=4\n#SBATCH --ntasks-per-socket=1\n#SBATCH --mem-per-cpu=16000\n#SBATCH --cpus-per-task=8\n#SBATCH --partition=hourly\n#SBATCH --time=01:00:00\n#SBATCH --output=srun_%j.out\n#SBATCH --error=srun_%j.err\n#SBATCH --hint=multithread\n\nmodule purge\nmodule load $MODULE_NAME # where $MODULE_NAME is a software in PModules\nsrun $MYEXEC # where $MYEXEC is a path to your binary file\n</code></pre> <p>Memory Limit</p> <p>Also, always consider that <code>--mem-per-cpu</code> x <code>--cpus-per-task</code> can never exceed the maximum amount of memory per node (352000MB).</p>"},{"location":"merlin6/slurm-general-docs/slurm-examples/#example-4-non-hyperthreaded-hybrid-mpiopenmp-job","title":"Example 4: Non-hyperthreaded Hybrid MPI/OpenMP job","text":"<p>In this example, we want to run a Hybrid Job using MPI and OpenMP without hyperthreading. In this job, we want to run 4 MPI tasks by using 8 CPUs per task. Each task in our example requires 128GB of memory. Then we specify 16000MB per CPU (8 x 16000MB = 128000MB). Notice that since hyperthreading is disabled, Slurm will use 8 cores per task (disabling hyperthreading we force the use of only 1 thread -a.k.a. 1 CPU- per core).</p> Bash<pre><code>#!/bin/bash -l\n#SBATCH --clusters=merlin6\n#SBATCH --job-name=test\n#SBATCH --ntasks=4\n#SBATCH --ntasks-per-socket=1\n#SBATCH --mem-per-cpu=16000\n#SBATCH --cpus-per-task=8\n#SBATCH --partition=hourly\n#SBATCH --time=01:00:00\n#SBATCH --output=srun_%j.out\n#SBATCH --error=srun_%j.err\n#SBATCH --hint=nomultithread\n\nmodule purge\nmodule load $MODULE_NAME # where $MODULE_NAME is a software in PModules\nsrun $MYEXEC # where $MYEXEC is a path to your binary file\n</code></pre> <p>Memory Limit</p> <p>Also, always consider that <code>--mem-per-cpu</code> x <code>--cpus-per-task</code> can never exceed the maximum amount of memory per node (352000MB).</p>"},{"location":"merlin6/slurm-general-docs/slurm-examples/#gpu-examples","title":"GPU examples","text":"<p>Using GPUs requires two major changes. First, the cluster needs to be specified to <code>gmerlin6</code>. This should also be added to later commands pertaining to the job, e.g. <code>scancel --cluster=gmerlin6 <jobid></code>. Second, the number of GPUs should be specified using <code>--gpus</code>, <code>--gpus-per-task</code>, or similar parameters. Here's an example for a simple test job:</p> Bash<pre><code>#!/bin/bash\n#SBATCH --partition=gpu # Or 'gpu-short' for higher priority but 2-hour limit\n#SBATCH --cluster=gmerlin6 # Required for GPU\n#SBATCH --gpus=2 # Total number of GPUs\n#SBATCH --cpus-per-gpu=5 # Request CPU resources\n#SBATCH --time=1-00:00:00 # Define max time job will run\n#SBATCH --output=myscript.out # Define your output file\n#SBATCH --error=myscript.err # Define your error file\n\nmodule purge\nmodule load cuda # load any needed modules here\nsrun $MYEXEC # where $MYEXEC is a path to your binary file\n</code></pre> <p>Slurm will automatically set the gpu visibility (eg <code>$CUDA_VISIBLE_DEVICES</code>).</p>"},{"location":"merlin6/slurm-general-docs/slurm-examples/#advanced-examples","title":"Advanced examples","text":""},{"location":"merlin6/slurm-general-docs/slurm-examples/#array-jobs-launching-a-large-number-of-related-jobs","title":"Array Jobs: launching a large number of related jobs","text":"<p>If you need to run a large number of jobs based on the same executable with systematically varying inputs, e.g. for a parameter sweep, you can do this most easily in form of a simple array job.</p> Bash<pre><code>#!/bin/bash\n#SBATCH --job-name=test-array\n#SBATCH --partition=daily\n#SBATCH --ntasks=1\n#SBATCH --time=08:00:00\n#SBATCH --array=1-8\n\necho $(date) \"I am job number ${SLURM_ARRAY_TASK_ID}\"\nsrun $MYEXEC config-file-${SLURM_ARRAY_TASK_ID}.dat\n</code></pre> <p>This will run 8 independent jobs, where each job can use the counter variable <code>SLURM_ARRAY_TASK_ID</code> defined by Slurm inside of the job's environment to feed the correct input arguments or configuration file to the \"myprogram\" executable. Each job will receive the same set of configurations (e.g. time limit of 8h in the example above).</p> <p>The jobs are independent, but they will run in parallel (if the cluster resources allow for it). The jobs will get JobIDs like {some-number}_0 to {some-number}_7, and they also will each have their own output file.</p> <p>Note</p> <ul> <li>Do not use such jobs if you have very short tasks, since each array sub job will incur the full overhead for launching an independent Slurm job. For such cases you should used a packed job (see below).</li> <li>If you want to control how many of these jobs can run in parallel, you can use the <code>#SBATCH --array=1-100%5</code> syntax. The <code>%5</code> will define that only 5 sub jobs may ever run in parallel.</li> </ul> <p>You also can use an array job approach to run over all files in a directory, substituting the payload with</p> Bash<pre><code>FILES=(/path/to/data/*)\nsrun $MYEXEC ${FILES[$SLURM_ARRAY_TASK_ID]}\n</code></pre> <p>Or for a trivial case you could supply the values for a parameter scan in form of a argument list that gets fed to the program using the counter variable.</p> Bash<pre><code>ARGS=(0.05 0.25 0.5 1 2 5 100)\nsrun $MYEXEC ${ARGS[$SLURM_ARRAY_TASK_ID]}\n</code></pre>"},{"location":"merlin6/slurm-general-docs/slurm-examples/#array-jobs-running-very-long-tasks-with-checkpoint-files","title":"Array jobs: running very long tasks with checkpoint files","text":"<p>If you need to run a job for much longer than the queues (partitions) permit, and your executable is able to create checkpoint files, you can use this strategy:</p> Bash<pre><code>#!/bin/bash\n#SBATCH --job-name=test-checkpoint\n#SBATCH --partition=general\n#SBATCH --ntasks=1\n#SBATCH --time=7-00:00:00 # each job can run for 7 days\n#SBATCH --cpus-per-task=1\n#SBATCH --array=1-10%1 # Run a 10-job array, one job at a time.\nif test -e checkpointfile; then\n # There is a checkpoint file;\n $MYEXEC --read-checkp checkpointfile\nelse\n # There is no checkpoint file, start a new simulation.\n $MYEXEC\nfi\n</code></pre> <p>The <code>%1</code> in the <code>#SBATCH --array=1-10%1</code> statement defines that only 1 subjob can ever run in parallel, so this will result in subjob n+1 only being started when job n has finished. It will read the checkpoint file if it is present.</p>"},{"location":"merlin6/slurm-general-docs/slurm-examples/#packed-jobs-running-a-large-number-of-short-tasks","title":"Packed jobs: running a large number of short tasks","text":"<p>Since the launching of a Slurm job incurs some overhead, you should not submit each short task as a separate Slurm job. Use job packing, i.e. you run the short tasks within the loop of a single Slurm job.</p> <p>You can launch the short tasks using <code>srun</code> with the <code>--exclusive</code> switch (not to be confused with the switch of the same name used in the SBATCH commands). This switch will ensure that only a specified number of tasks can run in parallel.</p> <p>As an example, the following job submission script will ask Slurm for 44 cores (threads), then it will run the =myprog= program 1000 times with arguments passed from 1 to 1000. But with the =-N1 -n1 -c1 --exclusive= option, it will control that at any point in time only 44 instances are effectively running, each being allocated one CPU. You can at this point decide to allocate several CPUs or tasks by adapting the corresponding parameters.</p> Bash<pre><code>#! /bin/bash\n#SBATCH --job-name=test-checkpoint\n#SBATCH --partition=general\n#SBATCH --ntasks=1\n#SBATCH --time=7-00:00:00\n#SBATCH --ntasks=44 # defines the number of parallel tasks\nfor i in {1..1000}\ndo\n srun -N1 -n1 -c1 --exclusive $MYEXEC $i &\ndone\nwait\n</code></pre> <p>Note</p> <p>The <code>&</code> at the end of the <code>srun</code> line is needed to not have the script waiting (blocking). The <code>wait</code> command waits for all such background tasks to finish and returns the exit code.</p>"},{"location":"merlin6/slurm-general-docs/slurm-examples/#hands-on-example","title":"Hands-On Example","text":"<p>Copy-paste the following example in a file called myAdvancedTest.batch):</p> Bash<pre><code>#!/bin/bash\n#SBATCH --partition=daily # name of slurm partition to submit\n#SBATCH --time=2:00:00 # limit the execution of this job to 2 hours, see sinfo for the max. allowance\n#SBATCH --nodes=2 # number of nodes\n#SBATCH --ntasks=44 # number of tasks\n#SBATCH --ntasks-per-core=1 # Request the max ntasks be invoked on each core\n#SBATCH --hint=nomultithread # Don't use extra threads with in-core multi-threading\n\nmodule load gcc/9.2.0 openmpi/3.1.5-1_merlin6\nmodule list\n\necho \"Example no-MPI:\" ; hostname # will print one hostname per node\necho \"Example MPI:\" ; srun hostname # will print one hostname per ntask\n</code></pre> <p>In the above example are specified the options <code>--nodes=2</code> and <code>--ntasks=44</code>. This means that up 2 nodes are requested, and is expected to run 44 tasks. Hence, 44 cores are needed for running that job. Slurm will try to allocate a maximum of 2 nodes, both together having at least 44 cores. Since our nodes have 44 cores / each, if nodes are empty (no other users have running jobs there), job can land on a single node (it has enough cores to run 44 tasks).</p> <p>If we want to ensure that job is using at least two different nodes (i.e. for boosting CPU frequency, or because the job requires more memory per core) you should specify other options.</p> <p>A good example is <code>--ntasks-per-node=22</code>. This will equally distribute 22 tasks on 2 nodes.</p> Bash<pre><code>#SBATCH --ntasks-per-node=22\n</code></pre> <p>A different example could be by specifying how much memory per core is needed. For instance <code>--mem-per-cpu=32000</code> will reserve ~32000MB per core. Since we have a maximum of 352000MB per Apollo node, Slurm will be only able to allocate 11 cores (32000MB x 11cores = 352000MB) per node. It means that 4 nodes will be needed (max 11 tasks per node due to memory definition, and we need to run 44 tasks), in this case we need to change <code>--nodes=4</code> (or remove <code>--nodes</code>). Alternatively, we can decrease <code>--mem-per-cpu</code> to a lower value which can allow the use of at least 44 cores per node (i.e. with <code>16000</code> should be able to use 2 nodes)</p> Bash<pre><code>#SBATCH --mem-per-cpu=16000\n</code></pre> <p>Finally, in order to ensure exclusivity of the node, an option --exclusive can be used (see below). This will ensure that the requested nodes are exclusive for the job (no other users jobs will interact with this node, and only completely free nodes will be allocated).</p> Bash<pre><code>#SBATCH --exclusive\n</code></pre> <p>This can be combined with the previous examples.</p> <p>More advanced configurations can be defined and can be combined with the previous examples. More information about advanced options can be found in the following link: https://slurm.schedmd.com/sbatch.html (or run <code>man sbatch</code>).</p> <p>If you have questions about how to properly execute your jobs, please contact us through merlin-admins@lists.psi.ch. Do not run advanced configurations unless your are sure of what you are doing.</p>"},{"location":"merlin6/software-support/ansys-cfx/","title":"ANSYS - CFX","text":""},{"location":"merlin6/software-support/ansys-cfx/#ansys-cfx","title":"ANSYS - CFX","text":"<p>Is always recommended to check which parameters are available in CFX and adapt the below examples according to your needs. For that, run <code>cfx5solve -help</code> for getting a list of options.</p>"},{"location":"merlin6/software-support/ansys-cfx/#running-cfx-jobs","title":"Running CFX jobs","text":""},{"location":"merlin6/software-support/ansys-cfx/#pmodules","title":"PModules","text":"<p>Is strongly recommended the use of the latest ANSYS software available in PModules.</p> Bash<pre><code>module use unstable\nmodule load Pmodules/1.1.6\nmodule use overlay_merlin\nmodule load ANSYS/2022R1\n</code></pre>"},{"location":"merlin6/software-support/ansys-cfx/#interactive-rsm-from-remote-psi-workstations","title":"Interactive: RSM from remote PSI Workstations","text":"<p>Is possible to run CFX through RSM from remote PSI (Linux or Windows) Workstation having a local installation of ANSYS CFX and RSM client. For that, please refer to the ANSYS RSM in the Merlin documentation for further information of how to setup a RSM client for submitting jobs to Merlin.</p>"},{"location":"merlin6/software-support/ansys-cfx/#non-interactive-sbatch","title":"Non-interactive: sbatch","text":"<p>Running jobs with <code>sbatch</code> is always the recommended method. This makes the use of the resources more efficient. Notice that for running non interactive Mechanical APDL jobs one must specify the <code>-batch</code> option.</p>"},{"location":"merlin6/software-support/ansys-cfx/#serial-example","title":"Serial example","text":"<p>This example shows a very basic serial job.</p> Bash<pre><code>#!/bin/bash\n#SBATCH --job-name=CFX # Job Name\n#SBATCH --partition=hourly # Using 'daily' will grant higher priority than 'general'\n#SBATCH --time=0-01:00:00 # Time needed for running the job. Must match with 'partition' limits.\n#SBATCH --cpus-per-task=1 # Double if hyperthreading enabled\n#SBATCH --ntasks-per-core=1 # Double if hyperthreading enabled\n#SBATCH --hint=nomultithread # Disable Hyperthreading\n#SBATCH --error=slurm-%j.err # Define your error file\n\nmodule use unstable\nmodule load ANSYS/2020R1-1\n\n# [Optional:BEGIN] Specify your license server if this is not 'lic-ansys.psi.ch'\nLICENSE_SERVER=<your_license_server>\nexport ANSYSLMD_LICENSE_FILE=1055@$LICENSE_SERVER\nexport ANSYSLI_SERVERS=2325@$LICENSE_SERVER\n# [Optional:END]\n\nSOLVER_FILE=/data/user/caubet_m/CFX5/mysolver.in\ncfx5solve -batch -def \"$JOURNAL_FILE\"\n</code></pre> <p>One can enable hypertheading by defining <code>--hint=multithread</code>, <code>--cpus-per-task=2</code> and <code>--ntasks-per-core=2</code>. However, this is in general not recommended, unless one can ensure that can be beneficial.</p>"},{"location":"merlin6/software-support/ansys-cfx/#mpi-based-example","title":"MPI-based example","text":"<p>An example for running CFX using a Slurm batch script is the following:</p> Bash<pre><code>#!/bin/bash\n#SBATCH --job-name=CFX # Job Name\n#SBATCH --partition=hourly # Using 'daily' will grant higher priority than 'general'\n#SBATCH --time=0-01:00:00 # Time needed for running the job. Must match with 'partition' limits.\n#SBATCH --nodes=1 # Number of nodes\n#SBATCH --ntasks=44 # Number of tasks\n#SBATCH --cpus-per-task=1 # Double if hyperthreading enabled\n#SBATCH --ntasks-per-core=1 # Double if hyperthreading enabled\n#SBATCH --hint=nomultithread # Disable Hyperthreading\n#SBATCH --error=slurm-%j.err # Define a file for standard error messages\n##SBATCH --exclusive # Uncomment if you want exclusive usage of the nodes\n\nmodule use unstable\nmodule load ANSYS/2020R1-1\n\n# [Optional:BEGIN] Specify your license server if this is not 'lic-ansys.psi.ch'\nLICENSE_SERVER=<your_license_server>\nexport ANSYSLMD_LICENSE_FILE=1055@$LICENSE_SERVER\nexport ANSYSLI_SERVERS=2325@$LICENSE_SERVER\n# [Optional:END]\n\nexport HOSTLIST=$(scontrol show hostname | tr '\\n' ',' | sed 's/,$//g')\n\nJOURNAL_FILE=myjournal.in\n\n# INTELMPI=no for IBM MPI\n# INTELMPI=yes for INTEL MPI\nINTELMPI=no\n\nif [ \"$INTELMPI\" == \"yes\" ]\nthen\n export I_MPI_DEBUG=4\n export I_MPI_PIN_CELL=core\n\n # Simple example: cfx5solve -batch -def \"$JOURNAL_FILE\" -par-dist \"$HOSTLIST\" \\\n # -part $SLURM_NTASKS \\\n # -start-method 'Intel MPI Distributed Parallel'\n cfx5solve -batch -part-large -double -verbose -def \"$JOURNAL_FILE\" -par-dist \"$HOSTLIST\" \\\n -part $SLURM_NTASKS -par-local -start-method 'Intel MPI Distributed Parallel'\nelse\n # Simple example: cfx5solve -batch -def \"$JOURNAL_FILE\" -par-dist \"$HOSTLIST\" \\\n # -part $SLURM_NTASKS \\\n # -start-method 'IBM MPI Distributed Parallel'\n cfx5solve -batch -part-large -double -verbose -def \"$JOURNAL_FILE\" -par-dist \"$HOSTLIST\" \\\n -part $SLURM_NTASKS -par-local -start-method 'IBM MPI Distributed Parallel'\nfi\n</code></pre> <p>In the above example, one can increase the number of nodes and/or ntasks if needed and combine it with <code>--exclusive</code> whenever needed. In general, no hypertheading is recommended for MPI based jobs.</p> <p>Also, one can combine it with <code>--exclusive</code> when necessary. Finally, one can change the MPI technology in <code>-start-method</code> (check CFX documentation for possible values).</p>"},{"location":"merlin6/software-support/ansys-cfx/#cfx5-launcher-cfd-prepost-solve-manager-turbogrid","title":"CFX5 Launcher: CFD-Pre/Post, Solve Manager, TurboGrid","text":"<p>Some users might need to visualize or change some parameters when running calculations with the CFX Solver. For running TurboGrid, CFX-Pre, CFX-Solver Manager or CFD-Post one should run it with the <code>cfx5</code> launcher binary:</p> Bash<pre><code>cfx5\n</code></pre> <p></p> <p>Then, from the launcher, one can open the proper application (i.e. CFX-Solver Manager for visualizing and modifying an existing job run)</p> <p>For running CFX5 Launcher, is required a proper SSH + X11 Forwarding access (<code>ssh -XY</code>) or preferrible NoMachine. If ssh does not work for you, please use NoMachine instead (which is the supported X based access, and simpler).</p>"},{"location":"merlin6/software-support/ansys-fluent/","title":"ANSYS - Fluent","text":""},{"location":"merlin6/software-support/ansys-fluent/#ansys-fluent","title":"ANSYS - Fluent","text":"<p>Is always recommended to check which parameters are available in Fluent and adapt the below example according to your needs. For that, run <code>fluent -help</code> for getting a list of options. However, as when running Fluent one must specify one of the following flags:</p> <ul> <li>2d: This is a 2D solver with single point precision.</li> <li>3d: This is a 3D solver with single point precision.</li> <li>2dpp: This is a 2D solver with double point precision.</li> <li>3dpp: This is a 3D solver with double point precision.</li> </ul>"},{"location":"merlin6/software-support/ansys-fluent/#running-fluent-jobs","title":"Running Fluent jobs","text":""},{"location":"merlin6/software-support/ansys-fluent/#pmodules","title":"PModules","text":"<p>Is strongly recommended the use of the latest ANSYS software available in PModules.</p> Bash<pre><code>module use unstable\nmodule load Pmodules/1.1.6\nmodule use overlay_merlin\nmodule load ANSYS/2022R1\n</code></pre>"},{"location":"merlin6/software-support/ansys-fluent/#interactive-rsm-from-remote-psi-workstations","title":"Interactive: RSM from remote PSI Workstations","text":"<p>Is possible to run Fluent through RSM from remote PSI (Linux or Windows) Workstation having a local installation of ANSYS Fluent and RSM client. For that, please refer to the ANSYS RSM in the Merlin documentation for further information of how to setup a RSM client for submitting jobs to Merlin.</p>"},{"location":"merlin6/software-support/ansys-fluent/#non-interactive-sbatch","title":"Non-interactive: sbatch","text":"<p>Running jobs with <code>sbatch</code> is always the recommended method. This makes the use of the resources more efficient. For running it as a job, one needs to run in no graphical mode (<code>-g</code> option).</p>"},{"location":"merlin6/software-support/ansys-fluent/#serial-example","title":"Serial example","text":"<p>This example shows a very basic serial job.</p> Bash<pre><code>#!/bin/bash\n#SBATCH --job-name=Fluent # Job Name\n#SBATCH --partition=hourly # Using 'daily' will grant higher priority than 'general'\n#SBATCH --time=0-01:00:00 # Time needed for running the job. Must match with 'partition' limits.\n#SBATCH --cpus-per-task=1 # Double if hyperthreading enabled\n#SBATCH --hint=nomultithread # Disable Hyperthreading\n#SBATCH --error=slurm-%j.err # Define your error file\n\nmodule use unstable\nmodule load ANSYS/2020R1-1\n\n# [Optional:BEGIN] Specify your license server if this is not 'lic-ansys.psi.ch'\nLICENSE_SERVER=<your_license_server>\nexport ANSYSLMD_LICENSE_FILE=1055@$LICENSE_SERVER\nexport ANSYSLI_SERVERS=2325@$LICENSE_SERVER\n# [Optional:END]\n\nJOURNAL_FILE=/data/user/caubet_m/Fluent/myjournal.in\nfluent 3ddp -g -i ${JOURNAL_FILE}\n</code></pre> <p>One can enable hypertheading by defining <code>--hint=multithread</code>, <code>--cpus-per-task=2</code> and <code>--ntasks-per-core=2</code>. However, this is in general not recommended, unless one can ensure that can be beneficial.</p>"},{"location":"merlin6/software-support/ansys-fluent/#mpi-based-example","title":"MPI-based example","text":"<p>An example for running Fluent using a Slurm batch script is the following:</p> Bash<pre><code>#!/bin/bash\n#SBATCH --job-name=Fluent # Job Name\n#SBATCH --partition=hourly # Using 'daily' will grant higher priority than 'general'\n#SBATCH --time=0-01:00:00 # Time needed for running the job. Must match with 'partition' limits.\n#SBATCH --nodes=1 # Number of nodes\n#SBATCH --ntasks=44 # Number of tasks\n#SBATCH --cpus-per-task=1 # Double if hyperthreading enabled\n#SBATCH --ntasks-per-core=1 # Run one task per core\n#SBATCH --hint=nomultithread # Disable Hyperthreading\n#SBATCH --error=slurm-%j.err # Define a file for standard error messages\n##SBATCH --exclusive # Uncomment if you want exclusive usage of the nodes\n\nmodule use unstable\nmodule load ANSYS/2020R1-1\n\n# [Optional:BEGIN] Specify your license server if this is not 'lic-ansys.psi.ch'\nLICENSE_SERVER=<your_license_server>\nexport ANSYSLMD_LICENSE_FILE=1055@$LICENSE_SERVER\nexport ANSYSLI_SERVERS=2325@$LICENSE_SERVER\n# [Optional:END]\n\nJOURNAL_FILE=/data/user/caubet_m/Fluent/myjournal.in\nfluent 3ddp -g -t ${SLURM_NTASKS} -i ${JOURNAL_FILE}\n</code></pre> <p>In the above example, one can increase the number of nodes and/or ntasks if needed. One can remove <code>--nodes</code> for running on multiple nodes, but may lead to communication overhead. In general, no hyperthreading is recommended for MPI based jobs. Also, one can combine it with <code>--exclusive</code> when necessary.</p>"},{"location":"merlin6/software-support/ansys-fluent/#interactive-salloc","title":"Interactive: salloc","text":"<p>Running Fluent interactively is strongly not recommended and one should whenever possible use <code>sbatch</code>. However, sometimes interactive runs are needed. For jobs requiring only few CPUs (in example, 2 CPUs) and for a short period of time, one can use the login nodes. Otherwise, one must use the Slurm batch system using allocations:</p> <ul> <li>For short jobs requiring more CPUs, one can use the Merlin shortest partitions (<code>hourly</code>).</li> <li>For longer jobs, one can use longer partitions, however, interactive access is not always possible (depending on the usage of the cluster).</li> </ul> <p>Please refer to the documentation Running Interactive Jobs for firther information about different ways for running interactive jobs in the Merlin6 cluster.</p>"},{"location":"merlin6/software-support/ansys-fluent/#requirements","title":"Requirements","text":""},{"location":"merlin6/software-support/ansys-fluent/#ssh-keys","title":"SSH Keys","text":"<p>Running Fluent interactively requires the use of SSH Keys. This is the way of communication between the GUI and the different nodes. For doing that, one must have a passphrase protected SSH Key. If the user does not have SSH Keys yet (simply run <code>ls $HOME/.ssh/</code> to check whether <code>id_rsa</code> files exist or not). For deploying SSH Keys for running Fluent interactively, one should follow this documentation: Configuring SSH Keys</p>"},{"location":"merlin6/software-support/ansys-fluent/#list-of-hosts","title":"List of hosts","text":"<p>For running Fluent using Slurm computing nodes, one needs to get the list of the reserved nodes. For getting that list, once you have the allocation, one can run the following command:</p> Bash<pre><code>scontrol show hostname\n</code></pre> <p>This list must be included in the settings as the list of hosts where to run Fluent. Alternatively, one can give that list as parameter (<code>-cnf</code> option) when running <code>fluent</code>, as follows:</p> Running Fluent with 'salloc' Bash Session<pre><code>$ salloc --nodes=2 --ntasks=88 --hint=nomultithread --time=0-01:00:00 --partition=test $SHELL\nsalloc: Pending job allocation 135030174\nsalloc: job 135030174 queued and waiting for resources\nsalloc: job 135030174 has been allocated resources\nsalloc: Granted job allocation 135030174\n\n$ module use unstable\n$ module load ANSYS/2020R1-1\nmodule load: unstable module has been loaded -- ANSYS/2020R1-1\n\n$ fluent 3ddp -t$SLURM_NPROCS -cnf=$(scontrol show hostname | tr '\\n' ',')\n\n$ exit\nexit\nsalloc: Relinquishing job allocation 135030174\nsalloc: Job allocation 135030174 has been revoked.\n</code></pre>"},{"location":"merlin6/software-support/ansys-hfss/","title":"ANSYS HFSS (ElectroMagnetics)","text":""},{"location":"merlin6/software-support/ansys-hfss/#ansys-hfss-electromagnetics","title":"ANSYS HFSS (ElectroMagnetics)","text":"<p>This recipe is intended to show how to run ANSYS HFSS (ElectroMagnetics) in Slurm. Having in mind that in general, running ANSYS HFSS means running ANSYS Electronics Desktop.</p>"},{"location":"merlin6/software-support/ansys-hfss/#running-hfss-electromagnetics-jobs","title":"Running HFSS / Electromagnetics jobs","text":""},{"location":"merlin6/software-support/ansys-hfss/#pmodules","title":"PModules","text":"<p>Is necessary to run at least ANSYS software ANSYS/2022R1, which is available in PModules:</p> Bash<pre><code>module use unstable\nmodule load Pmodules/1.1.6\nmodule use overlay_merlin\nmodule load ANSYS/2022R1\n</code></pre>"},{"location":"merlin6/software-support/ansys-hfss/#remote-job-submission-hfss-rsm-and-slurm","title":"Remote job submission: HFSS RSM and SLURM","text":"<p>Running jobs through Remote RSM or Slurm is the recommended way for running ANSYS HFSS.</p> <ul> <li>HFSS RSM can be used from ANSYS HFSS installations running on Windows workstations at PSI (as long as are in the internal PSI network).</li> <li>Slurm can be used when submitting directly from a Merlin login node (i.e. <code>sbatch</code> command or interactively from ANSYS Electronics Desktop)</li> </ul>"},{"location":"merlin6/software-support/ansys-hfss/#hfss-rsm-from-remote-workstations","title":"HFSS RSM (from remote workstations)","text":"<p>Running jobs through Remote RSM is the way for running ANSYS HFSS when submitting from an ANSYS HFSS installation on a PSI Windows workstation. A HFSS RSM service is running on each Merlin login node, and the listening port depends on the ANSYS EM version. Current support ANSYS EM RSM release and associated listening ports are the following:</p> ANSYS version Login nodes Listening port 2022R1 merlin-l-001 merlin-l-001 merlin-l-001 32958 2022R2 merlin-l-001 merlin-l-001 merlin-l-001 32959 2023R2 merlin-l-001 merlin-l-001 merlin-l-001 32960 <p>Notice that by default ANSYS EM is listening on port <code>32958</code>, this is the default for ANSYS/2022R1 only.</p> <ul> <li>Workstations connecting to the Merlin ANSYS EM service must ensure that Electronics Desktop is connecting to the proper port.</li> <li>In the same way, the ANSYS Workstation version must be the same as the version running on Merlin.</li> </ul> <p>Notice that HFSS RSM is not the same RSM provided for other ANSYS products. Therefore, the configuration is different from ANSYS RSM.</p> <p>To setup HFSS RSM for using it with the Merlin cluster, it must be done from the following ANSYS Electronics Desktop menu:</p> <ol> <li>[Tools]->[Job Management]->[Select Scheduler]:</li> </ol> <p></p> <ol> <li>In the new [Select scheduler] window, setup the following settings and Refresh:</li> </ol> <p></p> Text Only<pre><code>* **Select Scheduler**: `Remote RSM`.\n* **Server**: Add a Merlin login node.\n* **User name**: Add your Merlin username.\n* **Password**: Add you Merlin username password.\n</code></pre> <p>Once refreshed, the Scheduler info box must provide Slurm information of the server (see above picture). If the box contains that information, then you can save changes (<code>OK</code> button).</p> <ol> <li>[Tools]->[Job Management]->[Submit Job...]:</li> </ol> <p></p> <ol> <li>In the new [Submite Job] window, you must specify the location of the ANSYS Electronics Desktop binary:</li> </ol> <p></p> Text Only<pre><code>* In example, for **ANSYS/2022R1**, the location is `/data/software/pmodules/Tools/ANSYS/2021R1/v211/AnsysEM21.1/Linux64/ansysedt.exe`.\n</code></pre>"},{"location":"merlin6/software-support/ansys-hfss/#hfss-slurm-from-login-node-only","title":"HFSS Slurm (from login node only)","text":"<p>Running jobs through Slurm from ANSYS Electronics Desktop is the way for running ANSYS HFSS when submitting from an ANSYS HFSS installation in a Merlin login node. ANSYS Electronics Desktop usually needs to be run from the Merlin NoMachine service, which currently runs on:</p> <ul> <li><code>merlin-l-001.psi.ch</code></li> <li><code>merlin-l-002.psi.ch</code></li> </ul> <p>Since the Slurm client is present in the login node (where ANSYS Electronics Desktop is running), the application will be able to detect and to submit directly to Slurm. Therefore, we only have to configure ANSYS Electronics Desktop to submit to Slurm. This can set as follows:</p> <ol> <li>[Tools]->[Job Management]->[Select Scheduler]:</li> </ol> <p></p> <ol> <li>In the new [Select scheduler] window, setup the following settings and Refresh:</li> </ol> <p></p> Text Only<pre><code>* **Select Scheduler**: `Slurm`.\n* **Server**: must point to `localhost`.\n* **User name**: must be empty.\n* **Password**: must be empty.\n</code></pre> <p>The Server, User name and Password boxes can't be modified, but if value do not match with the above settings, they should be changed by selecting another Scheduler which allows editig these boxes (i.e. RSM Remote).</p> <p>Once refreshed, the Scheduler info box must provide Slurm information of the server (see above picture). If the box contains that information, then you can save changes (<code>OK</code> button).</p>"},{"location":"merlin6/software-support/ansys-mapdl/","title":"ANSYS - MAPDL","text":""},{"location":"merlin6/software-support/ansys-mapdl/#ansys-mapdl","title":"ANSYS - MAPDL","text":""},{"location":"merlin6/software-support/ansys-mapdl/#ansys-mechanical-apdl","title":"ANSYS - Mechanical APDL","text":"<p>Is always recommended to check which parameters are available in Mechanical APDL and adapt the below examples according to your needs. For that, please refer to the official Mechanical APDL documentation.</p>"},{"location":"merlin6/software-support/ansys-mapdl/#running-mechanical-apdl-jobs","title":"Running Mechanical APDL jobs","text":""},{"location":"merlin6/software-support/ansys-mapdl/#pmodules","title":"PModules","text":"<p>Is strongly recommended the use of the latest ANSYS software available in PModules.</p> Bash<pre><code>module use unstable\nmodule load Pmodules/1.1.6\nmodule use overlay_merlin\nmodule load ANSYS/2022R1\n</code></pre>"},{"location":"merlin6/software-support/ansys-mapdl/#interactive-rsm-from-remote-psi-workstations","title":"Interactive: RSM from remote PSI Workstations","text":"<p>Is possible to run Mechanical through RSM from remote PSI (Linux or Windows) Workstation having a local installation of ANSYS Mechanical and RSM client. For that, please refer to the ANSYS RSM in the Merlin documentation for further information of how to setup a RSM client for submitting jobs to Merlin.</p>"},{"location":"merlin6/software-support/ansys-mapdl/#non-interactive-sbatch","title":"Non-interactive: sbatch","text":"<p>Running jobs with <code>sbatch</code> is always the recommended method. This makes the use of the resources more efficient. Notice that for running non interactive Mechanical APDL jobs one must specify the <code>-b</code> option.</p>"},{"location":"merlin6/software-support/ansys-mapdl/#serial-example","title":"Serial example","text":"<p>This example shows a very basic serial job.</p> Bash<pre><code>#!/bin/bash\n#SBATCH --job-name=MAPDL # Job Name\n#SBATCH --partition=hourly # Using 'daily' will grant higher priority than 'general'\n#SBATCH --time=0-01:00:00 # Time needed for running the job. Must match with 'partition' limits.\n#SBATCH --cpus-per-task=1 # Double if hyperthreading enabled\n#SBATCH --ntasks-per-core=1 # Double if hyperthreading enabled\n#SBATCH --hint=nomultithread # Disable Hyperthreading\n#SBATCH --error=slurm-%j.err # Define your error file\n\nmodule use unstable\nmodule load ANSYS/2020R1-1\n\n# [Optional:BEGIN] Specify your license server if this is not 'lic-ansys.psi.ch'\nLICENSE_SERVER=<your_license_server>\nexport ANSYSLMD_LICENSE_FILE=1055@$LICENSE_SERVER\nexport ANSYSLI_SERVERS=2325@$LICENSE_SERVER\n# [Optional:END]\n\nSOLVER_FILE=/data/user/caubet_m/MAPDL/mysolver.in\nmapdl -b -i \"$SOLVER_FILE\"\n</code></pre> <p>One can enable hypertheading by defining <code>--hint=multithread</code>, <code>--cpus-per-task=2</code> and <code>--ntasks-per-core=2</code>. However, this is in general not recommended, unless one can ensure that can be beneficial.</p>"},{"location":"merlin6/software-support/ansys-mapdl/#smp-based-example","title":"SMP-based example","text":"<p>This example shows how to running Mechanical APDL in Shared-Memory Parallelism mode. It limits the use to 1 single node, but by using many cores. In the example below, we use a node by using all his cores and the whole memory.</p> Bash<pre><code>#!/bin/bash\n#SBATCH --job-name=MAPDL # Job Name\n#SBATCH --partition=hourly # Using 'daily' will grant higher priority than 'general'\n#SBATCH --time=0-01:00:00 # Time needed for running the job. Must match with 'partition' limits.\n#SBATCH --nodes=1 # Number of nodes\n#SBATCH --ntasks=1 # Number of tasks\n#SBATCH --cpus-per-task=44 # Double if hyperthreading enabled\n#SBATCH --hint=nomultithread # Disable Hyperthreading\n#SBATCH --error=slurm-%j.err # Define a file for standard error messages\n#SBATCH --exclusive # Uncomment if you want exclusive usage of the nodes\n\nmodule use unstable\nmodule load ANSYS/2020R1-1\n\n# [Optional:BEGIN] Specify your license server if this is not 'lic-ansys.psi.ch'\nLICENSE_SERVER=<your_license_server>\nexport ANSYSLMD_LICENSE_FILE=1055@$LICENSE_SERVER\nexport ANSYSLI_SERVERS=2325@$LICENSE_SERVER\n# [Optional:END]\n\nSOLVER_FILE=/data/user/caubet_m/MAPDL/mysolver.in\nmapdl -b -np ${SLURM_CPUS_PER_TASK} -i \"$SOLVER_FILE\"\n</code></pre> <p>In the above example, one can reduce the number of cpus per task. Here usually <code>--exclusive</code> is recommended if one needs to use the whole memory.</p> <p>For SMP runs, one might try the hyperthreading mode by doubling the proper settings (<code>--cpus-per-task</code>), in some cases it might be beneficial.</p> <p>Please notice that <code>--ntasks-per-core=1</code> is not defined here, this is because we want to run 1 task on many cores! As an alternative, one can explore <code>--ntasks-per-socket</code> or <code>--ntasks-per-node</code> for fine grained configurations.</p>"},{"location":"merlin6/software-support/ansys-mapdl/#mpi-based-example","title":"MPI-based example","text":"<p>This example enables Distributed ANSYS for running Mechanical APDL using a Slurm batch script.</p> Bash<pre><code>#!/bin/bash\n#SBATCH --job-name=MAPDL # Job Name\n#SBATCH --partition=hourly # Using 'daily' will grant higher priority than 'general'\n#SBATCH --time=0-01:00:00 # Time needed for running the job. Must match with 'partition' limits.\n#SBATCH --nodes=1 # Number of nodes\n#SBATCH --ntasks=44 # Number of tasks\n#SBATCH --cpus-per-task=1 # Double if hyperthreading enabled\n#SBATCH --ntasks-per-core=1 # Run one task per core\n#SBATCH --hint=nomultithread # Disable Hyperthreading\n#SBATCH --error=slurm-%j.err # Define a file for standard error messages\n##SBATCH --exclusive # Uncomment if you want exclusive usage of the nodes\n\nmodule use unstable\nmodule load ANSYS/2020R1-1\n\n# [Optional:BEGIN] Specify your license server if this is not 'lic-ansys.psi.ch'\nLICENSE_SERVER=<your_license_server>\nexport ANSYSLMD_LICENSE_FILE=1055@$LICENSE_SERVER\nexport ANSYSLI_SERVERS=2325@$LICENSE_SERVER\n# [Optional:END]\n\nSOLVER_FILE=input.dat\n\n# INTELMPI=no for IBM MPI\n# INTELMPI=yes for INTEL MPI\nINTELMPI=no\n\nif [ \"$INTELMPI\" == \"yes\" ]\nthen\n # When using -mpi=intelmpi, KMP Affinity must be disabled\n export KMP_AFFINITY=disabled\n\n # INTELMPI is not aware about distribution of tasks.\n # - We need to define tasks distribution.\n HOSTLIST=$(srun hostname | sort | uniq -c | awk '{print $2 \":\" $1}' | tr '\\n' ':' | sed 's/:$/\\n/g')\n mapdl -b -dis -mpi intelmpi -machines $HOSTLIST -np ${SLURM_NTASKS} -i \"$SOLVER_FILE\"\nelse\n # IBMMPI (default) will be aware of the distribution of tasks.\n # - In principle, no need to force tasks distribution\n mapdl -b -dis -mpi ibmmpi -np ${SLURM_NTASKS} -i \"$SOLVER_FILE\"\nfi\n</code></pre> <p>In the above example, one can increase the number of nodes and/or ntasks if needed and combine it with <code>--exclusive</code> when necessary. In general, no hypertheading is recommended for MPI based jobs. Also, one can combine it with <code>--exclusive</code> when necessary.</p>"},{"location":"merlin6/software-support/ansys-rsm/","title":"ANSYS - RSM","text":""},{"location":"merlin6/software-support/ansys-rsm/#ansys-rsm","title":"ANSYS - RSM","text":""},{"location":"merlin6/software-support/ansys-rsm/#ansys-remote-resolve-manager","title":"ANSYS Remote Resolve Manager","text":"<p>ANSYS Remote Solve Manager (RSM) is used by ANSYS Workbench to submit computational jobs to HPC clusters directly from Workbench on your desktop. Therefore, PSI workstations with direct access to Merlin can submit jobs by using RSM.</p> <p>Users are responsible for requesting possible necessary network access and debugging any possible connectivity problem with the cluster. In example, in case that the workstation is behind a firewall, users would need to request a firewall rule to enable access to Merlin.</p> <p>Warning</p> <p>The Merlin6 administrators are not responsible for connectivity problems between users workstations and the Merlin6 cluster.</p>"},{"location":"merlin6/software-support/ansys-rsm/#the-merlin6-rsm-service","title":"The Merlin6 RSM service","text":"<p>A RSM service is running on each login node. This service will listen a specific port and will process any request using RSM (in example, from ANSYS users workstations). The following login nodes are configured with such services:</p> <ul> <li><code>merlin-l-01.psi.ch</code></li> <li><code>merlin-l-001.psi.ch</code></li> <li><code>merlin-l-002.psi.ch</code></li> </ul> <p>Each ANSYS release installed in <code>/data/software/pmodules/ANSYS</code> should have its own RSM service running (the listening port is the default one set by that ANSYS release). With the following command users can check which ANSYS releases have an RSM instance running:</p> Bash<pre><code>systemctl | grep pli-ansys-rsm-v[0-9][0-9][0-9].service\n</code></pre> Listing RSM service running on merlin-l-001.psi.ch Bash Session<pre><code># systemctl | grep pli-ansys-rsm-v[0-9][0-9][0-9].service\n pli-ansys-rsm-v195.service loaded active exited PSI ANSYS RSM v195\n pli-ansys-rsm-v202.service loaded active exited PSI ANSYS RSM v202\n pli-ansys-rsm-v211.service loaded active exited PSI ANSYS RSM v211\n pli-ansys-rsm-v212.service loaded active exited PSI ANSYS RSM v212\n pli-ansys-rsm-v221.service loaded active exited PSI ANSYS RSM v221\n</code></pre>"},{"location":"merlin6/software-support/ansys-rsm/#configuring-rsm-client-on-windows-workstations","title":"Configuring RSM client on Windows workstations","text":"<p>Users can setup ANSYS RSM in their workstations to connect to the Merlin6 cluster. The different steps and settings required to make it work are that following:</p> <ol> <li>Open the RSM Configuration service in Windows for the ANSYS release you want to configure.</li> <li>Right-click the HPC Resources icon followed by Add HPC Resource... </li> <li>In the HPC Resource tab, fill up the corresponding fields as follows: <ul> <li>\"Name\": Add here the preffered name for the cluster. In example: <code>Merlin6 cluster - merlin-l-001</code></li> <li>\"HPC Type\": Select <code>SLURM</code></li> <li>\"Submit host\": Add one of the login nodes. In example <code>merlin-l-001</code>.</li> <li>\"Slurm Job submission arguments (optional)\": Add any required Slurm options for running your jobs.</li> <li>In general, <code>--hint=nomultithread</code> should be at least present.</li> <li>Check \"Use SSH protocol for inter and intra-node communication (Linux only)\"</li> <li>Select \"Able to directly submit and monitor HPC jobs\".</li> <li>\"Apply\" changes.</li> </ul> </li> <li>In the \"File Management\" tab, fill up the corresponding fields as follows: <ul> <li>Select \"RSM internal file transfer mechanism\" and add <code>/shared-scratch</code> as the \"Staging directory path on Cluster\"</li> <li>Select \"Scratch directory local to the execution node(s)\" and add <code>/scratch</code> as the HPC scratch directory.</li> <li>Never check the option \"Keep job files in the staging directory when job is complete\" if the previous option \"Scratch directory local to the execution node(s)\" was set.</li> <li>\"Apply\" changes.</li> </ul> </li> <li>In the \"Queues\" tab, use the left button to auto-discover partitions <ul> <li>If no authentication method was configured before, an authentication window will appear. Use your PSI account to authenticate. Notice that the <code>PSICH\\</code> prefix must not be added. </li> </ul> </li> <li>From the partition list, select the ones you want to typically use.<ul> <li>In general, standard Merlin users must use <code>hourly</code>, <code>daily</code> and <code>general</code> only.</li> <li>Other partitions are reserved for allowed users only.</li> </ul> </li> <li>\"Apply\" changes. </li> <li>[Optional] You can perform a test by submitting a test job on each partition by clicking on the Submit button for each selected partition.</li> </ol> <p>Tip</p> <p>Repeat the process from for adding other login nodes if necessary. This will give users the alternative of using another login node in case of maintenance windows.</p>"},{"location":"merlin6/software-support/ansys-rsm/#using-rsm-in-ansys","title":"Using RSM in ANSYS","text":"<p>Using the RSM service in ANSYS is slightly different depending on the ANSYS software being used. Please follow the official ANSYS documentation for details about how to use it for that specific software.</p> <p>Alternativaly, please refer to some the examples showed in the following chapters (ANSYS specific software).</p>"},{"location":"merlin6/software-support/ansys-rsm/#using-rsm-in-ansys-fluent","title":"Using RSM in ANSYS Fluent","text":"<p>For further information for using RSM with Fluent, please visit the ANSYS Fluent section.</p>"},{"location":"merlin6/software-support/ansys-rsm/#using-rsm-in-ansys-cfx","title":"Using RSM in ANSYS CFX","text":"<p>For further information for using RSM with CFX, please visit the ANSYS CFX section.</p>"},{"location":"merlin6/software-support/ansys-rsm/#using-rsm-in-ansys-mapdl","title":"Using RSM in ANSYS MAPDL","text":"<p>For further information for using RSM with MAPDL, please visit the ANSYS mapdl section.</p>"},{"location":"merlin6/software-support/ansys/","title":"ANSYS","text":""},{"location":"merlin6/software-support/ansys/#ansys","title":"ANSYS","text":"<p>This document describes generic information of how to load and run ANSYS software in the Merlin cluster</p>"},{"location":"merlin6/software-support/ansys/#ansys-software-in-pmodules","title":"ANSYS software in Pmodules","text":"<p>The ANSYS software can be loaded through PModules.</p> <p>The default ANSYS versions are loaded from the central PModules repository. However, there are some known problems that can pop up when using some specific ANSYS packages in advanced mode. Due to this, and also to improve the interactive experience of the user, ANSYS has been also installed in the Merlin high performance storage and we have made it available from Pmodules.</p>"},{"location":"merlin6/software-support/ansys/#loading-merlin6-ansys","title":"Loading Merlin6 ANSYS","text":"<p>For loading the Merlin6 ANSYS software, one needs to run Pmodules v1.1.4 or newer, and then use a specific repository (called <code>overlay_merlin</code>) which is only available from the Merlin cluster:</p> Bash<pre><code>module load Pmodules/1.1.6\nmodule use overlay_merlin\n</code></pre> <p>Once <code>overlay_merlin</code> is invoked, it will disable central ANSYS installations with the same version, which will be replaced by the local ones in Merlin. Releases from the central Pmodules repository which have not a local installation will remain visible. For each ANSYS release, one can identify where it is installed by searching ANSYS in PModules with the <code>--verbose</code> option. This will show the location of the different ANSYS releases as follows:</p> <ul> <li>For ANSYS releases installed in the central repositories, the path starts with <code>/opt/psi</code></li> <li>For ANSYS releases installed in the Merlin6 repository (and/or overwritting the central ones), the path starts with <code>/data/software/pmodules</code></li> </ul> [Example] Loading ANSYS from the Merlin6 PModules repository Bash Session<pre><code># module load Pmodules/1.1.6\nmodule load: unstable module has been loaded -- Pmodules/1.1.6\n\n# module use merlin_overlay\n\n# module search ANSYS --verbose\n\nModule Rel.stage Group Dependencies/Modulefile\n-------------------------------------------------------------------------------------------------------------------------------------------------------------------------\nANSYS/2019R3 stable Tools dependencies:\n modulefile: /data/software/pmodules/Tools/modulefiles/ANSYS/2019R3\nANSYS/2020R1 stable Tools dependencies:\n modulefile: /opt/psi/Tools/modulefiles/ANSYS/2020R1\nANSYS/2020R1-1 stable Tools dependencies:\n modulefile: /opt/psi/Tools/modulefiles/ANSYS/2020R1-1\nANSYS/2020R2 stable Tools dependencies:\n modulefile: /data/software/pmodules/Tools/modulefiles/ANSYS/2020R2\nANSYS/2021R1 stable Tools dependencies:\n modulefile: /data/software/pmodules/Tools/modulefiles/ANSYS/2021R1\nANSYS/2021R2 stable Tools dependencies:\n modulefile: /data/software/pmodules/Tools/modulefiles/ANSYS/2021R2\n</code></pre> <p>Tip</p> <p>Please only use Merlin6 ANSYS installations from <code>overlay_merlin</code> in the Merlin cluster.</p>"},{"location":"merlin6/software-support/ansys/#ansys-documentation-by-product","title":"ANSYS Documentation by product","text":""},{"location":"merlin6/software-support/ansys/#ansys-rsm","title":"ANSYS RSM","text":"<p>ANSYS Remote Solve Manager (RSM) is used by ANSYS Workbench to submit computational jobs to HPC clusters directly from Workbench on your desktop.</p> <p>Therefore, PSI workstations with direct access to Merlin can submit jobs by using RSM.</p> <p>For further information, please visit the ANSYS RSM section.</p>"},{"location":"merlin6/software-support/ansys/#ansys-fluent","title":"ANSYS Fluent","text":"<p>For further information, please visit the ANSYS Fluent section.</p>"},{"location":"merlin6/software-support/ansys/#ansys-cfx","title":"ANSYS CFX","text":"<p>For further information, please visit the ANSYS CFX section.</p>"},{"location":"merlin6/software-support/ansys/#ansys-mapdl","title":"ANSYS MAPDL","text":"<p>For further information, please visit the ANSYS MAPDL section.</p>"},{"location":"merlin6/software-support/gothic/","title":"GOTHIC","text":""},{"location":"merlin6/software-support/gothic/#gothic","title":"GOTHIC","text":""},{"location":"merlin6/software-support/gothic/#installation","title":"Installation","text":"<p>Gothic is locally installed in Merlin in the following directory:</p> Bash<pre><code>/data/project/general/software/gothic\n</code></pre> <p>Multiple versions are available. As of August 22, 2022, the latest installed version is Gothic 8.3 QA.</p> <p>Future releases will be placed in the PSI Modules system, therefore, loading it through PModules will be possible at some point. However, in the meantime one has to use the existing installations present in <code>/data/project/general/software/gothic</code>.</p>"},{"location":"merlin6/software-support/gothic/#running-gothic","title":"Running Gothic","text":""},{"location":"merlin6/software-support/gothic/#general-requirements","title":"General requirements","text":"<p>When running Gothic in interactive or batch mode, one has to consider the following requirements:</p> <ul> <li>Use always one node only: Gothic runs a single instance. Therefore, it can not run on multiple nodes. Adding option <code>--nodes=1-1</code> or <code>-N 1-1</code> is strongly recommended: this will prevent Slurm to allocate multiple nodes if the Slurm allocation definition is ambiguous.</li> <li>Use one task only: Gothic spawns one main process, which then will spawn multiple threads depending on the number of available cores. Therefore, one has to specify 1 task (<code>--ntasks=1</code> or <code>-n 1</code>).</li> <li>Use multiple CPUs: since Gothic will spawn multiple threads, then multiple CPUs can be used. Adding <code>--cpus-per-task=<num_cpus></code> or <code>-c <num_cpus></code> is in general recommended. Notice that <code><num_cpus></code> must never exceed the maximum number of CPUS in a compute node (usually 88).</li> <li>Use multithread: Gothic is an OpenMP based software, therefore, running in hyper-threading mode is strongly recommended. Use the option <code>--hint=multithread</code> for enforcing hyper-threading.</li> <li>[Optional] Memory setup: The default memory per CPU (4000MB) is usually enough for running Gothic. If you require more memory, you can always set the <code>--mem=<mem_in_MB></code> option. This is in general not necessary.</li> </ul>"},{"location":"merlin6/software-support/gothic/#interactive","title":"Interactive","text":"<p>Is not allowed to run CPU intensive interactive jobs in the login nodes. Only applications capable to limit the number of cores are allowed to run for longer time. Also, running in the login nodes is not efficient, since resources are shared with other processes and users.</p> <p>Is possible to submit interactive jobs to the cluster by allocating a full compute node, or even by allocating a few cores only. This will grant dedicated CPUs and resources and in general it will not affect other users.</p> <p>For interactive jobs, is strongly recommended to use the <code>hourly</code> partition, which usually has a good availability of nodes.</p> <p>For longer runs, one should use the <code>daily</code> (or <code>general</code>) partition. However, getting interactive access to nodes on these partitions is sometimes more difficult if the cluster is pretty full.</p> <p>To submit an interactive job, consider the following requirements:</p> <ul> <li>X11 forwarding must be enabled: Gothic spawns an interactive window which requires X11 forwarding when using it remotely, therefore using the Slurm option <code>--x11</code> is necessary.</li> <li>Ensure that the scratch area is accessible: For running Gothic, one has to define a scratch area with the <code>GTHTMP</code> environment variable. There are two options:</li> <li> <p>Use local scratch: Each compute node has its own <code>/scratch</code> area. This area is independent to any other node, therefore not visible by other nodes. Using the top directory <code>/scratch</code> for interactive jobs is the simplest way, and it can be defined before or after the allocation creation, as follows:</p> Bash<pre><code># Example 1: Define GTHTMP before the allocation\nexport GTHTMP=/scratch\nsalloc ...\n\n# Example 2: Define GTHTMP after the allocation\nsalloc ...\nexport GTHTMP=/scratch\n</code></pre> <p>Notice that if you want to create a custom sub-directory (i.e. <code>/scratch/$USER</code>, one has to create the sub-directory on every new allocation! In example:</p> Bash<pre><code># Example 1:\nexport GTHTMP=/scratch/$USER\nsalloc ...\nmkdir -p $GTHTMP\n\n# Example 2:\nsalloc ...\nexport GTHTMP=/scratch/$USER\nmkdir -p $GTHTMP\n</code></pre> <p>Creating sub-directories makes the process more complex, therefore using just <code>/scratch</code> is simpler and recommended. 2. Shared scratch: Using shared scratch allows to have a directory visible from all compute nodes and login nodes. Therefore, one can use <code>/shared-scratch</code> to achieve the same as in 1., but creating a sub-directory needs to be done just once.</p> <p>Please, consider that <code>/scratch</code> usually provides better performance and, in addition, will offload the main storage. Therefore, using local scratch is strongly recommended. Use the shared scratch only when strongly necessary.</p> </li> <li> <p>Use the <code>hourly</code> partition: Using the <code>hourly</code> partition is recommended for running interactive jobs (latency is in general lower). However, <code>daily</code> and <code>general</code> are also available if you expect longer runs, but in these cases you should expect longer waiting times.</p> </li> </ul> <p>These requirements are in addition to the requirements previously described in the General requirements section.</p>"},{"location":"merlin6/software-support/gothic/#interactive-allocations-examples","title":"Interactive allocations: examples","text":"<ul> <li>Requesting a full node:</li> </ul> Bash<pre><code>salloc --partition=hourly -N 1 -n 1 -c 88 --hint=multithread --x11 --exclusive --mem=0\n</code></pre> <ul> <li>Requesting 22 CPUs from a node, with default memory per CPU (4000MB/CPU):</li> </ul> Bash<pre><code>num_cpus=22\nsalloc --partition=hourly -N 1 -n 1 -c $num_cpus --hint=multithread --x11\n</code></pre>"},{"location":"merlin6/software-support/gothic/#batch-job","title":"Batch job","text":"<p>The Slurm cluster is mainly used by non interactive batch jobs: Users submit a job, which goes into a queue, and waits until Slurm can assign resources to it. In general, the longer the job, the longer the waiting time, unless there are enough free resources to inmediately start running it.</p> <p>Running Gothic in a Slurm batch script is pretty simple. One has to mainly consider the requirements described in the General requirements section, and:</p> <ul> <li>Use local scratch for running batch jobs. In general, defining <code>GTHTMP</code> in a batch script is simpler than on an allocation. If you plan to run multiple jobs in the same node, you can even create a second sub-directory level based on the Slurm Job ID:</li> </ul> Bash<pre><code>mkdir -p /scratch/$USER/$SLURM_JOB_ID\nexport GTHTMP=/scratch/$USER/$SLURM_JOB_ID\n... # Run Gothic here\nrm -rf /scratch/$USER/$SLURM_JOB_ID\n</code></pre> <p>Temporary data generated by the job in <code>GTHTMP</code> must be removed at the end of the job, as showed above.</p>"},{"location":"merlin6/software-support/gothic/#batch-script-examples","title":"Batch script: examples","text":"<ul> <li>Requesting a full node:</li> </ul> Bash<pre><code>#!/bin/bash -l\n#SBATCH --job-name=Gothic\n#SBATCH --time=3-00:00:00\n#SBATCH --partition=general\n#SBATCH --nodes=1\n#SBATCH --ntasks=1\n#SBATCH --cpus-per-task=88\n#SBATCH --hint=multithread\n#SBATCH --exclusive\n#SBATCH --mem=0\n#SBATCH --clusters=merlin6\n\nINPUT_FILE='MY_INPUT.SIN'\n\nmkdir -p /scratch/$USER/$SLURM_JOB_ID\nexport GTHTMP=/scratch/$USER/$SLURM_JOB_ID\n\n/data/project/general/software/gothic/gothic8.3qa/bin/gothic_s.sh $INPUT_FILE -m -np $SLURM_CPUS_PER_TASK\ngth_exit_code=$?\n\n# Clean up data in /scratch\nrm -rf /scratch/$USER/$SLURM_JOB_ID\n\n# Return exit code from GOTHIC\nexit $gth_exit_code\n</code></pre> <ul> <li>Requesting 22 CPUs from a node, with default memory per CPU (4000MB/CPU):</li> </ul> Bash<pre><code>#!/bin/bash -l\n#SBATCH --job-name=Gothic\n#SBATCH --time=3-00:00:00\n#SBATCH --partition=general\n#SBATCH --nodes=1\n#SBATCH --ntasks=1\n#SBATCH --cpus-per-task=22\n#SBATCH --hint=multithread\n#SBATCH --clusters=merlin6\n\nINPUT_FILE='MY_INPUT.SIN'\n\nmkdir -p /scratch/$USER/$SLURM_JOB_ID\nexport GTHTMP=/scratch/$USER/$SLURM_JOB_ID\n\n/data/project/general/software/gothic/gothic8.3qa/bin/gothic_s.sh $INPUT_FILE -m -np $SLURM_CPUS_PER_TASK\ngth_exit_code=$?\n\n# Clean up data in /scratch\nrm -rf /scratch/$USER/$SLURM_JOB_ID\n\n# Return exit code from GOTHIC\nexit $gth_exit_code\n</code></pre>"},{"location":"merlin6/software-support/impi/","title":"Intel MPI Support","text":""},{"location":"merlin6/software-support/impi/#intel-mpi-support","title":"Intel MPI Support","text":"<p>This document describes which set of Intel MPI versions in PModules are supported in the Merlin6 cluster.</p>"},{"location":"merlin6/software-support/impi/#usage","title":"Usage","text":""},{"location":"merlin6/software-support/impi/#srun","title":"srun","text":"<p>We strongly recommend the use of <code>srun</code> over <code>mpirun</code> or <code>mpiexec</code>. Using <code>srun</code> would properly bind tasks in to cores and less customization is needed, while <code>mpirun</code> and <code>mpiexec</code> might need more advanced configuration and should be only used by advanced users. Please, always adapt your scripts for using <code>srun</code> before opening a support ticket. Also, please contact us on any problem when using a module.</p> <p>Tip</p> <p>Always run Intel MPI with the srun command. The only exception is for advanced users, however srun is still recommended.</p> <p>When running with srun, one should tell Intel MPI to use the PMI libraries provided by Slurm. For PMI-1:</p> Bash<pre><code>export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so\n\nsrun ./app\n</code></pre> <p>Alternatively, one can use PMI-2, but then one needs to specify it as follows:</p> Bash<pre><code>export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi2.so\nexport I_MPI_PMI2=yes\n\nsrun ./app\n</code></pre> <p>For more information, please read Slurm Intel MPI Guide</p> <p>Note</p> <p>Please note that PMI2 might not work properly in some Intel MPI versions. If so, you can either fallback to PMI-1 or to contact the Merlin administrators.</p>"},{"location":"merlin6/software-support/merlin-rmount/","title":"merlin_rmount","text":""},{"location":"merlin6/software-support/merlin-rmount/#merlin_rmount","title":"merlin_rmount","text":""},{"location":"merlin6/software-support/merlin-rmount/#background","title":"Background","text":"<p>Merlin provides a command for mounting remote file systems, called <code>merlin_rmount</code>. This provides a helpful wrapper over the Gnome storage utilities (GIO and GVFS), and provides support for a wide range of remote file formats, including:</p> <ul> <li>SMB/CIFS (Windows shared folders)</li> <li>WebDav</li> <li>AFP</li> <li>FTP, SFTP</li> <li>complete list</li> </ul>"},{"location":"merlin6/software-support/merlin-rmount/#usage","title":"Usage","text":""},{"location":"merlin6/software-support/merlin-rmount/#start-a-session","title":"Start a session","text":"<p>First, start a new session. This will start a new bash shell in the current terminal where you can add further commands.</p> Bash Session<pre><code>$ merlin_rmount --init\n[INFO] Starting new D-Bus RMOUNT session\n\n(RMOUNT STARTED) [bliven_s@merlin-l-002 ~]$\n</code></pre> <p>Note that behind the scenes this is creating a new dbus daemon. Running multiple daemons on the same login node leads to unpredictable results, so it is best not to initialize multiple sessions in parallel.</p>"},{"location":"merlin6/software-support/merlin-rmount/#standard-endpoints","title":"Standard Endpoints","text":"<p>Standard endpoints can be mounted using</p> Bash<pre><code>merlin_rmount --select-mount\n</code></pre> <p>Select the desired url using the arrow keys.</p> <p></p> <p>From this list any of the standard supported endpoints can be mounted.</p>"},{"location":"merlin6/software-support/merlin-rmount/#other-endpoints","title":"Other endpoints","text":"<p>Other endpoints can be mounted using the <code>merlin_rmount --mount <endpoint></code> command.</p> <p></p>"},{"location":"merlin6/software-support/merlin-rmount/#accessing-files","title":"Accessing Files","text":"<p>After mounting a volume the script will print the mountpoint. It should be of the form</p> Bash<pre><code>/run/user/$UID/gvfs/<endpoint>\n</code></pre> <p>where <code>$UID</code> gives your unix user id (a 5-digit number, also viewable with <code>id -u</code>) and <code><endpoint></code> is some string generated from the mount options.</p> <p>For convenience, it may be useful to add a symbolic link for this gvfs directory. For instance, this would allow all volumes to be accessed in ~/mnt/:</p> Bash<pre><code>ln -s ~/mnt /run/user/$UID/gvfs\n</code></pre> <p>Files are accessible as long as the <code>merlin_rmount</code> shell remains open.</p>"},{"location":"merlin6/software-support/merlin-rmount/#disconnecting","title":"Disconnecting","text":"<p>To disconnect, close the session with one of the following:</p> <ul> <li>The exit command</li> <li>CTRL-D</li> <li>Closing the terminal</li> </ul> <p>Disconnecting will unmount all volumes.</p>"},{"location":"merlin6/software-support/merlin-rmount/#alternatives","title":"Alternatives","text":""},{"location":"merlin6/software-support/merlin-rmount/#thunar","title":"Thunar","text":"<p>Users that prefer a GUI file browser may prefer the <code>thunar</code> command, which opens the Gnome File Browser. This is also available in NoMachine sessions in the bottom bar (1). Thunar supports the same remote filesystems as <code>merlin_rmount</code>; just type the URL in the address bar (2).</p> <p></p> <p>When using thunar within a NoMachine session, file transfers continue after closing NoMachine (as long as the NoMachine session stays active).</p> <p>Files can also be accessed at the command line as needed (see 'Accessing Files' above).</p>"},{"location":"merlin6/software-support/merlin-rmount/#resources","title":"Resources","text":"<ul> <li>BIO docs on using these tools for transfering EM data</li> <li>Redhad docs on GVFS</li> <li>gio reference</li> </ul>"},{"location":"merlin6/software-support/openmpi/","title":"OpenMPI","text":""},{"location":"merlin6/software-support/openmpi/#openmpi","title":"OpenMPI","text":"<p>This document describes which set of OpenMPI versions in PModules are supported in the Merlin6 cluster.</p>"},{"location":"merlin6/software-support/openmpi/#usage","title":"Usage","text":""},{"location":"merlin6/software-support/openmpi/#srun","title":"srun","text":"<p>We strongly recommend the use of <code>srun</code> over <code>mpirun</code> or <code>mpiexec</code>. Using <code>srun</code> would properly bind tasks in to cores and less customization is needed, while <code>mpirun</code> and 'mpiexec' might need more advanced configuration and should be only used by advanced users. Please, always adapt your scripts for using <code>srun</code> before opening a support ticket. Also, please contact us on any problem when using a module.</p> <p>Example:</p> Bash<pre><code>srun ./app\n</code></pre> <p>Tip</p> <p>Always run OpenMPI with the <code>srun</code> command. The only exception is for advanced users, however <code>srun</code> is still recommended.</p>"},{"location":"merlin6/software-support/openmpi/#openmpi-with-ucx","title":"OpenMPI with UCX","text":"<p>OpenMPI supports UCX starting from version 3.0, but it\u2019s recommended to use version 4.0 or higher due to stability and performance improvements. UCX should be used only by advanced users, as it requires to run it with <code>mpirun</code> (needs advanced knowledge) and is an exception for running MPI without <code>srun</code> (UCX is not integrated at PSI within <code>srun</code>).</p> <p>For running UCX, one should:</p> <ul> <li>add the following options to <code>mpirun</code>:</li> </ul> Bash<pre><code>-mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=mlx5_0:1\n</code></pre> <ul> <li>or alternatively, add the following options before <code>mpirun</code>:</li> </ul> Bash<pre><code>export OMPI_MCA_pml=\"ucx\"\nexport OMPI_MCA_btl=\"^vader,tcp,openib,uct\"\nexport UCX_NET_DEVICES=mlx5_0:1\n</code></pre> <p>In addition, one can add the following options for debugging purposes (visit UCX Logging for possible <code>UCX_LOG_LEVEL</code> values):</p> Bash<pre><code>-x UCX_LOG_LEVEL=<data|debug|warn|info|...> -x UCX_LOG_FILE=<filename>\n</code></pre> <p>This can be also added externally before the <code>mpirun</code> call (see below example). Full example:</p> <ul> <li>Within the <code>mpirun</code> command:</li> </ul> Bash<pre><code>mpirun -np $SLURM_NTASKS -mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=mlx5_0:1 -x UCX_LOG_LEVEL=data -x UCX_LOG_FILE=UCX-$SLURM_JOB_ID.log ./app\n</code></pre> <ul> <li>Outside the <code>mpirun</code> command:</li> </ul> Bash<pre><code>export OMPI_MCA_pml=\"ucx\"\nexport OMPI_MCA_btl=\"^vader,tcp,openib,uct\"\nexport UCX_NET_DEVICES=mlx5_0:1\nexport UCX_LOG_LEVEL=data\nexport UCX_LOG_FILE=UCX-$SLURM_JOB_ID.log\n\nmpirun -np $SLURM_NTASKS ./app\n</code></pre>"},{"location":"merlin6/software-support/openmpi/#supported-openmpi-versions","title":"Supported OpenMPI versions","text":"<p>For running OpenMPI properly in a Slurm batch system, OpenMPI and Slurm must be compiled accordingly.</p> <p>We can find a large number of compilations of OpenMPI modules in the PModules central repositories. However, only some of them are suitable for running in a Slurm cluster: any OpenMPI versions with suffixes <code>_slurm</code> are suitable for running in the Merlin6 cluster. Also, OpenMPI with suffix <code>_merlin6</code> can be used, but these will be fully replaced by the <code>_slurm</code> series in the future (so it can be used on any Slurm cluster at PSI). Please, avoid using any other OpenMPI releases.</p> <p>Tip</p> <p>Suitable OpenMPI versions for running in the Merlin6 cluster:</p> <ul> <li><code>openmpi/<version>_slurm</code> [Recommended]</li> <li><code>openmpi/<version>_merlin6</code></li> </ul>"},{"location":"merlin6/software-support/openmpi/#unstable-repository","title":"'unstable' repository","text":"<p>New OpenMPI versions that need to be tested will be compiled first in the <code>unstable</code> repository, and once validated will be moved to <code>stable</code>. We can not ensure that modules in that repository are production ready, but you can use it at your own risk.</p> <p>For using unstable modules, you might need to load the <code>unstable</code> PModules repository as follows:</p> Bash<pre><code>module use unstable\n</code></pre>"},{"location":"merlin6/software-support/openmpi/#stable-repository","title":"'stable' repository","text":"<p>Officially supported OpenMPI versions will be available in the <code>stable</code> repository (which is the default loaded repository).</p> <p>For further information, please check Current and still supported versions in the left-hand sidebar.</p> <p>Usually, not more than 2 minor update releases will be present in the <code>stable</code> repository. Older minor update releases will be moved to <code>deprecated</code> despite are officially supported. This will ensure that users compile new software with the latest stable versions, but we keep available the old versions for software which was compiled with it.</p>"},{"location":"merlin6/software-support/openmpi/#deprecated-repository","title":"'deprecated' repository","text":"<p>Old OpenMPI versions (it is, any official OpenMPI version which has been moved to retired or ancient) will be moved to the <code>deprecated</code> PModules repository. For further information, please check Older versions in the left-hand sidebar. versions.</p> <p>Also, as mentioned in before, older official supported OpenMPI releases (minor updates) will be moved to <code>deprecated</code>.</p> <p>For using deprecated modules, you might need to load the <code>deprecated</code> PModules repository as follows:</p> Bash<pre><code>module use deprecated\n</code></pre> <p>However, this is usually not needed: when loading directly a specific version in the <code>deprecated</code> repository, if this is not found in <code>stable</code> it try to search and to fallback to other repositories (<code>deprecated</code> or <code>unstable</code>).</p>"},{"location":"merlin6/software-support/openmpi/#about-missing-versions","title":"About missing versions","text":""},{"location":"merlin6/software-support/openmpi/#missing-openmpi-versions","title":"Missing OpenMPI versions","text":"<p>For legacy software, some users might require a different OpenMPI version. We always encourage users to try one of the existing stable versions (OpenMPI always with suffix <code>_slurm</code> or <code>_merlin6</code>!), as they will contain the latest bug fixes and they usually should work. In the worst case, you can also try with the ones in the deprecated repository (again, OpenMPI always with suffix <code>_slurm</code> or <code>_merlin6</code>!), or for very old software which was based on OpenMPI v1 you can follow the guide FAQ: Removed MPI constructs, which provides some easy steps for migrating from OpenMPI v1 to v2 or superior or also is useful to find out why your code does not compile properly.</p> <p>When, after trying the mentioned versions and guide, you are still facing problems, please contact us. Also, please contact us if you require a newer version with a different <code>gcc</code> or <code>intel</code> compiler (in example, Intel v19).</p>"},{"location":"merlin6/software-support/paraview/","title":"ParaView","text":""},{"location":"merlin6/software-support/paraview/#paraview","title":"ParaView","text":"<p>Note</p> <p>NoMachine is the official strongly recommended and supported tool for running ParaView. Consider that running over SSH (X11-Forwarding needed) is very slow, but also configuration might not work as it also depends on the client configuration (Linux workstation/laptop, Windows with XMing, etc.). Hence, please avoid running Paraview over SSH. The only exception for running over SSH is when running it as a job from a NoMachine client.</p>"},{"location":"merlin6/software-support/paraview/#usage","title":"Usage","text":""},{"location":"merlin6/software-support/paraview/#pmodules","title":"PModules","text":"<p>Is strongly recommended the use of the latest ParaView version available in PModules. In example, for loading paraview:</p> Bash<pre><code>module use unstable\nmodule load paraview/5.8.1\n</code></pre>"},{"location":"merlin6/software-support/paraview/#running-paraview","title":"Running ParaView","text":"<p>For running ParaView, one can run it with VirtualGL to take advantatge of the GPU card located on each login node. For that, once loaded, you can start paraview as follows:</p> Bash<pre><code>vglrun paraview\n</code></pre> <p>Alternatively, one can run paraview with mesa support with the below command. This can be useful when running on CPU computing nodes (with <code>srun</code> / <code>salloc</code>) which have no graphics card (and where <code>vglrun</code> is not possible):</p> Bash<pre><code>paraview-mesa paraview\n</code></pre>"},{"location":"merlin6/software-support/paraview/#running-older-versions-of-paraview","title":"Running older versions of ParaView","text":"<p>Older versions of ParaView available in PModules (i.e. <code>paraview/5.0.1</code> and <code>paraview/5.4.1</code>) might require a different command for running paraview with Mesa support. The command is the following:</p> Bash<pre><code># Warning: only for Paraview 5.4.1 and older\nparaview --mesa\n</code></pre>"},{"location":"merlin6/software-support/paraview/#running-paraview-interactively-in-the-batch-system","title":"Running ParaView interactively in the batch system","text":"<p>One can run ParaView interactively in the CPU cluster as follows:</p> Bash<pre><code># First, load module. In example: \"module load paraview/5.8.1\"\nsrun --pty --x11 --partition=general --ntasks=1 paraview-mesa paraview\n</code></pre> <p>One can change the partition, number of tasks or specify extra parameters to <code>srun</code> if needed.</p>"},{"location":"merlin6/software-support/python/","title":"Python","text":""},{"location":"merlin6/software-support/python/#python","title":"Python","text":"<p>PSI provides a variety of ways to execute python code.</p> <ol> <li>Anaconda - Custom environments for using installation and development</li> <li>Jupyterhub - Execute Jupyter notebooks on the cluster</li> <li>System Python - Do not use! Only for OS applications.</li> </ol>"},{"location":"merlin6/software-support/python/#anaconda","title":"Anaconda","text":"<p>Anaconda (\"conda\" for short) is a package manager with excellent python integration. Using it you can create isolated environments for each of your python applications, containing exactly the dependencies needed for that app. It is similar to the virtualenv python package, but can also manage non-python requirements.</p>"},{"location":"merlin6/software-support/python/#loading-conda","title":"Loading conda","text":"<p>Conda is loaded from the module system:</p> Bash<pre><code>module load anaconda\n</code></pre>"},{"location":"merlin6/software-support/python/#using-pre-made-environments","title":"Using pre-made environments","text":"<p>Loading the module provides the <code>conda</code> command, but does not otherwise change your environment. First an environment needs to be activated. Available environments can be seen with <code>conda info --envs</code> and include many specialized environments for software installs. After activating you should see the environment name in your prompt:</p> Bash<pre><code>conda activate datascience_py37\n</code></pre>"},{"location":"merlin6/software-support/python/#condarc-file","title":"CondaRC file","text":"<p>Creating a <code>~/.condarc</code> file is recommended if you want to create new environments on merlin. Environments can grow quite large, so you will need to change the default storage location from the default (your home directory) to a larger volume (usually <code>/data/user/$USER</code>).</p> <p>Save the following as <code>$HOME/.condarc</code>:</p> YAML<pre><code>always_copy: true\n\nenvs_dirs:\n - /data/user/$USER/conda/envs\n\npkgs_dirs:\n - /data/user/$USER/conda/pkgs\n - $ANACONDA_PREFIX/conda/pkgs\n\nchannels:\n - conda-forge\n - nodefaults\n</code></pre> <p>Run <code>conda info</code> to check that the variables are being set correctly.</p>"},{"location":"merlin6/software-support/python/#creating-environments","title":"Creating environments","text":"<p>We will create an environment named <code>myenv</code> which uses an older version of numpy, e.g. to test for backwards compatibility of our code (the <code>-q</code> and <code>--yes</code> switches are just for not getting prompted and disabling the progress bar). The environment will be created in the default location as defined by the <code>.condarc</code> configuration file (see above).</p> Bash Session<pre><code>$ conda create -q --yes -n 'myenv1' numpy=1.8 scipy ipython\n\nFetching package metadata: ...\nSolving package specifications: .\nPackage plan for installation in environment /gpfs/home/feichtinger/conda-envs/myenv1:\n\nThe following NEW packages will be INSTALLED:\n\n ipython: 2.3.0-py27_0\n numpy: 1.8.2-py27_0\n openssl: 1.0.1h-1\n pip: 1.5.6-py27_0\n python: 2.7.8-1\n readline: 6.2-2\n scipy: 0.14.0-np18py27_0\n setuptools: 5.8-py27_0\n sqlite: 3.8.4.1-0\n system: 5.8-1\n tk: 8.5.15-0\n zlib: 1.2.7-0\n\nTo activate this environment, use:\n$ source activate myenv1\n\nTo deactivate this environment, use:\n$ source deactivate\n</code></pre> <p>The created environment contains just the packages that are needed to satisfy the requirements and it is local to your installation. The python installation is even independent of the central installation, i.e. your code will still work in such an environment, even if you are offline or AFS is down. However, you need the central installation if you want to use the <code>conda</code> command itself.</p> <p>Packages for your new environment will be either copied from the central one into your new environment, or if there are newer packages available from anaconda and you did not specify exactly the version from our central installation, they may get downloaded from the web. This will require significant space in the <code>envs_dirs</code> that you defined in <code>.condarc</code>. If you create other environments on the same local disk, they will share the packages using hard links.</p> <p>We can switch to the newly created environment with the <code>conda activate</code> command.</p> Bash<pre><code>conda activate myenv1\n</code></pre> <p>Info</p> <p>Note that anaconda's activate/deactivate scripts are compatible with the bash and zsh shells but not with [t]csh.</p> <p>Let's test whether we indeed got the desired numpy version:</p> Bash Session<pre><code>$ python -c 'import numpy as np; print np.version.version'\n\n1.8.2\n</code></pre> <p>You can install additional packages into the active environment using the <code>conda install</code> command.</p> Bash Session<pre><code>$ conda install --yes -q bottle\n\nFetching package metadata: ...\nSolving package specifications: .\nPackage plan for installation in environment /gpfs/home/feichtinger/conda-envs/myenv1:\n\nThe following NEW packages will be INSTALLED:\n\n bottle: 0.12.5-py27_0\n</code></pre>"},{"location":"merlin6/software-support/python/#jupyterhub","title":"Jupyterhub","text":"<p>Jupyterhub is a service for running code notebooks on the cluster, particularly in python. It is a powerful tool for data analysis and prototyping. For more infomation see the Jupyterhub documentation.</p>"},{"location":"merlin6/software-support/python/#pythons-to-avoid","title":"Pythons to avoid","text":"<p>Avoid using the system python (<code>/usr/bin/python</code>). It is intended for OS software and may not be up to date.</p> <p>Also avoid the 'python' module (<code>module load python</code>). This is a minimal install of python intended for embedding in other modules.</p>"},{"location":"merlin7/01-Quick-Start-Guide/accessing-interactive-nodes/","title":"Accessing Interactive Nodes","text":""},{"location":"merlin7/01-Quick-Start-Guide/accessing-interactive-nodes/#accessing-interactive-nodes","title":"Accessing Interactive Nodes","text":""},{"location":"merlin7/01-Quick-Start-Guide/accessing-interactive-nodes/#ssh-access","title":"SSH Access","text":"<p>For interactive command shell access, use an SSH client. We recommend to activate SSH's X11 forwarding to allow you to use graphical applications (e.g. a text editor, but for more performant graphical access, refer to the sections below). X applications are supported in the login nodes and X11 forwarding can be used for those users who have properly configured X11 support in their desktops, however:</p> <ul> <li>Merlin7 administrators do not offer support for user desktop configuration (Windows, MacOS, Linux).<ul> <li>Hence, Merlin7 administrators do not offer official support for X11 client setup.</li> <li>Nevertheless, a generic guide for X11 client setup (Linux, Windows and MacOS) is provided below.</li> </ul> </li> <li>PSI desktop configuration issues must be addressed through PSI Service Now as an Incident Request.<ul> <li>Ticket will be redirected to the corresponding Desktop support group (Windows, Linux).</li> </ul> </li> </ul>"},{"location":"merlin7/01-Quick-Start-Guide/accessing-interactive-nodes/#accessing-from-a-linux-client","title":"Accessing from a Linux client","text":"<p>Refer to {How To Use Merlin -> Accessing from Linux Clients} for Linux SSH client and X11 configuration.</p>"},{"location":"merlin7/01-Quick-Start-Guide/accessing-interactive-nodes/#accessing-from-a-windows-client","title":"Accessing from a Windows client","text":"<p>Refer to {How To Use Merlin -> Accessing from Windows Clients} for Windows SSH client and X11 configuration.</p>"},{"location":"merlin7/01-Quick-Start-Guide/accessing-interactive-nodes/#accessing-from-a-macos-client","title":"Accessing from a MacOS client","text":"<p>Refer to {How To Use Merlin -> Accessing from MacOS Clients} for MacOS SSH client and X11 configuration.</p>"},{"location":"merlin7/01-Quick-Start-Guide/accessing-interactive-nodes/#nomachine-remote-desktop-access","title":"NoMachine Remote Desktop Access","text":"<p>X applications are supported in the login nodes and can run efficiently through a NoMachine client. This is the officially supported way to run more demanding X applications on Merlin7.</p> <ul> <li>For PSI Windows workstations, this can be installed from the Software Kiosk as 'NX Client'. If you have difficulties installing, please request support through PSI Service Now as an Incident Request.</li> <li>For other workstations The client software can be downloaded from the Nomachine Website.</li> </ul>"},{"location":"merlin7/01-Quick-Start-Guide/accessing-interactive-nodes/#configuring-nomachine","title":"Configuring NoMachine","text":"<p>Refer to {How To Use Merlin -> Remote Desktop Access} for further instructions of how to configure the NoMachine client and how to access it from PSI and from outside PSI.</p>"},{"location":"merlin7/01-Quick-Start-Guide/accessing-interactive-nodes/#login-nodes-hardware-description","title":"Login nodes hardware description","text":"<p>The Merlin7 login nodes are the official machines for accessing the recources of Merlin7. From these machines, users can submit jobs to the Slurm batch system as well as visualize or compile their software.</p> <p>The Merlin7 login nodes are the following:</p> Hostname SSH NoMachine Scratch Scratch Mountpoint login001.merlin7.psi.ch yes yes 1TB NVMe <code>/scratch</code> login002.merlin7.psi.ch yes yes 1TB NVMe <code>/scratch</code>"},{"location":"merlin7/01-Quick-Start-Guide/accessing-slurm/","title":"Accessing Slurm Cluster","text":""},{"location":"merlin7/01-Quick-Start-Guide/accessing-slurm/#accessing-slurm-cluster","title":"Accessing Slurm Cluster","text":""},{"location":"merlin7/01-Quick-Start-Guide/accessing-slurm/#the-merlin-slurm-clusters","title":"The Merlin Slurm clusters","text":"<p>Merlin contains a multi-cluster setup, where multiple Slurm clusters coexist under the same umbrella. It basically contains the following clusters:</p> <ul> <li>The Merlin7 Slurm CPU cluster, which is called <code>merlin7</code>.</li> <li>The Merlin7 Slurm GPU cluster, which is called <code>gmerlin7</code>.</li> </ul>"},{"location":"merlin7/01-Quick-Start-Guide/accessing-slurm/#accessing-the-slurm-clusters","title":"Accessing the Slurm clusters","text":"<p>Any job submission must be performed from a Merlin login node. Please refer to the Accessing the Interactive Nodes documentation for further information about how to access the cluster.</p> <p>In addition, any job must be submitted from a high performance storage area visible by the login nodes and by the computing nodes. For this, the possible storage areas are the following: * <code>/data/user</code> * <code>/data/project</code> * <code>/data/scratch/shared</code></p>"},{"location":"merlin7/01-Quick-Start-Guide/accessing-slurm/#merlin7-cpu-cluster-access","title":"Merlin7 CPU cluster access","text":"<p>The Merlin7 CPU cluster (<code>merlin7</code>) is the default cluster configured in the login nodes. Any job submission will use by default this cluster, unless the option <code>--cluster</code> is specified with another of the existing clusters.</p> <p>For further information about how to use this cluster, please visit: Merlin7 CPU Slurm Cluster documentation.</p>"},{"location":"merlin7/01-Quick-Start-Guide/accessing-slurm/#merlin7-gpu-cluster-access","title":"Merlin7 GPU cluster access","text":"<p>The Merlin7 GPU cluster (<code>gmerlin7</code>) is visible from the login nodes. However, to submit jobs to this cluster, one needs to specify the option <code>--cluster=gmerlin7</code> when submitting a job or allocation.</p> <p>For further information about how to use this cluster, please visit: Merlin7 GPU Slurm Cluster documentation.</p>"},{"location":"merlin7/01-Quick-Start-Guide/code-of-conduct/","title":"Code Of Conduct","text":""},{"location":"merlin7/01-Quick-Start-Guide/code-of-conduct/#code-of-conduct","title":"Code Of Conduct","text":""},{"location":"merlin7/01-Quick-Start-Guide/code-of-conduct/#the-basic-principle","title":"The Basic principle","text":"<p>The basic principle is courtesy and consideration for other users.</p> <ul> <li>Merlin7 is a system shared by many users, therefore you are kindly requested to apply common courtesy in using its resources. Please follow our guidelines which aim at providing and maintaining an efficient compute environment for all our users.</li> <li>Basic shell programming skills are an essential requirement in a Linux/UNIX HPC cluster environment; a proficiency in shell programming is greatly beneficial.</li> </ul>"},{"location":"merlin7/01-Quick-Start-Guide/code-of-conduct/#interactive-nodes","title":"Interactive nodes","text":"<ul> <li>The interactive nodes (also known as login nodes) are for development and quick testing:<ul> <li>It is strictly forbidden to run production jobs on the login nodes. All production jobs must be submitted to the batch system.</li> <li>It is forbidden to run long processes occupying big parts of a login node's resources.</li> <li>According to the previous rules, misbehaving running processes will have to be killed. in order to keep the system responsive for other users.</li> </ul> </li> </ul>"},{"location":"merlin7/01-Quick-Start-Guide/code-of-conduct/#batch-system","title":"Batch system","text":"<ul> <li>Make sure that no broken or run-away processes are left when your job is done. Keep the process space clean on all nodes.</li> <li>During the runtime of a job, it is mandatory to use the <code>/scratch</code> and <code>/data/scratch/shared</code> partitions for temporary data:<ul> <li>It is forbidden to use the <code>/data/user</code> or <code>/data/project</code> for that purpose.</li> <li>Always remove files you do not need any more (e.g. core dumps, temporary files) as early as possible. Keep the disk space clean on all nodes.</li> <li>Prefer <code>/scratch</code> over <code>/data/scratch/shared</code> and use the latter only when you require the temporary files to be visible from multiple nodes.</li> </ul> </li> <li>Read the description in Merlin7 directory structure for learning about the correct usage of each partition type.</li> </ul>"},{"location":"merlin7/01-Quick-Start-Guide/code-of-conduct/#user-and-project-data","title":"User and project data","text":"<ul> <li>Users are responsible for backing up their own data. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).</li> <li>When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.</li> </ul> <p>Warning</p> <p>When a user leaves PSI and his account has been removed, her storage space in Merlin may be recycled. Hence, when a user leaves PSI, she, her supervisor or team must ensure that the data is backed up to an external storage!</p>"},{"location":"merlin7/01-Quick-Start-Guide/code-of-conduct/#system-administrator-rights","title":"System Administrator Rights","text":"<ul> <li>The system administrator has the right to temporarily block the access to Merlin7 for an account violating the Code of Conduct in order to maintain the efficiency and stability of the system.<ul> <li>Repetitive violations by the same user will be escalated to the user's supervisor.</li> </ul> </li> <li>The system administrator has the right to delete files in the scratch directories<ul> <li>after a job, if the job failed to clean up its files.</li> <li>during the job in order to prevent a job from destabilizing a node or multiple nodes.</li> </ul> </li> <li>The system administrator has the right to kill any misbehaving running processes.</li> </ul>"},{"location":"merlin7/01-Quick-Start-Guide/introduction/","title":"Introduction","text":""},{"location":"merlin7/01-Quick-Start-Guide/introduction/#introduction","title":"Introduction","text":"<p>Within his lair, the wizard ever strives for the perfection of his art.</p>"},{"location":"merlin7/01-Quick-Start-Guide/introduction/#about-merlin7","title":"About Merlin7","text":"<p>PSI's Merlin7 cluster is run on top of an IaaS (Infrastructure as a Service) vCluster on the CSCS Alps infrastructure. It is fully integrated with the PSI service landscape was designed to provide the same end user experience as its PSI-local predecessor clusters.</p> <p>Merlin7 has been in production since beginning of June 2025. </p> <p>All PSI users can request access to Merlin7, please go to the Requesting Merlin Accounts page and complete the steps given there.</p> <p>In case you identify errors or missing information, please provide feedback through merlin-admins mailing list mailing list or submit a ticket using the PSI service portal.</p>"},{"location":"merlin7/01-Quick-Start-Guide/introduction/#infrastructure","title":"Infrastructure","text":""},{"location":"merlin7/01-Quick-Start-Guide/introduction/#hardware","title":"Hardware","text":"<p>The Merlin7 cluster contains the following node specification:</p> Node #N CPU RAM GPU #GPUs Login 2 2 AMD EPYC 7742 (64 Cores 2.25GHz) 512GB CPU 77 2 AMD EPYC 7742 (64 Cores 2.25GHz) 512GB GPU A100 8 2 AMD EPYC 7713 (64 Cores 3.2GHz) 512GB A100 80GB 4 GPU GH 5 NVIDIA ARM Grace Neoverse v2 (144 Cores 3.1GHz) 864GB (Unified) GH200 120GB 4"},{"location":"merlin7/01-Quick-Start-Guide/introduction/#network","title":"Network","text":"<p>The Merlin7 cluster builds on top of HPE/Cray technologies, including a high-performance network fabric called Slingshot. This network fabric is able to provide up to 200 Gbit/s throughput between nodes. Further information on Slignshot can be found on at HPE.</p> <p>Through software interfaces like libFabric (which available on Merlin7), application can leverage the network seamlessly.</p>"},{"location":"merlin7/01-Quick-Start-Guide/introduction/#storage","title":"Storage","text":"<p>Unlike previous iteration of the Merlin HPC clusters, Merlin7 does not have any local storage. Instead storage for the entire cluster is provided through a dedicated storage appliance from HPE/Cray called ClusterStor.</p> <p>The appliance is built of several storage servers:</p> <ul> <li>2 management nodes</li> <li>2 MDS servers, 12 drives per server, 2.9TiB (Raid10)</li> <li>8 OSS-D servers, 106 drives per server, 14.5 T.B HDDs (Gridraid / Raid6)</li> <li>4 OSS-F servers, 12 drives per server 7TiB SSDs (Raid10)</li> </ul> <p>With effective storage capacity of:</p> <ul> <li>10 PB HDD<ul> <li>value visible on linux: HDD 9302.4 TiB</li> </ul> </li> <li>162 TB SSD<ul> <li>value visible on linux: SSD 151.6 TiB</li> </ul> </li> <li>23.6 TiB on Metadata</li> </ul> <p>The storage is directly connected to the cluster (and each individual node) through the Slingshot NIC.</p>"},{"location":"merlin7/01-Quick-Start-Guide/requesting-accounts/","title":"Requesting Merlin Accounts","text":""},{"location":"merlin7/01-Quick-Start-Guide/requesting-accounts/#requesting-merlin-accounts","title":"Requesting Merlin Accounts","text":""},{"location":"merlin7/01-Quick-Start-Guide/requesting-accounts/#requesting-access-to-merlin7","title":"Requesting Access to Merlin7","text":"<p>All PSI users can ask for access to the Merlin7 cluster. Access to Merlin7 is regulated by the PSI user's account being a member of the <code>svc-cluster_merlin7</code> access group.</p> <p>Requesting Merlin7 access has to be done using the Request Linux Group Membership form, available in PSI's central Service Catalog on Service Now.</p> <p></p> <p>Mandatory fields you need to fill: * <code>Order Access for user:</code> Defaults to the logged in user. However, requesting access for another user it's also possible. * <code>Request membership for group:</code> Choose<code>svc-cluster_merlin7</code>. * <code>Justification:</code> Please add a short justification of what you will be running on Merlin7.</p> <p>Once submitted, the Merlin responsibles will approve the request as soon as possible (within the next few hours on working days). Once the request is approved, it may take up to 30 minutes to get the account fully configured.</p>"},{"location":"merlin7/01-Quick-Start-Guide/requesting-projects/","title":"Requesting a Merlin Project","text":""},{"location":"merlin7/01-Quick-Start-Guide/requesting-projects/#requesting-a-merlin-project","title":"Requesting a Merlin Project","text":"<p>A project owns its own storage area in Merlin, which can be accessed by other group members.</p> <p>Projects can receive a higher storage quota than user areas and should be the primary way of organizing bigger storage requirements in a multi-user collaboration.</p> <p>Access to a project's directories is governed by project members belonging to a common Unix group. You may use an existing Unix group or you may have a new Unix group created especially for the project. The project responsible will be the owner of the Unix group (this is important)!</p> <p>This document explains how to request new Unix group, to request membership for existing groups, and the procedure for requesting a Merlin project.</p>"},{"location":"merlin7/01-Quick-Start-Guide/requesting-projects/#about-unix-groups","title":"About Unix groups","text":"<p>Before requesting a Merlin project, it is important to have a Unix group that can be used to grant access to it to different members of the project.</p> <p>Unix groups in the PSI Active Directory (which is the PSI central database containing user and group information, and more) are defined by the <code>unx-</code> prefix, followed by a name. In general, PSI employees working on Linux systems (including HPC clusters, like Merlin) can request for a non-existing Unix group, and can become responsible for managing it. In addition, a list of administrators can be set. The administrators, together with the group manager, can approve or deny membership requests. Further information about this topic is covered in the Linux Documentation - Services Admin Guides: Unix Groups / Group Management, managed by the Central Linux Team.</p> <p>To gran access to specific Merlin project directories, some users may require to be added to some specific Unix groups: * Each Merlin project (i.e. <code>/data/project/{bio|general}/$projectname</code>) or experiment (i.e. <code>/data/experiment/$experimentname</code>) directory has access restricted by ownership and group membership (with a very few exceptions allowing public access). * Users requiring access to a specific restricted project or experiment directory have to request membership for the corresponding Unix group owning the directory.</p>"},{"location":"merlin7/01-Quick-Start-Guide/requesting-projects/#requesting-a-new-unix-group","title":"Requesting a new Unix group","text":"<p>If you need a new Unix group to be created, you need to first get this group through a separate PSI Service Now ticket. Please use the following template. You can also specify the login names of the initial group members and the owner of the group. The owner of the group is the person who will be allowed to modify the group.</p> <ul> <li> <p>Please open an Incident Request with subject: </p>Text Only<pre><code>Subject: Request for new unix group xxxx\n</code></pre><p></p> </li> <li> <p>and base the text field of the request on this template </p>Text Only<pre><code>Dear HelpDesk\n\nI would like to request a new unix group.\n\nUnix Group Name: unx-xxxxx\nInitial Group Members: xxxxx, yyyyy, zzzzz, ...\nGroup Owner: xxxxx\nGroup Administrators: aaaaa, bbbbb, ccccc, ....\n\nBest regards,\n</code></pre><p></p> </li> </ul>"},{"location":"merlin7/01-Quick-Start-Guide/requesting-projects/#requesting-unix-group-membership","title":"Requesting Unix group membership","text":"<p>Existing Merlin projects have already a Unix group assigned. To have access to a project, users must belong to the proper Unix group owning that project.</p> <p>Supervisors should inform new users which extra groups are needed for their project(s). If this information is not known, one can check the permissions for that directory. In example: </p>Bash<pre><code>(base) \u2744 [caubet_m@merlin-l-001:/data/user/caubet_m]# ls -ltrhd /data/project/general/$projectname\n(base) \u2744 [caubet_m@merlin-l-001:/data/user/caubet_m]# ls -ltrhd /data/project/bio/$projectname\n</code></pre><p></p> <p>Requesting membership for a specific Unix group has to be done with the corresponding Request Linux Group Membership form, available in the PSI Service Now Service Catalog.</p> <p></p> <p>Once submitted, the responsible of the Unix group has to approve the request.</p> <p>Important note: Requesting access to specific Unix Groups will require validation from the responsible of the Unix Group. If you ask for inclusion in many groups it may take longer, since the fulfillment of the request will depend on more people.</p> <p>Further information can be found in the Linux Documentation - Services User guide: Unix Groups / Group Management</p>"},{"location":"merlin7/01-Quick-Start-Guide/requesting-projects/#managing-unix-groups","title":"Managing Unix Groups","text":"<p>Other administration operations on Unix Groups it's mainly covered in the Linux Documentation - Services Admin Guides: Unix Groups / Group Management, managed by the Central Linux Team.</p>"},{"location":"merlin7/01-Quick-Start-Guide/requesting-projects/#requesting-a-merlin-project_1","title":"Requesting a Merlin project","text":"<p>Once a Unix group is available, a Merlin project can be requested. To request a project, please provide the following information in a PSI Service Now ticket</p> <ul> <li> <p>Please open an Incident Request with subject: </p>Text Only<pre><code>Subject: [Merlin7] Project Request for project name xxxxxx\n</code></pre><p></p> </li> <li> <p>and base the text field of the request on this template </p>Text Only<pre><code>Dear HelpDesk\n\nI would like to request a new Merlin7 project.\n\nProject Name: xxxxx\nUnixGroup: xxxxx # Must be an existing Unix Group\n\nThe project responsible is the Owner of the Unix Group.\nIf you need a storage quota exceeding the defaults, please provide a description\nand motivation for the higher storage needs:\n\nStorage Quota: 1TB with a maximum of 1M Files\nReason: (None for default 1TB/1M)\n\nBest regards,\n</code></pre><p></p> </li> </ul> <p>The default storage quota for a project is 1TB (with a maximal Number of Files of 1M). If you need a larger assignment, you need to request this and provide a description of your storage needs.</p>"},{"location":"merlin7/01-Quick-Start-Guide/requesting-projects/#further-documentation","title":"Further documentation","text":"<p>Further information it's also available in the Linux Central Documentation: * Unix Group / Group Management for users * Unix Group / Group Management for group managers</p> <p>Special thanks to the Linux Central Team and AIT to make this possible.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/archive/","title":"Archive & PSI Data Catalog","text":""},{"location":"merlin7/02-How-To-Use-Merlin/archive/#archive-psi-data-catalog","title":"Archive & PSI Data Catalog","text":""},{"location":"merlin7/02-How-To-Use-Merlin/archive/#psi-data-catalog-as-a-psi-central-service","title":"PSI Data Catalog as a PSI Central Service","text":"<p>PSI provides access to the Data Catalog for long-term data storage and retrieval. Data is stored on the PetaByte Archive at the Swiss National Supercomputing Centre (CSCS).</p> <p>The Data Catalog and Archive is suitable for:</p> <ul> <li>Raw data generated by PSI instruments</li> <li>Derived data produced by processing some inputs</li> <li>Data required to reproduce PSI research and publications</li> </ul> <p>The Data Catalog is part of PSI's effort to conform to the FAIR principles for data management. In accordance with this policy, data will be publicly released under CC-BY-SA 4.0 after an embargo period expires.</p> <p>The Merlin cluster is connected to the Data Catalog. Hence, users archive data stored in the Merlin storage under the <code>/data</code> directories (currentlyi, <code>/data/user</code> and <code>/data/project</code>). Archiving from other directories is also possible, however the process is much slower as data can not be directly retrieved by the PSI archive central servers (central mode), and needs to be indirectly copied to these (decentral mode).</p> <p>Archiving can be done from any node accessible by the users (usually from the login nodes).</p> <p>Tip</p> <p>Archiving can be done in two different ways:</p> <ul> <li>Central mode: Possible for the user and project data directories, is the fastest way as it does not require remote copy (data is directly retreived by central AIT servers from Merlin through ).</li> <li>Decentral mode: Possible for any directory, is the slowest way of archiving as it requires to copy ('rsync') the data from Merlin to the central AIT servers.</li> </ul>"},{"location":"merlin7/02-How-To-Use-Merlin/archive/#procedure","title":"Procedure","text":""},{"location":"merlin7/02-How-To-Use-Merlin/archive/#overview","title":"Overview","text":"<p>Below are the main steps for using the Data Catalog.</p> <ul> <li>Ingest the dataset into the Data Catalog. This makes the data known to the Data Catalog system at PSI:<ul> <li>Prepare a metadata file describing the dataset</li> <li>Run <code>datasetIngestor</code> script</li> <li>If necessary, the script will copy the data to the PSI archive servers<ul> <li>Usually this is necessary when archiving from directories other than <code>/data/user</code> or <code>/data/project</code>. It would be also necessary when the Merlin export server (<code>merlin-archive.psi.ch</code>) is down for any reason.</li> </ul> </li> </ul> </li> <li>Archive the dataset:<ul> <li>Visit <https://discovery.psi.ch</li> <li>Click <code>Archive</code> for the dataset</li> <li>The system will now copy the data to the PetaByte Archive at CSCS</li> </ul> </li> <li>Retrieve data from the catalog:<ul> <li>Find the dataset on <https://discovery.psi.ch and click <code>Retrieve</code></li> <li>Wait for the data to be copied to the PSI retrieval system</li> <li>Run <code>datasetRetriever</code> script</li> </ul> </li> </ul> <p>Since large data sets may take a lot of time to transfer, some steps are designed to happen in the background. The discovery website can be used to track the progress of each step.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/archive/#account-registration","title":"Account Registration","text":"<p>Two types of account permit access to the Data Catalog. If your data was collected at a beamline, you may have been assigned a <code>p-group</code> (e.g. <code>p12345</code>) for the experiment. Other users are assigned <code>a-group</code> (e.g. <code>a-12345</code>).</p> <p>Groups are usually assigned to a PI, and then individual user accounts are added to the group. This must be done under user request through PSI Service Now. For existing a-groups and p-groups, you can follow the standard central procedures. Alternatively, if you do not know how to do that, follow the Merlin7 Requesting extra Unix groups procedure, or open a PSI Service Now ticket.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/archive/#documentation","title":"Documentation","text":"<p>Accessing the Data Catalog is done through the SciCat software. Documentation is here: ingestManual.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/archive/#loading-datacatalog-tools","title":"Loading datacatalog tools","text":"<p>The latest datacatalog software is maintained in the PSI module system. To access it from the Merlin systems, run the following command:</p> Bash<pre><code>module load datacatalog\n</code></pre> <p>It can be done from any host in the Merlin cluster accessible by users. Usually, login nodes will be the nodes used for archiving.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/archive/#finding-your-token","title":"Finding your token","text":"<p>As of 2022-04-14 a secure token is required to interact with the data catalog. This is a long random string that replaces the previous user/password authentication (allowing access for non-PSI use cases). This string should be treated like a password and not shared.</p> <ol> <li>Go to discovery.psi.ch</li> <li>Click 'Sign in' in the top right corner. Click the 'Login with PSI account' and log in on the PSI login1. page.</li> <li>You should be redirected to your user settings and see a 'User Information' section. If not, click on1. your username in the top right and choose 'Settings' from the menu.</li> <li>Look for the field 'Catamel Token'. This should be a 64-character string. Click the icon to copy the1. token.</li> </ol> <p></p> <p>You will need to save this token for later steps. To avoid including it in all the commands, I suggest saving it to an environmental variable (Linux):</p> Text Only<pre><code>$ SCICAT_TOKEN=RqYMZcqpqMJqluplbNYXLeSyJISLXfnkwlfBKuvTSdnlpKkU\n</code></pre> <p>(Hint: prefix this line with a space to avoid saving the token to your bash history.)</p> <p>Tokens expire after 2 weeks and will need to be fetched from the website again.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/archive/#ingestion","title":"Ingestion","text":"<p>The first step to ingesting your data into the catalog is to prepare a file describing what data you have. This is called <code>metadata.json</code>, and can be created with a text editor (e.g. <code>vim</code>). It can in principle be saved anywhere, but keeping it with your archived data is recommended. For more information about the format, see the 'Bio metadata' section below. An example follows:</p> YAML<pre><code>{\n \"principalInvestigator\": \"albrecht.gessler@psi.ch\",\n \"creationLocation\": \"/PSI/EMF/JEOL2200FS\",\n \"dataFormat\": \"TIFF+LZW Image Stack\",\n \"sourceFolder\": \"/gpfs/group/LBR/pXXX/myimages\",\n \"owner\": \"Wilhelm Tell\",\n \"ownerEmail\": \"wilhelm.tell@psi.ch\",\n \"type\": \"raw\",\n \"description\": \"EM micrographs of amygdalin\",\n \"ownerGroup\": \"a-12345\",\n \"scientificMetadata\": {\n \"description\": \"EM micrographs of amygdalin\",\n \"sample\": {\n \"name\": \"Amygdalin beta-glucosidase 1\",\n \"uniprot\": \"P29259\",\n \"species\": \"Apple\"\n },\n \"dataCollection\": {\n \"date\": \"2018-08-01\"\n },\n \"microscopeParameters\": {\n \"pixel size\": {\n \"v\": 0.885,\n \"u\": \"A\"\n },\n \"voltage\": {\n \"v\": 200,\n \"u\": \"kV\"\n },\n \"dosePerFrame\": {\n \"v\": 1.277,\n \"u\": \"e/A2\"\n }\n }\n }\n}\n</code></pre> <p>It is recommended to use the ScicatEditor for creating metadata files. This is a browser-based tool specifically for ingesting PSI data. Using the tool avoids syntax errors and provides templates for common data sets and options. The finished JSON file can then be downloaded to merlin or copied into a text editor.</p> <p>Another option is to use the SciCat graphical interface from NoMachine. This provides a graphical interface for selecting data to archive. This is particularly useful for data associated with a DUO experiment and p-group. Type <code>SciCat`` to get started after loading the</code>datacatalog`` module. The GUI also replaces the the command-line ingestion described below.</p> <p>The following steps can be run from wherever you saved your <code>metadata.json</code>. First, perform a \"dry-run\" which will check the metadata for errors:</p> Bash<pre><code>datasetIngestor --token $SCICAT_TOKEN metadata.json\n</code></pre> <p>It will ask for your PSI credentials and then print some info about the data to be ingested. If there are no errors, proceed to the real ingestion:</p> Bash<pre><code>datasetIngestor --token $SCICAT_TOKEN --ingest --autoarchive metadata.json\n</code></pre> <p>You will be asked whether you want to copy the data to the central system:</p> <ul> <li>If you are on the Merlin cluster and you are archiving data from <code>/data/user</code> or <code>/data/project</code>, answer 'no' since the data catalog can</li> </ul> <p>directly read the data. * If you are on a directory other than <code>/data/user</code> and ``/data/project, or you are on a desktop computer, answer 'yes'. Copying large datasets to the PSI archive system may take quite a while (minutes to hours).</p> <p>If there are no errors, your data has been accepted into the data catalog! From now on, no changes should be made to the ingested data. This is important, since the next step is for the system to copy all the data to the CSCS Petabyte archive. Writing to tape is slow, so this process may take several days, and it will fail if any modifications are detected.</p> <p>If using the <code>--autoarchive</code> option as suggested above, your dataset should now be in the queue. Check the data catalog: <https://discovery.psi.ch. Your job should have status 'WorkInProgress'. You will receive an email when the ingestion is complete.</p> <p>If you didn't use <code>--autoarchive</code>, you need to manually move the dataset into the archive queue. From discovery.psi.ch, navigate to the 'Archive' tab. You should see the newly ingested dataset. Check the dataset and click <code>Archive</code>. You should see the status change from <code>datasetCreated</code> to <code>scheduleArchiveJob</code>. This indicates that the data is in the process of being transferred to CSCS.</p> <p>After a few days the dataset's status will change to <code>datasetOnAchive</code> indicating the data is stored. At this point it is safe to delete the data.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/archive/#useful-commands","title":"Useful commands","text":"<p>Running the datasetIngestor in dry mode (without <code>--ingest</code>) finds most errors. However, it is sometimes convenient to find potential errors yourself with simple unix commands.</p> <p>Find problematic filenames</p> Bash<pre><code>find . -iregex '.*/[^/]*[^a-zA-Z0-9_ ./-][^/]*'=\n</code></pre> <p>Find broken links</p> Bash<pre><code>find -L . -type l\n</code></pre> <p>Find outside links</p> Bash<pre><code>find . -type l -exec bash -c 'realpath --relative-base \"`pwd`\" \"$0\" 2>/dev/null |egrep \"^[./]\" |sed \"s|^|$0 ->|\" ' '{}' ';'\n</code></pre> <p>Delete certain files (use with caution)</p> Bash<pre><code># Empty directories\nfind . -type d -empty -delete\n# Backup files\nfind . -name '*~' -delete\nfind . -name '*#autosave#' -delete\n</code></pre>"},{"location":"merlin7/02-How-To-Use-Merlin/archive/#troubleshooting-known-bugs","title":"Troubleshooting & Known Bugs","text":"<ul> <li>The following message can be safely ignored:</li> </ul> Bash<pre><code>key_cert_check_authority: invalid certificate\nCertificate invalid: name is not a listed principal\n</code></pre> It indicates that no kerberos token was provided for authentication. You can avoid the warning by first running kinit (PSI linux systems). <ul> <li>For decentral ingestion cases, the copy step is indicated by a message <code>Running [/usr/bin/rsync -e ssh -avxz ...</code>. It is expected that this step will take a long time and may appear to have hung. You can check what files have been successfully transfered using rsync:</li> </ul> Bash<pre><code>rsync --list-only user_n@pb-archive.psi.ch:archive/UID/PATH/\n</code></pre> <p>where UID is the dataset ID (12345678-1234-1234-1234-123456789012) and PATH is the absolute path to your data. Note that rsync creates directories first and that the transfer order is not alphabetical in some cases, but it should be possible to see whether any data has transferred.</p> <ul> <li> <p>There is currently a limit on the number of files per dataset (technically, the limit is from the total length of all file paths). It is recommended to break up datasets into 300'000 files or less.</p> <ul> <li>If it is not possible or desirable to split data between multiple datasets, an alternate work-around is to package files into a tarball. For datasets which are already compressed, omit the -z option for a considerable speedup:</li> </ul> Text Only<pre><code>tar -f [output].tar [srcdir]\n</code></pre> <p>Uncompressed data can be compressed on the cluster using the following command:</p> Text Only<pre><code>sbatch /data/software/Slurm/Utilities/Parallel_TarGz.batch -s [srcdir] -t [output].tar -n\n</code></pre> <p>Run /data/software/Slurm/Utilities/Parallel_TarGz.batch -h for more details and options.</p> </li> </ul>"},{"location":"merlin7/02-How-To-Use-Merlin/archive/#sample-ingestion-output-datasetingestor-1111","title":"Sample ingestion output (datasetIngestor 1.1.11)","text":"[Show Example]: Sample ingestion output (datasetIngestor 1.1.11) <pre>/data/project/bio/myproject/archive $ datasetIngestor -copy -autoarchive -allowexistingsource -ingest metadata.json\n2019/11/06 11:04:43 Latest version: 1.1.11\n\n2019/11/06 11:04:43 Your version of this program is up-to-date\n2019/11/06 11:04:43 You are about to add a dataset to the === production === data catalog environment...\n2019/11/06 11:04:43 Your username:\nuser_n\n2019/11/06 11:04:48 Your password:\n2019/11/06 11:04:52 User authenticated: XXX\n2019/11/06 11:04:52 User is member in following a or p groups: XXX\n2019/11/06 11:04:52 OwnerGroup information a-XXX verified successfully.\n2019/11/06 11:04:52 contactEmail field added: XXX\n2019/11/06 11:04:52 Scanning files in dataset /data/project/bio/myproject/archive\n2019/11/06 11:04:52 No explicit filelistingPath defined - full folder /data/project/bio/myproject/archive is used.\n2019/11/06 11:04:52 Source Folder: /data/project/bio/myproject/archive at /data/project/bio/myproject/archive\n2019/11/06 11:04:57 The dataset contains 100000 files with a total size of 50000000000 bytes.\n2019/11/06 11:04:57 creationTime field added: 2019-07-29 18:47:08 +0200 CEST\n2019/11/06 11:04:57 endTime field added: 2019-11-06 10:52:17.256033 +0100 CET\n2019/11/06 11:04:57 license field added: CC BY-SA 4.0\n2019/11/06 11:04:57 isPublished field added: false\n2019/11/06 11:04:57 classification field added: IN=medium,AV=low,CO=low\n2019/11/06 11:04:57 Updated metadata object:\n{\n \"accessGroups\": [\n \"XXX\"\n ],\n \"classification\": \"IN=medium,AV=low,CO=low\",\n \"contactEmail\": \"XXX\",\n \"creationLocation\": \"XXX\",\n \"creationTime\": \"2019-07-29T18:47:08+02:00\",\n \"dataFormat\": \"XXX\",\n \"description\": \"XXX\",\n \"endTime\": \"2019-11-06T10:52:17.256033+01:00\",\n \"isPublished\": false,\n \"license\": \"CC BY-SA 4.0\",\n \"owner\": \"XXX\",\n \"ownerEmail\": \"XXX\",\n \"ownerGroup\": \"a-XXX\",\n \"principalInvestigator\": \"XXX\",\n \"scientificMetadata\": {\n...\n },\n \"sourceFolder\": \"/data/project/bio/myproject/archive\",\n \"type\": \"raw\"\n}\n2019/11/06 11:04:57 Running [/usr/bin/ssh -l user_n pb-archive.psi.ch test -d /data/project/bio/myproject/archive].\nkey_cert_check_authority: invalid certificate\nCertificate invalid: name is not a listed principal\nuser_n@pb-archive.psi.ch's password:\n2019/11/06 11:05:04 The source folder /data/project/bio/myproject/archive is not centrally available (decentral use case).\nThe data must first be copied to a rsync cache server.\n\n2019/11/06 11:05:04 Do you want to continue (Y/n)?\nY\n2019/11/06 11:05:09 Created dataset with id 12.345.67890/12345678-1234-1234-1234-123456789012\n2019/11/06 11:05:09 The dataset contains 108057 files.\n2019/11/06 11:05:10 Created file block 0 from file 0 to 1000 with total size of 413229990 bytes\n2019/11/06 11:05:10 Created file block 1 from file 1000 to 2000 with total size of 416024000 bytes\n2019/11/06 11:05:10 Created file block 2 from file 2000 to 3000 with total size of 416024000 bytes\n2019/11/06 11:05:10 Created file block 3 from file 3000 to 4000 with total size of 416024000 bytes\n...\n2019/11/06 11:05:26 Created file block 105 from file 105000 to 106000 with total size of 416024000 bytes\n2019/11/06 11:05:27 Created file block 106 from file 106000 to 107000 with total size of 416024000 bytes\n2019/11/06 11:05:27 Created file block 107 from file 107000 to 108000 with total size of 850195143 bytes\n2019/11/06 11:05:27 Created file block 108 from file 108000 to 108057 with total size of 151904903 bytes\n2019/11/06 11:05:27 short dataset id: 0a9fe316-c9e7-4cc5-8856-e1346dd31e31\n2019/11/06 11:05:27 Running [/usr/bin/rsync -e ssh -avxz /data/project/bio/myproject/archive/ user_n@pb-archive.psi.ch:archive\n/0a9fe316-c9e7-4cc5-8856-e1346dd31e31/data/project/bio/myproject/archive].\nkey_cert_check_authority: invalid certificate\nCertificate invalid: name is not a listed principal\nuser_n@pb-archive.psi.ch's password:\nPermission denied, please try again.\nuser_n@pb-archive.psi.ch's password:\n/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied\n/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied\n/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied\n/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied\n/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied\n...\n2019/11/06 12:05:08 Successfully updated {\"pid\":\"12.345.67890/12345678-1234-1234-1234-123456789012\",...}\n2019/11/06 12:05:08 Submitting Archive Job for the ingested datasets.\n2019/11/06 12:05:08 Job response Status: okay\n2019/11/06 12:05:08 A confirmation email will be sent to XXX\n12.345.67890/12345678-1234-1234-1234-123456789012\n</pre>"},{"location":"merlin7/02-How-To-Use-Merlin/archive/#publishing","title":"Publishing","text":"<p>After datasets are are ingested they can be assigned a public DOI. This can be included in publications and will make the datasets on http://doi.psi.ch.</p> <p>For instructions on this, please read the 'Publish' section in the ingest manual.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/archive/#retrieving-data","title":"Retrieving data","text":"<p>Retrieving data from the archive is also initiated through the Data Catalog. Please read the 'Retrieve' section in the ingest manual.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/archive/#further-information","title":"Further Information","text":"<ul> <li>PSI Data Catalog</li> <li>Full Documentation</li> <li>Published Datasets (doi.psi.ch)</li> <li>Data Catalog PSI page</li> <li>Data catalog SciCat Software</li> <li>FAIR definition and SNF Research Policy</li> <li>Petabyte Archive at CSCS</li> </ul>"},{"location":"merlin7/02-How-To-Use-Merlin/connect-from-linux/","title":"Connecting from a Linux Client","text":""},{"location":"merlin7/02-How-To-Use-Merlin/connect-from-linux/#connecting-from-a-linux-client","title":"Connecting from a Linux Client","text":""},{"location":"merlin7/02-How-To-Use-Merlin/connect-from-linux/#ssh-without-x11-forwarding","title":"SSH without X11 Forwarding","text":"<p>This is the standard method. Official X11 support is provided through NoMachine. For normal SSH sessions, use your SSH client as follows:</p> Bash<pre><code>ssh $username@login001.merlin7.psi.ch\nssh $username@login002.merlin7.psi.ch\n</code></pre>"},{"location":"merlin7/02-How-To-Use-Merlin/connect-from-linux/#ssh-with-x11-forwarding","title":"SSH with X11 Forwarding","text":"<p>Official X11 Forwarding support is through NoMachine. Please follow the document {Job Submission -> Interactive Jobs} and {Accessing Merlin -> NoMachine} for more details. However, we provide a small recipe for enabling X11 Forwarding in Linux.</p> <ul> <li>For enabling client X11 forwarding, add the following to the start of <code>~/.ssh/config</code> to implicitly add <code>-X</code> to all ssh connections:</li> </ul> Bash<pre><code>ForwardAgent yes\nForwardX11Trusted yes\n</code></pre> <ul> <li>Alternatively, you can add the option <code>-Y</code> to the <code>ssh</code> command. In example:</li> </ul> Bash<pre><code>ssh -X $username@login001.merlin7.psi.ch\nssh -X $username@login002.merlin7.psi.ch\n</code></pre> <ul> <li>For testing that X11 forwarding works, just run <code>sview</code>. A X11 based slurm view of the cluster should popup in your client session:</li> </ul> Bash<pre><code>sview\n</code></pre>"},{"location":"merlin7/02-How-To-Use-Merlin/connect-from-macos/","title":"Connecting from a MacOS Client","text":""},{"location":"merlin7/02-How-To-Use-Merlin/connect-from-macos/#connecting-from-a-macos-client","title":"Connecting from a MacOS Client","text":""},{"location":"merlin7/02-How-To-Use-Merlin/connect-from-macos/#ssh-without-x11-forwarding","title":"SSH without X11 Forwarding","text":"<p>This is the standard method. Official X11 support is provided through NoMachine. For normal SSH sessions, use your SSH client as follows:</p> Bash<pre><code>ssh $username@login001.merlin7.psi.ch\nssh $username@login002.merlin7.psi.ch\n</code></pre>"},{"location":"merlin7/02-How-To-Use-Merlin/connect-from-macos/#ssh-with-x11-forwarding","title":"SSH with X11 Forwarding","text":""},{"location":"merlin7/02-How-To-Use-Merlin/connect-from-macos/#requirements","title":"Requirements","text":"<p>For running SSH with X11 Forwarding in MacOS, one needs to have a X server running in MacOS. The official X Server for MacOS is XQuartz. Please ensure you have it running before starting a SSH connection with X11 forwarding.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/connect-from-macos/#ssh-with-x11-forwarding-in-macos","title":"SSH with X11 Forwarding in MacOS","text":"<p>Official X11 support is through NoMachine. Please follow the document {Job Submission -> Interactive Jobs} and {Accessing Merlin -> NoMachine} for more details. However, we provide a small recipe for enabling X11 Forwarding in MacOS.</p> <ul> <li> <p>Ensure that XQuartz is installed and running in your MacOS.</p> </li> <li> <p>For enabling client X11 forwarding, add the following to the start of <code>~/.ssh/config</code> to implicitly add <code>-X</code> to all ssh connections:</p> </li> </ul> Bash<pre><code>ForwardAgent yes\nForwardX11Trusted yes\n</code></pre> <ul> <li>Alternatively, you can add the option <code>-Y</code> to the <code>ssh</code> command. In example:</li> </ul> Bash<pre><code>ssh -X $username@login001.merlin7.psi.ch\nssh -X $username@login002.merlin7.psi.ch\n</code></pre> <ul> <li>For testing that X11 forwarding works, just run <code>sview</code>. A X11 based slurm view of the cluster should popup in your client session.</li> </ul> Bash<pre><code>sview\n</code></pre>"},{"location":"merlin7/02-How-To-Use-Merlin/connect-from-windows/","title":"Connecting from a Windows Client","text":""},{"location":"merlin7/02-How-To-Use-Merlin/connect-from-windows/#connecting-from-a-windows-client","title":"Connecting from a Windows Client","text":""},{"location":"merlin7/02-How-To-Use-Merlin/connect-from-windows/#ssh-with-putty-without-x11-forwarding","title":"SSH with PuTTY without X11 Forwarding","text":"<p>PuTTY is one of the most common tools for SSH.</p> <p>Check, if the following software packages are installed on the Windows workstation by inspecting the Start menu (hint: use the Search box to save time):</p> <ul> <li>PuTTY (should be already installed)</li> <li>[Optional] Xming (needed for SSH with X11 Forwarding)</li> </ul> <p>If they are missing, you can install them using the Software Kiosk icon on the Desktop.</p> <ol> <li> <p>Start PuTTY</p> </li> <li> <p>[Optional] Enable <code>xterm</code> to have similar mouse behavour as in Linux:</p> </li> </ol> <p></p> <ol> <li>Create session to a Merlin login node and Open:</li> </ol> <p></p>"},{"location":"merlin7/02-How-To-Use-Merlin/connect-from-windows/#ssh-with-putty-with-x11-forwarding","title":"SSH with PuTTY with X11 Forwarding","text":"<p>Official X11 Forwarding support is through NoMachine. Please follow the document {Job Submission -> Interactive Jobs} and {Accessing Merlin -> NoMachine} for more details. However, we provide a small recipe for enabling X11 Forwarding in Windows.</p> <p>Check, if the Xming is installed on the Windows workstation by inspecting the Start menu (hint: use the Search box to save time). If missing, you can install it by using the Software Kiosk icon (should be located on the Desktop).</p> <ol> <li> <p>Ensure that a X server (Xming) is running. Otherwise, start it.</p> </li> <li> <p>Enable X11 Forwarding in your SSH client. In example, for Putty:</p> </li> </ol> <p></p>"},{"location":"merlin7/02-How-To-Use-Merlin/kerberos/","title":"Kerberos and AFS authentication","text":""},{"location":"merlin7/02-How-To-Use-Merlin/kerberos/#kerberos-and-afs-authentication","title":"Kerberos and AFS authentication","text":"<p>Projects and users have their own areas in the central PSI AFS service. In order to access to these areas, valid Kerberos and AFS tickets must be granted.</p> <p>These tickets are automatically granted when accessing through SSH with username and password. Alternatively, one can get a granting ticket with the <code>kinit</code> (Kerberos) and <code>aklog</code> (AFS ticket, which needs to be run after <code>kinit</code>) commands.</p> <p>Due to PSI security policies, the maximum lifetime of the ticket is 7 days, and the default time is 10 hours. It means than one needs to constantly renew (<code>krenew</code> command) the existing granting tickets, and their validity can not be extended longer than 7 days. At this point, one needs to obtain new granting tickets.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/kerberos/#obtaining-granting-tickets-with-username-and-password","title":"Obtaining granting tickets with username and password","text":"<p>As already described above, the most common use case is to obtain Kerberos and AFS granting tickets by introducing username and password:</p> <ul> <li>When login to Merlin through SSH protocol, if this is done with username + password authentication, tickets for Kerberos and AFS will be automatically obtained.</li> <li>When login to Merlin through NoMachine, no Kerberos and AFS are granted. Therefore, users need to run <code>kinit</code> (to obtain a granting Kerberos ticket) followed by <code>aklog</code> (to obtain a granting AFS ticket). See further details below.</li> </ul> <p>To manually obtain granting tickets, one has to:</p> <ol> <li>To obtain a granting Kerberos ticket, one needs to run <code>kinit $USER</code> and enter the PSI password.</li> </ol> Bash<pre><code>kinit $USER@D.PSI.CH\n</code></pre> <ol> <li>To obtain a granting ticket for AFS, one needs to run <code>aklog</code>. No password is necessary, but a valid Kerberos ticket is mandatory.</li> </ol> Bash<pre><code>aklog\n</code></pre> <ol> <li>To list the status of your granted tickets, users can use the <code>klist</code> command.</li> </ol> Bash<pre><code>klist\n</code></pre> <ol> <li>To extend the validity of existing granting tickets, users can use the <code>krenew</code> command.</li> </ol> Bash<pre><code>krenew\n</code></pre> Text Only<pre><code>* Keep in mind that the maximum lifetime for granting tickets is 7 days, therefore `krenew` can not be used beyond that limit,\n and then `kinit` should be used instead.\n</code></pre>"},{"location":"merlin7/02-How-To-Use-Merlin/kerberos/#obtanining-granting-tickets-with-keytab","title":"Obtanining granting tickets with keytab","text":"<p>Sometimes, obtaining granting tickets by using password authentication is not possible. An example are user Slurm jobs requiring access to private areas in AFS. For that, there's the possibility to generate a keytab file.</p> <p>Be aware that the keytab file must be private, fully protected by correct permissions and not shared with any other users.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/kerberos/#creating-a-keytab-file","title":"Creating a keytab file","text":"<p>For generating a keytab, one has to:</p> <ol> <li>Create a private directory for storing the Kerberos keytab file</li> </ol> Bash<pre><code>mkdir -p ~/.k5\n</code></pre> <ol> <li>Run the <code>ktutil</code> utility:</li> </ol> Bash<pre><code>ktutil\n</code></pre> <ol> <li>In the <code>ktutil</code> console, one has to generate a keytab file as follows:</li> </ol> Bash<pre><code># Replace $USER by your username\nadd_entry -password -k 0 -f -p $USER\nwkt /data/user/$USER/.k5/krb5.keytab\nexit\n</code></pre> <p>Please note: * That you will need to add your password once. This step is required for generating the keytab file. * <code>ktutil</code>does not report an error if you enter a wrong password! You can test with the <code>kinit</code> command documented below. If <code>kinit</code> fails with an error message like \"pre-authentication failed\", this is usually due to a wrong password/key in the keytab file. In this case you have to remove the keytab file and re-run the <code>ktutil</code> command. See \"Updating the keytab file\" in the section below.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/kerberos/#updating-an-existing-keytab-file","title":"Updating an existing keytab file","text":"<p>After a password change you have to update your keytab:</p> <ol> <li>Remove the old keytab file</li> </ol> Bash<pre><code>rm -f ~/.k5/krb5.keytab\n</code></pre> <ol> <li>Run the <code>ktutil</code> utility:</li> </ol> Bash<pre><code>ktutil\n</code></pre> <ol> <li>In the <code>ktutil</code> console, one has to generate a keytab file as follows:</li> </ol> Bash<pre><code># Replace $USER by your username\nadd_entry -password -k 0 -f -p $USER\nwkt /data/user/$USER/.k5/krb5.keytab\nexit\n</code></pre>"},{"location":"merlin7/02-How-To-Use-Merlin/kerberos/#obtaining-tickets-by-using-keytab-files","title":"Obtaining tickets by using keytab files","text":"<p>Once the keytab is created, one can obtain kerberos tickets without being prompted for a password as follows:</p> Bash<pre><code>kinit -kt ~/.k5/krb5.keytab $USER\naklog\n</code></pre>"},{"location":"merlin7/02-How-To-Use-Merlin/kerberos/#slurm-jobs-accessing-afs","title":"Slurm jobs accessing AFS","text":"<p>Some jobs may require to access private areas in AFS. For that, having a valid keytab file is required. Then, from inside the batch script one can obtain granting tickets for Kerberos and AFS, which can be used for accessing AFS private areas.</p> <p>The steps should be the following:</p> <ul> <li>Setup <code>KRB5CCNAME</code>, which can be used to specify the location of the Kerberos5 credentials (ticket) cache. In general it should point to a shared area (<code>$HOME/.k5</code> is a good location), and is strongly recommended to generate an independent Kerberos5 credential cache (it is, creating a new credential cache per Slurm job):</li> </ul> Bash<pre><code>export KRB5CCNAME=\"$(mktemp \"$HOME/.k5/krb5cc_XXXXXX\")\"\n</code></pre> <ul> <li>To obtain a Kerberos5 granting ticket, run <code>kinit</code> by using your keytab:</li> </ul> Bash<pre><code>kinit -kt \"$HOME/.k5/krb5.keytab\" $USER@D.PSI.CH\n</code></pre> <ul> <li>To obtain a granting AFS ticket, run <code>aklog</code>:</li> </ul> Bash<pre><code>aklog\n</code></pre> <ul> <li>At the end of the job, you can remove destroy existing Kerberos tickets.</li> </ul> Bash<pre><code>kdestroy\n</code></pre>"},{"location":"merlin7/02-How-To-Use-Merlin/kerberos/#slurm-batch-script-example-obtaining-krbafs-granting-tickets","title":"Slurm batch script example: obtaining KRB+AFS granting tickets","text":""},{"location":"merlin7/02-How-To-Use-Merlin/kerberos/#example-1-independent-crendetial-cache-per-slurm-job","title":"Example 1: Independent crendetial cache per Slurm job","text":"<p>This is the recommended way. At the end of the job, is strongly recommended to remove / destroy the existing kerberos tickets.</p> Bash<pre><code>#!/bin/bash\n#SBATCH --partition=hourly # Specify 'general' or 'daily' or 'hourly'\n#SBATCH --time=01:00:00 # Strictly recommended when using 'general' partition.\n#SBATCH --output=run.out # Generate custom output file\n#SBATCH --error=run.err # Generate custom error file\n#SBATCH --nodes=1 # Uncomment and specify #nodes to use\n#SBATCH --ntasks=1 # Uncomment and specify #nodes to use\n#SBATCH --cpus-per-task=1\n#SBATCH --constraint=xeon-gold-6152\n#SBATCH --hint=nomultithread\n#SBATCH --job-name=krb5\n\nexport KRB5CCNAME=\"$(mktemp \"$HOME/.k5/krb5cc_XXXXXX\")\"\nkinit -kt \"$HOME/.k5/krb5.keytab\" $USER@D.PSI.CH\naklog\nklist\n\necho \"Here should go my batch script code.\"\n\n# Destroy Kerberos tickets created for this job only\nkdestroy\nklist\n</code></pre>"},{"location":"merlin7/02-How-To-Use-Merlin/kerberos/#example-2-shared-credential-cache","title":"Example 2: Shared credential cache","text":"<p>Some users may need/prefer to run with a shared cache file. For doing that, one needs to setup <code>KRB5CCNAME</code> from the login node session, before submitting the job.</p> Bash<pre><code>export KRB5CCNAME=\"$(mktemp \"$HOME/.k5/krb5cc_XXXXXX\")\"\n</code></pre> <p>Then, you can run one or multiple jobs scripts (or parallel job with <code>srun</code>). <code>KRB5CCNAME</code> will be propagated to the job script or to the parallel job, therefore a single credential cache will be shared amongst different Slurm runs.</p> Bash<pre><code>#!/bin/bash\n#SBATCH --partition=hourly # Specify 'general' or 'daily' or 'hourly'\n#SBATCH --time=01:00:00 # Strictly recommended when using 'general' partition.\n#SBATCH --output=run.out # Generate custom output file\n#SBATCH --error=run.err # Generate custom error file\n#SBATCH --nodes=1 # Uncomment and specify #nodes to use\n#SBATCH --ntasks=1 # Uncomment and specify #nodes to use \n#SBATCH --cpus-per-task=1\n#SBATCH --constraint=xeon-gold-6152\n#SBATCH --hint=nomultithread\n#SBATCH --job-name=krb5\n\n# KRB5CCNAME is inherit from the login node session\nkinit -kt \"$HOME/.k5/krb5.keytab\" $USER@D.PSI.CH\nsrun aklog\n\necho \"Here should go my batch script code.\"\n\necho \"No need to run 'kdestroy', as it may have to survive for running other jobs\"\n</code></pre>"},{"location":"merlin7/02-How-To-Use-Merlin/merlin-rmount/","title":"Using merlin_rmount","text":""},{"location":"merlin7/02-How-To-Use-Merlin/merlin-rmount/#using-merlin_rmount","title":"Using merlin_rmount","text":""},{"location":"merlin7/02-How-To-Use-Merlin/merlin-rmount/#background","title":"Background","text":"<p>Merlin provides a command for mounting remote file systems, called <code>merlin_rmount</code>. This provides a helpful wrapper over the Gnome storage utilities (GIO and GVFS), and provides support for a wide range of remote file formats, including - SMB/CIFS (Windows shared folders) - WebDav - AFP - FTP, SFTP - complete list</p>"},{"location":"merlin7/02-How-To-Use-Merlin/merlin-rmount/#usage","title":"Usage","text":""},{"location":"merlin7/02-How-To-Use-Merlin/merlin-rmount/#start-a-session","title":"Start a session","text":"<p>First, start a new session. This will start a new bash shell in the current terminal where you can add further commands.</p> Text Only<pre><code>$ merlin_rmount --init\n[INFO] Starting new D-Bus RMOUNT session\n\n(RMOUNT STARTED) [bliven_s@login002 ~]$\n</code></pre> <p>Note that behind the scenes this is creating a new dbus daemon. Running multiple daemons on the same login node leads to unpredictable results, so it is best not to initialize multiple sessions in parallel.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/merlin-rmount/#standard-endpoints","title":"Standard Endpoints","text":"<p>Standard endpoints can be mounted using</p> Text Only<pre><code>merlin_rmount --select-mount\n</code></pre> <p>Select the desired url using the arrow keys.</p> <p></p> <p>From this list any of the standard supported endpoints can be mounted.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/merlin-rmount/#other-endpoints","title":"Other endpoints","text":"<p>Other endpoints can be mounted using the <code>merlin_rmount --mount <endpoint></code> command.</p> <p></p>"},{"location":"merlin7/02-How-To-Use-Merlin/merlin-rmount/#accessing-files","title":"Accessing Files","text":"<p>After mounting a volume the script will print the mountpoint. It should be of the form</p> Text Only<pre><code>/run/user/$UID/gvfs/<endpoint>\n</code></pre> <p>where <code>$UID</code> gives your unix user id (a 5-digit number, also viewable with <code>id -u</code>) and <code><endpoint></code> is some string generated from the mount options.</p> <p>For convenience, it may be useful to add a symbolic link for this gvfs directory. For instance, this would allow all volumes to be accessed in ~/mnt/:</p> Text Only<pre><code>ln -s ~/mnt /run/user/$UID/gvfs\n</code></pre> <p>Files are accessible as long as the <code>merlin_rmount</code> shell remains open.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/merlin-rmount/#disconnecting","title":"Disconnecting","text":"<p>To disconnect, close the session with one of the following:</p> <ul> <li>The exit command</li> <li>CTRL-D</li> <li>Closing the terminal</li> </ul> <p>Disconnecting will unmount all volumes.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/merlin-rmount/#alternatives","title":"Alternatives","text":""},{"location":"merlin7/02-How-To-Use-Merlin/merlin-rmount/#thunar","title":"Thunar","text":"<p>Users that prefer a GUI file browser may prefer the <code>thunar</code> command, which opens the Gnome File Browser. This is also available in NoMachine sessions in the bottom bar (1). Thunar supports the same remote filesystems as <code>merlin_rmount</code>; just type the URL in the address bar (2).</p> <p></p> <p>When using thunar within a NoMachine session, file transfers continue after closing NoMachine (as long as the NoMachine session stays active).</p> <p>Files can also be accessed at the command line as needed (see 'Accessing Files' above).</p>"},{"location":"merlin7/02-How-To-Use-Merlin/merlin-rmount/#resources","title":"Resources","text":"<ul> <li>BIO docs on using these tools for transfering EM data</li> <li>Redhad docs on GVFS</li> <li>gio reference</li> </ul>"},{"location":"merlin7/02-How-To-Use-Merlin/merlin_tools/","title":"Merlin7 Tools","text":""},{"location":"merlin7/02-How-To-Use-Merlin/merlin_tools/#merlin7-tools","title":"Merlin7 Tools","text":""},{"location":"merlin7/02-How-To-Use-Merlin/merlin_tools/#about","title":"About","text":"<p>We provide tool(s) to help user get the most out of using the cluster. The tools described here are organised by use case and include usage examples.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/merlin_tools/#files-and-directories","title":"Files and Directories","text":""},{"location":"merlin7/02-How-To-Use-Merlin/merlin_tools/#merlin_quotas","title":"<code>merlin_quotas</code>","text":"<p>This tool is available on all of the login nodes and provides a brief overview of a user's filesystem quotas. These are limits which restrict how much storage (or number of files) a user can create. A generic table of filesystem quotas can be found on the Storage page.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/merlin_tools/#example-1-viewing-quotas","title":"Example #1: Viewing quotas","text":"<p>Simply calling <code>merlin_quotas</code> will show you a table of our quotas:</p> Bash Session<pre><code>$ merlin_quotas\nPath SpaceUsed SpaceQuota Space % FilesUsed FilesQuota Files %\n-------------- --------- ---------- ------- --------- ---------- -------\n/data/user 30.26G 1T 03% 367296 2097152 18%\n \u2514\u2500 <USERNAME>\n/afs/psi.ch 3.4G 9.5G 36% 0 0 00%\n \u2514\u2500 user/<USERDIR>\n/data/project 2.457T 10T 25% 58 2097152 00%\n \u2514\u2500 bio/shared\n/data/project 338.3G 10T 03% 199391 2097152 10%\n \u2514\u2500 bio/hpce\n</code></pre> <p>Tip</p> <p>You can change the width of the table by either passing <code>--no-wrap</code> (to disable wrapping of the Path) or <code>--width N</code> (to explicitly set some width by <code>N</code> characters).</p>"},{"location":"merlin7/02-How-To-Use-Merlin/merlin_tools/#example-2-project-view","title":"Example #2: Project view","text":"<p>The tool can also be used to list out information about what projects directories there are and who owns/manages these:</p> Bash Session<pre><code>$ merlin_quotas projects\nProject ID Path Owner Group\n---------- ------------------------ --------- --------------\n600000000 /data/project/bio/shared germann_e unx-merlin_adm\n600000001 /data/project/bio/hpce assman_g unx-merlin_adm\n</code></pre> <p>By default this only shows information on projects that you have access to, but to view the whole list you can pass <code>--all</code> flag:</p> Bash Session<pre><code>$ merlin_quotas projects --all\nProject ID Path Owner Group\n---------- ------------------------------- -------------- -----------------\n500000000 /data/project/general/mcnp gac-mcnp unx-mcnp_all\n500000001 /data/project/general/vis_as talanov_v unx-vis_as\n500000002 /data/project/general/mmm krack org-7302\n500000003 /data/project/general laeuch_a org-7201\n \u2514\u2500 LTC_CompPhys\n600000000 /data/project/bio/shared germann_e unx-merlin_adm\n600000001 /data/project/bio/hpce assman_g unx-merlin_adm\n600000002 /data/project/bio/abrahams abrahams_j unx-bio_abrahams\n600000003 /data/project/bio/benoit benoit_r unx-bio_benoit\n600000004 /data/project/bio/ishikawa ishikawa unx-bio_ishikawa\n600000005 /data/project/bio/kammerer kammerer_r unx-bio_kammerer\n600000006 /data/project/bio/korkhov korkhov_v unx-bio_korkhov\n600000007 /data/project/bio/luo luo_j unx-bio_luo\n600000008 /data/project/bio/mueller mueller_e unx-bio_mueller\n600000009 /data/project/bio/poghosyan poghosyan_e unx-bio_poghosyan\n600000010 /data/project/bio/schertler schertler_g unx-bio_schertler\n600000011 /data/project/bio/shivashankar shivashankar_g unx-bio_shivashan\n600000012 /data/project/bio/standfuss standfuss unx-bio_standfuss\n600000013 /data/project/bio/steinmetz steinmetz unx-bio_steinmetz\n</code></pre> <p>Tip</p> <p>As above you can change the table width by pass either <code>--no-wrap</code> or <code>--width N</code>.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/merlin_tools/#example-3-project-config","title":"Example #3: Project config","text":"<p>To make tracking quotas of projects easier, <code>merlin_quotas</code> generates a config file in your home directory which defines the projects to show when you call the tool (called <code>~/.merlin_quotas</code>).</p> <p>The config file simply contains a list (one per line) of project IDs which should be tracked. In theory any (or all available projects) can be tracked, but due to UNIX and Lustre permissions, accessing quotas information for a project you're not a member of is not possible.</p> <p>Updating the project config</p> <p>If you are added/removed from a project, you can update this config file by calling <code>merlin_quotas genconf --all-projects --force</code>. The <code>--all-projects</code> will fully check your possible membership to all projects, and the <code>--force</code> will overwrite your existing config file. You can also edit the file by hand (not recommended).</p>"},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/","title":"Remote Desktop Access to Merlin7","text":""},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#remote-desktop-access-to-merlin7","title":"Remote Desktop Access to Merlin7","text":""},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#overview","title":"Overview","text":"<p>Merlin7 NoMachine provides users with remote desktop access to the Merlin7 computing environment. This service enables users to connect to their computing resources from any location, whether they are inside the PSI network or accessing from outside via secure methods.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#accessing-merlin7-nomachine","title":"Accessing Merlin7 NoMachine","text":""},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#from-inside-psi","title":"From Inside PSI","text":"<p>If you are inside the PSI network, you can directly connect to the Merlin7 NoMachine service without the need to go through another service.</p> <ol> <li>Ensure Network Connectivity: Make sure you are connected to the PSI internal network.</li> <li>Choose Your Access Method: You can access Merlin7 using either a web browser or the NoMachine client.</li> </ol>"},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#method-1-using-a-web-browser","title":"Method 1: Using a Web Browser","text":"<p>Open your web browser and navigate to https://merlin7-nx.psi.ch:4443.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#method-2-using-the-nomachine-client","title":"Method 2: Using the NoMachine Client","text":"<p>Settings for the NoMachine client:</p> <ul> <li>Host: <code>merlin7-nx.psi.ch</code></li> <li>Port: <code>4000</code></li> <li>Protocol: <code>NX</code></li> <li>Authentication: <code>Use password authentication</code></li> </ul>"},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#from-outside-psi","title":"From Outside PSI","text":"<p>Users outside the PSI network have two options for accessing the Merlin7 NoMachine service: through <code>nx.psi.ch</code> or via a VPN connection.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#option-1-via-nxpsich","title":"Option 1: Via <code>nx.psi.ch</code>","text":"<p>Documentation about the <code>nx.psi.ch</code> service can be found here.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#using-a-web-browser","title":"Using a Web Browser","text":"<p>Open your web browser and navigate to https://nx.psi.ch.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#using-the-nomachine-client","title":"Using the NoMachine Client","text":"<p>Settings for the NoMachine client:</p> <ul> <li>Host: <code>nx.psi.ch</code></li> <li>Port: <code>4000</code></li> <li>Protocol: <code>NX</code></li> <li>Authentication: <code>Use password authentication</code></li> </ul>"},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#option-2-via-vpn","title":"Option 2: Via VPN","text":"<p>Alternatively, you can use a VPN connection to access Merlin7 as if you were inside the PSI network.</p> <ol> <li>Request VPN Access: Contact the IT department to request VPN access if you do not already have it. Submit a request through the PSI Service Now ticketing system: VPN Access (PSI employees).</li> <li>Connect to the VPN: Once access is granted, connect to the PSI VPN using your credentials.</li> <li>Access Merlin7 NoMachine: Once connected to the VPN, you can access Merlin7 using either a web browser or the NoMachine client as if you were inside the PSI network.</li> </ol>"},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#the-nomachine-client","title":"The NoMachine Client","text":""},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#installation","title":"Installation","text":""},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#windows","title":"Windows","text":"<p>The NoMachine client is available for PSI Windows computers in the Software Kiosk under the name NX Client.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#macos-and-linux","title":"macOS and Linux","text":"<p>The NoMachine client can be downloaded from NoMachine's download page.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#connection-configuration","title":"Connection Configuration","text":"<ol> <li>Launch NoMachine Client: Open the NoMachine client on your computer.</li> <li> <p>Create a New Connection: Click the Add button to create a new connection.</p> <ul> <li>On the Address tab configure:</li> <li>Name: Enter a name for your connection. This can be anything.</li> <li>Host: Enter the appropriate hostname (e.g. <code>merlin7-nx.psi.ch</code>).</li> <li>Port: Enter <code>4000</code>.</li> <li>Protocol: Select <code>NX</code>.</li> </ul> <p></p> <ul> <li>On the Configuration tab ensure:</li> <li>Authentication: Select <code>Use password authentication</code>.</li> </ul> <p></p> <ul> <li>Click the Add button to finish creating the new connection.</li> </ul> </li> </ol>"},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#authenticating","title":"Authenticating","text":"<p>When prompted, use your PSI credentials to authenticate.</p> <p></p>"},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#managing-sessions","title":"Managing Sessions","text":"<p>The Merlin7 NoMachine service is managed through a front-end server and back-end nodes, facilitating balanced and efficient access to remote desktop sessions.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#architecture-overview","title":"Architecture Overview","text":"<ul> <li>Front-End Server: <code>merlin7-nx.psi.ch</code></li> <li>Serves as the entry point for users connecting to the NoMachine service.</li> <li> <p>Handles load-balancing and directs users to available back-end nodes.</p> </li> <li> <p>Back-End Nodes:</p> </li> <li><code>login001.merlin7.psi.ch</code></li> <li><code>login002.merlin7.psi.ch</code></li> <li>These nodes host the NoMachine desktop service and manage the individual desktop sessions.</li> </ul> <p>Access to the login node desktops must be initiated through the <code>merlin7-nx.psi.ch</code> front-end. The front-end service will distribute sessions across available nodes in the back-end, ensuring optimal resource usage.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#opening-nomachine-desktop-sessions","title":"Opening NoMachine Desktop Sessions","text":"<p>When connecting to the <code>merlin7-nx.psi.ch</code> front-end, a new session automatically opens if no existing session is found. Users can manage their sessions as follows:</p> <ul> <li>Reconnect to an Existing Session: If you have an active session, you can reconnect to it by selecting the appropriate icon in the NoMachine client interface. This allows you to resume work without losing any progress. </li> <li>Create a Second Session: If you require a separate session, you can select the <code>New Desktop</code> button. This option creates a second session on another login node, provided the node is available and operational.</li> </ul>"},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#session-management-considerations","title":"Session Management Considerations","text":"<ul> <li>Load Balancing: The front-end service ensures that sessions are evenly distributed across the available back-end nodes to optimize performance and resource utilization.</li> <li>Session Limits: Users are limited to one session per back-end node to maintain system stability and efficiency.</li> </ul>"},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#support-and-resources","title":"Support and Resources","text":"<p>If you encounter any issues or need further assistance with the Merlin7 NoMachine service, support is available via email. Please contact us at merlin-admins@lists.psi.ch, and our support team will be happy to assist you.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#advanced-display-settings","title":"Advanced Display Settings","text":"<p>NoMachine provides several options to optimize the display settings for better performance and clarity. These settings can be accessed and adjusted when creating a new session or by clicking the top right corner of a running session.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/nomachine/#prevent-rescaling","title":"Prevent Rescaling","text":"<p>Preventing rescaling can help eliminate \"blurriness\" in your display, though it may affect performance. Adjust these settings based on your performance needs:</p> <ul> <li>Display: Choose <code>Resize remote display</code> (forces 1:1 pixel sizes)</li> <li>Display > Change settings > Quality: Choose medium-best quality</li> <li>Display > Change settings > Modify the advanced display settings</li> <li>Check: Disable network-adaptive display quality (turns off lossy compression)</li> <li>Check: Disable client side image post-processing</li> </ul>"},{"location":"merlin7/02-How-To-Use-Merlin/software-repositories/","title":"Software repositories","text":""},{"location":"merlin7/02-How-To-Use-Merlin/software-repositories/#software-repositories","title":"Software repositories","text":""},{"location":"merlin7/02-How-To-Use-Merlin/software-repositories/#module-systems-in-merlin7","title":"Module Systems in Merlin7","text":"<p>Merlin7 provides a modular environment to ensure flexibility, compatibility, and optimized performance. The system supports three primary module types: PSI Environment Modules (PModules), Spack Modules, and Cray Environment Modules.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/software-repositories/#psi-environment-modules-pmodules","title":"PSI Environment Modules (PModules)","text":"<p>The PModules system, developed by PSI, is the officially supported module system on Merlin7. It is the preferred choice for accessing validated software across a wide range of applications.</p> <p>Key Features: * Expert Deployment: Each package is deployed and maintained by specific experts to ensure reliability and compatibility. * Broad Availability: Commonly used software, such as OpenMPI, ANSYS, MATLAB, and other, is provided within PModules. * Custom Requests: If a package, version, or feature is missing, users can contact the support team to explore feasibility for installation.</p> <p>Tip</p> <p>For further information about PModules on Merlin7 please refer to the PSI Modules chapter.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/software-repositories/#spack-modules","title":"Spack Modules","text":"<p>Merlin7 also provides Spack modules, offering a modern and flexible package management system. Spack supports a wide variety of software packages and versions. For more information, refer to the external PSI Spack documentation.</p> <p>Tip</p> <p>For further information about Spack on Merlin7 please refer to the Spack chapter.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/software-repositories/#cray-environment-modules","title":"Cray Environment Modules","text":"<p>Merlin7 also supports Cray Environment Modules, which include compilers, MPI implementations, and libraries optimized for Cray systems. However, Cray modules are not recommended as the default choice due to potential backward compatibility issues when the Cray Programming Environment (CPE) is upgraded to a newer version.</p> <p>Recommendations: * Compiling Software: Cray modules can be used when optimization for Cray hardware is essential. * General Use: For most applications, prefer PModules, which ensure stability, backward compatibility, and long-term support.</p> <p>Tip</p> <p>For further information about CPE on Merlin7 please refer to the Cray Modules chapter.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/ssh-keys/","title":"Configuring SSH Keys in Merlin","text":""},{"location":"merlin7/02-How-To-Use-Merlin/ssh-keys/#configuring-ssh-keys-in-merlin","title":"Configuring SSH Keys in Merlin","text":"<p>Merlin users sometimes will need to access the different Merlin services without being constantly requested by a password. One can achieve that with Kerberos authentication, however in some cases some software would require the setup of SSH Keys. One example is ANSYS Fluent, which, when used interactively, the way of communication between the GUI and the different nodes is through the SSH protocol, and the use of SSH Keys is enforced.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/ssh-keys/#setting-up-ssh-keys-on-merlin","title":"Setting up SSH Keys on Merlin","text":"<p>For security reason, users must always protect SSH Keys with a passphrase.</p> <p>User can check whether a SSH key already exists. These would be placed in the ~/.ssh/ directory. <code>RSA</code> encryption is usually the default one, and files in there would be <code>id_rsa</code> (private key) and <code>id_rsa.pub</code> (public key).</p> Bash<pre><code>ls ~/.ssh/id*\n</code></pre> <p>For creating SSH RSA Keys, one should:</p> <ol> <li>Run <code>ssh-keygen</code>, a password will be requested twice. You must remember this password for the future.<ul> <li>Due to security reasons, always try protecting it with a password. There is only one exception, when running ANSYS software, which in general should not use password to simplify the way of running the software in Slurm.</li> <li>This will generate a private key id_rsa, and a public key id_rsa.pub in your ~/.ssh directory.</li> </ul> </li> <li>Add your public key to the <code>authorized_keys</code> file, and ensure proper permissions for that file, as follows:</li> </ol> Bash<pre><code>cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys\nchmod 0600 ~/.ssh/authorized_keys\n</code></pre> <ol> <li>Configure the SSH client in order to force the usage of the psi.ch domain for trusting keys:</li> </ol> Bash<pre><code>echo \"CanonicalizeHostname yes\" >> ~/.ssh/config\n</code></pre> <ol> <li>Configure further SSH options as follows:</li> </ol> Bash<pre><code>echo \"AddKeysToAgent yes\" >> ~/.ssh/config\necho \"ForwardAgent yes\" >> ~/.ssh/config\n</code></pre> <p>Other options may be added.</p> <ol> <li>Check that your SSH config file contains at least the lines mentioned in steps 3 and 4:</li> </ol> Bash Session<pre><code># cat ~/.ssh/config\nCanonicalizeHostname yes\nAddKeysToAgent yes\nForwardAgent yes\n</code></pre>"},{"location":"merlin7/02-How-To-Use-Merlin/ssh-keys/#using-the-ssh-keys","title":"Using the SSH Keys","text":""},{"location":"merlin7/02-How-To-Use-Merlin/ssh-keys/#using-authentication-agent-in-ssh-session","title":"Using Authentication Agent in SSH session","text":"<p>By default, when accessing the login node via SSH (with <code>ForwardAgent=yes</code>), it will automatically add your SSH Keys to the authentication agent. Hence, no actions should not be needed by the user. One can configure <code>ForwardAgent=yes</code> as follows:</p> <ul> <li>(Recommended) In your local Linux (workstation, laptop or desktop) add the following line in the <code>$HOME/.ssh/config</code> (or alternatively in <code>/etc/ssh/ssh_config</code>) file:</li> </ul> Text Only<pre><code>ForwardAgent yes\n</code></pre> <ul> <li>Alternatively, on each SSH you can add the option <code>ForwardAgent=yes</code> in the SSH command. In example:</li> </ul> Bash<pre><code>ssh -XY -o ForwardAgent=yes login001.merlin7.psi.ch\n</code></pre> <p>If <code>ForwardAgent</code> is not enabled as shown above, one needs to run the authentication agent and then add your key to the ssh-agent. This must be done once per SSH session, as follows:</p> <ul> <li>Run <code>eval $(ssh-agent -s)</code> to run the ssh-agent in that SSH session</li> <li>Check whether the authentication agent has your key already added:</li> </ul> Bash<pre><code>ssh-add -l | grep \"/data/user/$(whoami)/.ssh\"\n</code></pre> <ul> <li>If no key is returned in the previous step, you have to add the private key identity to the authentication agent. You will be requested for the passphrase of your key, and it can be done by running:</li> </ul> Bash<pre><code>ssh-add\n</code></pre>"},{"location":"merlin7/02-How-To-Use-Merlin/ssh-keys/#using-authentication-agent-in-nomachine-session","title":"Using Authentication Agent in NoMachine Session","text":"<p>By default, when using a NoMachine session, the <code>ssh-agent</code> should be automatically started. Hence, there is no need of starting the agent or forwarding it.</p> <p>However, for NoMachine one always need to add the private key identity to the authentication agent. This can be done as follows:</p> <ol> <li>Check whether the authentication agent has already the key added:</li> </ol> <p></p>Bash<pre><code>ssh-add -l | grep \"/data/user/$(whoami)/.ssh\"\n</code></pre> 2. If no key is returned in the previous step, you have to add the private key identity to the authentication agent. You will be requested for the passphrase of your key, and it can be done by running:<p></p> Bash<pre><code>ssh-add\n</code></pre> <p>You just need to run it once per NoMachine session, and it would apply to all terminal windows within that NoMachine session.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/ssh-keys/#troubleshooting","title":"Troubleshooting","text":""},{"location":"merlin7/02-How-To-Use-Merlin/ssh-keys/#errors-when-running-ssh-add","title":"Errors when running 'ssh-add'","text":"<p>If the error <code>Could not open a connection to your authentication agent.</code> appears when running <code>ssh-add</code>, it means that the authentication agent is not running. Please follow the previous procedures for starting it.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/ssh-keys/#addupdate-ssh-rsa-key-password","title":"Add/Update SSH RSA Key password","text":"<p>If an existing SSH Key does not have password, or you want to update an existing password with a new one, you can do it as follows:</p> Bash<pre><code>ssh-keygen -p -f ~/.ssh/id_rsa\n</code></pre>"},{"location":"merlin7/02-How-To-Use-Merlin/ssh-keys/#ssh-keys-deployed-but-not-working","title":"SSH Keys deployed but not working","text":"<p>Please ensure proper permissions of the involved files, as well as any typos in the file names involved:</p> Bash<pre><code>chmod u+rwx,go-rwx,g+s ~/.ssh\nchmod u+rw-x,go-rwx ~/.ssh/authorized_keys\nchmod u+rw-x,go-rwx ~/.ssh/id_rsa\nchmod u+rw-x,go+r-wx ~/.ssh/id_rsa.pub\n</code></pre>"},{"location":"merlin7/02-How-To-Use-Merlin/ssh-keys/#testing-ssh-keys","title":"Testing SSH Keys","text":"<p>Once SSH Key is created, for testing that the SSH Key is valid, one can do the following:</p> <ol> <li>Create a new SSH session in one of the login nodes:</li> </ol> Bash<pre><code>ssh login001\n</code></pre> <ol> <li>In the login node session, destroy any existing Kerberos ticket or active SSH Key:</li> </ol> Bash<pre><code>kdestroy\nssh-add -D\n</code></pre> <ol> <li>Add the new private key identity to the authentication agent. You will be requested by the passphrase.</li> </ol> Bash<pre><code>ssh-add\n</code></pre> <ol> <li>Check that your key is active by the SSH agent:</li> </ol> Bash<pre><code>ssh-add -l\n</code></pre> <ol> <li>SSH to the second login node. No password should be requested:</li> </ol> Bash<pre><code>ssh -vvv login002\n</code></pre> <p>If the last step succeeds, then means that your SSH Key is properly setup.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/storage/","title":"Merlin7 Storage","text":""},{"location":"merlin7/02-How-To-Use-Merlin/storage/#merlin7-storage","title":"Merlin7 Storage","text":""},{"location":"merlin7/02-How-To-Use-Merlin/storage/#introduction","title":"Introduction","text":"<p>This document describes the different directories of the Merlin7 cluster.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/storage/#backup-and-data-policies","title":"Backup and data policies","text":"<ul> <li>Users are responsible for backing up their own data. Is recommended to backup the data on third party independent systems (i.e. LTS, Archive, AFS, SwitchDrive, Windows Shares, etc.).</li> <li>When a user leaves PSI, she or her supervisor/team are responsible to backup and move the data out from the cluster: every few months, the storage space will be recycled for those old users who do not have an existing and valid PSI account.</li> </ul> <p>Warning</p> <p>When a user leaves PSI and their account is removed, their storage space in Merlin may be recycled. Hence, when a user leaves PSI, they, their supervisor or team must ensure that the data is backed up to an external storage!</p>"},{"location":"merlin7/02-How-To-Use-Merlin/storage/#how-to-check-quotas","title":"How to check quotas","text":"<p>Some of the Merlin7 directories have quotas applied. A way for checking the quotas is provided with the <code>merlin_quotas</code> command. This command is useful to show all quotas for the different user storage directories and partitions (including AFS). To check your quotas, please run:</p> Bash Session<pre><code>$ merlin_quotas\nPath SpaceUsed SpaceQuota Space % FilesUsed FilesQuota Files %\n-------------- --------- ---------- ------- --------- ---------- -------\n/data/user 30.26G 1T 03% 367296 2097152 18%\n \u2514\u2500 <USERNAME>\n/afs/psi.ch 3.4G 9.5G 36% 0 0 0%\n \u2514\u2500 user/<USERDIR>\n/data/scratch 688.9M 2T 00% 368471 0 00%\n \u2514\u2500 shared\n/data/project 3.373T 11T 31% 425644 2097152 20%\n \u2514\u2500 bio/shared\n/data/project 4.142T 11T 38% 579596 2097152 28%\n \u2514\u2500 bio/hpce\n</code></pre> <p>Note</p> <p>On first use you will see a message about some configuration being generated, this is expected. Don't be surprised that it takes some time. After this using <code>merlin_quotas</code> should be faster.</p> <p>The output shows the quotas set and how much you are using of the quota, for each filesystem that has this set. Notice that some users will have one or more <code>/data/project/...</code> directories showing, depending on whether you are part of a specific PSI research group or project.</p> <p>The general quota constraints for the different directories are shown in the table below. Further details on how to use <code>merlin_quotas</code> can be found on the Tools page.</p> <p>Tip</p> <p>If you're interesting, you can retrieve the Lustre-based quota information directly by calling <code>lfs quota -h -p $(( 100000000 + $(id -u $USER) )) /data</code> directly. Using the <code>merlin_quotas</code> command is more convenient and shows all your relevant filesystem quotas.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/storage/#merlin7-directories","title":"Merlin7 directories","text":"<p>Merlin7 offers the following directory classes for users:</p> <ul> <li><code>/data/user/<username></code>: Private user home directory</li> <li><code>/data/project/general</code>: project directory for Merlin</li> <li><code>/data/project/bio/$projectname</code>: project directory for BIO</li> <li><code>/data/project/mu3e/$projectname</code>: project directory for Mu3e</li> <li><code>/data/project/meg/$projectname</code>: project directory for Mu3e</li> <li><code>/scratch</code>: Local scratch disk (only visible by the node running a job).</li> <li><code>/data/scratch/shared</code>: Shared scratch disk (visible from all nodes).</li> </ul> <p>Tip</p> <p>In Lustre there is a concept called grace time. Filesystems have a block (amount of data) and inode (number of files) quota. These quotas contain a soft and hard limits. Once the soft limit is reached, users can keep writing up to their hard limit quota during the grace period. Once the grace time or hard limit are reached, users will be unable to write and will need remove data below the soft limit (or ask for a quota increase when this is possible, see below table).</p> <p>Properties of the directory classes:</p> Directory Block Quota [Soft:Hard] Inode Quota [Soft:Hard] GraceTime Quota Change Policy: Block Quota Change Policy: Inodes Backup /data/user/$username PRJ [1TB:1.074TB] PRJ [2M:2.1M] 7d Immutable. Need a project. Changeable when justified. no /data/project/bio/$projectname PRJ [1TB:1.074TB] PRJ [1M:1.1M] 7d Subject to project requirements. Subject to project requirements. no /data/project/general/$projectname PRJ [1TB:1.074TB] PRJ [1M:1.1M] 7d Subject to project requirements. Subject to project requirements. no /data/scratch/shared USR [512GB:2TB] 7d Up to x2 when strongly justified. Changeable when justified. no /scratch Undef Undef N/A N/A N/A no <p>Warning</p> <p>The use of <code>/scratch</code> and <code>/data/scratch/shared</code> areas as an extension of the quota is forbidden. The <code>/scratch</code> and <code>/data/scratch/shared</code> areas must not contain final data. Keep in mind that auto cleanup policies in the <code>/scratch</code> and <code>/data/scratch/shared</code> areas are applied.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/storage/#user-home-directory","title":"User home directory","text":"<p>This is the default directory users will land when login in to any Merlin7 machine. It is intended for your scripts, documents, software development and data. Do not use it for I/O-hungry tasks.</p> <p>The home directories are mounted in the login and computing nodes under the directory</p> <p>```bash /data/user/$username </p>Text Only<pre><code>Directory policies:\n\n* Read **[Important: Code of Conduct](../01-Quick-Start-Guide/code-of-conduct.md)** for more information about Merlin7 policies.\n* Is **forbidden** to use the home directories for IO-intensive tasks, instead use one of the **[scratch](storage.md#scratch-directories)** areas instead!\n* No backup policy is applied for the user home directories: **users are responsible for backing up their data**.\n\nHome directory quotas are defined in a per Lustre project basis. The quota can be checked using the `merlin_quotas` command described\n[above](storage.md#how-to-check-quotas).\n\n### Project data directory\n\nThis storage is intended for keeping large amounts of a project's data, where the data also can be\nshared by all members of the project (the project's corresponding UNIX group). We recommend to keep most data in\nproject related storage spaces, since it allows users to coordinate. Also, project spaces have more flexible policies\nregarding extending the available storage space.\n\nScientists can request a Merlin project space as described in **[[Accessing Merlin -> Requesting a Project]](../01-Quick-Start-Guide/requesting-projects.md)**.\nBy default, Merlin can offer **general** project space, centrally covered, as long as it does not exceed 10TB (otherwise, it has to be justified).\nGeneral Merlin projects might need to be reviewed after one year of their creation.\n\nOnce a Merlin project is created, the directory will be mounted in the login and computing nodes under the directory:\n\n```bash\n/data/project/general/$projectname\n</code></pre><p></p> <p>Project quotas are defined in a per Lustre project basis. Users can check the project quota by running the following command:</p> Bash<pre><code>lfs quota -h -p $projectid /data\n</code></pre> <p>Warning</p> <p>Checking quotas for the Merlin projects is not yet possible. In the future, a list of <code>projectid</code> will be provided, so users can check their quotas.</p> <p>Directory policies:</p> <ul> <li>Read Important: Code of Conduct for more information about Merlin7 policies.</li> <li>It is forbidden to use the data directories as <code>/scratch</code> area during a job's runtime, i.e. for high throughput I/O for a job's temporary files.<ul> <li>Please Use <code>/scratch</code>, <code>/data/scratch/shared</code> for this purpose.</li> </ul> </li> <li>No backups: users are responsible for managing the backups of their data directories.</li> </ul>"},{"location":"merlin7/02-How-To-Use-Merlin/storage/#dedicated-project-directories","title":"Dedicated project directories","text":"<p>Some departments or divisions have bigger storage space requirements on Merlin7. At present, <code>bio</code>, <code>mu3e</code> and <code>meg</code> are the main ones. These are mounted under the following paths:</p> Bash<pre><code>/data/project/bio\n/data/project/mu3e\n/data/project/meg\n</code></pre> <p>They follow the same rules as the general projects, except that they have assigned more space.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/storage/#scratch-directories","title":"Scratch directories","text":"<p>There are two different types of scratch storage: local (<code>/scratch</code>) and shared (<code>/data/scratch/shared</code>).</p> <ul> <li>local scratch should be used for all jobs that do not require the scratch files to be accessible from multiple nodes, which is trivially true for all jobs running on a single node. Mount path:</li> </ul> Bash<pre><code>/scratch\n</code></pre> <ul> <li>shared scratch is intended for files that need to be accessible by multiple nodes, e.g. by a MPI-job where tasks are spread out over the cluster and all tasks need to do I/O on the same temporary files.</li> </ul> Bash<pre><code>/data/scratch/shared\n</code></pre> <p>Scratch directories policies:</p> <ul> <li>Read Important: Code of Conduct for more information about Merlin7 policies.</li> <li>By default, always use local first and only use shared if your specific use case requires it.</li> <li>Temporary files must be deleted at the end of the job by the user.<ul> <li>Remaining files will be deleted by the system if detected.</li> <li>Files not accessed within 28 days will be automatically cleaned up by the system.</li> <li>If for some reason the scratch areas get full, admins have the rights to cleanup the oldest data.</li> </ul> </li> </ul>"},{"location":"merlin7/02-How-To-Use-Merlin/transfer-data/","title":"Transferring Data","text":""},{"location":"merlin7/02-How-To-Use-Merlin/transfer-data/#transferring-data","title":"Transferring Data","text":""},{"location":"merlin7/02-How-To-Use-Merlin/transfer-data/#overview","title":"Overview","text":"<p>Most data transfer methods support both sending and receiving, so you may initiate the transfer from either Merlin or the other system \u2014 depending on network visibility.</p> <ul> <li>From PSI Network to Merlin: Merlin login nodes are visible from the PSI network, so direct transfers using <code>rsync</code>, or ftp are generally preferable. Transfers from Merlin7 to PSI may require special firewall rules.</li> <li>From Merlin to the Internet: Merlin login nodes can access the internet with a limited set of protocols:<ul> <li>HTTP-based protocols on ports <code>80</code> or <code>445</code> (e.g., HTTPS, WebDAV).</li> <li>Other protocols (e.g., SSH, FTP, rsync daemon mode) require admin configuration, may only work with specific hosts, and might need new firewall rules.</li> </ul> </li> <li>From the Internet to PSI: Systems outside PSI can access the PSI Data Transfer Service at <code>datatransfer.psi.ch</code> using SSH-based protocols or Globus.</li> </ul> <p>Note</p> <p>SSH-based protocols using port <code>22</code> to most PSI servers are generally not permitted. However, transfers from any PSI host to Merlin7 using port 22 are allowed. Port <code>21</code> is also available for FTP transfers from PSI to Merlin7.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/transfer-data/#choosing-the-best-transfer-method","title":"Choosing the best transfer method","text":"Scenario Recommended Method Reason Small dataset, Linux/macOS <code>rsync</code> Resume support, skips existing files, works over SSH Quick one-time small transfer <code>scp</code> Simple syntax, no need to install extra tools Large dataset, high speed needed (not sensitive) FTP via <code>service03.merlin7.psi.ch</code> Fastest transfer speed (unencrypted data channel) Large dataset, high speed needed (sensitive data) FTP via <code>ftp-encrypted.merlin7.psi.ch</code> Encrypted control & data channels for security, but slower than <code>service03</code> Windows interactive GUI transfer WinSCP User-friendly interface, PSI Software Kiosk, supports drag-and-drop Cross-platform interactive GUI transfer FileZilla User-friendly interface, works on Linux/macOS/Windows, supports drag-and-drop From the internet to PSI PSI Data Transfer Service Supports SSH-based protocols and Globus Need for sharing large files SWITCHfilesender Supports sharing large file and expiration date PSI -> Merlin7 over FTP Any FTP-based client Port 21 allowed from PSI to Merlin7 PSI -> Merlin7 over SSH Any SSH-based method Port 22 allowed from PSI to Merlin7 <p>The next chapters contain detailed information about the different transfer methods available on Merlin7.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/transfer-data/#direct-transfer-via-merlin7-login-nodes","title":"Direct Transfer via Merlin7 Login Nodes","text":"<p>The following methods transfer data directly via the login nodes. They are suitable for use from within the PSI network.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/transfer-data/#rsync-recommended-for-linuxmacos","title":"Rsync (Recommended for Linux/macOS)","text":"<p>Rsync is the preferred method for small datasets from Linux/macOS systems. It supports resuming interrupted transfers and skips already transferred files. Syntax:</p> Bash<pre><code>rsync -avAHXS <src> <dst>\n</code></pre> <p>An example for transferring local files to a Merlin project directory</p> Bash<pre><code>rsync -avAHXS ~/localdata $USER@login001.merlin7.psi.ch:/data/project/general/myproject/\n</code></pre> <p>Tip</p> <p>If a transfer is interrupted, just rerun the command: <code>rsync</code> will skip existing files.</p> <p>Warning</p> <p>Rsync uses SSH (port 22). For large datasets, transfer speed might be limited.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/transfer-data/#scp","title":"SCP","text":"<p>SCP works similarly to <code>rsync</code> but does not support resuming interrupted transfers. It may be used for quick one-off transfers. Example:</p> Bash<pre><code>scp ~/localfile.txt $USER@login001.merlin7.psi.ch:/data/project/general/myproject/\n</code></pre>"},{"location":"merlin7/02-How-To-Use-Merlin/transfer-data/#secure-ftp","title":"Secure FTP","text":"<p>A <code>vsftpd</code> service is available on the login nodes, providing high-speed transfers. Choose the server based on your speed vs. encryption needs:</p> <ul> <li><code>login001.merlin7.psi.ch</code>: Encrypted control & data channels. Use if your data is sensitive. Slower, but secure.</li> <li><code>service03.merlin7.psi.ch</code>: Encrypted control channel only. Use if your data can be transferred unencrypted. Fastest method.</li> </ul> <p>Tip</p> <p>The control channel is always encrypted, therefore, authentication is encrypted and secured.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/transfer-data/#ui-based-clients-for-data-transfer","title":"UI-based Clients for Data Transfer","text":""},{"location":"merlin7/02-How-To-Use-Merlin/transfer-data/#winscp-windows","title":"WinSCP (Windows)","text":"<p>Available in the Software Kiosk on PSI Windows machines.</p> <ul> <li>Using your PSI credentials, connect to<ul> <li>when using port 22, connect to <code>login001.merlin7.psi.ch</code> or <code>login002.merlin7.psi.ch</code>.</li> <li>when using port 21, connect to:<ul> <li><code>ftp-encrypted.merlin7.psi.ch</code>: Fast transfer rates. Both control and data channels encrypted.</li> <li><code>service03.merlin7.psi.ch</code>: Fastest transfer rates, but data channel not encrypted.</li> </ul> </li> </ul> </li> <li> <p>Drag and drop files between your PC and Merlin.</p> </li> <li> <p>FTP (port 21)</p> </li> </ul>"},{"location":"merlin7/02-How-To-Use-Merlin/transfer-data/#filezilla-linuxmacoswindows","title":"FileZilla (Linux/MacOS/Windows)","text":"<p>Download from FileZilla Project, or install from your Linux software repositories if available.</p> <ul> <li>Using your PSI credentials, connect to<ul> <li>when using port 22, connect to <code>login001.merlin7.psi.ch</code> or <code>login002.merlin7.psi.ch</code>.</li> <li>when using port 21, connect to:<ul> <li><code>ftp-encrypted.merlin7.psi.ch</code>: Fast transfer rates. Both control and data channels encrypted.</li> <li><code>service03.merlin7.psi.ch</code>: Fastest transfer rates, but data channel not encrypted.</li> </ul> </li> </ul> </li> <li>Supports drag-and-drop file transfers.</li> </ul>"},{"location":"merlin7/02-How-To-Use-Merlin/transfer-data/#sharing-files-with-switchfilesender","title":"Sharing Files with SWITCHfilesender","text":"<p>SWITCHfilesender is a Swiss-hosted installation of the FileSender project \u2014 a web-based application that allows authenticated users to securely and easily send arbitrarily large files to other users. Features:</p> <ul> <li>Secure large file transfers: Send files that exceed normal email attachment limits.</li> <li>Time-limited availability: Files are automatically deleted after the chosen expiration date or number of downloads.</li> <li>Voucher system: Authenticated users can send upload vouchers to external recipients without an account.</li> <li>Designed for research & education: Developed to meet the needs of universities and research institutions.</li> </ul> <p>About the authentication:</p> <ul> <li>It uses SimpleSAMLphp, supporting multiple authentication mechanisms: SAML2, LDAP, RADIUS and more.</li> <li>It's fully integrated with PSI's Authentication and Authorization Infrastructure (AAI).</li> <li>PSI employees can log in using their PSI account:</li> <li>Open SWITCHfilesender.</li> <li>Select PSI as the institution.</li> <li>Authenticate with your PSI credentials.</li> </ul> <p>The service is designed to send large files for temporary availability, not as a permanent publishing platform. Typical use case:</p> <ol> <li>Upload a file.</li> <li>Share the download link with a recipient.</li> <li>File remains available until the specified expiration date is reached, or the download limit is reached.</li> <li>The file is automatically deleted after expiration.</li> </ol> <p>Warning</p> <p>SWITCHfilesender is not a long-term storage or archiving solution.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/transfer-data/#psi-data-transfer","title":"PSI Data Transfer","text":"<p>From August 2024, Merlin is connected to the PSI Data Transfer service, <code>datatransfer.psi.ch</code>. This is a central service managed by the Linux team. However, any problems or questions related to it can be directly reported to the Merlin administrators, which will forward the request if necessary.</p> <p>The PSI Data Transfer servers supports the following protocols:</p> <ul> <li>Data Transfer - SSH (scp / rsync)</li> <li>Data Transfer - Globus</li> </ul> <p>Notice that <code>datatransfer.psi.ch</code> does not allow SSH login, only <code>rsync</code>, <code>scp</code> and Globus access is allowed.</p> <p>Access to the PSI Data Transfer uses Multi factor authentication (MFA). Therefore, having the Microsoft Authenticator App is required as explained here.</p> <p>Tip</p> <p>Please follow the Official PSI Data Transfer documentation for further instructions.</p>"},{"location":"merlin7/02-How-To-Use-Merlin/transfer-data/#connecting-to-merlin7-from-outside-psi","title":"Connecting to Merlin7 from outside PSI","text":"<p>Merlin7 is fully accessible from within the PSI network. To connect from outside you can use:</p> <ul> <li>VPN (alternate instructions)</li> <li>SSH hopx<ul> <li>Please avoid transferring big amount data through hop</li> </ul> </li> <li>No Machine<ul> <li>Remote Interactive Access through 'nx.psi.ch'</li> <li>Please avoid transferring big amount of data through NoMachine</li> </ul> </li> </ul>"},{"location":"merlin7/02-How-To-Use-Merlin/transfer-data/#connecting-from-merlin7-to-outside-file-shares","title":"Connecting from Merlin7 to outside file shares","text":""},{"location":"merlin7/02-How-To-Use-Merlin/transfer-data/#merlin_rmount-command","title":"<code>merlin_rmount</code> command","text":"<p>Merlin provides a command for mounting remote file systems, called <code>merlin_rmount</code>. This provides a helpful wrapper over the Gnome storage utilities, and provides support for a wide range of remote file formats, including</p> <ul> <li>SMB/CIFS (Windows shared folders)</li> <li>WebDav</li> <li>AFP</li> <li>FTP, SFTP</li> <li>others</li> </ul> <p>More instruction on using <code>merlin_rmount</code></p>"},{"location":"merlin7/03-Slurm-General-Documentation/interactive-jobs/","title":"Running Interactive Jobs","text":""},{"location":"merlin7/03-Slurm-General-Documentation/interactive-jobs/#running-interactive-jobs","title":"Running Interactive Jobs","text":""},{"location":"merlin7/03-Slurm-General-Documentation/interactive-jobs/#the-merlin7-interactive-partition","title":"The Merlin7 'interactive' partition","text":"<p>On the <code>merlin7</code> cluster, it is recommended to always run interactive jobs on the <code>interactive</code> partition. This partition allows CPU oversubscription (up to four users may share the same CPU) and has the highest scheduling priority. Access to this partition is typically quick, making it a convenient extension of the login nodes for interactive workloads.</p> <p>On the <code>gmerlin7</code> cluster, additional interactive partitions are available, but these are primarily intended for CPU-only workloads (such like compiling GPU-based software, or creating an allocation for submitting jobs to Grace-Hopper nodes).</p> <p>Warning</p> <p>Because GPU resources are scarce and expensive, interactive allocations on GPU nodes that use GPUs should only be submitted when strictly necessary and well justified.</p>"},{"location":"merlin7/03-Slurm-General-Documentation/interactive-jobs/#running-interactive-jobs_1","title":"Running interactive jobs","text":"<p>There are two different ways for running interactive jobs in Slurm. This is possible by using the <code>salloc</code> and <code>srun</code> commands:</p> <ul> <li><code>salloc</code>: to obtain a Slurm job allocation (a set of nodes), execute command(s), and then release the allocation when the command is finished.</li> <li><code>srun</code>: is used for running parallel tasks.</li> </ul>"},{"location":"merlin7/03-Slurm-General-Documentation/interactive-jobs/#srun","title":"srun","text":"<p>Is run is used to run parallel jobs in the batch system. It can be used within a batch script (which can be run with <code>sbatch</code>), or within a job allocation (which can be run with <code>salloc</code>). Also, it can be used as a direct command (in example, from the login nodes).</p> <p>When used inside a batch script or during a job allocation, <code>srun</code> is constricted to the amount of resources allocated by the <code>sbatch</code>/<code>salloc</code> commands. In <code>sbatch</code>, usually these resources are defined inside the batch script with the format <code>#SBATCH <option>=<value></code>. In other words, if you define in your batch script or allocation 88 tasks (and 1 thread / core) and 2 nodes, <code>srun</code> is constricted to these amount of resources (you can use less, but never exceed those limits).</p> <p>When used from the login node, usually is used to run a specific command or software in an interactive way. <code>srun</code> is a blocking process (it will block bash prompt until the <code>srun</code> command finishes, unless you run it in background with <code>&</code>). This can be very useful to run interactive software which pops up a Window and then submits jobs or run sub-tasks in the background (in example, Relion, cisTEM, etc.)</p> <p>Refer to <code>man srun</code> for exploring all possible options for that command.</p> [Show 'srun' example]: Running 'hostname' command on 3 nodes, using 2 cores (1 task/core) per node <pre>caubet_m@login001:~> srun --clusters=merlin7 --partition=interactive --ntasks=6 --ntasks-per-node=2 --nodes=3 hostname\ncn001.merlin7.psi.ch\ncn001.merlin7.psi.ch\ncn002.merlin7.psi.ch\ncn002.merlin7.psi.ch\ncn003.merlin7.psi.ch\ncn003.merlin7.psi.ch\n</pre>"},{"location":"merlin7/03-Slurm-General-Documentation/interactive-jobs/#salloc","title":"salloc","text":"<p><code>salloc</code> is used to obtain a Slurm job allocation (a set of nodes). Once job is allocated, users are able to execute interactive command(s). Once finished (<code>exit</code> or <code>Ctrl+D</code>), the allocation is released. <code>salloc</code> is a blocking command, it is, command will be blocked until the requested resources are allocated. </p> <p>When running <code>salloc</code>, once the resources are allocated, by default the user will get a new shell on one of the allocated resources (if a user has requested few nodes, it will prompt a new shell on the first allocated node). However, this behaviour can be changed by adding a shell (<code>$SHELL</code>) at the end of the <code>salloc</code> command. In example:</p> Bash<pre><code># Typical 'salloc' call\nsalloc --clusters=merlin7 --partition=interactive -N 2 -n 2\n\n# Custom 'salloc' call\n# - $SHELL will open a local shell on the login node from where ``salloc`` is running\nsalloc --clusters=merlin7 --partition=interactive -N 2 -n 2 $SHELL\n</code></pre> [Show 'salloc' example]: Allocating 2 cores (1 task/core) in 2 nodes (1 core/node) - Default <pre>caubet_m@login001:~> salloc --clusters=merlin7 --partition=interactive -N 2 -n 2\nsalloc: Granted job allocation 161\nsalloc: Nodes cn[001-002] are ready for job\n\ncaubet_m@login001:~> srun hostname\ncn002.merlin7.psi.ch\ncn001.merlin7.psi.ch\n\ncaubet_m@login001:~> exit\nexit\nsalloc: Relinquishing job allocation 161\n</pre> [Show 'salloc' example]: Allocating 2 cores (1 task/core) in 2 nodes (1 core/node) - $SHELL <pre>caubet_m@login001:~> salloc --clusters=merlin7 --partition=interactive --ntasks=2 --nodes=2 $SHELL\nsalloc: Granted job allocation 165\nsalloc: Nodes cn[001-002] are ready for job\ncaubet_m@login001:~> srun hostname\ncn001.merlin7.psi.ch\ncn002.merlin7.psi.ch\ncaubet_m@login001:~> exit\nexit\nsalloc: Relinquishing job allocation 165\n</pre>"},{"location":"merlin7/03-Slurm-General-Documentation/interactive-jobs/#running-interactive-jobs-with-x11-support","title":"Running interactive jobs with X11 support","text":""},{"location":"merlin7/03-Slurm-General-Documentation/interactive-jobs/#requirements","title":"Requirements","text":""},{"location":"merlin7/03-Slurm-General-Documentation/interactive-jobs/#graphical-access","title":"Graphical access","text":"<p>NoMachine is the official supported service for graphical access in the Merlin cluster. This service is running on the login nodes. Check the document {Accessing Merlin -> NoMachine} for details about how to connect to the NoMachine service in the Merlin cluster.</p> <p>For other non officially supported graphical access (X11 forwarding):</p> <ul> <li> <p>For Linux clients, please follow {How To Use Merlin -> Accessing from Linux Clients}</p> </li> <li> <p>For Windows clients, please follow {How To Use Merlin -> Accessing from Windows Clients}</p> </li> <li>For MacOS clients, please follow {How To Use Merlin -> Accessing from MacOS Clients}</li> </ul>"},{"location":"merlin7/03-Slurm-General-Documentation/interactive-jobs/#srun-with-x11-support","title":"'srun' with x11 support","text":"<p>Merlin6 and merlin7 clusters allow running any windows based applications. For that, you need to add the option <code>--x11</code> to the <code>srun</code> command. In example:</p> Bash<pre><code>srun --clusters=merlin7 --partition=interactive --x11 sview\n</code></pre> <p>will popup a X11 based slurm view of the cluster.</p> <p>In the same manner, you can create a bash shell with x11 support. For doing that, you need to add the option <code>--pty</code> to the <code>srun --x11</code> command. Once resource is allocated, from there you can interactively run X11 and non-X11 based commands.</p> Bash<pre><code>srun --clusters=merlin7 --partition=interactive --x11 --pty bash\n</code></pre> [Show 'srun' with X11 support examples] <pre>caubet_m@login001:~> srun --clusters=merlin7 --partition=interactive --x11 sview\n\ncaubet_m@login001:~>\n\ncaubet_m@login001:~> srun --clusters=merlin7 --partition=interactive --x11 --pty bash\n\ncaubet_m@cn003:~> sview\n\ncaubet_m@cn003:~> echo \"This was an example\"\nThis was an example\n\ncaubet_m@cn003:~> exit\nexit\n</pre>"},{"location":"merlin7/03-Slurm-General-Documentation/interactive-jobs/#salloc-with-x11-support","title":"'salloc' with x11 support","text":"<p>Merlin6 and merlin7 clusters allow running any windows based applications. For that, you need to add the option <code>--x11</code> to the <code>salloc</code> command. In example:</p> Bash<pre><code>salloc --clusters=merlin7 --partition=interactive --x11 sview\n</code></pre> <p>will popup a X11 based slurm view of the cluster.</p> <p>In the same manner, you can create a bash shell with x11 support. For doing that, you need to add to run just <code>salloc --clusters=merlin7 --partition=interactive --x11</code>. Once resource is allocated, from there you can interactively run X11 and non-X11 based commands.</p> Bash<pre><code>salloc --clusters=merlin7 --partition=interactive --x11\n</code></pre> [Show 'salloc' with X11 support examples] <pre>caubet_m@login001:~> salloc --clusters=merlin7 --partition=interactive --x11 sview\nsalloc: Granted job allocation 174\nsalloc: Nodes cn001 are ready for job\nsalloc: Relinquishing job allocation 174\n\ncaubet_m@login001:~> salloc --clusters=merlin7 --partition=interactive --x11\nsalloc: Granted job allocation 175\nsalloc: Nodes cn001 are ready for job\ncaubet_m@cn001:~>\n\ncaubet_m@cn001:~> sview\n\ncaubet_m@cn001:~> echo \"This was an example\"\nThis was an example\n\ncaubet_m@cn001:~> exit\nexit\nsalloc: Relinquishing job allocation 175\n</pre>"},{"location":"merlin7/03-Slurm-General-Documentation/merlin7-configuration/","title":"Slurm cluster 'merlin7'","text":""},{"location":"merlin7/03-Slurm-General-Documentation/merlin7-configuration/#slurm-cluster-merlin7","title":"Slurm cluster 'merlin7'","text":"<p>This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin7 cluster.</p>"},{"location":"merlin7/03-Slurm-General-Documentation/merlin7-configuration/#infrastructure","title":"Infrastructure","text":""},{"location":"merlin7/03-Slurm-General-Documentation/merlin7-configuration/#hardware","title":"Hardware","text":"Text Only<pre><code>* 2 CPU-only login nodes\n* 77 CPU-only compute nodes\n* 5 GPU A100 nodes\n* 8 GPU Grace Hopper nodes\n</code></pre> <p>The specification of the node types is:</p> Node #Nodes CPU RAM GRES Login Nodes 2 2x AMD EPYC 7742 (x86_64 Rome, 64 Cores, 2.25GHz) 512GB DDR4 3200Mhz CPU Nodes 77 2x AMD EPYC 7742 (x86_64 Rome, 64 Cores, 2.25GHz) 512GB DDR4 3200Mhz A100 GPU Nodes 5 2x AMD EPYC 7713 (x86_64 Milan, 64 Cores, 3.2GHz) 512GB DDR4 3200Mhz 4 x NV_A100 (80GB) GH GPU Nodes 3 2x NVidia Grace Neoverse-V2 (SBSA ARM 64bit, 144 Cores, 3.1GHz) 2x 480GB DDR5X (CPU+GPU) 4 x NV_GH200 (120GB)"},{"location":"merlin7/03-Slurm-General-Documentation/merlin7-configuration/#network","title":"Network","text":"<p>The Merlin7 cluster builds on top of HPE/Cray technologies, including a high-performance network fabric called Slingshot. This network fabric is able to provide up to 200 Gbit/s throughput between nodes. Further information on Slignshot can be found on at HPE and at https://www.glennklockwood.com/garden/slingshot.</p> <p>Through software interfaces like libFabric (which available on Merlin7), application can leverage the network seamlessly.</p>"},{"location":"merlin7/03-Slurm-General-Documentation/merlin7-configuration/#storage","title":"Storage","text":"<p>Unlike previous iteration of the Merlin HPC clusters, Merlin7 does not have any local storage. Instead storage for the entire cluster is provided through a dedicated storage appliance from HPE/Cray called ClusterStor.</p> <p>The appliance is built of several storage servers:</p> <ul> <li>2 management nodes</li> <li>2 MDS servers, 12 drives per server, 2.9TiB (Raid10)</li> <li>8 OSS-D servers, 106 drives per server, 14.5 T.B HDDs (Gridraid / Raid6)</li> <li>4 OSS-F servers, 12 drives per server 7TiB SSDs (Raid10)</li> </ul> <p>With effective storage capacity of:</p> <ul> <li>10 PB HDD<ul> <li>value visible on linux: HDD 9302.4 TiB</li> </ul> </li> <li>162 TB SSD<ul> <li>value visible on linux: SSD 151.6 TiB</li> </ul> </li> <li>23.6 TiB on Metadata</li> </ul> <p>The storage is directly connected to the cluster (and each individual node) through the Slingshot NIC.</p>"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/","title":"Slurm merlin7 Configuration","text":""},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#slurm-merlin7-configuration","title":"Slurm merlin7 Configuration","text":"<p>This documentation shows basic Slurm configuration and options needed to run jobs in the Merlin7 cluster.</p>"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#public-partitions-configuration-summary","title":"Public partitions configuration summary","text":""},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#cpu-public-partitions","title":"CPU public partitions","text":"PartitionName DefaultTime MaxTime Priority Account Per Job Limits Per User Limits general 1-00:00:00 7-00:00:00 Low merlin cpu=1024,mem=1920G cpu=1024,mem=1920G daily 0-01:00:00 1-00:00:00 Medium merlin cpu=1024,mem=1920G cpu=2048,mem=3840G hourly 0-00:30:00 0-01:00:00 High merlin cpu=2048,mem=3840G cpu=8192,mem=15T interactive 0-04:00:00 0-12:00:00 Highest merlin cpu=16,mem=30G,node=1 cpu=32,mem=60G,node=1"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#gpu-public-partitions","title":"GPU public partitions","text":""},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#a100-nodes","title":"A100 nodes","text":"PartitionName DefaultTime MaxTime Priority Account Per Job Limits Per User Limits a100-general 1-00:00:00 7-00:00:00 Low merlin gres/gpu=4 gres/gpu=8 a100-daily 0-01:00:00 1-00:00:00 Medium merlin gres/gpu=8 gres/gpu=8 a100-hourly 0-00:30:00 0-01:00:00 High merlin gres/gpu=8 gres/gpu=8 a100-interactive 0-01:00:00 0-12:00:00 Very High merlin cpu=16,gres/gpu=1,mem=60G,node=1 cpu=16,gres/gpu=1,mem=60G,node=1"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#grace-hopper-nodes","title":"Grace-Hopper nodes","text":"PartitionName DefaultTime MaxTime Priority Account Per Job Limits Per User Limits gh-general 1-00:00:00 7-00:00:00 Low merlin gres/gpu=4 gres/gpu=8 gh-daily 0-01:00:00 1-00:00:00 Medium merlin gres/gpu=8 gres/gpu=8 gh-hourly 0-00:30:00 0-01:00:00 High merlin gres/gpu=8 gres/gpu=8 gh-interactive 0-01:00:00 0-12:00:00 Very High merlin cpu=16,gres/gpu=1,mem=46G,node=1 cpu=16,gres/gpu=1,mem=46G,node=1"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#cpu-cluster-merlin7","title":"CPU cluster: merlin7","text":"<p>By default, jobs will be submitted to <code>merlin7</code>, as it is the primary cluster configured on the login nodes. Specifying the cluster name is typically unnecessary unless you have defined environment variables that could override the default cluster name. However, when necessary, one can specify the cluster as follows: </p>Bash<pre><code>#SBATCH --cluster=merlin7\n</code></pre><p></p>"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#cpu-general-configuration","title":"CPU general configuration","text":"<p>The Merlin7 CPU cluster is configured with the <code>CR_CORE_MEMORY</code> and <code>CR_ONE_TASK_PER_CORE</code> options.</p> <ul> <li>This configuration treats both cores and memory as consumable resources.</li> <li>Since the nodes are running with hyper-threading enabled, each core thread is counted as a CPU to fulfill a job's resource requirements.</li> </ul> <p>By default, Slurm will allocate one task per core, which means: * Each task will consume 2 CPUs, regardless of whether both threads are actively used by the job.</p> <p>This behavior ensures consistent resource allocation but may result in underutilization of hyper-threading in some cases.</p>"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#cpu-nodes-definition","title":"CPU nodes definition","text":"<p>The table below provides an overview of the Slurm configuration for the different node types in the Merlin7 cluster. This information is essential for understanding how resources are allocated, enabling users to tailor their submission scripts accordingly.</p> Nodes Sockets CoresPerSocket Cores ThreadsPerCore CPUs MaxMemPerNode DefMemPerCPU Features login[001-002] 2 64 128 2 256 480G 1920M AMD_EPYC_7713 cn[001-077] 2 64 128 2 256 480G 1920M AMD_EPYC_7713 <p>Notes on memory configuration: * Memory allocation options: To request additional memory, use the following options in your submission script: * <code>--mem=<mem_in_MB></code>: Allocates memory per node. * <code>--mem-per-cpu=<mem_in_MB></code>: Allocates memory per CPU (equivalent to a core thread).</p> <p>The total memory requested cannot exceed the <code>MaxMemPerNode</code> value. * Impact of disabling Hyper-Threading: Using the <code>--hint=nomultithread</code> option disables one thread per core, effectively halving the number of available CPUs. Consequently, memory allocation will also be halved unless explicitly adjusted.</p> <p>For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended. In such cases, you should double the <code>--mem-per-cpu</code> value to account for the reduced number of threads.</p> <p>Tip</p> <p>Always verify the Slurm <code>/var/spool/slurmd/conf-cache/slurm.conf</code> configuration file for potential changes.</p>"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#user-and-job-limits-with-qos","title":"User and job limits with QoS","text":"<p>In the <code>merlin7</code> CPU cluster, we enforce certain limits on jobs and users to ensure fair resource usage and prevent overuse by a single user or job. These limits aim to balance resource availability while maintaining overall cluster efficiency. However, applying limits can occasionally impact the cluster\u2019s utilization. For example, user-specific limits may result in pending jobs even when many nodes are idle due to low activity.</p> <p>On the other hand, these limits also enhance cluster efficiency by preventing scenarios such as a single job monopolizing all available resources, which could block other jobs from running. Without job size limits, for instance, a large job might drain the entire cluster to satisfy its resource request, a situation that is generally undesirable.</p> <p>Thus, setting appropriate limits is essential to maintain fair resource usage while optimizing cluster efficiency. These limits should allow for a mix of jobs of varying sizes and types, including single-core and parallel jobs, to coexist effectively.</p> <p>To implement these limits, we utilize Quality of Service (QoS). Different QoS policies are defined and applied to specific partitions in line with the established resource allocation policies. The table below outlines the various QoS definitions applicable to the merlin7 CPU-based cluster. Here: * <code>MaxTRES</code> specifies resource limits per job. * <code>MaxTRESPU</code> specifies resource limits per user.</p> Name MaxTRES MaxTRESPU Scope normal partition cpu_general cpu=1024,mem=1920G cpu=1024,mem=1920G user, partition cpu_daily cpu=1024,mem=1920G cpu=2048,mem=3840G partition cpu_hourly cpu=2048,mem=3840G cpu=8192,mem=15T partition cpu_interactive cpu=16,mem=30G,node=1 cpu=32,mem=60G,node=1 partition <p>Where: * <code>normal</code> QoS: This QoS has no limits and is typically applied to partitions that do not require user or job restrictions. * <code>cpu_general</code> QoS: This is the default QoS for <code>merlin7</code> users. It limits the total resources available to each user. Additionally, this QoS is applied to the <code>general</code> partition, enforcing restrictions at the partition level and overriding user-level QoS. * <code>cpu_daily</code> QoS: Guarantees increased resources for the <code>daily</code> partition, accommodating shorter-duration jobs with higher resource needs. * <code>cpu_hourly</code> QoS: Offers the least constraints, allowing more resources to be used for the <code>hourly</code> partition, which caters to very short-duration jobs. * <code>cpu_interactive</code> QoS: Is restricted to one node and a few CPUs only, and is intended to be used when interactive allocations are necessary (<code>salloc</code>, <code>srun</code>).</p> <p>For additional details, refer to the CPU partitions section.</p> <p>Tip</p> <p>Always verify QoS definitions for potential changes using the <code>sacctmgr show qos format=\"Name%22,MaxTRESPU%35,MaxTRES%35\"</code> command.</p>"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#cpu-partitions","title":"CPU partitions","text":"<p>This section provides a summary of the partitions available in the <code>merlin7</code> CPU cluster.</p> <p>Key concepts: * <code>PriorityJobFactor</code>: This value is added to a job\u2019s priority (visible in the <code>PARTITION</code> column of the <code>sprio -l</code> command). Jobs submitted to partitions with higher <code>PriorityJobFactor</code> values generally run sooner. However, other factors like job age and especially fair share can also influence scheduling. * <code>PriorityTier</code>: Jobs submitted to partitions with higher <code>PriorityTier</code> values take precedence over pending jobs in partitions with lower <code>PriorityTier</code> values. Additionally, jobs from higher <code>PriorityTier</code> partitions can preempt running jobs in lower-tier partitions, where applicable. * <code>QoS</code>: Specifies the quality of service associated with a partition. It is used to control and restrict resource availability for specific partitions, ensuring that resource allocation aligns with intended usage policies. Detailed explanations of the various QoS settings can be found in the User and job limits with QoS section.</p> <p>Tip</p> <p>Always verify partition configurations for potential changes using the <code>scontrol show partition</code> command.</p>"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#cpu-public-partitions_1","title":"CPU public partitions","text":"PartitionName DefaultTime MaxTime TotalNodes PriorityJobFactor PriorityTier QoS AllowAccounts general 1-00:00:00 7-00:00:00 46 1 1 cpu_general merlin daily 0-01:00:00 1-00:00:00 58 500 1 cpu_daily merlin hourly 0-00:30:00 0-01:00:00 77 1000 1 cpu_hourly merlin interactive 0-04:00:00 0-12:00:00 58 1 2 cpu_interactive merlin <p>All Merlin users are part of the <code>merlin</code> account, which is used as the default account when submitting jobs. Similarly, if no partition is specified, jobs are automatically submitted to the <code>general</code> partition by default.</p> <p>Tip</p> <p>For jobs running less than one day, submit them to the daily partition. For jobs running less than one hour, use the hourly partition. These partitions provide higher priority and ensure quicker scheduling compared to general, which has limited node availability.</p> <p>The <code>hourly</code> partition may include private nodes as an additional buffer. However, the current Slurm partition configuration, governed by <code>PriorityTier</code>, ensures that jobs submitted to private partitions are prioritized and processed first. As a result, access to the <code>hourly</code> partition might experience delays in such scenarios.</p> <p>The <code>interactive</code> partition is designed specifically for real-time, interactive work. Here are the key characteristics:</p> <ul> <li>CPU Oversubscription: This partition allows CPU oversubscription (configured as <code>FORCE:4</code>), meaning that up to four interactive jobs may share the same physical CPU core. This can impact performance, but enables fast access for short-term tasks.</li> <li>Highest Scheduling Priority: Jobs submitted to the interactive partition are always prioritized. They will be scheduled before any jobs in other partitions.</li> <li>Intended Use: This partition is ideal for debugging, testing, compiling, short interactive runs, and other activities where immediate access is important.</li> </ul> <p>Warning</p> <p>Because of CPU sharing, the performance on the interactive partition may not be optimal for compute-intensive tasks. For long-running or production workloads, use a dedicated batch partition instead.</p>"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#cpu-private-partitions","title":"CPU private partitions","text":""},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#cas-asa","title":"CAS / ASA","text":"PartitionName DefaultTime MaxTime TotalNodes PriorityJobFactor PriorityTier QoS AllowAccounts asa 0-01:00:00 14-00:00:00 10 1 2 normal asa"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#cnm-mu3e","title":"CNM / Mu3e","text":"PartitionName DefaultTime MaxTime TotalNodes PriorityJobFactor PriorityTier QoS AllowAccounts mu3e 1-00:00:00 7-00:00:00 4 1 2 normal mu3e, meg"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#cnm-meg","title":"CNM / MeG","text":"PartitionName DefaultTime MaxTime TotalNodes PriorityJobFactor PriorityTier QoS AllowAccounts meg-short 0-01:00:00 0-01:00:00 unlimited 1000 2 normal meg meg-long 1-00:00:00 5-00:00:00 unlimited 1 2 normal meg meg-prod 1-00:00:00 5-00:00:00 unlimited 1000 4 normal meg"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#gpu-cluster-gmerlin7","title":"GPU cluster: gmerlin7","text":"<p>As mentioned in previous sections, by default, jobs will be submitted to <code>merlin7</code>, as it is the primary cluster configured on the login nodes. For submittng jobs to the GPU cluster, the cluster name <code>gmerlin7</code> must be specified, as follows: </p>Bash<pre><code>#SBATCH --cluster=gmerlin7\n</code></pre><p></p>"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#gpu-general-configuration","title":"GPU general configuration","text":"<p>The Merlin7 GPU cluster is configured with the <code>CR_CORE_MEMORY</code>, <code>CR_ONE_TASK_PER_CORE</code>, and <code>ENFORCE_BINDING_GRES</code> options.</p> <ul> <li>This configuration treats both cores and memory as consumable resources.</li> <li>Since the nodes are running with hyper-threading enabled, each core thread is counted as a CPU to fulfill a job's resource requirements.</li> <li>Slurm will allocate the CPUs to the selected GPU.</li> </ul> <p>By default, Slurm will allocate one task per core, which means:</p> <ul> <li>For hyper-threaded nodes (NVIDIA A100-based nodes), each task will consume 2 CPUs, regardless of whether both threads are actively used by the job.</li> <li>For the NVIDIA GraceHopper-based nodes, each task will consume 1 CPU.</li> </ul> <p>This behavior ensures consistent resource allocation but may result in underutilization of hyper-threading in some cases.</p>"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#gpu-nodes-definition","title":"GPU nodes definition","text":"<p>The table below provides an overview of the Slurm configuration for the different node types in the Merlin7 cluster. This information is essential for understanding how resources are allocated, enabling users to tailor their submission scripts accordingly.</p> Nodes Sockets CoresPerSocket Cores ThreadsPerCore CPUs MaxMemPerNode DefMemPerCPU Gres Features gpu[001-007] 4 72 288 1 288 828G 2944M gpu:gh200:4 AMD_EPYC_7713, NV_A100 gpu[101-105] 1 64 64 2 128 480G 3840M gpu:nvidia_a100-sxm4-80gb:4 GH200, NV_H100 <p>Notes on memory configuration: * Memory allocation options: To request additional memory, use the following options in your submission script: * <code>--mem=<mem_in_MB></code>: Allocates memory per node. * <code>--mem-per-cpu=<mem_in_MB></code>: Allocates memory per CPU (equivalent to a core thread).</p> <p>The total memory requested cannot exceed the <code>MaxMemPerNode</code> value.</p> <ul> <li>Impact of disabling Hyper-Threading: Using the <code>--hint=nomultithread</code> option disables one thread per core, effectively halving the number of available CPUs. Consequently, memory allocation will also be halved unless explicitly adjusted.</li> </ul> <p>For MPI-based jobs, where performance generally improves with single-threaded CPUs, this option is recommended. In such cases, you should double the <code>--mem-per-cpu</code> value to account for the reduced number of threads.</p> <p>Tip</p> <p>Always verify the Slurm <code>/var/spool/slurmd/conf-cache/slurm.conf</code> configuration file for potential changes.</p>"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#user-and-job-limits-with-qos_1","title":"User and job limits with QoS","text":"<p>In the <code>gmerlin7</code> CPU cluster, we enforce certain limits on jobs and users to ensure fair resource usage and prevent overuse by a single user or job. These limits aim to balance resource availability while maintaining overall cluster efficiency. However, applying limits can occasionally impact the cluster\u2019s utilization. For example, user-specific limits may result in pending jobs even when many nodes are idle due to low activity.</p> <p>On the other hand, these limits also enhance cluster efficiency by preventing scenarios such as a single job monopolizing all available resources, which could block other jobs from running. Without job size limits, for instance, a large job might drain the entire cluster to satisfy its resource request, a situation that is generally undesirable.</p> <p>Thus, setting appropriate limits is essential to maintain fair resource usage while optimizing cluster efficiency. These limits should allow for a mix of jobs of varying sizes and types, including single-core and parallel jobs, to coexist effectively.</p> <p>To implement these limits, we utilize Quality of Service (QoS). Different QoS policies are defined and applied to specific partitions in line with the established resource allocation policies. The table below outlines the</p> <p>various QoS definitions applicable to the merlin7 CPU-based cluster. Here:</p> <ul> <li><code>MaxTRES</code> specifies resource limits per job.</li> <li><code>MaxTRESPU</code> specifies resource limits per user.</li> </ul> Name MaxTRES MaxTRESPU Scope normal partition gpu_general gres/gpu=4 gres/gpu=8 user, partition gpu_daily gres/gpu=8 gres/gpu=8 partition gpu_hourly gres/gpu=8 gres/gpu=8 partition gpu_gh_interactive cpu=16,gres/gpu=1,mem=46G,node=1 cpu=16,gres/gpu=1,mem=46G,node=1 partition gpu_a100_interactive cpu=16,gres/gpu=1,mem=60G,node=1 cpu=16,gres/gpu=1,mem=60G,node=1 partition <p>Where: * <code>normal</code> QoS: This QoS has no limits and is typically applied to partitions that do not require user or job restrictions. * <code>gpu_general</code> QoS: This is the default QoS for <code>gmerlin7</code> users. It limits the total resources available to each user. Additionally, this QoS is applied to the <code>[a100|gh]-general</code> partitions, enforcing restrictions at the partition level and overriding user-level QoS. * <code>gpu_daily</code> QoS: Guarantees increased resources for the <code>[a100|gh]-daily</code> partitions, accommodating shorter-duration jobs with higher resource needs. * <code>gpu_hourly</code> QoS: Offers the least constraints, allowing more resources to be used for the <code>[a100|gh]-hourly</code> partitions, which caters to very short-duration jobs. * <code>gpu_a100_interactive</code> & <code>gpu_gh_interactive</code> QoS: Guarantee interactive access to GPU nodes for software compilation and small testing.</p> <p>For additional details, refer to the GPU partitions section.</p> <p>Tip</p> <p>Always verify QoS definitions for potential changes using the <code>sacctmgr show qos format=\"Name%22,MaxTRESPU%35,MaxTRES%35\"</code> command.</p>"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#gpu-partitions","title":"GPU partitions","text":"<p>This section provides a summary of the partitions available in the <code>gmerlin7</code> GPU cluster.</p> <p>Key concepts: * <code>PriorityJobFactor</code>: This value is added to a job\u2019s priority (visible in the <code>PARTITION</code> column of the <code>sprio -l</code> command). Jobs submitted to partitions with higher <code>PriorityJobFactor</code> values generally run sooner. However, other factors like job age and especially fair share can also influence scheduling. * <code>PriorityTier</code>: Jobs submitted to partitions with higher <code>PriorityTier</code> values take precedence over pending jobs in partitions with lower <code>PriorityTier</code> values. Additionally, jobs from higher <code>PriorityTier</code> partitions can preempt running jobs in lower-tier partitions, where applicable. * <code>QoS</code>: Specifies the quality of service associated with a partition. It is used to control and restrict resource availability for specific partitions, ensuring that resource allocation aligns with intended usage policies. Detailed explanations of the various QoS settings can be found in the User and job limits with QoS section.</p> <p>Tip</p> <p>Always verify partition configurations for potential changes using the <code>scontrol show partition</code> command.</p>"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#a100-based-partitions","title":"A100-based partitions","text":"PartitionName DefaultTime MaxTime TotalNodes PriorityJobFactor PriorityTier QoS AllowAccounts a100-general 1-00:00:00 7-00:00:00 3 1 1 gpu_general merlin a100-daily 0-01:00:00 1-00:00:00 4 500 1 gpu_daily merlin a100-hourly 0-00:30:00 0-01:00:00 5 1000 1 gpu_hourly merlin a100-interactive 0-01:00:00 0-12:00:00 5 1 2 gpu_a100_interactive merlin <p>All Merlin users are part of the <code>merlin</code> account, which is used as the default account when submitting jobs. Similarly, if no partition is specified, jobs are automatically submitted to the <code>general</code> partition by default.</p> <p>Tip</p> <p>For jobs running less than one day, submit them to the a100-daily partition. For jobs running less than one hour, use the a100-hourly partition. These partitions provide higher priority and ensure quicker scheduling compared to a100-general, which has limited node availability.</p>"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-configuration/#gh-based-partitions","title":"GH-based partitions","text":"PartitionName DefaultTime MaxTime TotalNodes PriorityJobFactor PriorityTier QoS AllowAccounts gh-general 1-00:00:00 7-00:00:00 5 1 1 gpu_general merlin gh-daily 0-01:00:00 1-00:00:00 6 500 1 gpu_daily merlin gh-hourly 0-00:30:00 0-01:00:00 7 1000 1 gpu_hourly merlin gh-interactive 0-01:00:00 0-12:00:00 7 1 2 gpu_gh_interactive merlin <p>All Merlin users are part of the <code>merlin</code> account, which is used as the default account when submitting jobs. Similarly, if no partition is specified, jobs are automatically submitted to the <code>general</code> partition by default.</p> <p>Tip</p> <p>For jobs running less than one day, submit them to the gh-daily partition. For jobs running less than one hour, use the gh-hourly partition. These partitions provide higher priority and ensure quicker scheduling compared to gh-general, which has limited node availability.</p>"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-examples/","title":"Slurm Examples","text":""},{"location":"merlin7/03-Slurm-General-Documentation/slurm-examples/#slurm-examples","title":"Slurm Examples","text":""},{"location":"merlin7/03-Slurm-General-Documentation/slurm-examples/#single-core-based-job-examples","title":"Single core based job examples","text":"Bash<pre><code>#!/bin/bash\n#SBATCH --partition=hourly # Using 'hourly' will grant higher priority\n#SBATCH --ntasks-per-core=2 # Request the max ntasks be invoked on each core\n#SBATCH --hint=multithread # Use extra threads with in-core multi-threading\n#SBATCH --time=00:30:00 # Define max time job will run\n#SBATCH --output=myscript.out # Define your output file\n#SBATCH --error=myscript.err # Define your error file\n\nmodule purge\nmodule load $MODULE_NAME # where $MODULE_NAME is a software in PModules\nsrun $MYEXEC # where $MYEXEC is a path to your binary file\n</code></pre>"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-examples/#multi-core-based-jobs-example","title":"Multi-core based jobs example","text":""},{"location":"merlin7/03-Slurm-General-Documentation/slurm-examples/#pure-mpi","title":"Pure MPI","text":"Bash<pre><code>#!/bin/bash\n#SBATCH --job-name=purempi\n#SBATCH --partition=daily # Using 'daily' will grant higher priority\n#SBATCH --time=24:00:00 # Define max time job will run\n#SBATCH --output=%x-%j.out # Define your output file\n#SBATCH --error=%x-%j.err # Define your error file\n#SBATCH --exclusive\n#SBATCH --nodes=1\n#SBATCH --ntasks=128\n#SBATCH --hint=nomultithread\n##SBATCH --cpus-per-task=1\n\nmodule purge\nmodule load $MODULE_NAME # where $MODULE_NAME is a software in PModules\nsrun $MYEXEC # where $MYEXEC is a path to your binary file\n</code></pre>"},{"location":"merlin7/03-Slurm-General-Documentation/slurm-examples/#hybrid","title":"Hybrid","text":"Bash<pre><code>#!/bin/bash\n#SBATCH --job-name=hybrid\n#SBATCH --partition=daily # Using 'daily' will grant higher priority\n#SBATCH --time=24:00:00 # Define max time job will run\n#SBATCH --output=%x-%j.out # Define your output file\n#SBATCH --error=%x-%j.err # Define your error file\n#SBATCH --exclusive\n#SBATCH --nodes=1\n#SBATCH --ntasks=128\n#SBATCH --hint=multithread\n#SBATCH --cpus-per-task=2\n\nmodule purge\nmodule load $MODULE_NAME # where $MODULE_NAME is a software in PModules\nsrun $MYEXEC # where $MYEXEC is a path to your binary file\n</code></pre>"},{"location":"merlin7/04-Jupyterhub/jupyterhub/","title":"Jupyterhub on Merlin7","text":""},{"location":"merlin7/04-Jupyterhub/jupyterhub/#jupyterhub-on-merlin7","title":"Jupyterhub on Merlin7","text":"<p>Jupyterhub provides jupyter notebooks that are launched on cluster nodes of merlin and can be accessed through a web portal.</p>"},{"location":"merlin7/04-Jupyterhub/jupyterhub/#accessing-jupyterhub-and-launching-a-session","title":"Accessing Jupyterhub and launching a session","text":"<p>The service is available inside of PSI (or through a VPN connection) at</p> <p>https://merlin7-jupyter01.psi.ch:8000/hub/</p> <ol> <li>Login: You will be presented with a Login web page for authenticating with your PSI account.</li> <li>Spawn job: The Spawner Options page allows you to specify the properties (Slurm partition, running time,...) of the batch jobs that will be running your jupyter notebook. Once you click on the <code>Spawn</code> button, your job will be sent to the Slurm batch system. If the cluster is not currently overloaded and the resources you requested are available, your job will usually start within 30 seconds.</li> </ol>"},{"location":"merlin7/04-Jupyterhub/jupyterhub/#recommended-partitions","title":"Recommended partitions","text":"<p>Running on the <code>merlin7</code> cluster and using the <code>interactive</code> partition would in general guarantee fast access to resources. Keep in mind, that this partition has a limit of 12 hours.</p>"},{"location":"merlin7/04-Jupyterhub/jupyterhub/#requesting-additional-resources","title":"Requesting additional resources","text":"<p>The Spawner Options page covers the most common options. These are used to create a submission script for the jupyterhub job and submit it to the slurm queue. Additional customization can be implemented using the 'Optional user defined line to be added to the batch launcher script' option. This line is added to the submission script at the end of other <code>#SBATCH</code> lines. Parameters can be passed to SLURM by starting the line with <code>#SBATCH</code>, like in Running Slurm Scripts. Some ideas:</p> <p>Request additional memory</p> Text Only<pre><code>#SBATCH --mem=100G\n</code></pre> <p>Request multiple GPUs (gpu partition only)</p> Text Only<pre><code>#SBATCH --gpus=2\n</code></pre> <p>Log additional information</p> Text Only<pre><code>hostname; date; echo $USER\n</code></pre> <p>Output is found in <code>~/jupyterhub_batchspawner_<jobid>.log</code>.</p>"},{"location":"merlin7/04-Jupyterhub/jupyterhub/#contact","title":"Contact","text":"<p>In case of problems or requests, please either submit a PSI Service Now incident containing \"Merlin Jupyterhub\" as part of the subject, or contact us by mail through merlin-admins@lists.psi.ch.</p>"},{"location":"merlin7/05-Software-Support/ansys-rsm/","title":"ANSYS RSM (Remote Resolve Manager)","text":""},{"location":"merlin7/05-Software-Support/ansys-rsm/#ansys-rsm-remote-resolve-manager","title":"ANSYS RSM (Remote Resolve Manager)","text":""},{"location":"merlin7/05-Software-Support/ansys-rsm/#ansys-remote-resolve-manager","title":"ANSYS Remote Resolve Manager","text":"<p>ANSYS Remote Solve Manager (RSM) is used by ANSYS Workbench to submit computational jobs to HPC clusters directly from Workbench on your desktop.</p> <p>Warning</p> <p>Merlin7 is running behind a firewall, however, there are firewall policies in place to access the Merlin7 ANSYS RSM service from the main PSI networks. If you can not connect to it, please contact us, and please provide the IP address for the corresponding workstation: we will check the PSI firewall rules in place and request for an update if necessary.</p>"},{"location":"merlin7/05-Software-Support/ansys-rsm/#the-merlin7-rsm-service","title":"The Merlin7 RSM service","text":"<p>A RSM service is running on a dedicated Virtual Machine server. This service will listen a specific port and will process any request using RSM (in example, from ANSYS users workstations). The following nodes are configured with such services: * <code>service03.merlin7.psi.ch</code></p> <p>The earliest version supported in the Merlin7 cluster is ANSYS/2022R2. Older versions are not supported due to existing bugs or missing functionalities. In case you strongly need to run an older version, please do not hesitate to contact the Merlin admins.</p>"},{"location":"merlin7/05-Software-Support/ansys-rsm/#configuring-rsm-client-on-windows-workstations","title":"Configuring RSM client on Windows workstations","text":"<p>Users can setup ANSYS RSM in their workstations to connect to the Merlin7 cluster. The different steps and settings required to make it work are that following:</p> <ol> <li>Open the RSM Configuration service in Windows for the ANSYS release you want to configure.</li> <li>Right-click the HPC Resources icon followed by Add HPC Resource... </li> <li>In the HPC Resource tab, fill up the corresponding fields as follows: </li> <li> <p>\"Name\": Add here the preffered name for the cluster. For example: <code>Merlin7 cluster</code></p> </li> <li> <p>\"HPC Type\": Select <code>SLURM</code></p> </li> <li>\"Submit host\": <code>service03.merlin7.psi.ch</code></li> <li>\"Slurm Job submission arguments (optional)\": Add any required Slurm options for running your jobs.<ul> <li><code>--hint=nomultithread</code> must be present.</li> <li><code>--exclusive</code> must also be present for now, due to a bug in the <code>Slingshot</code> interconnect which does not allow running shared nodes.</li> </ul> </li> <li>Check \"Use SSH protocol for inter and intra-node communication (Linux only)\"</li> <li>Select \"Able to directly submit and monitor HPC jobs\".</li> <li>\"Apply\" changes.</li> <li>In the \"File Management\" tab, fill up the corresponding fields as follows: </li> <li>Select \"RSM internal file transfer mechanism\" and add <code>/data/scratch/shared</code> as the \"Staging directory path on Cluster\"</li> <li>Select \"Scratch directory local to the execution node(s)\" and add <code>/scratch</code> as the HPC scratch directory.</li> <li>Never check the option \"Keep job files in the staging directory when job is complete\" if the previous option \"Scratch directory local to the execution node(s)\" was set.</li> <li>\"Apply\" changes.</li> <li>In the \"Queues\" tab, use the left button to auto-discover partitions </li> <li>If no authentication method was configured before, an authentication window will appear. Use your PSI account to authenticate. Notice that the <code>PSICH\\</code> prefix must not be added. </li> <li>From the partition list, select the ones you want to typically use.<ul> <li>In general, standard Merlin users must use <code>hourly</code>, <code>daily</code> and <code>general</code> only.</li> <li>Other partitions are reserved for allowed users only.</li> </ul> </li> <li>\"Apply\" changes. </li> <li>[Optional] You can perform a test by submitting a test job on each partition by clicking on the Submit button for each selected partition.</li> </ol> <p>Tip</p> <p>In the future, we might provide this service also from the login nodes for better transfer performance.</p>"},{"location":"merlin7/05-Software-Support/ansys-rsm/#using-rsm-in-ansys","title":"Using RSM in ANSYS","text":"<p>Using the RSM service in ANSYS is slightly different depending on the ANSYS software being used. Please follow the official ANSYS documentation for details about how to use it for that specific software.</p> <p>Alternativaly, please refer to some the examples showed in the following chapters (ANSYS specific software).</p>"},{"location":"merlin7/05-Software-Support/ansys-rsm/#using-rsm-in-ansys-fluent","title":"Using RSM in ANSYS Fluent","text":"<p>For further information for using RSM with Fluent, please visit the ANSYS RSM section.</p>"},{"location":"merlin7/05-Software-Support/ansys-rsm/#using-rsm-in-ansys-cfx","title":"Using RSM in ANSYS CFX","text":"<p>For further information for using RSM with CFX, please visit the ANSYS RSM section.</p>"},{"location":"merlin7/05-Software-Support/ansys-rsm/#using-rsm-in-ansys-mapdl","title":"Using RSM in ANSYS MAPDL","text":"<p>For further information for using RSM with MAPDL, please visit the ANSYS RSM section.</p>"},{"location":"merlin7/05-Software-Support/ansys/","title":"ANSYS","text":""},{"location":"merlin7/05-Software-Support/ansys/#ansys","title":"ANSYS","text":"<p>This document describes generic information of how to load and run ANSYS software in the Merlin cluster</p>"},{"location":"merlin7/05-Software-Support/ansys/#ansys-software-in-pmodules","title":"ANSYS software in Pmodules","text":"<p>The ANSYS software can be loaded through PModules.</p> <p>The default ANSYS versions are loaded from the central PModules repository.</p> <p>However, we provide local installations on Merlin7 which are needed mainly for some ANSYS packages, like Ansys RSM. Due to this, and also to improve the interactive experience of the user, ANSYS has been also installed in the Merlin high performance storage and we have made it available from Pmodules.</p>"},{"location":"merlin7/05-Software-Support/ansys/#loading-merlin7-ansys","title":"Loading Merlin7 ANSYS","text":"Bash<pre><code>module purge\nmodule use unstable # Optional\nmodule search ANSYS\n\n# Load the proper ANSYS version, in example for 2025R2\nmodule load ANSYS/2025R2\n</code></pre> [Example] Loading ANSYS from the Merlin7 PModules repository <pre>\ud83d\udd25 [caubet_m@login001:~]# module purge\n\ud83d\udd25 [caubet_m@login001:~]# module use unstable\n\ud83d\udd25 [caubet_m@login001:~]# module use deprecated\n\n\ud83d\udd25 [caubet_m@login001:~]# module search ANSYS\n(Re-)building the Pmodules cache. Please be patient ...\nDone ...\n\nModule Rel.stage Group Overlay Requires\n--------------------------------------------------------------------------------\nANSYS/2022R2 deprecated Tools merlin\nANSYS/2023R2 deprecated Tools merlin\nANSYS/2024R2 stable Tools merlin\nANSYS/2025R2 stable Tools merlin\n</pre> <p>Tip</p> <p>Older ANSYS releases, 2022R2 and 2023R2 are <code>deprecated</code>. Please always run ANSYS/2024R2 or superior.</p>"},{"location":"merlin7/05-Software-Support/ansys/#ansys-documentation-by-product","title":"ANSYS Documentation by product","text":""},{"location":"merlin7/05-Software-Support/ansys/#ansys-rsm","title":"ANSYS RSM","text":"<p>ANSYS Remote Solve Manager (RSM) is used by ANSYS Workbench to submit computational jobs to HPC clusters directly from Workbench on your desktop. Therefore, PSI workstations with direct access to Merlin can submit jobs by using RSM.</p> <p>For further information, please visit the ANSYS RSM section.</p>"},{"location":"merlin7/05-Software-Support/ansys/#ansys-fluent","title":"ANSYS Fluent","text":"<p>ANSYS Fluent is not currently documented for Merlin7. Please refer to the Merlin6 documentation for information about ANSYS Fluent on Merlin6.</p>"},{"location":"merlin7/05-Software-Support/ansys/#ansys-cfx","title":"ANSYS CFX","text":"<p>ANSYS CFX is not currently documented for Merlin7. Please refer to the Merlin6 documentation for information about ANSYS CFX on Merlin6.</p>"},{"location":"merlin7/05-Software-Support/ansys/#ansys-mapdl","title":"ANSYS MAPDL","text":"<p>ANSYS MAPDL is not currently documented for Merlin7. Please refer to the Merlin6 documentation for information about ANSYS MAPDL on Merlin6.</p>"},{"location":"merlin7/05-Software-Support/cp2k/","title":"CP2k","text":""},{"location":"merlin7/05-Software-Support/cp2k/#cp2k","title":"CP2k","text":""},{"location":"merlin7/05-Software-Support/cp2k/#cp2k_1","title":"CP2k","text":"<p>CP2K is a quantum chemistry and solid state physics software package that can perform atomistic simulations of solid state, liquid, molecular, periodic, material, crystal, and biological systems.</p> <p>CP2K provides a general framework for different modeling methods such as DFT using the mixed Gaussian and plane waves approaches GPW and GAPW. Supported theory levels include DFTB, LDA, GGA, MP2, RPA, semi-empirical methods (AM1, PM3, PM6, RM1, MNDO, \u2026), and classical force fields (AMBER, CHARMM, \u2026). CP2K can do simulations of molecular dynamics, metadynamics, Monte Carlo, Ehrenfest dynamics, vibrational analysis, core level spectroscopy, energy minimization, and transition state optimization using NEB or dimer method</p>"},{"location":"merlin7/05-Software-Support/cp2k/#licensing-terms-and-conditions","title":"Licensing Terms and Conditions","text":"<p>CP2k is a joint effort, with contributions from developers around the world: users agree to acknowledge use of CP2k in any reports or publications of results obtained with the Software (see CP2k Homepage for details).</p>"},{"location":"merlin7/05-Software-Support/cp2k/#how-to-run-on-merlin7","title":"How to run on Merlin7","text":""},{"location":"merlin7/05-Software-Support/cp2k/#cpu-nodes","title":"CPU nodes","text":"Bash<pre><code>module use unstable Spack\nmodule load gcc/12.3 openmpi/5.0.8-hgej cp2k/2025.2-yb6g-omp\n</code></pre>"},{"location":"merlin7/05-Software-Support/cp2k/#a100-nodes","title":"A100 nodes","text":"Bash<pre><code>module use unstable Spack\nmodule load gcc/12.3 openmpi/5.0.8-r5lz-A100-gpu cp2k/2025.2-hkub-A100-gpu-omp\n</code></pre>"},{"location":"merlin7/05-Software-Support/cp2k/#gh-nodes","title":"GH nodes","text":"Bash<pre><code>module use unstable Spack\nmodule load gcc/12.3 openmpi/5.0.8-tx2w-GH200-gpu cp2k/2025.2-xk4q-GH200-gpu-omp\n</code></pre>"},{"location":"merlin7/05-Software-Support/cp2k/#sbatch-cpu-4-mpi-ranks-16-omp-threads","title":"SBATCH CPU, 4 MPI ranks, 16 OMP threads","text":"Bash<pre><code>#!/bin/bash\n#SBATCH --time=00:10:00 # maximum execution time of 10 minutes\n#SBATCH --nodes=1 # requesting 1 compute node\n#SBATCH --ntasks=4 # use 4 MPI rank (task)\n#SBATCH --partition=hourly\n#SBATCH --cpus-per-task=16 # modify this number of CPU cores per MPI task\n#SBATCH --output=_scheduler-stdout.txt\n#SBATCH --error=_scheduler-stderr.txt\n\nunset PMODULES_ENV\nmodule purge\nmodule use unstable Spack\nmodule load gcc/12.3 openmpi/5.0.8-hgej cp2k/2025.2-yb6g-omp\n\nexport FI_CXI_RX_MATCH_MODE=software\nexport OMP_NUM_THREADS=$((SLURM_CPUS_PER_TASK - 1))\n\nsrun cp2k.psmp -i <CP2K_INPUT> -o <CP2K_OUTPUT>\n</code></pre>"},{"location":"merlin7/05-Software-Support/cp2k/#sbatch-a100-4-gpu-16-omp-threads-4-mpi-ranks","title":"SBATCH A100, 4 GPU, 16 OMP threads, 4 MPI ranks","text":"Bash<pre><code>#!/bin/bash\n#SBATCH --time=00:10:00 # maximum execution time of 10 minutes\n#SBATCH --output=_scheduler-stdout.txt\n#SBATCH --error=_scheduler-stderr.txt\n#SBATCH --nodes=1 # number of A100 nodes\n#SBATCH --ntasks-per-node=4 # 4 MPI ranks per node\n#SBATCH --cpus-per-task=16 # 16 OMP threads per MPI rank\n#SBATCH --cluster=gmerlin7\n#SBATCH --hint=nomultithread\n#SBATCH --partition=a100-hourly\n#SBATCH --gpus=4\n\nunset PMODULES_ENV\nmodule purge\nmodule use unstable Spack\nmodule load gcc/12.3 openmpi/5.0.8-r5lz-A100-gpu cp2k/2025.2-hkub-A100-gpu-omp\n\nexport FI_CXI_RX_MATCH_MODE=software\nexport OMP_NUM_THREADS=$((SLURM_CPUS_PER_TASK - 1))\n\nsrun cp2k.psmp -i <CP2K_INPUT> -o <CP2K_OUTPUT>\n</code></pre>"},{"location":"merlin7/05-Software-Support/cp2k/#sbatch-gh-2-gpu-18-omp-threads-2-mpi-ranks","title":"SBATCH GH, 2 GPU, 18 OMP threads, 2 MPI ranks","text":"Bash<pre><code>#!/bin/bash\n#SBATCH --time=00:10:00 # maximum execution time of 10 minutes\n#SBATCH --output=_scheduler-stdout.txt\n#SBATCH --error=_scheduler-stderr.txt\n#SBATCH --nodes=1 # number of GH200 nodes with each node having 4 CPU+GPU\n#SBATCH --ntasks-per-node=2 # 2 MPI ranks per node\n#SBATCH --cpus-per-task=18 # 18 OMP threads per MPI rank\n#SBATCH --cluster=gmerlin7\n#SBATCH --hint=nomultithread\n#SBATCH --partition=gh-hourly\n#SBATCH --gpus=2\n\nunset PMODULES_ENV\nmodule purge\nmodule use unstable Spack\nmodule load gcc/12.3 openmpi/5.0.8-tx2w-GH200-gpu cp2k/2025.2-xk4q-GH200-gpu-omp\n\nexport FI_CXI_RX_MATCH_MODE=software\nexport OMP_NUM_THREADS=$((SLURM_CPUS_PER_TASK - 1))\n\nsrun cp2k.psmp -i <CP2K_INPUT> -o <CP2K_OUTPUT>\n</code></pre>"},{"location":"merlin7/05-Software-Support/cp2k/#developing-your-own-cpu-code","title":"Developing your own CPU code","text":"Bash<pre><code>module purge\nmodule use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.8-hgej dbcsr/2.8.0-4yld-omp openblas/0.3.30-gye6-omp netlib-scalapack/2.2.2-2trj libxsmm/1.17-hwwi libxc/7.0.0-mibp libint/2.11.1-nxhl hdf5/1.14.6-tgzo fftw/3.3.10-t7bo-omp py-fypp/3.1-bteo sirius/7.8.0-uh3i-omp cmake/3.31.8-j47l ninja/1.12.1-afxy\n\ngit clone https://github.com/cp2k/cp2k.git\ncd cp2k\n\nmkdir build && cd build\nCC=mpicc CXX=mpic++ FC=mpifort cmake -GNinja -DCMAKE_CUDA_HOST_COMPILER=mpicc -DCP2K_USE_LIBXC=ON -DCP2K_USE_LIBINT2=ON -DCP2K_USE_SIRIUS=ON -DCP2K_USE_SPLA=ON -DCP2K_USE_SPGLIB=ON -DCP2K_USE_HDF5=ON -DCP2K_USE_FFTW3=ON ..\nninja -j 16\n</code></pre>"},{"location":"merlin7/05-Software-Support/cp2k/#developing-your-own-gpu-code","title":"Developing your own GPU code","text":""},{"location":"merlin7/05-Software-Support/cp2k/#a100","title":"A100","text":"Bash<pre><code>module purge\nmodule use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.8-r5lz-A100-gpu dbcsr/2.8.0-3r22-A100-gpu-omp cosma/2.7.0-y2tr-gpu cuda/12.6.0-3y6a dftd4/3.7.0-4k4c-omp elpa/2025.01.002-bovg-A100-gpu-omp fftw/3.3.10-syba-omp hdf5/1.14.6-pcsd libint/2.11.1-3lxv libxc/7.0.0-u556 libxsmm/1.17-2azz netlib-scalapack/2.2.2-rmcf openblas/0.3.30-ynou-omp plumed/2.9.2-47hk py-fypp/3.1-z25p py-numpy/2.3.2-45ay python/3.13.5-qivs sirius/develop-qz4c-A100-gpu-omp spglib/2.5.0-jl5l-omp spla/1.6.1-hrgf-gpu cmake/3.31.8-j47l ninja/1.12.1-afxy\n\ngit clone <https://github.com/cp2k/cp2k.git>\ncd cp2k\n\nmkdir build && cd build\nCC=mpicc CXX=mpic++ FC=mpifort cmake -GNinja -DCMAKE_CUDA_HOST_COMPILER=mpicc -DCP2K_USE_LIBXC=ON -DCP2K_USE_LIBINT2=ON -DCP2K_USE_SPGLIB=ON -DCP2K_USE_ELPA=ON -DCP2K_USE_SPLA=ON -DCP2K_USE_SIRIUS=ON -DCP2K_USE_PLUMED=ON -DCP2K_USE_DFTD4=ON -DCP2K_USE_COSMA=ON -DCP2K_USE_ACCEL=CUDA -DCMAKE_CUDA_ARCHITECTURES=80 -DCP2K_USE_FFTW3=ON ..\nninja -j 16\n</code></pre>"},{"location":"merlin7/05-Software-Support/cp2k/#gh200","title":"GH200","text":"Bash<pre><code>salloc --partition=gh-daily --clusters=gmerlin7 --time=08:00:00 --ntasks=4 --nodes=1 --gpus=1 --mem=40000 $SHELL\nssh <allocated_gpu>\nmodule purge\nmodule use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.8-tx2w-GH200-gpu dbcsr/2.8.0-h3bo-GH200-gpu-omp cosma/2.7.0-dc23-gpu cuda/12.6.0-wak5 dbcsr/2.8.0-h3bo-GH200-gpu-omp dftd4/3.7.0-aa6l-omp elpa/2025.01.002-nybd-GH200-gpu-omp fftw/3.3.10-alp3-omp hdf5/1.14.6-qjob libint/2.11.1-dpqq libxc/7.0.0-ojgl netlib-scalapack/2.2.2-cj5m openblas/0.3.30-rv46-omp plumed/2.9.2-nbay py-fypp/3.1-j4yw py-numpy/2.3.2-yoqr python/3.13.5-xbg5 sirius/develop-v5tb-GH200-gpu-omp spglib/2.5.0-da2i-omp spla/1.6.1-uepy-gpu cmake/3.31.8-2jne ninja/1.13.0-xn4a\n\ngit clone https://github.com/cp2k/cp2k.git\ncd cp2k\n\nmkdir build && cd build\nCC=mpicc CXX=mpic++ FC=mpifort cmake -GNinja -DCMAKE_CUDA_HOST_COMPILER=mpicc -DCP2K_USE_LIBXC=ON -DCP2K_USE_LIBINT2=ON -DCP2K_USE_SPGLIB=ON -DCP2K_USE_ELPA=ON -DCP2K_USE_SPLA=ON -DCP2K_USE_SIRIUS=ON -DCP2K_USE_PLUMED=ON -DCP2K_USE_DFTD4=ON -DCP2K_USE_COSMA=ON -DCP2K_USE_ACCEL=CUDA -DCMAKE_CUDA_ARCHITECTURES=90 -DCP2K_USE_FFTW3=ON -DCP2K_USE_HDF5=ON ..\nninja -j 16\n</code></pre>"},{"location":"merlin7/05-Software-Support/cray-module.env/","title":"Cray Programming Environment","text":""},{"location":"merlin7/05-Software-Support/cray-module.env/#cray-programming-environment","title":"Cray Programming Environment","text":""},{"location":"merlin7/05-Software-Support/cray-module.env/#loading-the-cray-module","title":"Loading the Cray module","text":"<p>The Cray Programming Environment, with Cray's compilers and MPI, is not loaded by default.</p> <p>To load it, one has to run the following command:</p> Bash<pre><code>module load cray\n</code></pre> <p>The Cray Programming Environment will load all the necessary dependencies. In example:</p> Bash<pre><code>\ud83d\udd25 [caubet_m@login001:~]# module list\nCurrently Loaded Modules:\n 1) craype-x86-rome 2) libfabric/1.15.2.0\n 3) craype-network-ofi\n 4) xpmem/2.9.6-1.1_20240510205610__g087dc11fc19d 5) PrgEnv-cray/8.5.0\n 6) cce/17.0.0 7) cray-libsci/23.12.5\n 8) cray-mpich/8.1.28 9) craype/2.7.30\n10) perftools-base/23.12.0 11) cpe/23.12\n12) cray/23.12\n</code></pre> <p>You will notice an unfamiliar <code>PrgEnv-cray/8.5.0</code> that was loaded. This is a meta-module that Cray provides to simplify the switch of compilers and their associated dependencies and libraries, as a whole called Programming Environment. In the Cray Programming Environment, there are 4 key modules.</p> <ul> <li><code>cray-libsci</code> is a collection of numerical routines tuned for performance on Cray systems.</li> <li><code>libfabric</code> is an important low-level library that allows you to take advantage of the high performance Slingshot network.</li> <li><code>cray-mpich</code> is a CUDA-aware MPI implementation, optimized for Cray systems.</li> <li><code>cce</code> is the compiler from Cray. C/C++ compilers are based on Clang/LLVM while Fortran supports Fortran 2018 standard. More info: https://user.cscs.ch/computing/compilation/cray/</li> </ul> <p>You can switch between different programming environments. You can check the available module with the <code>module avail</code> command, as follows:</p> Bash<pre><code>\ud83d\udd25 [caubet_m@login001:~]# module avail PrgEnv\n--------------------- /opt/cray/pe/lmod/modulefiles/core ---------------------\n\nPrgEnv-cray/8.5.0 PrgEnv-gnu/8.5.0\nPrgEnv-nvhpc/8.5.0 PrgEnv-nvidia/8.5.0\n</code></pre>"},{"location":"merlin7/05-Software-Support/cray-module.env/#switching-compiler-suites","title":"Switching compiler suites","text":"<p>Compiler suites can be exchanged with PrgEnv (Programming Environments) provided by HPE-Cray. The wrappers call the correct compiler with appropriate options to build and link applications with relevant libraries, as required by the loaded modules (only dynamic linking is supported) and therefore should replace direct calls to compiler drivers in Makefiles and build scripts.</p> <p>To swap the the compiler suite from the default Cray to GNU compiler, one can run the following.</p> Bash<pre><code>\ud83d\udd25 [caubet_m@login001:~]# module swap PrgEnv-cray/8.5.0 PrgEnv-gnu/8.5.0\n\nLmod is automatically replacing \"cce/17.0.0\" with \"gcc-native/12.3\".\n</code></pre>"},{"location":"merlin7/05-Software-Support/gromacs/","title":"GROMACS","text":""},{"location":"merlin7/05-Software-Support/gromacs/#gromacs","title":"GROMACS","text":""},{"location":"merlin7/05-Software-Support/gromacs/#gromacs_1","title":"GROMACS","text":"<p>GROMACS (GROningen Machine for Chemical Simulations) is a versatile and widely-used open source package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.</p> <p>It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.)</p>"},{"location":"merlin7/05-Software-Support/gromacs/#licensing-terms-and-conditions","title":"Licensing Terms and Conditions","text":"<p>GROMACS is a joint effort, with contributions from developers around the world: users agree to acknowledge use of GROMACS in any reports or publications of results obtained with the Software (see GROMACS Homepage for details).</p>"},{"location":"merlin7/05-Software-Support/gromacs/#how-to-run-on-merlin7","title":"How to run on Merlin7","text":""},{"location":"merlin7/05-Software-Support/gromacs/#20252","title":"2025.2","text":""},{"location":"merlin7/05-Software-Support/gromacs/#cpu-nodes","title":"CPU nodes","text":"Bash<pre><code>module use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.7-ax23-A100-gpu gromacs/2025.2-whcq-omp\n</code></pre>"},{"location":"merlin7/05-Software-Support/gromacs/#a100-nodes","title":"A100 nodes","text":"Bash<pre><code>module use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.7-3vzj-A100-gpu gromacs/2025.2-vbj4-A100-gpu-omp\n</code></pre>"},{"location":"merlin7/05-Software-Support/gromacs/#gh-nodes","title":"GH nodes","text":"Bash<pre><code>module use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.7-blxc-GH200-gpu gromacs/2025.2-cjnq-GH200-gpu-omp\n</code></pre>"},{"location":"merlin7/05-Software-Support/gromacs/#20253","title":"2025.3","text":""},{"location":"merlin7/05-Software-Support/gromacs/#cpu-nodes_1","title":"CPU nodes","text":"Bash<pre><code>module use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.9-n4yf-A100-gpu gromacs/2025.3-6ken-omp\n</code></pre>"},{"location":"merlin7/05-Software-Support/gromacs/#a100-nodes_1","title":"A100 nodes","text":"Bash<pre><code>module use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.9-xqhy-A100-gpu gromacs/2025.3-ohlj-A100-gpu-omp\n</code></pre>"},{"location":"merlin7/05-Software-Support/gromacs/#gh-nodes_1","title":"GH nodes","text":"Bash<pre><code>module use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.9-inxi-GH200-gpu gromacs/2025.3-yqlu-GH200-gpu-omp\n</code></pre>"},{"location":"merlin7/05-Software-Support/gromacs/#sbatch-cpu-4-mpi-ranks-16-omp-threads","title":"SBATCH CPU, 4 MPI ranks, 16 OMP threads","text":"Bash<pre><code>#!/bin/bash\n#SBATCH --time=00:10:00 # maximum execution time of 10 minutes\n#SBATCH --nodes=1 # requesting 1 compute node\n#SBATCH --ntasks=4 # use 4 MPI rank (task)\n#SBATCH --partition=hourly\n#SBATCH --cpus-per-task=16 # modify this number of CPU cores per MPI task\n#SBATCH --output=_scheduler-stdout.txt\n#SBATCH --error=_scheduler-stderr.txt\n\nunset PMODULES_ENV\nmodule purge\nmodule use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.7-ax23-A100-gpu gromacs/2025.2-whcq-omp\n\nexport FI_CXI_RX_MATCH_MODE=software\n\n# Add your input (tpr) file in the command below\nsrun gmx_mpi grompp -f step6.0_minimization.mdp -o step6.0_minimization.tpr -c step5_input.gro -r step5_input.gro -p topol.top -n index.ndx\nsrun gmx_mpi mdrun -s step6.0_minimization.tpr -pin on -ntomp ${SLURM_CPUS_PER_TASK}\n</code></pre>"},{"location":"merlin7/05-Software-Support/gromacs/#sbatch-a100-4-gpu-16-omp-threads-4-mpi-ranks","title":"SBATCH A100, 4 GPU, 16 OMP threads, 4 MPI ranks","text":"Bash<pre><code>#!/bin/bash\n#SBATCH --time=00:10:00 # maximum execution time of 10 minutes\n#SBATCH --output=_scheduler-stdout.txt\n#SBATCH --error=_scheduler-stderr.txt\n#SBATCH --nodes=1 # number of GH200 nodes with each node having 4 CPU+GPU\n#SBATCH --ntasks-per-node=4 # 4 MPI ranks per node\n#SBATCH --cpus-per-task=16 # 16 OMP threads per MPI rank\n#SBATCH --cluster=gmerlin7\n#SBATCH --hint=nomultithread\n#SBATCH --partition=a100-hourly\n#SBATCH --gpus=4\n\nunset PMODULES_ENV\nmodule purge\nmodule use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.7-3vzj-A100-gpu gromacs/2025.2-vbj4-A100-gpu-omp\n\nexport FI_CXI_RX_MATCH_MODE=software\n\nexport GMX_GPU_DD_COMMS=true\nexport GMX_GPU_PME_PP_COMMS=true\nexport GMX_FORCE_UPDATE_DEFAULT_GPU=true\nexport GMX_ENABLE_DIRECT_GPU_COMM=1\nexport GMX_FORCE_GPU_AWARE_MPI=1\n\n# Add your input (tpr) file in the command below\nsrun gmx_mpi grompp -f step6.0_minimization.mdp -o step6.0_minimization.tpr -c step5_input.gro -r step5_input.gro -p topol.top -n index.ndx\nsrun gmx_mpi mdrun -s step6.0_minimization.tpr -ntomp ${SLURM_CPUS_PER_TASK}\n</code></pre>"},{"location":"merlin7/05-Software-Support/gromacs/#sbatch-gh-2-gpu-18-omp-threads-2-mpi-ranks","title":"SBATCH GH, 2 GPU, 18 OMP threads, 2 MPI ranks","text":"Bash<pre><code>#!/bin/bash\n#SBATCH --time=00:10:00 # maximum execution time of 10 minutes\n#SBATCH --output=_scheduler-stdout.txt\n#SBATCH --error=_scheduler-stderr.txt\n#SBATCH --nodes=1 # number of GH200 nodes with each node having 4 CPU+GPU\n#SBATCH --ntasks-per-node=2 # 2 MPI ranks per node\n#SBATCH --cpus-per-task=18 # 18 OMP threads per MPI rank\n#SBATCH --cluster=gmerlin7\n#SBATCH --hint=nomultithread\n#SBATCH --partition=gh-hourly\n#SBATCH --gpus=2\n\nunset PMODULES_ENV\nmodule purge\nmodule use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.7-blxc-GH200-gpu gromacs/2025.2-cjnq-GH200-gpu-omp\n\nexport FI_CXI_RX_MATCH_MODE=software\n\nexport GMX_GPU_DD_COMMS=true\nexport GMX_GPU_PME_PP_COMMS=true\nexport GMX_FORCE_UPDATE_DEFAULT_GPU=true\nexport GMX_ENABLE_DIRECT_GPU_COMM=1\nexport GMX_FORCE_GPU_AWARE_MPI=1\n\n# Add your input (tpr) file in the command below\nsrun gmx_mpi grompp -f step6.0_minimization.mdp -o step6.0_minimization.tpr -c step5_input.gro -r step5_input.gro -p topol.top -n index.ndx\nsrun gmx_mpi mdrun -s step6.0_minimization.tpr -ntomp ${SLURM_CPUS_PER_TASK}\n</code></pre>"},{"location":"merlin7/05-Software-Support/gromacs/#developing-your-own-gpu-code","title":"Developing your own GPU code","text":""},{"location":"merlin7/05-Software-Support/gromacs/#a100","title":"A100","text":"Bash<pre><code>module purge\nmodule use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.7-3vzj-A100-gpu gromacs/2025.2-vbj4-A100-gpu-omp cmake/3.31.6-o3lb python/3.13.1-cyro\n\ngit clone https://github.com/gromacs/gromacs.git\ncd gromacs\n\nmkdir build && cd build\ncmake -DCMAKE_C_COMPILER=gcc-12 \\\n -DCMAKE_CXX_COMPILER=g++-12 \\\n -DGMX_MPI=on \\\n -DGMX_GPU=CUDA \\\n -GMX_CUDA_TARGET_SM=\"80\" \\ # 90 for the Hopper GPUs\n -DGMX_DOUBLE=off \\ # turn on double precision only if useful\n ..\n\nmake\n</code></pre>"},{"location":"merlin7/05-Software-Support/ippl/","title":"IPPL","text":""},{"location":"merlin7/05-Software-Support/ippl/#ippl","title":"IPPL","text":""},{"location":"merlin7/05-Software-Support/ippl/#ippl_1","title":"IPPL","text":"<p>Independent Parallel Particle Layer (IPPL) is a performance portable C++ library for Particle-Mesh methods. IPPL makes use of Kokkos (https://github.com/kokkos/kokkos), HeFFTe (https://github.com/icl-utk-edu/heffte), and MPI (Message Passing Interface) to deliver a portable, massively parallel toolkit for particle-mesh methods. IPPL supports simulations in one to six dimensions, mixed precision, and asynchronous execution in different execution spaces (e.g. CPUs and GPUs).</p>"},{"location":"merlin7/05-Software-Support/ippl/#licensing-terms-and-conditions","title":"Licensing Terms and Conditions","text":"<p>GNU GPLv3</p>"},{"location":"merlin7/05-Software-Support/ippl/#how-to-run-on-merlin7","title":"How to run on Merlin7","text":""},{"location":"merlin7/05-Software-Support/ippl/#a100-nodes","title":"A100 nodes","text":"Bash<pre><code>module use Spack unstable\nmodule load gcc/13.2.0 openmpi/5.0.7-dnpr-A100-gpu boost/1.82.0-lgrt fftw/3.3.10.6-zv2b-omp googletest/1.14.0-msmu h5hut/2.0.0rc7-zy7s openblas/0.3.29-zkwb cmake/3.31.6-ufy7\n\ncd <path to IPPL source directory>\nmkdir build_gpu\ncd build_gpu\n\ncmake -DCMAKE_BUILD_TYPE=Release -DKokkos_ARCH_AMPERE80=ON -DCMAKE_CXX_STANDARD=20 -DIPPL_ENABLE_FFT=ON -DIPPL_ENABLE_TESTS=ON -DUSE_ALTERNATIVE_VARIANT=ON -DIPPL_ENABLE_SOLVERS=ON -DIPPL_ENABLE_ALPINE=True -DIPPL_PLATFORMS=cuda ..\nmake [-jN]\n</code></pre>"},{"location":"merlin7/05-Software-Support/ippl/#gh-nodes","title":"GH nodes","text":"Bash<pre><code>salloc --partition=gh-daily --clusters=gmerlin7 --time=08:00:00 --ntasks=4 --nodes=1 --gpus=1 --mem=40000 $SHELL\nssh <allocated_gpu>\n\nmodule use Spack unstable\nmodule load gcc/13.2.0 openmpi/5.0.3-3lmi-GH200-gpu\nmodule load boost/1.82.0-3ns6 fftw/3.3.10 gnutls/3.8.3 googletest/1.14.0 gsl/2.7.1 h5hut/2.0.0rc7 openblas/0.3.26 cmake/3.31.4-u2nm\n\ncd <path to IPPL source directory>\nmkdir build_gh\ncd build_gh\n\ncmake -DCMAKE_BUILD_TYPE=Release -DKokkos_ARCH_HOPPER90=ON -DCMAKE_CXX_STANDARD=20 -DIPPL_ENABLE_FFT=ON -DIPPL_ENABLE_TESTS=ON -DUSE_ALTERNATIVE_VARIANT=ON -DIPPL_ENABLE_SOLVERS=ON -DIPPL_ENABLE_ALPINE=True -DIPPL_PLATFORMS=cuda ..\nmake [-jN]\n</code></pre>"},{"location":"merlin7/05-Software-Support/lammps/","title":"LAMMPS","text":""},{"location":"merlin7/05-Software-Support/lammps/#lammps","title":"LAMMPS","text":""},{"location":"merlin7/05-Software-Support/lammps/#lammps_1","title":"LAMMPS","text":"<p>LAMMPS is a classical molecular dynamics code that models an ensemble of particles in a liquid, solid, or gaseous state. It can model atomic, polymeric, biological, metallic, granular, and coarse-grained systems using a variety of force fields and boundary conditions. The current version of LAMMPS is written in C++.</p>"},{"location":"merlin7/05-Software-Support/lammps/#licensing-terms-and-conditions","title":"Licensing Terms and Conditions","text":"<p>LAMMPS is an open-source code, available free-of-charge, and distributed under the terms of the GNU Public License Version 2 (GPLv2), which means you can use or modify the code however you wish for your own purposes, but have to adhere to certain rules when redistributing it - specifically in binary form - or are distributing software derived from it or that includes parts of it.</p> <p>LAMMPS comes with no warranty of any kind.</p> <p>As each source file states in its header, it is a copyrighted code, and thus not in the public domain. For more information about open-source software and open-source distribution, see www.gnu.org or www.opensource.org. The legal text of the GPL as it applies to LAMMPS is in the LICENSE file included in the LAMMPS distribution.</p> <p>Here is a more specific summary of what the GPL means for LAMMPS users:</p> <p>(1) Anyone is free to use, copy, modify, or extend LAMMPS in any way they choose, including for commercial purposes.</p> <p>(2) If you distribute a modified version of LAMMPS, it must remain open-source, meaning you are required to distribute all of it under the terms of the GPLv2. You should clearly annotate such a modified code as a derivative version of LAMMPS. This is best done by changing the name (example: LIGGGHTS is such a modified and extended version of LAMMPS).</p> <p>(3) If you release any code that includes or uses LAMMPS source code, then it must also be open-sourced, meaning you distribute it under the terms of the GPLv2. You may write code that interfaces LAMMPS to a differently licensed library. In that case the code that provides the interface must be licensed GPLv2, but not necessarily that library unless you are distributing binaries that require the library to run.</p> <p>(4) If you give LAMMPS files to someone else, the GPLv2 LICENSE file and source file headers (including the copyright and GPLv2 notices) should remain part of the code.</p>"},{"location":"merlin7/05-Software-Support/lammps/#how-to-run-on-merlin7","title":"How to run on Merlin7","text":""},{"location":"merlin7/05-Software-Support/lammps/#cpu-nodes","title":"CPU nodes","text":"Bash<pre><code>module use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.8-jsrx-A100-gpu lammps/20250722-37gs-omp\n</code></pre>"},{"location":"merlin7/05-Software-Support/lammps/#a100-nodes","title":"A100 nodes","text":"Bash<pre><code>module use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.8-jsrx-A100-gpu lammps/20250722-xcaf-A100-gpu-omp\n</code></pre>"},{"location":"merlin7/05-Software-Support/lammps/#gh-nodes","title":"GH nodes","text":"Bash<pre><code>module use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.8-fvlo-GH200-gpu lammps/20250722-3tfv-GH200-gpu-omp\n</code></pre>"},{"location":"merlin7/05-Software-Support/lammps/#sbatch-cpu-4-mpi-ranks-16-omp-threads","title":"SBATCH CPU, 4 MPI ranks, 16 OMP threads","text":"Bash<pre><code>#!/bin/bash\n#SBATCH --time=00:10:00 # maximum execution time of 10 minutes\n#SBATCH --nodes=1 # requesting 1 compute node\n#SBATCH --ntasks=4 # use 4 MPI rank (task)\n#SBATCH --partition=hourly\n#SBATCH --cpus-per-task=16 # modify this number of CPU cores per MPI task\n#SBATCH --output=_scheduler-stdout.txt\n#SBATCH --error=_scheduler-stderr.txt\n\nunset PMODULES_ENV\nmodule purge\nmodule use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.8-jsrx-A100-gpu lammps/20250722-37gs-omp\n\nexport FI_CXI_RX_MATCH_MODE=software\nexport OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK\nexport OMP_PROC_BIND=spread\nexport OMP_PLACES=threads\n\nsrun --cpu-bind=cores lmp -k on t $OMP_NUM_THREADS -sf kk -in lj_kokkos.in\n</code></pre>"},{"location":"merlin7/05-Software-Support/lammps/#sbatch-a100-4-gpu-16-omp-threads-4-mpi-ranks","title":"SBATCH A100, 4 GPU, 16 OMP threads, 4 MPI ranks","text":"Bash<pre><code>#!/bin/bash\n#SBATCH --time=00:10:00 # maximum execution time of 10 minutes\n#SBATCH --output=_scheduler-stdout.txt\n#SBATCH --error=_scheduler-stderr.txt\n#SBATCH --nodes=1 # number of GH200 nodes with each node having 4 CPU+GPU\n#SBATCH --ntasks-per-node=4 # 4 MPI ranks per node\n#SBATCH --cluster=gmerlin7\n#SBATCH --hint=nomultithread\n#SBATCH --partition=a100-hourly\n#SBATCH --gpus-per-task=1\n\nunset PMODULES_ENV\nmodule purge\nmodule use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.8-jsrx-A100-gpu lammps/20250722-xcaf-A100-gpu-omp\n\nexport FI_CXI_RX_MATCH_MODE=software\n\nsrun lmp -in lj_kokkos.in -k on g ${SLURM_GPUS_PER_TASK} -sf kk -pk kokkos gpu/aware on\n</code></pre>"},{"location":"merlin7/05-Software-Support/lammps/#sbatch-gh-2-gpu-18-omp-threads-2-mpi-ranks","title":"SBATCH GH, 2 GPU, 18 OMP threads, 2 MPI ranks","text":"Bash<pre><code>#!/bin/bash\n#SBATCH --time=00:10:00 # maximum execution time of 10 minutes\n#SBATCH --output=_scheduler-stdout.txt\n#SBATCH --error=_scheduler-stderr.txt\n#SBATCH --nodes=1 # number of GH200 nodes with each node having 4 CPU+GPU\n#SBATCH --ntasks-per-node=2 # 2 MPI ranks per node\n#SBATCH --cluster=gmerlin7\n#SBATCH --hint=nomultithread\n#SBATCH --partition=gh-hourly\n#SBATCH --gpus-per-task=1\n\nunset PMODULES_ENV\nmodule purge\nmodule use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.8-fvlo-GH200-gpu lammps/20250722-3tfv-GH200-gpu-omp\n\nexport FI_CXI_RX_MATCH_MODE=software\n\nsrun lmp -in lj_kokkos.in -k on g ${SLURM_GPUS_PER_TASK} -sf kk -pk kokkos gpu/aware on\n</code></pre>"},{"location":"merlin7/05-Software-Support/opal-x/","title":"OPAL-X","text":""},{"location":"merlin7/05-Software-Support/opal-x/#opal-x","title":"OPAL-X","text":""},{"location":"merlin7/05-Software-Support/opal-x/#opal","title":"OPAL","text":"<p>OPAL (Object Oriented Particle Accelerator Library) is an open source C++ framework for general particle accelerator simulations including 3D space charge, short range wake fields and particle matter interaction.</p>"},{"location":"merlin7/05-Software-Support/opal-x/#licensing-terms-and-conditions","title":"Licensing Terms and Conditions","text":"<p>GNU GPLv3</p>"},{"location":"merlin7/05-Software-Support/opal-x/#how-to-run-on-merlin7","title":"How to run on Merlin7","text":""},{"location":"merlin7/05-Software-Support/opal-x/#a100-nodes","title":"A100 nodes","text":"Bash<pre><code>module purge\nmodule use Spack unstable\nmodule load gcc/13.2.0 openmpi/5.0.7-dnpr-A100-gpu opal-x/master-cbgs-A100-gpu\n</code></pre>"},{"location":"merlin7/05-Software-Support/opal-x/#gh-nodes","title":"GH nodes","text":"Bash<pre><code>module purge\nmodule use Spack unstable\nmodule load gcc/13.2.0 openmpi/5.0.7-z3y6-GH200-gpu opal-x/master-v6v2-GH200-gpu\n</code></pre>"},{"location":"merlin7/05-Software-Support/opal-x/#developing-your-own-code","title":"Developing your own code","text":""},{"location":"merlin7/05-Software-Support/opal-x/#a100-nodes_1","title":"A100 nodes","text":"Bash<pre><code>module purge\nmodule use Spack unstable\nmodule load gcc/13.2.0 openmpi/5.0.7-dnpr-A100-gpu\nmodule load boost/1.82.0-lgrt fftw/3.3.10.6-zv2b-omp gnutls/3.8.9-mcdr googletest/1.14.0-msmu gsl/2.7.1-hxwy h5hut/2.0.0rc7-zy7s openblas/0.3.29-zkwb cmake/3.31.6-oe7u\ngit clone https://github.com/OPALX-project/OPALX.git opal-x\ncd opal-x\n./gen_OPALrevision\n\nmkdir build_gpu\ncd build_gpu\n\ncmake -DCMAKE_BUILD_TYPE=Release -DKokkos_ARCH_AMPERE80=ON -DCMAKE_CXX_STANDARD=20 -DIPPL_ENABLE_FFT=ON -DIPPL_ENABLE_TESTS=OFF -DIPPL_ENABLE_SOLVERS=ON -DIPPL_ENABLE_ALPINE=True -DIPPL_PLATFORMS=cuda ..\nmake [-jN]\n</code></pre>"},{"location":"merlin7/05-Software-Support/opal-x/#gh-nodes_1","title":"GH nodes","text":"Bash<pre><code>salloc --partition=gh-daily --clusters=gmerlin7 --time=08:00:00 --ntasks=4 --nodes=1 --gpus=1 --mem=40000 $SHELL\nssh <allocated_gpu>\n\nmodule purge\nmodule use Spack unstable\nmodule load gcc/13.2.0 openmpi/5.0.7-z3y6-GH200-gpu\nmodule load boost/1.82.0-znbt fftw/3.3.10-jctz gnutls/3.8.9-rtrg googletest/1.15.2-odox gsl/2.7.1-j2dk h5hut/2.0.0rc7-k63k openblas/0.3.29-d3m2 cmake/3.31.4-u2nm\n\ngit clone https://github.com/OPALX-project/OPALX.git opal-x\ncd opal-x\n./gen_OPALrevision\nmkdir build_gh\ncd build_gh\n\ncmake -DCMAKE_BUILD_TYPE=Release -DKokkos_ARCH_HOPPER90=ON -DCMAKE_CXX_STANDARD=20 -DIPPL_ENABLE_FFT=ON -DIPPL_ENABLE_TESTS=OFF -DIPPL_ENABLE_SOLVERS=ON -DIPPL_ENABLE_ALPINE=OFF -DIPPL_PLATFORMS=cuda ..\nmake [-jN]\n</code></pre>"},{"location":"merlin7/05-Software-Support/openmpi/","title":"OpenMPI Support","text":""},{"location":"merlin7/05-Software-Support/openmpi/#openmpi-support","title":"OpenMPI Support","text":""},{"location":"merlin7/05-Software-Support/openmpi/#introduction","title":"Introduction","text":"<p>This document outlines the supported OpenMPI versions in the Merlin7 cluster.</p>"},{"location":"merlin7/05-Software-Support/openmpi/#openmpi-supported-versionso","title":"OpenMPI supported versionso","text":"<p>The Merlin cluster supports OpenMPI versions across three distinct stages: stable, unstable, and deprecated. Below is an overview of each stage:</p>"},{"location":"merlin7/05-Software-Support/openmpi/#stable","title":"Stable","text":"<p>Versions in the <code>stable</code> stage are fully functional, thoroughly tested, and officially supported by the Merlin administrators. These versions are available via PModules and Spack, ensuring compatibility and reliability for production use.</p>"},{"location":"merlin7/05-Software-Support/openmpi/#unstable","title":"Unstable","text":"<p>Versions in the <code>unstable</code> stage are available for testing and early access to new OpenMPI features. While these versions can be used, their compilation and configuration are subject to change before they are promoted to the <code>stable</code> stage. Administrators recommend caution when relying on <code>unstable</code> versions for critical workloads.</p>"},{"location":"merlin7/05-Software-Support/openmpi/#deprecated","title":"Deprecated","text":"<p>Versions in the <code>deprecated</code> stage are no longer supported by the Merlin administrators. Typically, these include versions no longer supported by the official OpenMPI project. While deprecated versions may still be available for use, their functionality cannot be guaranteed, and they will not receive updates or bug fixes.</p>"},{"location":"merlin7/05-Software-Support/openmpi/#using-srun-in-merlin7","title":"Using srun in Merlin7","text":"<p>In OpenMPI versions prior to 5.0.x, using <code>srun</code> for direct task launches was faster than <code>mpirun</code>. Although this is no longer the case, <code>srun</code> remains the recommended method due to its simplicity and ease of use.</p> <p>Key benefits of <code>srun</code>: * Automatically handles task binding to cores. * In general, requires less configuration compared to <code>mpirun</code>. * Best suited for most users, while <code>mpirun</code> is recommended only for advanced MPI configurations.</p> <p>Guidelines: * Always adapt your scripts to use srun before seeking support. * For any module-related issues, please contact the Merlin7 administrators.</p> <p>Example Usage: </p>Bash<pre><code>srun ./app\n</code></pre><p></p> <p>Tip</p> <p>Always run OpenMPI applications with <code>srun</code> for a seamless experience.</p>"},{"location":"merlin7/05-Software-Support/openmpi/#pmix-support-in-merlin7","title":"PMIx Support in Merlin7","text":"<p>Merlin7's SLURM installation includes support for multiple PMI types, including pmix. To view the available options, use the following command:</p> Bash<pre><code>\ud83d\udd25 [caubet_m@login001:~]# srun --mpi=list\nMPI plugin types are...\n none\n pmix\n pmi2\n cray_shasta\nspecific pmix plugin versions available: pmix_v5,pmix_v4,pmix_v3,pmix_v2\n</code></pre> <p>Important Notes: * For OpenMPI, always use <code>pmix</code> by specifying the appropriate version (<code>pmix_$version</code>). When loading an OpenMPI module (via PModules or Spack), the corresponding PMIx version will be automatically loaded. * Users do not need to manually manage PMIx compatibility.</p> <p>Warning</p> <p>PMI-2 is not supported in OpenMPI 5.0.0 or later releases. Despite this, pmi2 remains the default SLURM PMI type in Merlin7 as it is the officially supported type and maintains compatibility with other MPI implementations.</p>"},{"location":"merlin7/05-Software-Support/pmodules/","title":"PSI Modules","text":""},{"location":"merlin7/05-Software-Support/pmodules/#psi-modules","title":"PSI Modules","text":""},{"location":"merlin7/05-Software-Support/pmodules/#psi-environment-modules","title":"PSI Environment Modules","text":"<p>On top of the operating system stack we provide different software using the PSI developed PModule system.</p> <p>PModules is the official supported way and each package is deployed by a specific expert. Usually, in PModules software which is used by many people will be found.</p> <p>If you miss any package/versions or a software with a specific missing feature, contact us. We will study if is feasible or not to install it.</p>"},{"location":"merlin7/05-Software-Support/pmodules/#module-release-stages","title":"Module Release Stages","text":"<p>To ensure proper software lifecycle management, PModules uses three release stages: unstable, stable, and deprecated.</p> <ol> <li> <p>Unstable Release Stage:</p> <ul> <li>Contains experimental or under-development software versions.</li> <li> <p>Not visible to users by default. Use explicitly:</p> Bash<pre><code>module use unstable\n</code></pre> </li> <li> <p>Software is promoted to stable after validation.</p> </li> </ul> </li> <li> <p>Stable Release Stage:</p> <ul> <li>Default stage, containing fully tested and supported software versions.</li> <li>Recommended for all production workloads.</li> </ul> </li> <li> <p>Deprecated Release Stage:</p> <ul> <li>Contains software versions that are outdated or discontinued.</li> <li> <p>These versions are hidden by default but can be explicitly accessed:</p> Bash<pre><code>module use deprecated\n</code></pre> </li> <li> <p>Deprecated software can still be loaded directly without additional configuration to ensure user transparency.</p> </li> </ul> </li> </ol>"},{"location":"merlin7/05-Software-Support/pmodules/#pmodules-commands","title":"PModules commands","text":"<p>Below is listed a summary of common <code>module</code> commands:</p> Bash<pre><code>module use # show all available PModule Software Groups as well as Release Stages\nmodule avail # to see the list of available software packages provided via pmodules\nmodule use unstable # to get access to a set of packages not fully tested by the community\nmodule load <package>/<version> # to load specific software package with a specific version\nmodule search <string> # to search for a specific software package and its dependencies.\nmodule list # to list which software is loaded in your environment\nmodule purge # unload all loaded packages and cleanup the environment\n</code></pre> <p>Please refer to the external PSI Modules document for detailed information about the <code>module</code> command.</p>"},{"location":"merlin7/05-Software-Support/pmodules/#module-useunuse","title":"module use/unuse","text":"<p>Without any parameter, <code>use</code> lists all available PModule Software Groups and Release Stages.</p> Bash<pre><code>module use\n</code></pre> <p>When followed by a parameter, <code>use</code>/<code>unuse</code> invokes/uninvokes a PModule Software Group or Release Stage.</p> Bash<pre><code>module use EM # Invokes the 'EM' software group\nmodule unuse EM # Uninvokes the 'EM' software group\nmodule use unstable # Invokes the 'unstable' Release stable\nmodule unuse unstable # Uninvokes the 'unstable' Release stable\n</code></pre>"},{"location":"merlin7/05-Software-Support/pmodules/#module-avail","title":"module avail","text":"<p>This option lists all available PModule Software Groups and their packages.</p> <p>Please run <code>module avail --help</code> for further listing options.</p>"},{"location":"merlin7/05-Software-Support/pmodules/#module-search","title":"module search","text":"<p>This is used to search for software packages. By default, if no Release Stage or Software Group is specified in the options of the <code>module search</code> command, it will search from the already invoked Software Groups and Release Stages. Direct package dependencies will be also showed.</p> Bash<pre><code>\ud83d\udd25 [caubet_m@login001:~]# module search openmpi\n\nModule Rel.stage Group Overlay Requires\n--------------------------------------------------------------------------------\nopenmpi/4.1.6 stable Compiler Alps gcc/12.3.0\nopenmpi/4.1.6 stable Compiler Alps gcc/13.3.0\nopenmpi/4.1.6 stable Compiler Alps gcc/14.2.0\nopenmpi/4.1.6 stable Compiler Alps intelcc/22.2\nopenmpi/5.0.5 stable Compiler Alps gcc/8.5.0\nopenmpi/5.0.5 stable Compiler Alps gcc/12.3.0\nopenmpi/5.0.5 stable Compiler Alps gcc/14.2.0\nopenmpi/5.0.5 stable Compiler Alps intelcc/22.2\n</code></pre> <p>Please run <code>module search --help</code> for further search options.</p>"},{"location":"merlin7/05-Software-Support/pmodules/#module-loadunload","title":"module load/unload","text":"<p>This loads/unloads specific software packages. Packages might have direct dependencies that need to be loaded first. Other dependencies will be automatically loaded.</p> <p>In the example below, the <code>openmpi/5.0.5</code> package will be loaded, however <code>gcc/14.2.0</code> must be loaded as well as this is a strict dependency. Direct dependencies must be loaded in advance. Users can load multiple packages one by one or at once. This can be useful for instance when loading a package with direct dependencies.</p> Bash<pre><code># Single line\nmodule load gcc/14.2.0 openmpi/5.0.5\n\n# Multiple line\nmodule load gcc/14.2.0\nmodule load openmpi/5.0.5\n</code></pre>"},{"location":"merlin7/05-Software-Support/pmodules/#module-purge","title":"module purge","text":"<p>This command is an alternative to <code>module unload</code>, which can be used to unload all loaded module files.</p> Bash<pre><code>module purge\n</code></pre>"},{"location":"merlin7/05-Software-Support/pmodules/#requesting-new-pmodules-packages","title":"Requesting New PModules Packages","text":"<p>The PModules system is designed to accommodate the diverse software needs of Merlin7 users. Below are guidelines for requesting new software or versions to be added to PModules.</p>"},{"location":"merlin7/05-Software-Support/pmodules/#requesting-missing-software","title":"Requesting Missing Software","text":"<p>If a specific software package is not available in PModules and there is interest from multiple users:</p> <ul> <li>Contact Support: Let us know about the software, and we will assess its feasibility for deployment.</li> <li>Deployment Timeline: Adding new software to PModules typically takes a few days, depending on complexity and compatibility.</li> <li>User Involvement: If you are interested in maintaining the software package, please inform us. Collaborative maintenance helps ensure timely updates and support.</li> </ul>"},{"location":"merlin7/05-Software-Support/pmodules/#requesting-a-missing-version","title":"Requesting a Missing Version","text":"<p>If the currently available versions of a package do not meet your requirements:</p> <ul> <li>New Versions: Requests for newer versions are generally supported, especially if there is interest from multiple users.</li> <li>Intermediate Versions: Installation of intermediate versions (e.g., versions between the current stable and deprecated versions) can be considered if there is a strong justification, such as specific features or compatibility requirements.</li> </ul>"},{"location":"merlin7/05-Software-Support/pmodules/#general-notes","title":"General Notes","text":"<ul> <li>New packages or versions are prioritized based on their relevance and usage.</li> <li>For any request, providing detailed information about the required software or version (e.g., name, version, features) will help expedite the process.</li> </ul>"},{"location":"merlin7/05-Software-Support/quantum-espresso/","title":"Quantum Espresso","text":""},{"location":"merlin7/05-Software-Support/quantum-espresso/#quantum-espresso","title":"Quantum Espresso","text":""},{"location":"merlin7/05-Software-Support/quantum-espresso/#quantum-espresso_1","title":"Quantum ESPRESSO","text":"<p>Quantum ESPRESSO is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials:</p> Text Only<pre><code>PWscf (Plane-Wave Self-Consistent Field)\nFPMD (First Principles Molecular Dynamics)\nCP (Car-Parrinello)\n</code></pre>"},{"location":"merlin7/05-Software-Support/quantum-espresso/#licensing-terms-and-conditions","title":"Licensing Terms and Conditions","text":"<p>Quantum ESPRESSO is an open initiative, in collaboration with many groups world-wide, coordinated by the Quantum ESPRESSO Foundation. Scientific work done using Quantum ESPRESSO should contain an explicit acknowledgment and reference to the main papers (see Quantum Espresso Homepage for the details).</p>"},{"location":"merlin7/05-Software-Support/quantum-espresso/#how-to-run-on-merlin7","title":"How to run on Merlin7","text":""},{"location":"merlin7/05-Software-Support/quantum-espresso/#75","title":"7.5","text":""},{"location":"merlin7/05-Software-Support/quantum-espresso/#cpu-nodes","title":"CPU nodes","text":"Bash<pre><code>module purge\nmodule use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.9-xqhy-A100-gpu quantum-espresso/7.5-zfwh-omp\n</code></pre>"},{"location":"merlin7/05-Software-Support/quantum-espresso/#gh-nodes","title":"GH nodes","text":"Bash<pre><code>module purge\nmodule use Spack unstable\nmodule load nvhpc/25.7 openmpi/4.1.8-l3jj-GH200-gpu quantum-espresso/7.5-2ysd-gpu-omp\n</code></pre>"},{"location":"merlin7/05-Software-Support/quantum-espresso/#741","title":"7.4.1","text":""},{"location":"merlin7/05-Software-Support/quantum-espresso/#a100-nodes","title":"A100 nodes","text":"Bash<pre><code>module purge\nmodule use Spack unstable\nmodule load nvhpc/25.3 openmpi/main-6bnq-A100-gpu quantum-espresso/7.4.1-nxsw-gpu-omp\n</code></pre>"},{"location":"merlin7/05-Software-Support/quantum-espresso/#gh-nodes_1","title":"GH nodes","text":"Bash<pre><code>module purge\nmodule use Spack unstable\nmodule load nvhpc/25.3 openmpi/5.0.7-e3bf-GH200-gpu quantum-espresso/7.4.1-gxvj-gpu-omp\n</code></pre>"},{"location":"merlin7/05-Software-Support/quantum-espresso/#sbatch-a100-1-gpu-64-openmp-threads-one-mpi-rank-example","title":"SBATCH A100, 1 GPU, 64 OpenMP threads, one MPI rank example","text":"Bash<pre><code>#!/bin/bash\n#SBATCH --no-requeue\n#SBATCH --job-name=\"si64\"\n#SBATCH --get-user-env\n#SBATCH --output=_scheduler-stdout.txt\n#SBATCH --error=_scheduler-stderr.txt\n#SBATCH --partition=a100-daily\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=1\n#SBATCH --time=06:00:00\n#SBATCH --cpus-per-task=64\n#SBATCH --cluster=gmerlin7\n#SBATCH --gpus=1\n#SBATCH --hint=nomultithread\n\nexport OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK\nexport OMP_PROC_BIND=spread\nexport OMP_PLACES=threads\n\n# Load necessary modules\nmodule purge\nmodule use Spack unstable\nmodule load nvhpc/25.3 openmpi/main-6bnq-A100-gpu quantum-espresso/7.4.1-nxsw-gpu-omp\n\n\"srun\" '$(which pw.x)' '-npool' '1' '-in' 'aiida.in' > \"aiida.out\"\n</code></pre>"},{"location":"merlin7/05-Software-Support/quantum-espresso/#developing-your-own-gpu-code","title":"Developing your own GPU code","text":""},{"location":"merlin7/05-Software-Support/quantum-espresso/#spack","title":"Spack","text":"<ol> <li><code>spack config edit</code></li> <li>Add granularity: microarchitectures to your config (if you use nvhpc compiler! Not needed for CPU builds!) Bash<pre><code>spack:\n concretizer:\n unify: false\n targets:\n granularity: microarchitectures\n</code></pre></li> <li><code>spack add quantum-espresso@develop +cuda +mpi +mpigpu hdf5=parallel %nvhpc arch=linux-sles15-zen3 # GPU</code></li> <li><code>spack add quantum-espresso@develop +mpi hdf5=parallel %gcc # CPU</code></li> <li><code>spack develop quantum-espresso@develop # clone the code under /afs/psi.ch/sys/spack/user/$USER/spack-environment/quantum-espresso</code></li> <li>Make changes in /afs/psi.ch/sys/spack/user/$USER/spack-environment/quantum-espresso</li> <li>Build: <code>spack install [-jN] -v --until=build quantum-espresso@develop</code></li> </ol>"},{"location":"merlin7/05-Software-Support/quantum-espresso/#environment-modules","title":"Environment modules","text":""},{"location":"merlin7/05-Software-Support/quantum-espresso/#cpu","title":"CPU","text":"Bash<pre><code>module purge\nmodule use Spack unstable\nmodule load gcc/12.3 openmpi/main-syah fftw/3.3.10.6-qbxu-omp hdf5/1.14.5-t46c openblas/0.3.29-omp cmake/3.31.6-oe7u\n\ncd <path to QE source directory>\nmkdir build\ncd build\n\ncmake -DQE_ENABLE_MPI:BOOL=ON -DQE_ENABLE_OPENMP:BOOL=ON -DCMAKE_C_COMPILER:STRING=mpicc -DCMAKE_Fortran_COMPILER:STRING=mpif90 -DQE_ENABLE_HDF5:BOOL=ON ..\nmake [-jN]\n</code></pre>"},{"location":"merlin7/05-Software-Support/quantum-espresso/#a100","title":"A100","text":"Bash<pre><code>module purge\nmodule use Spack unstable\nmodule load nvhpc/25.3 openmpi/main-6bnq-A100-gpu fftw/3.3.10.6-qbxu-omp hdf5/develop-2.0-rjgu netlib-scalapack/2.2.2-3hgw cmake/3.31.6-oe7u\n\ncd <path to QE source directory>\nmkdir build\ncd build\n\ncmake -DQE_ENABLE_MPI:BOOL=ON -DQE_ENABLE_OPENMP:BOOL=ON -DQE_ENABLE_SCALAPACK:BOOL=ON -DQE_ENABLE_CUDA:BOOL=ON -DQE_ENABLE_MPI_GPU_AWARE:BOOL=ON -DQE_ENABLE_OPENACC:BOOL=ON -DCMAKE_C_COMPILER:STRING=mpicc -DCMAKE_Fortran_COMPILER:STRING=mpif90 -DQE_ENABLE_HDF5:BOOL=ON ..\nmake [-jN]\n</code></pre>"},{"location":"merlin7/05-Software-Support/quantum-espresso/#gh200","title":"GH200","text":"Bash<pre><code>salloc --partition=gh-daily --clusters=gmerlin7 --time=08:00:00 --ntasks=4 --nodes=1 --gpus=1 --mem=40000 $SHELL\nssh <allocated_gpu>\nmodule purge\nmodule use Spack unstable\nmodule load nvhpc/25.3 openmpi/5.0.7-e3bf-GH200-gpu fftw/3.3.10-sfpw-omp hdf5/develop-2.0-ztvo nvpl-blas/0.4.0.1-3zpg nvpl-lapack/0.3.0-ymy5 netlib-scalapack/2.2.2-qrhq cmake/3.31.6-5dl7\n\ncd <path to QE source directory>\nmkdir build\ncd build\n\ncmake -DQE_ENABLE_MPI:BOOL=ON -DQE_ENABLE_OPENMP:BOOL=ON -DQE_ENABLE_SCALAPACK:BOOL=ON -DQE_ENABLE_CUDA:BOOL=ON -DQE_ENABLE_MPI_GPU_AWARE:BOOL=ON -DQE_ENABLE_OPENACC:BOOL=ON -DCMAKE_C_COMPILER:STRING=mpicc -DCMAKE_Fortran_COMPILER:STRING=mpif90 -DQE_ENABLE_HDF5:BOOL=ON ..\nmake [-jN]\n</code></pre>"},{"location":"merlin7/05-Software-Support/quantum-espresso/#q-e-sirius","title":"Q-E-SIRIUS","text":"<p>SIRIUS enabled fork of QuantumESPRESSO</p>"},{"location":"merlin7/05-Software-Support/quantum-espresso/#cpu_1","title":"CPU","text":"Bash<pre><code>module purge\nmodule use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.8-mx6f q-e-sirius/1.0.1-dtn4-omp\n</code></pre>"},{"location":"merlin7/05-Software-Support/quantum-espresso/#a100-nodes_1","title":"A100 nodes","text":"Bash<pre><code>module purge\nmodule use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.8-lsff-A100-gpu q-e-sirius/1.0.1-7snv-omp\n</code></pre>"},{"location":"merlin7/05-Software-Support/quantum-espresso/#gh-nodes_2","title":"GH nodes","text":"Bash<pre><code>module purge\nmodule use Spack unstable\nmodule load gcc/12.3 openmpi/5.0.8-tx2w-GH200-gpu q-e-sirius/1.0.1-3dwi-omp\n</code></pre>"},{"location":"merlin7/05-Software-Support/spack/","title":"Spack","text":""},{"location":"merlin7/05-Software-Support/spack/#spack","title":"Spack","text":"<p>For Merlin7 the package manager for supercomputing Spack is available. It is meant to compliment the existing PModules solution, giving users the opertunity to manage their own software environments.</p> <p>Documentation for how to use Spack on Merlin7 is provided here.</p>"},{"location":"merlin7/05-Software-Support/spack/#the-spack-psi-packages","title":"The Spack PSI packages","text":"<p>An initial collection of packages (and Spack reciepes) are located at Spack PSI, users can directly use these through calls like <code>spack add ...</code>.</p>"},{"location":"merlin7/99-support/migration-from-merlin6/","title":"Merlin6 to Merlin7 Migration Guide","text":""},{"location":"merlin7/99-support/migration-from-merlin6/#merlin6-to-merlin7-migration-guide","title":"Merlin6 to Merlin7 Migration Guide","text":"<p>Welcome to the official documentation for migrating your data from Merlin6 to Merlin7. Please follow the instructions carefully to ensure a smooth and secure transition.</p>"},{"location":"merlin7/99-support/migration-from-merlin6/#migration-schedule","title":"\ud83d\udcc5 Migration Schedule","text":""},{"location":"merlin7/99-support/migration-from-merlin6/#phase-1-users-without-projects-deadline-july-11","title":"Phase 1: Users without Projects \u2014 Deadline: July 11","text":"<p>If you do not belong to any Merlin project, i.e for</p> <ul> <li>Users not in any group project (<code>/data/projects/general</code>)</li> <li>Users not in BIO, MEG, Mu3e</li> <li>Users not part of PSI-owned private Merlin nodes (ASA, MEG, Mu3e)</li> </ul> <p>You must complete your migration before July 11. You just need to migrate your personal /data/user/$USER and /home/psi/$USER directories.</p> <p>Users are responsible for initiating and completing the migration process as lined out below. Contact the Merlin support team merlin-admins@lists.psi.ch if you need help.</p> <p>\u26a0\ufe0f In this phase, it's important that you don't belong to any project. Once the migration is finished, access to Merlin6 will be no longer possible.</p> <p>Please refer to the Phase 1: Step-by-Step Migration Instructions section for detailed information about user data migration.</p>"},{"location":"merlin7/99-support/migration-from-merlin6/#phase-2-project-members-and-owners-start-before-august-1","title":"Phase 2: Project Members and Owners \u2014 Start Before August 1","text":"<p>For users in active projects:</p> <ul> <li>Project owners and members will be contacted by the Merlin admins.</li> <li>Migration will be scheduled individually per project.</li> <li>Expect contact before August 1.</li> </ul> <p>\u26a0\ufe0f In this phase, data and home directories of group owners and members will be also requested to be migrated in parallel.</p> <p>Please refer to the Phase 2: Migration Instructions section for further information.</p>"},{"location":"merlin7/99-support/migration-from-merlin6/#directory-structure-changes","title":"Directory Structure Changes","text":""},{"location":"merlin7/99-support/migration-from-merlin6/#merlin6-vs-merlin7","title":"Merlin6 vs Merlin7","text":"Cluster Home Directory User Data Directory Projects Experiments merlin6 /psi/home/<code>$USER</code> /data/user/<code>$USER</code> /data/project/ /data/experiments merlin7 /data/user/<code>$USER</code> /data/user/<code>$USER</code> /data/project/ /data/project/ <ul> <li>The home directory and user data directory have been merged into the single new home directory<code>/data/user/$USER</code>.</li> <li> <p>The experiments directory has been integrated into <code>/data/project/</code>:</p> <ul> <li><code>/data/project/general</code> contains general Merlin7 projects.</li> <li>Other subdirectories are used for large-scale projects such as CLS division, Mu3e, and MeG.</li> </ul> </li> </ul>"},{"location":"merlin7/99-support/migration-from-merlin6/#prerequisites-and-preparation","title":"\ud83d\udccb Prerequisites and Preparation","text":"<p>Before starting the migration, make sure you:</p> <ul> <li> <p>are registered on Merlin7.</p> <ul> <li>If not yet registered, please do so following these instructions</li> </ul> </li> <li> <p>have cleaned up your data to reduce migration time and space usage.</p> </li> <li> <p>For the user data migration, ensure your total usage on Merlin6 (<code>/psi/home</code>+<code>/data/user</code>) is well below the 1\u202fTB quota (use the <code>merlin_quotas</code> command). Remember:</p> <ul> <li>Merlin7 also has a 1\u202fTB quota on your home directory, and you might already have data there.</li> <li>If your usage exceeds this during the transfer, the process might fail.</li> </ul> </li> <li> <p>No activity should be running / performed on Merlin6 when the transfer process is ongoing.</p> </li> </ul>"},{"location":"merlin7/99-support/migration-from-merlin6/#recommended-cleanup-actions","title":"Recommended Cleanup Actions","text":"<ul> <li>Remove unused files and datasets.</li> <li>Archive large, inactive data sets.</li> <li> <p>Delete or clean up unused <code>conda</code> or <code>virtualenv</code> Python environments:</p> <ul> <li>These are often large and may not work as-is on Merlin7.</li> <li>You can export your conda environment description to a file with:</li> </ul> <p></p>Bash<pre><code>conda env export -n myenv > $HOME/myenv.yml\n</code></pre> * Then recreate them later on Merlin7 from these files.<p></p> </li> </ul> <p>\ud83e\uddf9 For the user data, you can always remove more old data after migration \u2014 it will be copied into <code>~/merlin6data</code> and <code>~/merlin6home</code> on Merlin7.</p>"},{"location":"merlin7/99-support/migration-from-merlin6/#phase-1-step-by-step-migration-instructions","title":"Phase 1: Step-by-Step Migration Instructions","text":""},{"location":"merlin7/99-support/migration-from-merlin6/#step-1-run-merlin7_migrationsetup","title":"Step 1: Run <code>merlin7_migration.setup</code>","text":"<p>Log into any Merlin6 login node (<code>merlin-l-001.psi.ch</code>, <code>merlin-l-002.psi.ch</code>, <code>merlin-l-01.psi.ch</code>) and run:</p> Bash<pre><code>merlin7_migration.setup\n</code></pre> <p>This script will:</p> <ul> <li>Check that you have an account on Merlin7.</li> <li>Configure and check that your environment is ready for transferring files via Slurm job.</li> <li> <p>Create two directories:</p> <ul> <li><code>~/merlin6data</code> \u2192 copy of your old /data/user</li> <li><code>~/merlin6home</code> \u2192 copy of your old home</li> </ul> </li> </ul> <p>\u26a0\ufe0f Important: If <code>~/merlin6home</code> or <code>~/merlin6data</code> already exist on Merlin7, the script will exit.</p> <p>Please remove them or contact support.</p> <p>If there are issues, the script will:</p> <ul> <li>Print clear diagnostic output</li> <li>Give you some hints to resolve the issue</li> </ul> <p>If you are stuck, email: merlin-admins@lists.psi.ch</p>"},{"location":"merlin7/99-support/migration-from-merlin6/#step-2-run-merlin7_migrationstart","title":"Step 2: Run <code>merlin7_migration.start</code>","text":"<p>After setup completes, start the migration by running:</p> Bash<pre><code>merlin7_migration.start\n</code></pre> <p>This script will:</p> <ul> <li>Check the status of your quota on Merlin6.</li> <li>Submit SLURM batch jobs to the <code>xfer</code> partition</li> <li> <p>Queue two jobs:</p> </li> <li> <p><code>migrate_merlin6data.batch</code> (data dir)</p> </li> <li><code>migrate_merlin6home.batch</code> (home dir)<ul> <li>This job will only start if <code>migrate_merlin6data.batch</code> has successfully finished.</li> </ul> </li> <li>Automatically track the job IDs</li> <li>Print log file locations for the different jobs</li> </ul> <p>\u26a0\ufe0f Once both transfers succeed, your access to Merlin6 will be revoked. Do not attempt to reconnect to Merlin6 after this.</p>"},{"location":"merlin7/99-support/migration-from-merlin6/#if-something-goes-wrong","title":"\u2757 If Something Goes Wrong","text":"<p>If a problem occurs during the migration process:</p> <ul> <li>\ud83d\udd0d Check the job log files mentioned in the script output. They contain detailed messages that explain what failed and why.</li> <li> <p>\ud83d\udee0\ufe0f Fix the root cause on the source system. Common issues include:</p> <ul> <li>Files with incorrect permissions</li> <li>Ownership mismatches</li> <li>Disk quota exceeded on Merlin7</li> <li>\ud83d\udcda Refer to the \u26a0\ufe0f Common rsync/fpsync Migration Issues section below for detailed explanations and solutions.</li> </ul> </li> </ul> <p>\u2139\ufe0f Important: If <code>migrate_merlin6data.batch</code> fails, the migration process will automatically cancel <code>migrate_merlin6home.batch</code> to avoid ending in an inconsistent state.</p> <p>Once the problem is resolved, simply re-run the <code>merlin7_migration.start</code> script to resume the migration.</p>"},{"location":"merlin7/99-support/migration-from-merlin6/#step-3-monitor-transfer-jobs","title":"Step 3: Monitor Transfer Jobs","text":"<p>To monitor your transfer jobs, run:</p> Bash<pre><code>squeue -M merlin6 -u $USER -p xfer\n</code></pre> <p>Check the output to ensure your jobs are:</p> <ul> <li>Running (<code>R</code>) or completed (<code>CG</code> or removed from queue)</li> <li>Not failed (<code>F</code>, <code>TO</code>, or stuck)</li> </ul> <p>You can also check logs (as printed by the script) to verify job completion.</p> <p>\u2705 When <code>/data/user/$USER</code> and <code>/psi/home/$USER</code> on Merlin6 are no longer accessible, migration is complete.</p>"},{"location":"merlin7/99-support/migration-from-merlin6/#examples","title":"Examples","text":""},{"location":"merlin7/99-support/migration-from-merlin6/#setup-the-migration","title":"Setup the Migration","text":"Bash<pre><code>merlin7_migration.setup\n</code></pre> <p>Expected output:</p> Bash<pre><code>\u2705 login002.merlin7.psi.ch\n\u2705 `$USER` is a member of svc-cluster_merlin7\n\u2705 Skipping key generation\n\u2705 SSH key already added to agent.\n\u2705 SSH ID successfully copied to login00[1|2].merlin7.psi.ch.\n\u2705 Test successful.\n\u2705 /data/software/xfer_logs/caubet_m created.\n\u2705 ~/merlin6data directory created.\n\u2705 ~/merlin6home directory created.\n</code></pre>"},{"location":"merlin7/99-support/migration-from-merlin6/#start-the-migration","title":"Start the Migration","text":"Bash<pre><code>merlin7_migration.start\n</code></pre> <p>Expected output:</p> Bash<pre><code>(base) \u2744 [caubet_m@merlin-l-001:/data/software/admin/scripts/merlin-user-tools/alps(master)]# ./merlin7_migration.start\n\u2705 Quota check passed.\nUsed: 512 GB, 234001 files\n\n###################################################\nSubmitting transfer jobs to Slurm\n\n Job logs can be found here:\n\u27a1\ufe0f Directory '/data/user/caubet_m' does NOT have 000 permissions. Transfer pending, continuing...\n\u2705 Submitted DATA_MIGRATION job: 24688554. Sleeping 3 seconds...\n - /data/user transfer logs:\n - /data/software/xfer_logs/caubet_m/data-24688554.out\n - /data/software/xfer_logs/caubet_m/data-24688554.err\n\u27a1\ufe0f Directory '/psi/home/caubet_m' does NOT have 000 permissions. Transfer pending, continuing...\n\u2705 Submitted HOME_MIGRATION job with dependency on 24688554: 24688555. Sleeping 3 seconds...\n - /psi/home transfer logs:\n - /data/software/xfer_logs/caubet_m/home-24688555.out\n - /data/software/xfer_logs/caubet_m/home-24688555.err\n\n\u2705 You can start manually a monitoring window with:\n tmux new-session -d -s \"xfersession\" \"watch 'squeue -M merlin6 -u caubet_m -p xfer'\"\n tmux attach -t \"xfersession\"\n\n\u2705 FINISHED - PLEASE CHECK JOB TRANSFER PROGRESS\n</code></pre>"},{"location":"merlin7/99-support/migration-from-merlin6/#monitor-progress","title":"Monitor Progress","text":"Bash<pre><code>squeue -M merlin6 -u $USER -p xfer\n</code></pre> <p>Output:</p> Bash<pre><code>$ squeue -M merlin6 -u $USER -p xfer\nCLUSTER: merlin6\n JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n 24688581 xfer HOME_MIG caubet_m PD 0:00 1 (Dependency)\n 24688580 xfer DATA_MIG caubet_m R 0:22 1 merlin-c-017\n</code></pre>"},{"location":"merlin7/99-support/migration-from-merlin6/#phase-2-migration-instructions","title":"Phase 2: Migration Instructions","text":"<p>Please refer to the Prerequisites and Preparation section for initial setup steps. Further instructions will be sent via email once the owning team is contacted by the Merlin administrators.</p>"},{"location":"merlin7/99-support/migration-from-merlin6/#common-rsyncfpsync-migration-issues","title":"\u26a0\ufe0f Common <code>rsync</code>/<code>fpsync</code> Migration Issues","text":""},{"location":"merlin7/99-support/migration-from-merlin6/#file-permission-denied","title":"File Permission Denied","text":"<ul> <li>Cause: Files or directories are not readable by the user running the transfer.</li> <li>Solution: Fix source-side permissions:</li> </ul> Bash<pre><code>chmod -R u+rX /path/to/file_or_dir\n</code></pre>"},{"location":"merlin7/99-support/migration-from-merlin6/#ownership-mismatches","title":"Ownership Mismatches","text":"<ul> <li>Cause: Source files are owned by another user (e.g. root or a collaborator).</li> <li> <p>Solution:</p> <ul> <li>Change ownership before migration:</li> </ul> Bash<pre><code>chown -R $USER /path/to/file\n</code></pre> </li> </ul>"},{"location":"merlin7/99-support/migration-from-merlin6/#special-files-eg-device-files-sockets","title":"Special Files (e.g. device files, sockets)","text":"<ul> <li>Cause: <code>rsync</code> tries to copy UNIX sockets, device files, or FIFOs.</li> <li>Effect: Errors or incomplete copies.</li> <li>Solution: Avoid transferring such files entirely (by deleting them).</li> </ul>"},{"location":"merlin7/99-support/migration-from-merlin6/#exceeded-disk-quota","title":"Exceeded Disk Quota","text":"<ul> <li>Cause: Combined size of existing + incoming data exceeds 1\u202fTB quota on Merlin7.</li> <li>Effect: Transfer stops abruptly.</li> <li>Solution: Clean up or archive non-essential data before migration.</li> </ul>"},{"location":"merlin7/99-support/migration-from-merlin6/#very-small-files-or-large-trees-many-small-rsync-calls","title":"Very Small Files or Large Trees \u2192 Many Small rsync Calls","text":"<ul> <li>Cause: Directory with thousands/millions of small files.</li> <li>Effect: Transfer is slow or hits process limits.</li> <li>Solution: Consider archiving to <code>.tar.gz</code> before transferring:</li> </ul> Bash<pre><code>tar -czf myenv.tar.gz myenv/\n</code></pre>"},{"location":"merlin7/99-support/migration-from-merlin6/#need-help","title":"Need Help?","text":"<p>If something doesn't work:</p> <ul> <li>Re-run the scripts and check the logs carefully.</li> <li>Use <code>less</code>, <code>cat</code>, or <code>tail -f</code> to view your job logs.</li> <li>Contact the Merlin support team: \ud83d\udce7 merlin-admins@lists.psi.ch</li> </ul> <p>We are here to help you migrate safely and efficiently.</p>"},{"location":"news/","title":"News","text":""},{"location":"news/#news","title":"News","text":""},{"location":"news/2019/06/12/merlin-6-documentation-available/","title":"Merlin 6 documentation available","text":"","tags":["getting_started"]},{"location":"news/2019/06/12/merlin-6-documentation-available/#merlin-6-documentation-available","title":"Merlin 6 documentation available","text":"<p>Merlin 6 docs are now available at Merlin6 docs!</p> <p>More complete documentation will be coming shortly.</p>","tags":["getting_started"]},{"location":"news/2024/08/07/merlin7-in-preproduction/","title":"Merlin7 in preproduction","text":"","tags":["getting_started"]},{"location":"news/2024/08/07/merlin7-in-preproduction/#merlin7-in-preproduction","title":"Merlin7 in preproduction","text":"<p>The Merlin7 cluster is officially in preproduction. This phase will be tested by a few users and slowly we will contact other users to be part of it. Keep in mind that access is restricted.</p> <p>Merlin7 documentation is now available at Slurm configuration.</p> <p>More complete documentation will be coming shortly.</p>","tags":["getting_started"]},{"location":"news/2026/01/12/new-user-documentation-site/","title":"New User Documentation Site","text":"","tags":["getting_started"]},{"location":"news/2026/01/12/new-user-documentation-site/#new-user-documentation-site","title":"New User Documentation Site","text":"<p>Starting in 2026, we are changing the design of the user documentation website. Previously we had used the theme Documentation for Jekyll together with the Jykell SSG, but have now switch to the more modern Material for MkDocs theme and SSG engine. This comes with a few improvements:</p> <ul> <li>searching is more complete and provides better results</li> <li>theme related improvements (day-night coloring, page layout, content formatting)</li> <li>edits for pages can be submitted via the Edit button (taking you to the Gitea editor and letting you submit a pull request)</li> </ul> <p>With the latter new feature, we encourage our users to point out any issues they find with the documentation. Contributation are very welcome and will help in ensuring that the documentation is kept up-to-date.</p> <p>Notice also that we now have a dedicated Support page, making it easier to find and use our different contact options.</p>","tags":["getting_started"]},{"location":"support/faq/","title":"FAQ","text":""},{"location":"support/faq/#frequently-asked-questions","title":"Frequently Asked Questions","text":""},{"location":"support/faq/#how-do-i-register-for-merlin","title":"How do I register for Merlin?","text":"<p>See Requesting Merlin Access.</p>"},{"location":"support/faq/#how-do-i-get-information-about-downtimes-and-updates","title":"How do I get information about downtimes and updates?","text":"<p>See Get updated through the Merlin User list!</p>"},{"location":"support/faq/#how-can-i-request-access-to-a-merlin-project-directory","title":"How can I request access to a Merlin project directory?","text":"<p>Merlin projects are placed in the <code>/data/project</code> directory. Access to each project is controlled by Unix group membership. If you require access to an existing project, please request group membership as described in Requesting Unix Group Membership.</p> <p>Your project leader or project colleagues will know what Unix group you should belong to. Otherwise, you can check what Unix group is allowed to access that project directory (simply run <code>ls -ltrhd</code> for the project directory).</p>"},{"location":"support/faq/#can-i-install-software-myself","title":"Can I install software myself?","text":"<p>Most software can be installed in user directories without any special permissions. We recommend using <code>/data/user/$USER/bin</code> for software since home directories are fairly small. For software that will be used by multiple groups/users you can also request the admins install it as a module.</p> <p>How to install depends a bit on the software itself. There are three common installation procedures:</p> <ul> <li>binary distributions. These are easy; just put them in a directory (eg <code>/data/user/$USER/bin</code>) and add that to your PATH.</li> <li>source compilation using make/cmake/autoconfig/etc. Usually the compilation scripts accept a <code>--prefix=/data/user/$USER</code> directory for where to install it. Then they place files under <code><prefix>/bin</code>, <code><prefix>/lib</code>, etc. The exact syntax should be documented in the installation instructions.</li> </ul> <p>Note</p> <p>The following is based on <code>merlin6</code>, but should still be valid for <code>merlin7</code>.</p> <ul> <li> <p>conda environment. This is now becoming standard for python-based software, including lots of the AI tools. First follow the initial setup instructions to configure conda to use /data/user instead of your home directory. Then you can create environments like:</p> Bash<pre><code>module load anaconda/2019.07\n# if they provide environment.yml\nconda env create -f environment.yml\n\n# or to create manually\nconda create --name myenv python==3.9 ...\n\nconda activate myenv\n</code></pre> </li> </ul>"},{"location":"support/faq/#something-doesnt-work","title":"Something doesn't work","text":"<p>Check the list of known problems to see if a solution is known. If not, please contact the admins.</p>"},{"location":"support/introduction/","title":"Getting Support","text":""},{"location":"support/introduction/#getting-support","title":"Getting Support","text":"<p>Tip</p> <p>It is strongly recommended that users subscribe to the user mailing list, that way you will receive the newest announcements concerning the status of the clusters, information regarding maintenance actions, and other tasks that might affect your work.</p> <p>There are several channels you can use to get support:</p> <ul> <li>the preferred choice is to submit a ticket with PSI Service Now, alternatively</li> <li>you can also us our user mailing list, or</li> <li>you can email the Admins directly merlin-admins@lists.psi.ch</li> </ul> Info <p>Basic contact information is also displayed on every shell login to the system using the Message of the Day mechanism.</p>"},{"location":"support/introduction/#psi-service-now","title":"PSI Service Now","text":"<p>PSI Service Now is the official tool for opening tickets and requests.</p> <ul> <li>PSI HelpDesk will redirect the incident to the corresponding department, or</li> <li>you can always assign it directly by checking the box <code>I know which service is affected</code> and providing the service name <code>Local HPC Resources (e.g. Merlin) [CF]</code> (just type in <code>Local</code> and you should get the valid completions).</li> </ul>"},{"location":"support/introduction/#merlin-user-mailing-list","title":"Merlin User mailing list","text":"<p>This mailing list is the official channel used by Merlin administrators to inform users about downtimes, interventions or problems. Users can be subscribed in two ways:</p> <ul> <li>Preferred way: Self-registration through Sympa</li> <li>If you need to subscribe many people (e.g. your whole group) by sending a request to the admin list merlin-admins@lists.psi.ch and providing a list of email addresses.</li> </ul>"},{"location":"support/introduction/#email-the-admins","title":"Email the Admins","text":"<p>This is the official way to contact Merlin Administrators for discussions which do not fit well into the incident category. Do not hesitate to contact us for such cases.</p> <p>E-Mail: merlin-admins@lists.psi.ch</p>"},{"location":"support/introduction/#who-are-we","title":"Who are we?","text":"<p>The PSI Merlin clusters are managed by the High Performance Computing and Emerging technologies Group, which is part of the Science IT Infrastructure, and Services department (AWI) in PSI's Center for Scientific Computing, Theory and Data (SCD).</p>"},{"location":"support/known-problems/","title":"Known Problems","text":""},{"location":"support/known-problems/#known-problems","title":"Known Problems","text":""},{"location":"support/known-problems/#common-errors","title":"Common errors","text":""},{"location":"support/known-problems/#illegal-instruction-error","title":"Illegal instruction error","text":"<p>It may happened that your code, compiled on one machine will not be executed on another throwing exception like \"(Illegal instruction)\". This is usually because the software was compiled with a set of instructions newer than the ones available in the node where the software runs, and it mostly depends on the processor generation.</p> <p>In example, <code>merlin-l-001</code> and <code>merlin-l-002</code> contain a newer generation of processors than the old GPUs nodes, or than the Merlin5 cluster. Hence, unless one compiles the software with compatibility with set of instructions from older processors, it will not run on old nodes. Sometimes, this is properly set by default at the compilation time, but sometimes is not.</p> <p>For GCC, please refer to GCC x86 Options for compiling options. In case of doubts, contact us.</p>"},{"location":"support/known-problems/#slurm","title":"Slurm","text":""},{"location":"support/known-problems/#sbatch-using-one-core-despite-setting-c-cpus-per-task","title":"sbatch using one core despite setting -c/--cpus-per-task","text":"<p>From Slurm v22.05.6, the behavior of <code>srun</code> has changed. Merlin has been updated to this version since Tuesday 13.12.2022.</p> <p><code>srun</code> will no longer read in <code>SLURM_CPUS_PER_TASK</code>, which is typically set when defining <code>-c/--cpus-per-task</code> in the <code>sbatch</code> command. This means you will implicitly have to specify <code>-c\\--cpus-per-task</code> also on your <code>srun</code> calls, or set the new <code>SRUN_CPUS_PER_TASK</code> environment variable to accomplish the same thing. Therefore, unless this is implicitly specified, <code>srun</code> will use only one Core per task (resulting in 2 CPUs per task when multithreading is enabled)</p> <p>An example for setting up <code>srun</code> with <code>-c\\--cpus-per-task</code>:</p> Bash<pre><code>(base) \u2744 [caubet_m@merlin-l-001:/data/user/caubet_m]# cat mysbatch_method1\n#!/bin/bash\n#SBATCH -n 1\n#SBATCH --cpus-per-task=8\n\necho 'From Slurm v22.05.8 srun does not inherit $SLURM_CPUS_PER_TASK'\nsrun python -c \"import os; print(os.sched_getaffinity(0))\"\n\necho 'One has to implicitly specify $SLURM_CPUS_PER_TASK'\necho 'In this example, by setting -c/--cpus-per-task in srun'\nsrun --cpus-per-task=$SLURM_CPUS_PER_TASK python -c \"import os; print(os.sched_getaffinity(0))\"\n\n(base) \u2744 [caubet_m@merlin-l-001:/data/user/caubet_m]# sbatch mysbatch_method1\nSubmitted batch job 8000813\n\n(base) \u2744 [caubet_m@merlin-l-001:/data/user/caubet_m]# cat slurm-8000813.out\nFrom Slurm v22.05.8 srun does not inherit $SLURM_CPUS_PER_TASK\n{1, 45}\nOne has to implicitly specify $SLURM_CPUS_PER_TASK\nIn this example, by setting -c/--cpus-per-task in srun\n{1, 2, 3, 4, 45, 46, 47, 48}\n</code></pre> <p>An example to accomplish the same thing with the <code>SRUN_CPUS_PER_TASK</code> environment variable:</p> Bash<pre><code>(base) \u2744 [caubet_m@merlin-l-001:/data/user/caubet_m]# cat mysbatch_method2\n#!/bin/bash\n#SBATCH -n 1\n#SBATCH --cpus-per-task=8\n\necho 'From Slurm v22.05.8 srun does not inherit $SLURM_CPUS_PER_TASK'\nsrun python -c \"import os; print(os.sched_getaffinity(0))\"\n\necho 'One has to implicitly specify $SLURM_CPUS_PER_TASK'\necho 'In this example, by setting an environment variable SRUN_CPUS_PER_TASK'\nexport SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK\nsrun python -c \"import os; print(os.sched_getaffinity(0))\"\n\n(base) \u2744 [caubet_m@merlin-l-001:/data/user/caubet_m]# sbatch mysbatch_method2\nSubmitted batch job 8000815\n\n(base) \u2744 [caubet_m@merlin-l-001:/data/user/caubet_m]# cat slurm-8000815.out\nFrom Slurm v22.05.8 srun does not inherit $SLURM_CPUS_PER_TASK\n{1, 45}\nOne has to implicitly specify $SLURM_CPUS_PER_TASK\nIn this example, by setting an environment variable SRUN_CPUS_PER_TASK\n{1, 2, 3, 4, 45, 46, 47, 48}\n</code></pre>"},{"location":"support/known-problems/#general-topics","title":"General topics","text":""},{"location":"support/known-problems/#default-shell","title":"Default SHELL","text":"<p>In general, <code>/bin/bash</code> is the recommended default user's SHELL when working in Merlin.</p> <p>Some users might notice that BASH is not the default SHELL when logging in to Merlin systems, or they might need to run a different SHELL. This is probably because when the PSI account was requested, no SHELL description was specified or a different one was requested explicitly by the requestor. Users can check which is the default SHELL specified in the PSI account with the following command:</p> Bash<pre><code>getent passwd $USER | awk -F: '{print $NF}'\n</code></pre> <p>If SHELL does not correspond to the one you need to use, you should request a central change for it. This is because Merlin accounts are central PSI accounts. Hence, change must be requested via PSI Service Now.</p> <p>Alternatively, if you work on other PSI Linux systems but for Merlin you need a different SHELL type, a temporary change can be performed during login startup. You can update one of the following files:</p> <ul> <li><code>~/.login</code></li> <li><code>~/.profile</code></li> <li>Any <code>rc</code> or <code>profile</code> file in your home directory (i.e. <code>.cshrc</code>, <code>.bashrc</code>, <code>.bash_profile</code>, etc.)</li> </ul> <p>with the following lines:</p> Bash<pre><code># Replace MY_SHELL with the bash type you need\nMY_SHELL=/bin/bash\nexec $MY_SHELL -l\n</code></pre> <p>Notice that available shells can be found in the following file:</p> Bash<pre><code>cat /etc/shells\n</code></pre>"},{"location":"support/known-problems/#3d-acceleration-opengl-vs-mesa","title":"3D acceleration: OpenGL vs Mesa","text":"<p>Some applications can run with OpenGL support. This is only possible when the node contains a GPU card.</p> <p>In general, X11 with Mesa Driver is the recommended method as it will work in all cases (no need of GPUs). In example, for ParaView:</p> Bash<pre><code>module load paraview\nparaview-mesa paraview # 'paraview --mesa' for old releases\n</code></pre> <p>However, if one needs to run with OpenGL support, this is still possible by running <code>vglrun</code>. In example, for running Paraview:</p> Bash<pre><code>module load paraview\nvglrun paraview\n</code></pre> <p>Officially, the supported method for running <code>vglrun</code> is by using the NoMachine remote desktop. Running <code>vglrun</code> it's also possible using SSH with X11 Forwarding. However, it's very slow and it's only recommended when running in Slurm (from NoMachine). Please, avoid running <code>vglrun</code> over SSH from a desktop or laptop.</p>"},{"location":"support/known-problems/#software","title":"Software","text":""},{"location":"support/known-problems/#ansys","title":"ANSYS","text":"<p>Sometimes, running ANSYS/Fluent requires X11 support. For that, one should run fluent as follows.</p> Bash<pre><code>module load ANSYS\nfluent -driver x11\n</code></pre>"},{"location":"support/known-problems/#paraview","title":"Paraview","text":"<p>For running Paraview, one can run it with Mesa support or OpenGL support. Please refer to OpenGL vs Mesa for further information about how to run it.</p>"},{"location":"support/known-problems/#module-command-not-found","title":"Module command not found","text":"<p>In some circumstances the module command may not be initialized properly. For instance, you may see the following error upon logon:</p> Text Only<pre><code>bash: module: command not found\n</code></pre> <p>The most common cause for this is a custom <code>.bashrc</code> file which fails to source the global <code>/etc/bashrc</code> responsible for setting up PModules in some OS versions. To fix this, add the following to <code>$HOME/.bashrc</code>:</p> Bash<pre><code>if [ -f /etc/bashrc ]; then\n . /etc/bashrc\nfi\n</code></pre> <p>It can also be fixed temporarily in an existing terminal by running <code>. /etc/bashrc</code> manually.</p>"},{"location":"support/troubleshooting/","title":"Troubleshooting","text":""},{"location":"support/troubleshooting/#troubleshooting","title":"Troubleshooting","text":"<p>For troubleshooting, please contact us through the official channels. See here for more information.</p>"},{"location":"support/troubleshooting/#known-problems","title":"Known Problems","text":"<p>Before contacting us for support, please check the Known Problems page to see if there is an existing workaround for your specific problem.</p>"},{"location":"support/troubleshooting/#troubleshooting-slurm-jobs","title":"Troubleshooting Slurm Jobs","text":"<p>If you want to report a problem or request for help when running jobs, please always provide the following information:</p> <ol> <li>Provide your batch script or, alternatively, the path to your batch script.</li> <li> <p>Add always the following commands to your batch script</p> Bash<pre><code>echo \"User information:\"; who am i\necho \"Running hostname:\"; hostname\necho \"Current location:\"; pwd\necho \"User environment:\"; env\necho \"List of PModules:\"; module list\n</code></pre> </li> <li> <p>Whenever possible, provide the Slurm JobID.</p> </li> </ol> <p>Providing this information is extremely important in order to ease debugging, otherwise only with the description of the issue or just the error message is completely insufficient in most cases.</p>"},{"location":"support/troubleshooting/#troubleshooting-ssh","title":"Troubleshooting SSH","text":"<p>Use the ssh command with the \"-vvv\" option and copy and paste the text (please don't send us screenshots) the output to your request in Service-Now. Example:</p> Bash<pre><code>ssh -Y -vvv $username@<hostname>\n</code></pre>"},{"location":"news/archive/2026/","title":"2026","text":""},{"location":"news/archive/2026/#2026","title":"2026","text":""},{"location":"news/archive/2024/","title":"2024","text":""},{"location":"news/archive/2024/#2024","title":"2024","text":""},{"location":"news/archive/2019/","title":"2019","text":""},{"location":"news/archive/2019/#2019","title":"2019","text":""}]} |