SD 02.09.2019: merlin-l-002 has now NoMachine

2019-09-02 17:42:24 +02:00
parent f85373afc2
commit a7067d5efd
8 changed files with 3 additions and 2 deletions
--- a/announcements/downtimes.md
+++ b/announcements/downtimes.md
@@ -0,0 +1,62 @@
+---
+title: Downtimes
+#tags:
+#keywords:
+last_updated: 28 June 2019
+#summary: "Merlin 6 cluster overview"
+sidebar: merlin6_sidebar
+permalink: /merlin6/downtimes.html
+---
+
+On the first Monday of each month the Merlin6 cluster might be subject to interruption due to maintenance. 
+Users will be informed with at least one week in advance when a downtime is scheduled for the next month.
+
+Downtimes will be informed to users through the <merlin-users@lists.psi.ch> mail list. Also, a detailed description 
+for the nexts scheduled interventions will be available in [Next Scheduled Downtimes](/merlin6/downtimes.html#next-scheduled-downtimes)).
+
+---
+
+## Scheduled Downtime Draining Policy
+
+Scheduled downtimes mostly affecting the storage and Slurm configurantions may require draining the nodes.
+When this is required, users will be informed accordingly. Two different types of draining are possible:
+
+* **soft drain**: new jobs may be queued on the partition, but queued jobs may not be allocated nodes and run from the partition. 
+Jobs already running on the partition continue to run. This will be the **default** drain method.
+* **hard drain**: no new jobs may be queued on the partition (job submission requests will be denied with an error message), 
+but jobs already queued on the partition may be allocated to nodes and run. 
+
+Unless explicitly specified, the default draining policy for each partition will be the following: 
+
+* The **daily** and **general** partition will be soft drained 24h before the downtime.
+* The **hourly** partition will be soft drained 1 hour before the downtime.
+* The **gpu** partition will be soft drained 1 hour before the downtime.
+
+Finally, **remaining running jobs will be killed** by default when the downtime starts. In some specific rare cases jobs will be 
+just *paused* and *resumed* back when the downtime finished.
+
+### Draining Policy Summary
+
+The following table contains a summary of the draining policies during a Schedule Downtime:
+
+| **Partition**   | **Drain Policy**  | **Default Drain Type** | **Default Job Policy**           |
+|:---------------:| -----------------:| ----------------------:| --------------------------------:|
+| **general**     | 24h before the SD | soft drain             | Kill running jobs when SD starts |
+| **daily**       | 24h before the SD | soft drain             | Kill running jobs when SD starts |
+| **hourly**      |  1h before the SD | soft drain             | Kill running jobs when SD starts |
+| **gpu**         |  1h before the SD | soft drain             | Kill running jobs when SD starts |
+
+---
+
+## Next Scheduled Downtimes
+
+The table below shows a description for the next Scheduled Downtime:
+
+| Date         | Time      |  Affected Service/s             | Description                                                      | Status  |
+|:------------:| --------- | :-----------------------------  |:-----------------------------------------------------------------| ------- |
+| *02.09.2019* | From 8h   |  *Login + computing nodes*      | Upgrade HPC GPFS cluster from v5.0.2-3 v5.0.3-2.                 | Pending |
+| *02.09.2019* | From 8h   |  *Login + computing nodes*      | Upgrade to latest RHEL7.6 with Kernel 3.10.0-957.27.2.el7.x86_64 | Pending |
+| *02.09.2019* | From 8h   |  *Login + computing nodes*      | Upgrade PModules from 1.0.0rc4 to 1.0.0rc5                       | Pending |
+| *02.09.2019* | From 8h   |  *Login + computing nodes*      | Upgrade Infiniband drivers from v4.5 to v4.6                     | Pending |
+
+* **Note:** The next downtime date is a tentative date. Users will get informed by e-mail when this is confirmed.