From 9cfb24828e9a9557ae19d958f33cc397dca004a3 Mon Sep 17 00:00:00 2001 From: caubet_m Date: Fri, 28 Jun 2019 19:50:36 +0200 Subject: [PATCH] Added Downtimes --- _data/sidebars/merlin6_sidebar.yml | 2 +- pages/merlin6/announcements/downtimes.md | 59 ++++++++++++++++++++++++ 2 files changed, 60 insertions(+), 1 deletion(-) create mode 100644 pages/merlin6/announcements/downtimes.md diff --git a/_data/sidebars/merlin6_sidebar.yml b/_data/sidebars/merlin6_sidebar.yml index fde7bd6..e55523d 100644 --- a/_data/sidebars/merlin6_sidebar.yml +++ b/_data/sidebars/merlin6_sidebar.yml @@ -39,7 +39,7 @@ entries: url: /merlin6/slurm-examples.html - title: Announcements folderitems: - - title: Scheduled Downtimes + - title: Downtimes url: /merlin6/downtimes.html - title: Support folderitems: diff --git a/pages/merlin6/announcements/downtimes.md b/pages/merlin6/announcements/downtimes.md new file mode 100644 index 0000000..aeef106 --- /dev/null +++ b/pages/merlin6/announcements/downtimes.md @@ -0,0 +1,59 @@ +--- +title: Downtimes +#tags: +#keywords: +last_updated: 28 June 2019 +#summary: "Merlin 6 cluster overview" +sidebar: merlin6_sidebar +permalink: /merlin6/downtimes.html +--- + +On the first Monday of each month the Merlin6 cluster might be subject to interruption due to maintenance. +Users will be informed with at least one week in advance when a downtime is scheduled for the next month. + +Downtimes will be informed to users through the mail list. Also, a detailed description +for the nexts scheduled interventions will be available in [Next Scheduled Downtimes](## Next Scheduled Downtimes)). + +### Scheduled Downtime Draining Policy + +Scheduled downtimes mostly affecting the storage and Slurm configurantions may require draining the nodes. +When this is required, users will be informed accordingly. Two different types of draining are possible: + +* **soft drain**: new jobs may be queued on the partition, but queued jobs may not be allocated nodes and run from the partition. +Jobs already running on the partition continue to run. This will be the **default** drain method. +* **hard drain**: no new jobs may be queued on the partition (job submission requests will be denied with an error message), +but jobs already queued on the partition may be allocated to nodes and run. + +Unless explicitly specified, the default draining policy for each partition will be the following: + +* The **general** partition will be soft drained on the previous Friday from 8am. +* The **daily** partition will be soft drained on the previous day from 8am. +* The **hourly** partition will be soft drained on the same Monday from 7am. + +Finally, **remaining running jobs will be killed** by default when the downtime starts. In some specific rare cases jobs will be +just *paused* and *resumed* back when the downtime finished. + +#### Draining Policy Summary + +The following table contains a summary of the draining policies during a Schedule Downtime: + +| **Partition** | **Drain Policy** | **Default Drain Type** | **Default Job Policy** | +| ------------- | -----------------:| ---------------------- | -------------------------------- | +| **general** | 72h before the SD | soft drain | Kill running jobs when SD starts | +| **daily** | 24h before the SD | soft drain | Kill running jobs when SD starts | +| **hourly** | 1h before the SD | soft drain | Kill running jobs when SD starts | + +--- + +## Next Scheduled Downtimes + +The table below shows a description for the next Scheduled Downtime + +| Date | Affected Service/s | Description | +| ------------ |:---------------------------- |:---------------------------------------------------------------------- | +| *06.01.2020* | *Login Nodes* | *This is an example #1, here a short description will be written* | +| *06.01.2020* | *Computing Nodes* | *This is an example #1, here a short description will be written* | +| ------------ | ---------------------------- | ---------------------------------------------------------------------- | +| *03.02.2020* | *Login Nodes* | *This is an example #2, here a short description will be written* | +| *03.02.2020* | *Storage* | *This is an example #2, here a short description will be written* | +| *03.02.2020* | *Computing Nodes* | *This is an example #2, here a short description will be written* |