Added Downtimes
This commit is contained in:
parent
6bc532d4f7
commit
9cfb24828e
@ -39,7 +39,7 @@ entries:
|
|||||||
url: /merlin6/slurm-examples.html
|
url: /merlin6/slurm-examples.html
|
||||||
- title: Announcements
|
- title: Announcements
|
||||||
folderitems:
|
folderitems:
|
||||||
- title: Scheduled Downtimes
|
- title: Downtimes
|
||||||
url: /merlin6/downtimes.html
|
url: /merlin6/downtimes.html
|
||||||
- title: Support
|
- title: Support
|
||||||
folderitems:
|
folderitems:
|
||||||
|
59
pages/merlin6/announcements/downtimes.md
Normal file
59
pages/merlin6/announcements/downtimes.md
Normal file
@ -0,0 +1,59 @@
|
|||||||
|
---
|
||||||
|
title: Downtimes
|
||||||
|
#tags:
|
||||||
|
#keywords:
|
||||||
|
last_updated: 28 June 2019
|
||||||
|
#summary: "Merlin 6 cluster overview"
|
||||||
|
sidebar: merlin6_sidebar
|
||||||
|
permalink: /merlin6/downtimes.html
|
||||||
|
---
|
||||||
|
|
||||||
|
On the first Monday of each month the Merlin6 cluster might be subject to interruption due to maintenance.
|
||||||
|
Users will be informed with at least one week in advance when a downtime is scheduled for the next month.
|
||||||
|
|
||||||
|
Downtimes will be informed to users through the <merlin-users@lists.psi.ch> mail list. Also, a detailed description
|
||||||
|
for the nexts scheduled interventions will be available in [Next Scheduled Downtimes](## Next Scheduled Downtimes)).
|
||||||
|
|
||||||
|
### Scheduled Downtime Draining Policy
|
||||||
|
|
||||||
|
Scheduled downtimes mostly affecting the storage and Slurm configurantions may require draining the nodes.
|
||||||
|
When this is required, users will be informed accordingly. Two different types of draining are possible:
|
||||||
|
|
||||||
|
* **soft drain**: new jobs may be queued on the partition, but queued jobs may not be allocated nodes and run from the partition.
|
||||||
|
Jobs already running on the partition continue to run. This will be the **default** drain method.
|
||||||
|
* **hard drain**: no new jobs may be queued on the partition (job submission requests will be denied with an error message),
|
||||||
|
but jobs already queued on the partition may be allocated to nodes and run.
|
||||||
|
|
||||||
|
Unless explicitly specified, the default draining policy for each partition will be the following:
|
||||||
|
|
||||||
|
* The **general** partition will be soft drained on the previous Friday from 8am.
|
||||||
|
* The **daily** partition will be soft drained on the previous day from 8am.
|
||||||
|
* The **hourly** partition will be soft drained on the same Monday from 7am.
|
||||||
|
|
||||||
|
Finally, **remaining running jobs will be killed** by default when the downtime starts. In some specific rare cases jobs will be
|
||||||
|
just *paused* and *resumed* back when the downtime finished.
|
||||||
|
|
||||||
|
#### Draining Policy Summary
|
||||||
|
|
||||||
|
The following table contains a summary of the draining policies during a Schedule Downtime:
|
||||||
|
|
||||||
|
| **Partition** | **Drain Policy** | **Default Drain Type** | **Default Job Policy** |
|
||||||
|
| ------------- | -----------------:| ---------------------- | -------------------------------- |
|
||||||
|
| **general** | 72h before the SD | soft drain | Kill running jobs when SD starts |
|
||||||
|
| **daily** | 24h before the SD | soft drain | Kill running jobs when SD starts |
|
||||||
|
| **hourly** | 1h before the SD | soft drain | Kill running jobs when SD starts |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next Scheduled Downtimes
|
||||||
|
|
||||||
|
The table below shows a description for the next Scheduled Downtime
|
||||||
|
|
||||||
|
| Date | Affected Service/s | Description |
|
||||||
|
| ------------ |:---------------------------- |:---------------------------------------------------------------------- |
|
||||||
|
| *06.01.2020* | *Login Nodes* | *This is an example #1, here a short description will be written* |
|
||||||
|
| *06.01.2020* | *Computing Nodes* | *This is an example #1, here a short description will be written* |
|
||||||
|
| ------------ | ---------------------------- | ---------------------------------------------------------------------- |
|
||||||
|
| *03.02.2020* | *Login Nodes* | *This is an example #2, here a short description will be written* |
|
||||||
|
| *03.02.2020* | *Storage* | *This is an example #2, here a short description will be written* |
|
||||||
|
| *03.02.2020* | *Computing Nodes* | *This is an example #2, here a short description will be written* |
|
Loading…
x
Reference in New Issue
Block a user