Doc changes
This commit is contained in:
61
pages/merlin6/98-announcements/downtimes.md
Normal file
61
pages/merlin6/98-announcements/downtimes.md
Normal file
@ -0,0 +1,61 @@
|
||||
---
|
||||
title: Downtimes
|
||||
#tags:
|
||||
#keywords:
|
||||
last_updated: 28 June 2019
|
||||
#summary: "Merlin 6 cluster overview"
|
||||
sidebar: merlin6_sidebar
|
||||
permalink: /merlin6/downtimes.html
|
||||
---
|
||||
|
||||
On the first Monday of each month the Merlin6 cluster might be subject to interruption due to maintenance.
|
||||
Users will be informed with at least one week in advance when a downtime is scheduled for the next month.
|
||||
|
||||
Downtimes will be informed to users through the <merlin-users@lists.psi.ch> mail list. Also, a detailed description
|
||||
for the nexts scheduled interventions will be available in [Next Scheduled Downtimes](/merlin6/downtimes.html#next-scheduled-downtimes)).
|
||||
|
||||
---
|
||||
|
||||
## Scheduled Downtime Draining Policy
|
||||
|
||||
Scheduled downtimes mostly affecting the storage and Slurm configurantions may require draining the nodes.
|
||||
When this is required, users will be informed accordingly. Two different types of draining are possible:
|
||||
|
||||
* **soft drain**: new jobs may be queued on the partition, but queued jobs may not be allocated nodes and run from the partition.
|
||||
Jobs already running on the partition continue to run. This will be the **default** drain method.
|
||||
* **hard drain**: no new jobs may be queued on the partition (job submission requests will be denied with an error message),
|
||||
but jobs already queued on the partition may be allocated to nodes and run.
|
||||
|
||||
Unless explicitly specified, the default draining policy for each partition will be the following:
|
||||
|
||||
* The **daily** and **general** partitions will be soft drained 12h before the downtime.
|
||||
* The **hourly** partition will be soft drained 1 hour before the downtime.
|
||||
* The **gpu** and **gpu-short** partitions will be soft drained 1 hour before the downtime.
|
||||
|
||||
Finally, **remaining running jobs will be killed** by default when the downtime starts. In some specific rare cases jobs will be
|
||||
just *paused* and *resumed* back when the downtime finished.
|
||||
|
||||
### Draining Policy Summary
|
||||
|
||||
The following table contains a summary of the draining policies during a Schedule Downtime:
|
||||
|
||||
| **Partition** | **Drain Policy** | **Default Drain Type** | **Default Job Policy** |
|
||||
|:---------------:| -----------------:| ----------------------:| --------------------------------:|
|
||||
| **general** | 12h before the SD | soft drain | Kill running jobs when SD starts |
|
||||
| **daily** | 12h before the SD | soft drain | Kill running jobs when SD starts |
|
||||
| **hourly** | 1h before the SD | soft drain | Kill running jobs when SD starts |
|
||||
| **gpu** | 1h before the SD | soft drain | Kill running jobs when SD starts |
|
||||
| **gpu-short** | 1h before the SD | soft drain | Kill running jobs when SD starts |
|
||||
| **gfa-asa** | 1h before the SD | soft drain | Kill running jobs when SD starts |
|
||||
|
||||
---
|
||||
|
||||
## Next Scheduled Downtimes
|
||||
|
||||
The table below shows a description for the next Scheduled Downtime:
|
||||
|
||||
| From | To | Service | Description |
|
||||
| ---------------- | ---------------- |:------------:|:----------------------------------------------------------------------- |
|
||||
| 05.09.2020 8am | 05.09.2020 6pm | <pending> | <pending> |
|
||||
|
||||
* **Note**: An e-mail will be sent when the services are fully available.
|
38
pages/merlin6/98-announcements/past-downtimes.md
Normal file
38
pages/merlin6/98-announcements/past-downtimes.md
Normal file
@ -0,0 +1,38 @@
|
||||
---
|
||||
title: Past Downtimes
|
||||
#tags:
|
||||
#keywords:
|
||||
last_updated: 03 September 2019
|
||||
#summary: "Merlin 6 cluster overview"
|
||||
sidebar: merlin6_sidebar
|
||||
permalink: /merlin6/past-downtimes.html
|
||||
---
|
||||
|
||||
## Past Downtimes: Log Changes
|
||||
|
||||
### 2020
|
||||
|
||||
| From | To | Service | Clusters | Description | Exceptions |
|
||||
| ---------------- | ---------------- |:------------:|:---------------:|:--------------------------------------------------------------|:-------------------------------------------:|
|
||||
| 03.08.2020 8am | 03.08.2020 6pm | Archive | merlin6 | Replace old merlin-export-01 for merlin-export-02 | |
|
||||
| 03.08.2020 8am | 03.08.2020 6pm | RemoteAccess | merlin6 | ra-merlin-0[1,2] Remount merlin-export-02 | |
|
||||
| 06.07.2020 | 06.07.2020 | All services | merlin5,merlin6 | GPFS v5.0.4-4,OFED v5.0,YFS v0.195,RHEL7.7,Slurm v19.05.7,f/w | |
|
||||
| 04.05.2020 | 04.05.2020 | Login nodes | merlin6 | Outage. YFS (AFS) update v0.194 and reboot | |
|
||||
| 04.05.2020 | 04.05.2020 | CN | merlin5 | Outage. O.S. update, OFED drivers update, YFS (AFS) update. | |
|
||||
| 03.02.2020 9am | 03.02.2020 10am | Slurm | merlin5,merlin6 | Upgrading config [HPCLOCAL-321](https://jira.psi.ch/browse/HPCLOCAL-321) | |
|
||||
| 10.01.2020 9am | 10.01.2020 6pm | All Services | merlin5,merlin6 | Slurm v18->v19, IB Connected Mode, other. [HPCLOCAL-300](https://jira.psi.ch/browse/HPCLOCAL-300) | |
|
||||
|
||||
## Older downtimes
|
||||
|
||||
| From | To | Service | Clusters | Description | Exceptions |
|
||||
| ---------------- | ---------------- |:------------:|:---------------:|:--------------------------------------------------------------|:-------------------------------------------:|
|
||||
| 02.09.2019 | 02.09.2019 | GPFS | merlin5,merlin6 | v5.0.2-3 -> v5.0.3-2 | |
|
||||
| 02.09.2019 | 02.09.2019 | O.S. | merlin5 | RHEL7.4 (rhel-7.4) -> RHEL7.6 (prod-00048) | merlin-g-40, still running RHEL7.4\* |
|
||||
| 02.09.2019 | 02.09.2019 | O.S. | merlin6 | RHEL7.6 (prod-00030) -> RHEL7.6 (prod-00048) | |
|
||||
| 02.09.2019 | 02.09.2019 | Infiniband | merlin5 | OFED v4.4 -> v4.6 | merlin-g-40, still running OFED v4.4\* |
|
||||
| 02.09.2019 | 02.09.2019 | Infiniband | merlin6 | OFED v4.5 -> v4.6 | |
|
||||
| 02.09.2019 | 02.09.2019 | PModules | merlin5,merlin6 | PModules v1.0.0rc4 -> v1.0.0rc5 | |
|
||||
| 02.09.2019 | 02.09.2019 | AFS(YFS) | merlin5 | OpenAFS v1.6.22.2-236 -> YFS v188 | merlin-g-40, still running OpenAFS\* |
|
||||
| 02.09.2019 | 02.09.2019 | AFS(YFS) | merlin6 | YFS v186 -> YFS v188 | |
|
||||
| 02.09.2019 | 02.09.2019 | O.S. | merlin5 | RHEL7.4 -> RHEL7.6 (prod-00048) | |
|
||||
| 02.09.2019 | 02.09.2019 | Slurm | merlin5,merlin6 | Slurm v18.08.6 -> v18.08.8 | |
|
Reference in New Issue
Block a user