Files
gitea-pages/pages/merlin6/05 announcements/downtimes.md
Spencer Bliven 95f511a203 Reorganize merlin6 pages to follow navigation menu
The folders are only used for source organization; URLs remain flat.
2019-07-29 15:18:22 +02:00

3.5 KiB

title, last_updated, sidebar, permalink
title last_updated sidebar permalink
Downtimes 28 June 2019 merlin6_sidebar /merlin6/downtimes.html

On the first Monday of each month the Merlin6 cluster might be subject to interruption due to maintenance. Users will be informed with at least one week in advance when a downtime is scheduled for the next month.

Downtimes will be informed to users through the merlin-users@lists.psi.ch mail list. Also, a detailed description for the nexts scheduled interventions will be available in Next Scheduled Downtimes).


Scheduled Downtime Draining Policy

Scheduled downtimes mostly affecting the storage and Slurm configurantions may require draining the nodes. When this is required, users will be informed accordingly. Two different types of draining are possible:

  • soft drain: new jobs may be queued on the partition, but queued jobs may not be allocated nodes and run from the partition. Jobs already running on the partition continue to run. This will be the default drain method.
  • hard drain: no new jobs may be queued on the partition (job submission requests will be denied with an error message), but jobs already queued on the partition may be allocated to nodes and run.

Unless explicitly specified, the default draining policy for each partition will be the following:

  • The general partition will be soft drained on the previous Friday from 8am.
  • The daily partition will be soft drained on the previous day from 8am.
  • The hourly partition will be soft drained on the same Monday from 7am.
  • The gpu partition will be soft drained on the same Monday from 7am.

Finally, remaining running jobs will be killed by default when the downtime starts. In some specific rare cases jobs will be just paused and resumed back when the downtime finished.

Draining Policy Summary

The following table contains a summary of the draining policies during a Schedule Downtime:

Partition Drain Policy Default Drain Type Default Job Policy
general 72h before the SD soft drain Kill running jobs when SD starts
daily 24h before the SD soft drain Kill running jobs when SD starts
hourly 1h before the SD soft drain Kill running jobs when SD starts
gpu 72h before the SD soft drain Kill running jobs when SD starts

Next Scheduled Downtimes

The table below shows a description for the next Scheduled Downtime

Date Time Affected Service/s Description Status
02.08.2019 From 16h Merlin Cluster, all services SD for central PSI IT Services affecting Merlin cluster. Pending
03.08.2019 All day* Merlin Cluster, all services SD for central PSI IT Services affecting Merlin cluster. Pending
04.08.2019 Until 9h Merlin Cluster, all services SD for central PSI IT Services affecting Merlin cluster. Pending
  • Note: We will try to already make the cluster available on Sunday on a best effort basis. An email will be sent to the list.