Files
Controls-docs/engineering-guide/puppet/profiles/icinga/checks/slurm.rst
T

4.5 KiB

profile::icinga::checks::slurm

This profile should be called when Slurm (server, cn, ui) is being configured.

Setup specific checks for Slurm nodes:
  • It can check slurmd service. This check makes sense on computing nodes running slurmd.
  • It can check munge service. This check makes sense on any node with a configured Slurm client/server.
  • It can check slurmctld service. This check makes sense on server nodes, will disable slurmd check.
  • It can check slurmdbd service. This check makes sense on server nodes.
  • It can check sinfo status, detecting miss-behaving nodes.

By default, since the most common node in a Slurm cluster is a computing node, this check will report about slurmd and munge status.

Parameters

Name Type Default
skip_munge Boolean icinga::checks::options::slurm::skip_munge ``false``
skip_slurmd Boolean icinga::checks::options::slurm::skip_slurmd: ``false``
check_slurmdbd Boolean icinga::checks::options::slurm::check_slurmdbd ``false``
check_slurmctld Boolean icinga::checks::options::slurm::check_slurmctld ``false``
ignore_draining Boolean icinga::checks::options::slurm::ignore_draining ``false``
ignore_drained Boolean icinga::checks::options::slurm::ignore_drained ``false``
check_nodes Boolean icinga::checks::options::slurm::check_nodes ``false``
no_reason Boolean icinga::checks::options::slurm::no_reason ``true``
no_timestamp Boolean icinga::checks::options::slurm::no_timestamp ``false``

skip_munge

By default munge service is checked. Can be disabled by setting icinga::checks::options::slurm::skip_munge: false.

skip_slurmd

By default slurmd service is checked. Can be disabled by setting icinga::checks::options::slurm::skip_slurmd: false.

check_slurmdbd

By default slurmdbd service is disabled. Can be enabled by setting icinga::checks::options::slurm::check_slurmdbd: true.

check_slurmctld

By default slurmctld service is disabled. Can be enabled by setting icinga::checks::options::slurm::check_slurmctld: true, in this case icinga::checks::options::slurm::skip_slurmd should be also set to true.

ignore_draining`

By default nodes in draining state are checked and will be reported as [WARNING]. This can be disabled by setting icinga::checks::options::slurm::ignore_draining: true. When set to true, draining nodes will be reported anyway but will be considered as [OK]

ignore_drained

By default nodes in drained state are checked and will be reported as [WARNING]. This can be disabled by setting icinga::checks::options::slurm::ignore_drained: true. When set to true, drained nodes will be reported anyway but will be considered as [OK]

no_reason

By default no Reason is reported. This is because it will add extra text to the alarm output when nodes are drained/draining/down/failed. Printing Reason can be turned on by setting icinga::checks::options::slurm::no_reason: false.

no_timestamp

By default Timestamp is being reported. Timestamp shows the date and time when a node was set drained/draining/down/failed. As this setting adds extra text in the alarm, it can be disabled by setting icinga::checks::options::slurm::no_reason: true.

Facts

When some specific facts are detected, it will trigger some actions.

Fact Value(s) Action description