4.5 KiB
profile::icinga::checks::slurm
This profile should be called when Slurm (server, cn, ui) is being configured.
- Setup specific checks for Slurm nodes:
-
- It can check
slurmdservice. This check makes sense on computing nodes runningslurmd. - It can check
mungeservice. This check makes sense on any node with a configured Slurm client/server. - It can check
slurmctldservice. This check makes sense on server nodes, will disableslurmdcheck. - It can check
slurmdbdservice. This check makes sense on server nodes. - It can check
sinfostatus, detecting miss-behaving nodes.
- It can check
By default, since the most common node in a Slurm
cluster is a computing node, this check will report about
slurmd and munge status.
Parameters
| Name | Type | Default |
| skip_munge | Boolean | icinga::checks::options::slurm::skip_munge
``false`` |
| skip_slurmd | Boolean | icinga::checks::options::slurm::skip_slurmd:
``false`` |
| check_slurmdbd | Boolean | icinga::checks::options::slurm::check_slurmdbd
``false`` |
| check_slurmctld | Boolean | icinga::checks::options::slurm::check_slurmctld
``false`` |
| ignore_draining | Boolean | icinga::checks::options::slurm::ignore_draining
``false`` |
| ignore_drained | Boolean | icinga::checks::options::slurm::ignore_drained
``false`` |
| check_nodes | Boolean | icinga::checks::options::slurm::check_nodes
``false`` |
| no_reason | Boolean | icinga::checks::options::slurm::no_reason
``true`` |
| no_timestamp | Boolean | icinga::checks::options::slurm::no_timestamp
``false`` |
skip_munge
By default munge service is checked. Can be disabled by
setting
icinga::checks::options::slurm::skip_munge: false.
skip_slurmd
By default slurmd service is checked. Can be disabled by
setting
icinga::checks::options::slurm::skip_slurmd: false.
check_slurmdbd
By default slurmdbd service is disabled. Can be enabled
by setting
icinga::checks::options::slurm::check_slurmdbd: true.
check_slurmctld
By default slurmctld service is disabled. Can be enabled
by setting
icinga::checks::options::slurm::check_slurmctld: true, in
this case icinga::checks::options::slurm::skip_slurmd
should be also set to true.
ignore_draining`
By default nodes in draining state are checked and will
be reported as [WARNING]. This can be disabled by setting
icinga::checks::options::slurm::ignore_draining: true. When
set to true, draining nodes will be reported
anyway but will be considered as [OK]
ignore_drained
By default nodes in drained state are checked and will
be reported as [WARNING]. This can be disabled by setting
icinga::checks::options::slurm::ignore_drained: true. When
set to true, drained nodes will be reported
anyway but will be considered as [OK]
no_reason
By default no Reason is reported. This
is because it will add extra text to the alarm output when nodes are
drained/draining/down/failed.
Printing Reason can be turned on by setting
icinga::checks::options::slurm::no_reason: false.
no_timestamp
By default Timestamp is being reported. Timestamp shows
the date and time when a node was set
drained/draining/down/failed.
As this setting adds extra text in the alarm, it can be disabled by
setting
icinga::checks::options::slurm::no_reason: true.
Facts
When some specific facts are detected, it will trigger some actions.
| Fact | Value(s) | Action description |