Merge branch 'icinga2' into 'master'

Icinga2

See merge request linux-infra/documentation!12
This commit is contained in:
2024-01-25 17:04:55 +01:00
2 changed files with 73 additions and 7 deletions

View File

@@ -12,25 +12,91 @@ Enable monitoring with Icinga2 by
```
icinga2::enable: true
```
(which will be default at some point, e.g. for RHEL9).
(which is `false` by default for RHEL7 and RHEL8, but `true` for RHEL9 and later).
Users and groups in `aaa::admins` and `icinga2::web::users` will have access to these nodes on [monitoring.psi.ch](https://monitoring.psi.ch).
Prefix the group name with a `%` to distinguish them from users.
This only does the ping test to check if the host is online on the network. For further checks on the host itself the agent needs to be started:
By default no alerts are generated. If you wish different, set
```
icinga2::agent::enable: true
```
(also here it is `false` by default for RHEL7 and RHEL8, but `true` for RHEL9 and later).
Still no alerts are generated, respectively they are suppressed by a global infinite service window. If you wish different, set
```
icinga2::alerting::enable: true
```
## Icinga2 Agent
Per default these alerts are now sent during office hours to the admins. For further notification fine tuning checkout the chapters [Notifications](#Notifications) and [Check Customization](#Check Customization).
The Icinga2 Agent can be enabled with
## Web Access
Users and groups in `aaa::admins` and `icinga2::web::users` will have access to these nodes on [monitoring.psi.ch](https://monitoring.psi.ch).
Prefix the group name with a `%` to distinguish them from users.
## Notifications
### Notification Recipients
By default the notifications are sent to all admins, this means users and groups listed in Hiera at `aaa::admins` with the exception of the default admins from `common.yaml` and the group `unx-lx_support`. If the admins should not be notified, then disable the sending of messages with
```
icinga2::agent::enable: true
icinga2::alerting::notify_admins: false
```
Additionally to/instead of the admins you can list the notification recipients in the Hiera list `icinga2::alerting::contacts`. You can list
- AD users by login name
- AD groups with `%` as prefix to their name
### Notificiation Time Restrictions
Notificiations for warnings and alerts are sent out by default during office hours, this means from Monday to Friday 08:00 - 17:00.
This can be configured in Hiera with the `icinga2::alerting::severity` key which is `4` by default. Following options are possible:
| node severity | media | time |
|---------------|------------------|--------------|
| `1` | SMS and e-mail | 24x7 |
| `2` | e-mail | 24x7 |
| `3` | e-mail | office hours |
| `4` | e-mail | office hours |
| `5` | no notifications | never |
Please note that services where the `criticality` variable is set then time when notifications are sent out is also restricted:
| service criticality | time |
|---------------------|--------------|
| - | 24x7 |
| `A` | 24x7 |
| `B` | office hours |
| `C` | never |
The minimal settings are applied, e.g. a service with criticality `C` will never cause a notificiation independent of the node severity.
## Default Checks
By default we already run a comprehensive set of checks. Some of them can be fine-tuned in Hiera.
Whenever you have a use case which is not covered yet, please talk to us.
## Check Customization
Most checks can have custom parameters. The variables you can adapt you find as "Custom Variables" in the page of given service. In Hiera you can add below the key `icinga2::service_check::customize` as multi level hash the service name and below the variable name with the new values.
Lets look at the example of `CPU Usage` "service":
!["CPU Usage" service page](icinga2/service_custom_variables.png)
If the machinge is a number cruncher and the CPU is fine to be fully utilitzied, then you might ignore it by setting it always fine:
```
icinga2::service_check::customize:
'CPU Usage':
cpu_usage_always_ok: true
```
If in contrary you want to get an immediate notification when CPU is overused, then following snipped is more advisable:
```
icinga2::service_check::customize:
'CPU Usage':
criticality: A
```
If it is a Linuxfabrik plugin, you find a link at "Notes" which points to the documentation of the check. This might shed more light on the effect of these variables.

Binary file not shown.

After

Width:  |  Height:  |  Size: 202 KiB