diff --git a/admin-guide/configuration/icinga2.md b/admin-guide/configuration/icinga2.md index 8bac20f8..a2d2818a 100644 --- a/admin-guide/configuration/icinga2.md +++ b/admin-guide/configuration/icinga2.md @@ -12,25 +12,91 @@ Enable monitoring with Icinga2 by ``` icinga2::enable: true ``` -(which will be default at some point, e.g. for RHEL9). +(which is `false` by default for RHEL7 and RHEL8, but `true` for RHEL9 and later). -Users and groups in `aaa::admins` and `icinga2::web::users` will have access to these nodes on [monitoring.psi.ch](https://monitoring.psi.ch). -Prefix the group name with a `%` to distinguish them from users. +This only does the ping test to check if the host is online on the network. For further checks on the host itself the agent needs to be started: -By default no alerts are generated. If you wish different, set +``` +icinga2::agent::enable: true +``` +(also here it is `false` by default for RHEL7 and RHEL8, but `true` for RHEL9 and later). + +Still no alerts are generated, respectively they are suppressed by a global infinite service window. If you wish different, set ``` icinga2::alerting::enable: true ``` -## Icinga2 Agent +Per default these alerts are now sent during office hours to the admins. For further notification fine tuning checkout the chapters [Notifications](#Notifications) and [Check Customization](#Check Customization). -The Icinga2 Agent can be enabled with +## Web Access + +Users and groups in `aaa::admins` and `icinga2::web::users` will have access to these nodes on [monitoring.psi.ch](https://monitoring.psi.ch). +Prefix the group name with a `%` to distinguish them from users. + +## Notifications + +### Notification Recipients + +By default the notifications are sent to all admins, this means users and groups listed in Hiera at `aaa::admins` with the exception of the default admins from `common.yaml` and the group `unx-lx_support`. If the admins should not be notified, then disable the sending of messages with ``` -icinga2::agent::enable: true +icinga2::alerting::notify_admins: false ``` +Additionally to/instead of the admins you can list the notification recipients in the Hiera list `icinga2::alerting::contacts`. You can list +- AD users by login name +- AD groups with `%` as prefix to their name + +### Notificiation Time Restrictions + +Notificiations for warnings and alerts are sent out by default during office hours, this means from Monday to Friday 08:00 - 17:00. + +This can be configured in Hiera with the `icinga2::alerting::severity` key which is `4` by default. Following options are possible: + +| node severity | media | time | +|---------------|------------------|--------------| +| `1` | SMS and e-mail | 24x7 | +| `2` | e-mail | 24x7 | +| `3` | e-mail | office hours | +| `4` | e-mail | office hours | +| `5` | no notifications | never | + +Please note that services where the `criticality` variable is set then time when notifications are sent out is also restricted: + +| service criticality | time | +|---------------------|--------------| +| - | 24x7 | +| `A` | 24x7 | +| `B` | office hours | +| `C` | never | + +The minimal settings are applied, e.g. a service with criticality `C` will never cause a notificiation independent of the node severity. + ## Default Checks By default we already run a comprehensive set of checks. Some of them can be fine-tuned in Hiera. Whenever you have a use case which is not covered yet, please talk to us. +## Check Customization + +Most checks can have custom parameters. The variables you can adapt you find as "Custom Variables" in the page of given service. In Hiera you can add below the key `icinga2::service_check::customize` as multi level hash the service name and below the variable name with the new values. + +Lets look at the example of `CPU Usage` "service": + +!["CPU Usage" service page](icinga2/service_custom_variables.png) + +If the machinge is a number cruncher and the CPU is fine to be fully utilitzied, then you might ignore it by setting it always fine: + +``` +icinga2::service_check::customize: + 'CPU Usage': + cpu_usage_always_ok: true +``` + +If in contrary you want to get an immediate notification when CPU is overused, then following snipped is more advisable: +``` +icinga2::service_check::customize: + 'CPU Usage': + criticality: A +``` + +If it is a Linuxfabrik plugin, you find a link at "Notes" which points to the documentation of the check. This might shed more light on the effect of these variables. diff --git a/admin-guide/configuration/icinga2/service_custom_variables.png b/admin-guide/configuration/icinga2/service_custom_variables.png new file mode 100644 index 00000000..7de9e3c8 Binary files /dev/null and b/admin-guide/configuration/icinga2/service_custom_variables.png differ