6.3 KiB
Icinga2
Currently only standard checks are supported, still missing:
- support for the currently automatically installed Icinga1 checks by Puppet
- support for custom checks
The overview of your nodes in Icinga2 you get at monitoring.psi.ch and there you can handle the alerts and create service windows, etc.
But the configuration as such is not done therein, but in Hiera and automatically propagated.
TL;DR
I, admin of xyz.psi.ch want ...
... monitoring with e-Mails during office hours:
icinga2::enable: true
icinga2::agent::enable: true
icinga2::alerting::enable: true
... monitoring with SMS all around the clock:
icinga2::enable: true
icinga2::agent::enable: true
icinga2::alerting::enable: true
icinga2::alerting::severity: 1
... just be able to check monitoring state on monitoring.psi.ch:
icinga2::enable: true
icinga2::agent::enable: true
icinga2::alerting::enable: false
icinga2::alerting::severity: 5
... no monitoring:
icinga2::enable: false
Basic Configuration
Enable monitoring with Icinga2 by
icinga2::enable: true
(which is false by default for RHEL7 and RHEL8, but true for RHEL9 and later).
This only does the ping test to check if the host is online on the network. For further checks on the host itself the agent needs to be started:
icinga2::agent::enable: true
(also here it is false by default for RHEL7 and RHEL8, but true for RHEL9 and later).
Still no alerts are generated, respectively they are suppressed by a global infinite service window. If you wish different, set
icinga2::alerting::enable: true
Per default these alerts are now sent during office hours to the admins. For further notification fine tuning checkout the chapters Notifications and Check Customization.
Finally, if Icinga2 shall be managed without Puppet (not recommended except for Icinga2 infrastructure servers), then set
icinga2::puppet: false
Web Access
Users and groups in aaa::admins and icinga2::web::users will have access to these nodes on monitoring.psi.ch.
Prefix the group name with a % to distinguish them from users.
Notifications
Notification Recipients
By default the notifications are sent to all admins, this means users and groups listed in Hiera at aaa::admins with the exception of the default admins from common.yaml and the group unx-lx_support. If the admins should not be notified, then disable the sending of messages with
icinga2::alerting::notify_admins: false
Additionally to/instead of the admins you can list the notification recipients in the Hiera list icinga2::alerting::contacts. You can list
- AD users by login name
- AD groups with
%as prefix to their name
Notificiation Time Restrictions
Notificiations for warnings and alerts are sent out by default during office hours, this means from Monday to Friday 08:00 - 17:00.
This can be configured in Hiera with the icinga2::alerting::severity key which is 4 by default. Following options are possible:
| node severity | media | time |
|---|---|---|
1 |
SMS and e-mail | 24x7 |
2 |
24x7 | |
3 |
office hours | |
4 |
office hours | |
5 |
no notifications | never |
(Currently 3 and 4 behave the same.)
Please note that services where the criticality variable is set then time when notifications are sent out is also restricted:
| service criticality | time |
|---|---|
| - | 24x7 |
A |
24x7 |
B |
office hours |
C |
never |
The minimal settings are applied, e.g. a service with criticality C will never cause a notificiation independent of the node severity.
To receive notification messages over SMS, you need to register your mobile phone with Icinga2. You may request this informing icinga2-support@psi.ch about your wish. Alternatively you will get an e-mail with the request to do so when the first SMS was supposed to be sent out for you and the phone number is still missing.
Default Checks
By default we already run a comprehensive set of checks. Some of them can be fine-tuned in Hiera. Whenever you have a use case which is not covered yet, please talk to us.
Check Customization
Most checks can have custom parameters. The variables you can adapt you find as "Custom Variables" in the page of given service. In Hiera you can add below the key icinga2::service_check::customize as multi level hash the service name and below the variable name with the new values.
Example "CPU Usage"
Lets look at the example of CPU Usage "service":
If the machinge is a number cruncher and the CPU is fine to be fully utilitzied, then you might ignore it by setting it always fine:
icinga2::service_check::customize:
'CPU Usage':
cpu_usage_always_ok: true
If in contrary you want to get an immediate notification when CPU is overused, then following snipped is more advisable:
icinga2::service_check::customize:
'CPU Usage':
criticality: A
If it is a Linuxfabrik plugin, you find a link at "Notes" which points to the documentation of the check. This might shed more light on the effect of these variables.
Example "Kernel Ring Buffer (dmesg)'"
Another check which can easily have false alerts, but also has a big potential to signal severe kernel or hardware issues, is the check of the kernel log (dmesg).
If you conclude that a given message can savely be ingored, you may add it onto the ignore list, where a partial string match will make it ignored in the future:
icinga2::service_check::customize:
'Kernel Ring Buffer (dmesg)':
'dmesg_ignore':
- 'blk_update_request: I/O error, dev fd0, sector 0'
- 'integrity: Problem loading X.509 certificate -126'
If you think that this log message can be globally ignored, please inform the Linux Team so we can ignore it by default.
Note that you can reset this check after dealing with it by executing on the node:
dmesg --clear
