Files
gitea-pages/admin-guide/configuration/monitoring/icinga2.md

6.3 KiB

Icinga2

Currently only standard checks are supported, still missing:

  • support for the currently automatically installed Icinga1 checks by Puppet
  • support for custom checks

The overview of your nodes in Icinga2 you get at monitoring.psi.ch and there you can handle the alerts and create service windows, etc.

But the configuration as such is not done therein, but in Hiera and automatically propagated.

TL;DR

I, admin of xyz.psi.ch want ...

... monitoring with e-Mails during office hours:

icinga2::enable: true
icinga2::agent::enable: true
icinga2::alerting::enable: true

... monitoring with SMS all around the clock:

icinga2::enable: true
icinga2::agent::enable: true
icinga2::alerting::enable: true
icinga2::alerting::severity: 1

... just be able to check monitoring state on monitoring.psi.ch:

icinga2::enable: true
icinga2::agent::enable: true
icinga2::alerting::enable: false
icinga2::alerting::severity: 5

... no monitoring:

icinga2::enable: false

Basic Configuration

Enable monitoring with Icinga2 by

icinga2::enable: true

(which is false by default for RHEL7 and RHEL8, but true for RHEL9 and later).

This only does the ping test to check if the host is online on the network. For further checks on the host itself the agent needs to be started:

icinga2::agent::enable: true

(also here it is false by default for RHEL7 and RHEL8, but true for RHEL9 and later).

Still no alerts are generated, respectively they are suppressed by a global infinite service window. If you wish different, set

icinga2::alerting::enable: true

Per default these alerts are now sent during office hours to the admins. For further notification fine tuning checkout the chapters Notifications and Check Customization.

Finally, if Icinga2 shall be managed without Puppet (not recommended except for Icinga2 infrastructure servers), then set

icinga2::puppet: false

Web Access

Users and groups in aaa::admins and icinga2::web::users will have access to these nodes on monitoring.psi.ch. Prefix the group name with a % to distinguish them from users.

Notifications

Notification Recipients

By default the notifications are sent to all admins, this means users and groups listed in Hiera at aaa::admins with the exception of the default admins from common.yaml and the group unx-lx_support. If the admins should not be notified, then disable the sending of messages with

icinga2::alerting::notify_admins: false

Additionally to/instead of the admins you can list the notification recipients in the Hiera list icinga2::alerting::contacts. You can list

  • AD users by login name
  • AD groups with % as prefix to their name

Notificiation Time Restrictions

Notificiations for warnings and alerts are sent out by default during office hours, this means from Monday to Friday 08:00 - 17:00.

This can be configured in Hiera with the icinga2::alerting::severity key which is 4 by default. Following options are possible:

node severity media time
1 SMS and e-mail 24x7
2 e-mail 24x7
3 e-mail office hours
4 e-mail office hours
5 no notifications never

(Currently 3 and 4 behave the same.)

Please note that services where the criticality variable is set then time when notifications are sent out is also restricted:

service criticality time
- 24x7
A 24x7
B office hours
C never

The minimal settings are applied, e.g. a service with criticality C will never cause a notificiation independent of the node severity.

To receive notification messages over SMS, you need to register your mobile phone with Icinga2. You may request this informing icinga2-support@psi.ch about your wish. Alternatively you will get an e-mail with the request to do so when the first SMS was supposed to be sent out for you and the phone number is still missing.

Default Checks

By default we already run a comprehensive set of checks. Some of them can be fine-tuned in Hiera. Whenever you have a use case which is not covered yet, please talk to us.

Check Customization

Most checks can have custom parameters. The variables you can adapt you find as "Custom Variables" in the page of given service. In Hiera you can add below the key icinga2::service_check::customize as multi level hash the service name and below the variable name with the new values.

Example "CPU Usage"

Lets look at the example of CPU Usage "service":

"CPU Usage" service page

If the machinge is a number cruncher and the CPU is fine to be fully utilitzied, then you might ignore it by setting it always fine:

icinga2::service_check::customize:
  'CPU Usage':
    cpu_usage_always_ok: true

If in contrary you want to get an immediate notification when CPU is overused, then following snipped is more advisable:

icinga2::service_check::customize:
  'CPU Usage':
    criticality: A

If it is a Linuxfabrik plugin, you find a link at "Notes" which points to the documentation of the check. This might shed more light on the effect of these variables.

Example "Kernel Ring Buffer (dmesg)'"

Another check which can easily have false alerts, but also has a big potential to signal severe kernel or hardware issues, is the check of the kernel log (dmesg).

If you conclude that a given message can savely be ingored, you may add it onto the ignore list, where a partial string match will make it ignored in the future:

icinga2::service_check::customize:
  'Kernel Ring Buffer (dmesg)':
    'dmesg_ignore':
      - 'blk_update_request: I/O error, dev fd0, sector 0'
      - 'integrity: Problem loading X.509 certificate -126'

If you think that this log message can be globally ignored, please inform the Linux Team so we can ignore it by default.

Note that you can reset this check after dealing with it by executing on the node:

dmesg --clear