169 lines
6.3 KiB
Markdown
169 lines
6.3 KiB
Markdown
# Icinga2
|
|
|
|
**Currently only standard checks are supported**, still missing:
|
|
- support for the currently automatically installed Icinga1 checks by Puppet
|
|
- support for custom checks
|
|
|
|
The overview of your nodes in Icinga2 you get at [monitoring.psi.ch](https://monitoring.psi.ch) and there you can handle the alerts and create service windows, etc.
|
|
|
|
But the configuration as such is not done therein, but in Hiera and automatically propagated.
|
|
|
|
|
|
## TL;DR
|
|
|
|
I, admin of xyz.psi.ch want ...
|
|
|
|
... **monitoring with e-Mails during office hours**:
|
|
```
|
|
icinga2::enable: true
|
|
icinga2::agent::enable: true
|
|
icinga2::alerting::enable: true
|
|
```
|
|
|
|
... **monitoring with SMS all around the clock**:
|
|
```
|
|
icinga2::enable: true
|
|
icinga2::agent::enable: true
|
|
icinga2::alerting::enable: true
|
|
icinga2::alerting::severity: 1
|
|
```
|
|
|
|
... **just be able to check monitoring state on monitoring.psi.ch**:
|
|
```
|
|
icinga2::enable: true
|
|
icinga2::agent::enable: true
|
|
icinga2::alerting::enable: false
|
|
icinga2::alerting::severity: 5
|
|
```
|
|
|
|
... **no monitoring**:
|
|
```
|
|
icinga2::enable: false
|
|
```
|
|
|
|
## Basic Configuration
|
|
Enable monitoring with Icinga2 by
|
|
```
|
|
icinga2::enable: true
|
|
```
|
|
(which is `false` by default for RHEL7 and RHEL8, but `true` for RHEL9 and later).
|
|
|
|
This only does the ping test to check if the host is online on the network. For further checks on the host itself the agent needs to be started:
|
|
|
|
```
|
|
icinga2::agent::enable: true
|
|
```
|
|
(also here it is `false` by default for RHEL7 and RHEL8, but `true` for RHEL9 and later).
|
|
|
|
Still no alerts are generated, respectively they are suppressed by a global infinite service window. If you wish different, set
|
|
```
|
|
icinga2::alerting::enable: true
|
|
```
|
|
|
|
Per default these alerts are now sent during office hours to the admins. For further notification fine tuning checkout the chapters Notifications and Check Customization.
|
|
|
|
Finally, if Icinga2 shall be managed without Puppet (not recommended except for Icinga2 infrastructure servers), then set
|
|
```
|
|
icinga2::puppet: false
|
|
```
|
|
|
|
|
|
## Web Access
|
|
|
|
Users and groups in `aaa::admins` and `icinga2::web::users` will have access to these nodes on [monitoring.psi.ch](https://monitoring.psi.ch).
|
|
Prefix the group name with a `%` to distinguish them from users.
|
|
|
|
|
|
## Notifications
|
|
|
|
### Notification Recipients
|
|
|
|
By default the notifications are sent to all admins, this means users and groups listed in Hiera at `aaa::admins` with the exception of the default admins from `common.yaml` and the group `unx-lx_support`. If the admins should not be notified, then disable the sending of messages with
|
|
```
|
|
icinga2::alerting::notify_admins: false
|
|
```
|
|
|
|
Additionally to/instead of the admins you can list the notification recipients in the Hiera list `icinga2::alerting::contacts`. You can list
|
|
- AD users by login name
|
|
- AD groups with `%` as prefix to their name
|
|
|
|
### Notificiation Time Restrictions
|
|
|
|
Notificiations for warnings and alerts are sent out by default during office hours, this means from Monday to Friday 08:00 - 17:00.
|
|
|
|
This can be configured in Hiera with the `icinga2::alerting::severity` key which is `4` by default. Following options are possible:
|
|
|
|
| node severity | media | time |
|
|
|---------------|------------------|--------------|
|
|
| `1` | SMS and e-mail | 24x7 |
|
|
| `2` | e-mail | 24x7 |
|
|
| `3` | e-mail | office hours |
|
|
| `4` | e-mail | office hours |
|
|
| `5` | no notifications | never |
|
|
|
|
(Currently `3` and `4` behave the same.)
|
|
|
|
Please note that services where the `criticality` variable is set then time when notifications are sent out is also restricted:
|
|
|
|
| service criticality | time |
|
|
|---------------------|--------------|
|
|
| - | 24x7 |
|
|
| `A` | 24x7 |
|
|
| `B` | office hours |
|
|
| `C` | never |
|
|
|
|
The minimal settings are applied, e.g. a service with criticality `C` will never cause a notificiation independent of the node severity.
|
|
|
|
To receive notification messages over SMS, you need to register your mobile phone with Icinga2. You may request this informing icinga2-support@psi.ch about your wish. Alternatively you will get an e-mail with the request to do so when the first SMS was supposed to be sent out for you and the phone number is still missing.
|
|
|
|
## Default Checks
|
|
|
|
By default we already run a comprehensive set of checks. Some of them can be fine-tuned in Hiera.
|
|
Whenever you have a use case which is not covered yet, please talk to us.
|
|
|
|
|
|
## Check Customization
|
|
|
|
Most checks can have custom parameters. The variables you can adapt you find as "Custom Variables" in the page of given service. In Hiera you can add below the key `icinga2::service_check::customize` as multi level hash the service name and below the variable name with the new values.
|
|
|
|
### Example "CPU Usage"
|
|
|
|
Lets look at the example of `CPU Usage` "service":
|
|
|
|

|
|
|
|
If the machinge is a number cruncher and the CPU is fine to be fully utilitzied, then you might ignore it by setting it always fine:
|
|
|
|
```
|
|
icinga2::service_check::customize:
|
|
'CPU Usage':
|
|
cpu_usage_always_ok: true
|
|
```
|
|
|
|
If in contrary you want to get an immediate notification when CPU is overused, then following snipped is more advisable:
|
|
```
|
|
icinga2::service_check::customize:
|
|
'CPU Usage':
|
|
criticality: A
|
|
```
|
|
|
|
If it is a Linuxfabrik plugin, you find a link at "Notes" which points to the documentation of the check. This might shed more light on the effect of these variables.
|
|
|
|
### Example "Kernel Ring Buffer (dmesg)'"
|
|
Another check which can easily have false alerts, but also has a big potential to signal severe kernel or hardware issues, is the check of the kernel log (dmesg).
|
|
|
|
If you conclude that a given message can savely be ingored, you may add it onto the ignore list, where a partial string match will make it ignored in the future:
|
|
```
|
|
icinga2::service_check::customize:
|
|
'Kernel Ring Buffer (dmesg)':
|
|
'dmesg_ignore':
|
|
- 'blk_update_request: I/O error, dev fd0, sector 0'
|
|
- 'integrity: Problem loading X.509 certificate -126'
|
|
```
|
|
If you think that this log message can be globally ignored, please inform the Linux Team so we can ignore it by default.
|
|
|
|
Note that you can reset this check after dealing with it by executing on the node:
|
|
```
|
|
dmesg --clear
|
|
```
|