285 lines
10 KiB
Markdown
285 lines
10 KiB
Markdown
# Icinga2 Configuration
|
|
|
|
Icinga2 is productive, but the checks are still getting added:
|
|
|
|
- ✅ standard Linuxfabrik checks
|
|
- 🏗️ support for automatically installed Icinga1 checks by Puppet ([see issue](https://git.psi.ch/linux-infra/issues/-/issues/419))
|
|
- ✅ support for custom checks
|
|
|
|
The overview of your nodes in Icinga2 you get at [monitoring.psi.ch](https://monitoring.psi.ch) and there you can handle the alerts and create service windows, etc.
|
|
|
|
But the configuration as such is not done therein, but in Hiera and automatically propagated.
|
|
|
|
|
|
## TL;DR
|
|
|
|
I, admin of xyz.psi.ch want ...
|
|
|
|
... **monitoring with e-Mails during office hours**:
|
|
```
|
|
icinga2::enable: true
|
|
icinga2::agent::enable: true
|
|
icinga2::alerting::enable: true
|
|
```
|
|
|
|
... **monitoring with SMS all around the clock**:
|
|
```
|
|
icinga2::enable: true
|
|
icinga2::agent::enable: true
|
|
icinga2::alerting::enable: true
|
|
icinga2::alerting::severity: 1
|
|
```
|
|
|
|
... **just be able to check monitoring state on monitoring.psi.ch**:
|
|
```
|
|
icinga2::enable: true
|
|
icinga2::agent::enable: true
|
|
icinga2::alerting::enable: false
|
|
icinga2::alerting::severity: 5
|
|
```
|
|
|
|
... **no monitoring**:
|
|
```
|
|
icinga2::enable: false
|
|
```
|
|
|
|
## Basic Configuration
|
|
Enable monitoring with Icinga2 by
|
|
```
|
|
icinga2::enable: true
|
|
```
|
|
(which is `false` by default for RHEL7 and RHEL8, but `true` for RHEL9 and later).
|
|
|
|
This only does the ping test to check if the host is online on the network. For further checks on the host itself the agent needs to be started:
|
|
|
|
```
|
|
icinga2::agent::enable: true
|
|
```
|
|
(also here it is `false` by default for RHEL7 and RHEL8, but `true` for RHEL9 and later).
|
|
|
|
Still no alerts are generated, respectively they are suppressed by a global infinite service window. If you wish different, set
|
|
```
|
|
icinga2::alerting::enable: true
|
|
```
|
|
|
|
Per default these alerts are now sent during office hours to the admins. For further notification fine tuning checkout the chapters Notifications and Check Customization.
|
|
|
|
Finally, if Icinga2 shall be managed without Puppet (not recommended except for Icinga2 infrastructure servers), then set
|
|
```
|
|
icinga2::puppet: false
|
|
```
|
|
|
|
|
|
## Web Access
|
|
|
|
Users and groups in `aaa::admins` and `icinga2::web::users` will have access to these nodes on [monitoring.psi.ch](https://monitoring.psi.ch).
|
|
Prefix the group name with a `%` to distinguish them from users.
|
|
|
|
|
|
## Notifications
|
|
|
|
### Notification Recipients
|
|
|
|
By default the notifications are sent to all admins, this means users and groups listed in Hiera at `aaa::admins` with the exception of the default admins from `common.yaml` and the group `unx-lx_support`. If the admins should not be notified, then disable the sending of messages with
|
|
```
|
|
icinga2::alerting::notify_admins: false
|
|
```
|
|
|
|
Additionally to/instead of the admins you can list the notification recipients in the Hiera list `icinga2::alerting::contacts`. You can list
|
|
- AD users by login name
|
|
- AD groups with `%` as prefix to their name
|
|
- plain e-mail addresses
|
|
|
|
### Notificiation Time Restrictions
|
|
|
|
Notificiations for warnings and alerts are sent out by default during office hours, this means from Monday to Friday 08:00 - 17:00.
|
|
|
|
This can be configured in Hiera with the `icinga2::alerting::severity` key which is `4` by default. Following options are possible:
|
|
|
|
| node severity | media | time |
|
|
|---------------|------------------|--------------|
|
|
| `1` | SMS and e-mail | 24x7 |
|
|
| `2` | e-mail | 24x7 |
|
|
| `3` | e-mail | office hours |
|
|
| `4` | e-mail | office hours |
|
|
| `5` | no notifications | never |
|
|
|
|
(Currently `3` and `4` behave the same.)
|
|
|
|
Please note that services where the `criticality` variable is set then time when notifications are sent out is also restricted:
|
|
|
|
| service criticality | time |
|
|
|---------------------|--------------|
|
|
| - | 24x7 |
|
|
| `A` | 24x7 |
|
|
| `B` | office hours |
|
|
| `C` | never |
|
|
|
|
The minimal settings are applied, e.g. a service with criticality `C` will never cause a notificiation independent of the node severity.
|
|
|
|
To receive notification messages over SMS, you need to register your mobile phone with Icinga2. You may request this informing icinga2-support@psi.ch about your wish. Alternatively you will get an e-mail with the request to do so when the first SMS was supposed to be sent out for you and the phone number is still missing.
|
|
|
|
## Default Checks
|
|
|
|
By default we already run a comprehensive set of checks. Some of them can be fine-tuned in Hiera.
|
|
Whenever you have a use case which is not covered yet, please talk to us.
|
|
|
|
|
|
## Check Customization
|
|
|
|
Most checks can have custom parameters. The variables you can adapt you find as "Custom Variables" in the page of given service. In Hiera you can add below the key `icinga2::service_check::customize` as multi level hash the service name and below the variable name with the new values.
|
|
|
|
### Example "CPU Usage"
|
|
|
|
Lets look at the example of `CPU Usage` "service":
|
|
|
|

|
|
|
|
If the machinge is a number cruncher and the CPU is fine to be fully utilitzied, then you might ignore it by setting it always fine:
|
|
|
|
```
|
|
icinga2::service_check::customize:
|
|
'CPU Usage':
|
|
cpu_usage_always_ok: true
|
|
```
|
|
|
|
If in contrary you want to get an immediate notification when CPU is overused, then following snipped is more advisable:
|
|
```
|
|
icinga2::service_check::customize:
|
|
'CPU Usage':
|
|
criticality: A
|
|
```
|
|
|
|
If it is a Linuxfabrik plugin, you find a link at "Notes" which points to the documentation of the check. This might shed more light on the effect of these variables.
|
|
|
|
### Example "Kernel Ring Buffer (dmesg)'"
|
|
Another check which can easily have false alerts, but also has a big potential to signal severe kernel or hardware issues, is the check of the kernel log (dmesg).
|
|
|
|
If you conclude that a given message can savely be ingored, you may add it onto the ignore list, where a partial string match will make it ignored in the future:
|
|
```
|
|
icinga2::service_check::customize:
|
|
'Kernel Ring Buffer (dmesg)':
|
|
'dmesg_ignore':
|
|
- 'blk_update_request: I/O error, dev fd0, sector 0'
|
|
- 'integrity: Problem loading X.509 certificate -126'
|
|
```
|
|
If you think that this log message can be globally ignored, please inform the Linux Team so we can ignore it by default.
|
|
|
|
Note that you can reset this check after dealing with it by executing on the node:
|
|
```
|
|
dmesg --clear
|
|
```
|
|
|
|
## Extra Checks
|
|
|
|
### TLS/SSL Certificate Expiration
|
|
|
|
To monitor the expiration of one or more certificates you need to give the node in Hiera the additional server role `ssl-cert` (except for `role::jupyterserver`):
|
|
|
|
```
|
|
icinga2::additional_server_role:
|
|
- 'ssl-cert'
|
|
```
|
|
|
|
Then list what certificate files you want to have checked:
|
|
|
|
```
|
|
icinga2::service_check::customize:
|
|
'TLS/SSL Certificate Expiration':
|
|
ssl_cert_files:
|
|
- '/etc/xrdp/cert.pem'
|
|
- '/etc/httpd/ssl/node.crt'
|
|
```
|
|
|
|
Beside the file list you may set the warning time in days with the attribute `ssl_cert_warning` (`7` by default) and the critical time with the attribute `ssl_cert_critical` (`3` by default).
|
|
|
|
If you run your own PKI, you might also check a CA certificate for expiration with
|
|
|
|
```
|
|
icinga2::additional_server_role:
|
|
- 'ca-cert'
|
|
|
|
icinga2::service_check::customize:
|
|
'CA Certificate Expiration':
|
|
ssl_cert_files:
|
|
- '/etc/my_pki/ca.pem'
|
|
```
|
|
Here the warning is below 180 days and below 30 days is critical by default.
|
|
|
|
|
|
### Check for Systemd Service Status
|
|
|
|
To check if a daemon or service has been successfully started by `systemd` configure:
|
|
|
|
|
|
```
|
|
icinga2::custom_service:
|
|
'XRDP Active':
|
|
template: 'st-agent-awi-lx-service-active'
|
|
vars:
|
|
criticality: 'A'
|
|
service_names:
|
|
- 'xrdp'
|
|
- 'xrdp-sesman'
|
|
```
|
|
The name (here `XRDP Active`) needs to be unique over all Icinga "services" of a single host.
|
|
The `service_names` variable needs to contain one or more name of `systemd` services to be monitored.
|
|
|
|
You can create multiple of these checks.
|
|
|
|
### External Connection Checks (Active Checks)
|
|
|
|
For this we have fully custom service checks.
|
|
|
|
Below example is for a RDP port:
|
|
|
|
```
|
|
icinga2::custom_service:
|
|
'RDP Access':
|
|
command: 'tcp'
|
|
agent: false
|
|
perf_data: true
|
|
vars:
|
|
criticality: 'A'
|
|
tcp_port: 3389
|
|
```
|
|
|
|
Possible commands are [`http`](https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#plugin-check-command-http), [`tcp`](https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#plugin-check-command-tcp), [`udp`](https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#plugin-check-command-udp), [`ssl`](https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#plugin-check-command-ssl), [`ssh`](https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#plugin-check-command-ssh) or [`ftp`](https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#plugin-check-command-ftp).
|
|
|
|
Note if you want to reference the hostname, you might use a macro, e.g.:
|
|
```
|
|
http_vhost: '$host.name$'
|
|
```
|
|
Note that macros only work for check command arguments.
|
|
|
|
The actual service name is up to you, it only needs to be unique.
|
|
|
|
### Other Custom Checks
|
|
It is possible to create a very custom check. But note the command or service template used needs to be available/configured by some other means on the Icinga Master. The check plugin executed on the Icinga Satellite or by the Icinga agent needs also to be already available or distributed by other means. So please reach out to the [Linux Team](mailto:linux-eng@psi.ch) to check how to do it best and to ensure that all is in place.
|
|
|
|
```
|
|
icinga2::custom_service:
|
|
'My Service Check 1':
|
|
template: st-agent-lf-file-size
|
|
vars:
|
|
criticality: 'B'
|
|
file_size_filename: '/var/my/growing/file'
|
|
file_size_warning = '100M'
|
|
file_size_critical = '200M'
|
|
'My Service Check 2':
|
|
command: 'tcp'
|
|
agent: false
|
|
vars:
|
|
criticality: 'A'
|
|
tcp_port: 3389
|
|
perf_data: true
|
|
```
|
|
Below `icinga2::custom_service` set the name of the service/service check as it will be seen in Icingaweb. Then the possible arguments are
|
|
- `command` to issue a check command
|
|
- `template` to inherit from given service template
|
|
- `agent` shall the `command` run on the agent or the satellite, only if `themplate` is not set, default is `true`
|
|
- `vars` hash with arguments for the service check
|
|
- `perf_data` if performance data should be recorded and performance graph should be shown, default is `false`
|
|
|
|
You are free in the use of the actual service name, it only needs to be unique.
|