Files
gitea-pages/engineering-guide/icinga2.md

139 lines
7.6 KiB
Markdown

# Icinga2
We want to support monitoring of the Linux machines in Icinga2. The Icinga2 infrastructure as such is maintained by AIT, currently mainly Heinz Scheffler, with Bernard Bumbak as deputy.
## Icinga2 Servers
- PROD [monitoring.psi.ch](https://monitoring.psi.ch/) (Loadbalancer)
- Primary [vemonma01a.psi.ch](https://vemonma01a.psi.ch/) (with Icinga Director)
- Secondary [wmonma01b.psi.ch](https://wmonma01b.psi.ch/)
- DEV
- Primary [vmonma02a.psi.ch](https://vmonma02a.psi.ch/) (with Icinga Director)
- Secondary [vmonma02b.psi.ch](https://vmonma02b.psi.ch/)
## Automated Host Configuration
The Linux part of the Icinga2 Master configuration is manged using Ansible in the [`icinga_master` role in the `bootstrap` repo](https://git.psi.ch/linux-infra/bootstrap/-/tree/prod/ansible/roles/icinga_master).
For Puppet managed nodes there is an automated import pipeline using the Icinga Director. For the central infrastructure itself there is a predefined Configuration Basket snapshot which is installed by manual Ansible run.
Configuration which is shared and used by both type of systems are found in the [`awi-lx-basic` Configuration Basket](https://git.psi.ch/linux-infra/bootstrap/-/blob/prod/ansible/roles/icinga_master/files/etc/icingaweb2/psi/lx-core/Director-Basket_awi-lx-basic.json)
### Puppet Managed Nodes
The individual host configuration is automatically generated using already known information from
- Sysdb (inventory of nodes)
- Hiera (configuration and customization of the nodes)
- Puppet Facts (OS version, attached networks, partitions)
- NetOps (security levels of networks, needed for selecting the correct Icinga2 Satellite)
![high level idea of Linux computer data import to Icinga2](icinga2/icinga2_import_big_picture.png)
#### Import of Hiera Data to Sysdb
TODO
#### Import of Puppet Facts to Sysdb
TODO
#### Import of NetOps Data
TODO
#### Import into Director
What in the overview diagram is one arrow, is a bit more complicated when implemented:
![high level idea of Linux computer data import to Icinga2](icinga2/icinga2_import.png)
It contains two parts:
- the import of host information (a bit simpler)
- the import of notification users (a bit more complex).
The import as such is triggered every 10 minutes by the `sysdb-director-import` timer.
To avoid interference with manual Director change operations the import script checks first if there are no undeployed objects. If so it it will not run.
A Director import pipeline needs first a Import Source which imports the data from an external source. The second element is the Sync Rule which then creates actual Director objects out of the data provided by one or more Import Sources
##### Director Import of Host Information
The Icinga Director import pipeline is provides as [Configuration Basket template `awi-lx-sysdb`](https://git.psi.ch/linux-infra/bootstrap/-/blob/prod/ansible/roles/icinga_master/templates/Director-Basket_awi-lx-sysdb.json).
The pipeline for host groups imports the list of sysdb host groups and the filter to select the hosts where the group name is listed in the `host.vars.host_groups` array of the host.
The host import contains all the special configuration for a host. But there we need two sync rules, where `awi-lx-sysdb-host-sync-rule` creates and fills the host object.
The second `awi-lx-sysdb-service-override-sync-rule` just fills `host.vars._override_servicevars` which the Director does not touch in the normal import.
The reason is that this variable stores usually manual serice override changes made in Director, an those should survive normal imports. But as we also manage these in Hiera, we need a `Update only` sync rule which allows also to set this variable.
##### Director Import of Notification Users
Here the actual user information comes from the AD. But as we do not import all users, but only those who want to be notified, we also need to modify what is being imported.
An LDAP filter selects these users specificly. The filter has three sections:
- users we do not want (service users, etc)
- departments we want
- notification users for Linux machines
The LDAP filter is created on one side out of `ignore_list.conf` and `department_list.conf` which are managed by AIT/Icinga2 Team. The second part is read from the Sysdb API (`user_list.conf`). Together the Configuration Basket `Director-Basket_icinga-notification-user-import.json` which contains the full notification user import pipeline (import source `icinga-notification-user-import` and sync rule `icinga-notification-user-sync-rule` which also get updated always on an import run, before they are then actually triggered.
### Ansible Managed Central Infrastructure (e.g. Puppet Server)
TODO
## Development of Icinga Director Import Pipeline
The base are always the Configuration Basket snapshots (JSON files) which we have in Git.
For changes either change them directly or change them in the Icinga Director web UI and then create a new snapshot of the according Configuration Basket, download it, modify if necessary, e.g. if it is templated as for the Sysdb import pipeline
and then commit it to the git repo.
The rollout into production is then done with the bootstrap Ansible role for the Icinga2 Master nodes.
Note that it will only attempt to import the Configuration Basket snapshot as provided from git when the file changes on disk. So if there is an failure during the import, best delete them on the Icinga2 Master:
```
rm /etc/icingaweb2/psi/lx-core/*
```
Further there is an issue with updated Sync Rules in the Configuration Basket snapshot. There is a [bug which makes their property list not updated on import](https://github.com/Icinga/icingaweb2-module-director/issues/2779). To work around you need to delete the Sync Rule manually in the Icinga Director UI. They cannot be deleted from shell with `icingacli director` ([feature request](https://github.com/Icinga/icingaweb2-module-director/issues/2706)).
## Bootstrap
The Icinga2 infrastructure is maintained and prepared by AIT. Following items need to be prepared from their side:
- basic setup of Icinga2 Master
- addition of the Icinga Director module
- addition of the Fileshipper module with following configuration (`/etc/icingaweb2/modules/fileshipper/imports.ini`):
```
[Import AWI Linux Infrastructure Servers]
basedir = "/etc/icingaweb2/psi/lx-core"
```
- in `roles.ini` have a `Generic User Role` with read/monitoring-only permissions
- the `/etc/icingaweb2/psi/merge-roles-ini.py` script provided by AIT to be able to merge in roles via Ansible/Sysdb API
- access to the AD with and LDAP import source named `LDAP PSI`
From our side we need the following manual setup
- prepare the Scheduled Downtime `Generic Linux Alert Suppression` (cannot be imported with Configuration Basket, see [feature request](https://github.com/Icinga/icingaweb2-module-director/issues/2795)) with
- Downtime name: `Generic Linux Alert Suppression`
- Author: `Core Linux Research Services`
- Comment:
```
By default manged RHEL systems do not alert or send notifications, they just collect monitoring information in Icinga2.
To enable alerting, set in Hiera:
icinga2::alerting::enable: true
```
- Fixed: `Yes`
- Disabled: `No`
- Apply to: `Hosts`
- With Services: `Yes`
- Assign where: `host.vars.lx_disabled_alerting` `is true (or set)`
- and finally on "Ranges" add a range with
Days: `january 1 - december 31`
Timeperiods: `00:00-24:00`
- run the Ansible playbook:
```
ansible-playbook -i inventory_test.yaml --vault-pass-file ./vault-pass prepare_icinga_master.yaml
```
or for production
```
ansible-playbook -i inventory.yaml -i inventory_dmz.yaml --vault-pass-file ./vault-pass prepare_icinga_master.yaml
```