7.6 KiB
Icinga2
We want to support monitoring of the Linux machines in Icinga2. The Icinga2 infrastructure as such is maintained by AIT, currently mainly Heinz Scheffler, with Bernard Bumbak as deputy.
Icinga2 Servers
-
PROD monitoring.psi.ch (Loadbalancer)
- Primary vemonma01a.psi.ch (with Icinga Director)
- Secondary wmonma01b.psi.ch
-
DEV
- Primary vmonma02a.psi.ch (with Icinga Director)
- Secondary vmonma02b.psi.ch
Automated Host Configuration
The Linux part of the Icinga2 Master configuration is manged using Ansible in the icinga_master role in the bootstrap repo.
For Puppet managed nodes there is an automated import pipeline using the Icinga Director. For the central infrastructure itself there is a predefined Configuration Basket snapshot which is installed by manual Ansible run.
Configuration which is shared and used by both type of systems are found in the awi-lx-basic Configuration Basket
Puppet Managed Nodes
The individual host configuration is automatically generated using already known information from
- Sysdb (inventory of nodes)
- Hiera (configuration and customization of the nodes)
- Puppet Facts (OS version, attached networks, partitions)
- NetOps (security levels of networks, needed for selecting the correct Icinga2 Satellite)
Import of Hiera Data to Sysdb
TODO
Import of Puppet Facts to Sysdb
TODO
Import of NetOps Data
TODO
Import into Director
What in the overview diagram is one arrow, is a bit more complicated when implemented:
It contains two parts:
- the import of host information (a bit simpler)
- the import of notification users (a bit more complex).
The import as such is triggered every 10 minutes by the sysdb-director-import timer.
To avoid interference with manual Director change operations the import script checks first if there are no undeployed objects. If so it it will not run.
A Director import pipeline needs first a Import Source which imports the data from an external source. The second element is the Sync Rule which then creates actual Director objects out of the data provided by one or more Import Sources
Director Import of Host Information
The Icinga Director import pipeline is provides as Configuration Basket template awi-lx-sysdb.
The pipeline for host groups imports the list of sysdb host groups and the filter to select the hosts where the group name is listed in the host.vars.host_groups array of the host.
The host import contains all the special configuration for a host. But there we need two sync rules, where awi-lx-sysdb-host-sync-rule creates and fills the host object.
The second awi-lx-sysdb-service-override-sync-rule just fills host.vars._override_servicevars which the Director does not touch in the normal import.
The reason is that this variable stores usually manual serice override changes made in Director, an those should survive normal imports. But as we also manage these in Hiera, we need a Update only sync rule which allows also to set this variable.
Director Import of Notification Users
Here the actual user information comes from the AD. But as we do not import all users, but only those who want to be notified, we also need to modify what is being imported.
An LDAP filter selects these users specificly. The filter has three sections:
- users we do not want (service users, etc)
- departments we want
- notification users for Linux machines
The LDAP filter is created on one side out of ignore_list.conf and department_list.conf which are managed by AIT/Icinga2 Team. The second part is read from the Sysdb API (user_list.conf). Together the Configuration Basket Director-Basket_icinga-notification-user-import.json which contains the full notification user import pipeline (import source icinga-notification-user-import and sync rule icinga-notification-user-sync-rule which also get updated always on an import run, before they are then actually triggered.
Ansible Managed Central Infrastructure (e.g. Puppet Server)
TODO
Development of Icinga Director Import Pipeline
The base are always the Configuration Basket snapshots (JSON files) which we have in Git. For changes either change them directly or change them in the Icinga Director web UI and then create a new snapshot of the according Configuration Basket, download it, modify if necessary, e.g. if it is templated as for the Sysdb import pipeline and then commit it to the git repo.
The rollout into production is then done with the bootstrap Ansible role for the Icinga2 Master nodes. Note that it will only attempt to import the Configuration Basket snapshot as provided from git when the file changes on disk. So if there is an failure during the import, best delete them on the Icinga2 Master:
rm /etc/icingaweb2/psi/lx-core/*
Further there is an issue with updated Sync Rules in the Configuration Basket snapshot. There is a bug which makes their property list not updated on import. To work around you need to delete the Sync Rule manually in the Icinga Director UI. They cannot be deleted from shell with icingacli director (feature request).
Bootstrap
The Icinga2 infrastructure is maintained and prepared by AIT. Following items need to be prepared from their side:
- basic setup of Icinga2 Master
- addition of the Icinga Director module
- addition of the Fileshipper module with following configuration (
/etc/icingaweb2/modules/fileshipper/imports.ini):[Import AWI Linux Infrastructure Servers] basedir = "/etc/icingaweb2/psi/lx-core" - in
roles.inihave aGeneric User Rolewith read/monitoring-only permissions - the
/etc/icingaweb2/psi/merge-roles-ini.pyscript provided by AIT to be able to merge in roles via Ansible/Sysdb API - access to the AD with and LDAP import source named
LDAP PSI
From our side we need the following manual setup
- prepare the Scheduled Downtime
Generic Linux Alert Suppression(cannot be imported with Configuration Basket, see feature request) with- Downtime name:
Generic Linux Alert Suppression - Author:
Core Linux Research Services - Comment:
By default manged RHEL systems do not alert or send notifications, they just collect monitoring information in Icinga2. To enable alerting, set in Hiera: icinga2::alerting::enable: true - Fixed:
Yes - Disabled:
No - Apply to:
Hosts - With Services:
Yes - Assign where:
host.vars.lx_disabled_alertingis true (or set) - and finally on "Ranges" add a range with
Days:
january 1 - december 31Timeperiods:00:00-24:00
- Downtime name:
- run the Ansible playbook:
or for production
ansible-playbook -i inventory_test.yaml --vault-pass-file ./vault-pass prepare_icinga_master.yamlansible-playbook -i inventory.yaml -i inventory_dmz.yaml --vault-pass-file ./vault-pass prepare_icinga_master.yaml

