Files
gitea-pages/admin-guide/configuration/monitoring/metrics_telegraf.md

3.2 KiB

Metric Collections - Configuration Telegraf

There is a central metrics server at PSI that is accessible via https://metrics.psi.ch. All standard Linux system will be able to send metrics to this server when telegraf metrics collection is enabled via hiera.

Following statement will enable the metrics collection:

base::enable_telegraf: true

By default a number of metrics are collected, including cpu, disk usage, diskio, etc. A detailed list with the defaults can be found in common.yaml of the puppet repository.

Custom metrics can also be added. (documentation to be done - please contact the Linux Core group if you need this).

Depending on the location of the system hiera/puppet will configure the system to either send the data directly (PSI intranet) or via reverse proxy (DMZ, Extranet, tier3) to the central metrics server.

If you run your own metric server or you want to explicitly overwrite where data is send to you can do this as follows:

telegraf::agent:
  url: http://your-metric-server.psi.ch

If you want to tweak the configuration on how metrics are collected, you can do this as well like this (following are the defaults - only specify the keys you would like to overwrite):

telegraf::agent:
  interval: '1m'
  collection_jitter: '0s'
  flush_interval: '1m'
  flush_jitter: '10s'
  metric_buffer_limit: 10000

By default puppet will purge and recreate (if needed) all config files in /etc/telegraf/telegraf.d. If you want to deploy your own metrics collection scripts outside of puppet/hiera you can disable the purging via:

telegraf::config::purge: false

You can also configure your own metric to be collected via hiera as follows:

telegraf::metrics:
  'your_metric':
    plugin: 'exec'
    timeout: '30s'
    interval: '1m'
    data_format: 'influx'
    commands: ['sudo /your/script/location/script.sh']
    enable: true

This will only work if you have deployed the necessary script (in the example /your/script/location/script.sh) and the necessary sudo rule(s) beforehand. For this you might wanna use techniques described in Distribute Files and/or Custom sudo Rules.

Examples

Custom Script

A custom telegraf collector can look something like this:

#!/bin/bash

CONNECTED=$(/usr/NX/bin/nxserver --history | awk '$7 == "Connected" {print}' | wc -l)
DISCONNECTED=$(/usr/NX/bin/nxserver --history | awk '$7 == "Disconnected" {print}' | wc -l)
FINISHED=$(/usr/NX/bin/nxserver --history | awk '$7 == "Finished" {print}' | wc -l)

# Provide data to telegraf
echo "nxserver open_sockets=$(lsof -i -n -P | wc -l),connected_sessions=${CONNECTED},disconnected_sessions=${DISCONNECTED},finished_sessions=${FINISHED}"

The first string of the echo command is the name of the series the data is written into. This name can be overwritten in the metric config via name_override = "nxserver_report"

Custom Config File

A custom config file in /etc/telegraf/telegraf.d could look like this:

[[inputs.exec]]
  name_override = "remacc_report"
  timeout = "30s"
  interval = "5m"
  data_format = "influx"
  commands = ["sudo /usr/lib/telegraf/scripts/remacc_report.sh"]