86 lines
3.2 KiB
Markdown
86 lines
3.2 KiB
Markdown
# Metric Collections - Configuration Telegraf
|
|
|
|
There is a central metrics server at PSI that is accessible via https://metrics.psi.ch. All standard Linux system will be able to send metrics to this server when telegraf metrics collection is enabled via hiera.
|
|
|
|
Following statement will enable the metrics collection:
|
|
```yaml
|
|
base::enable_telegraf: true
|
|
```
|
|
|
|
By default a number of metrics are collected, including cpu, disk usage, diskio, etc.
|
|
A detailed list with the defaults can be found in [common.yaml](https://git.psi.ch/linux-infra/puppet/-/blob/preprod/data/common.yaml#L855) of the puppet repository.
|
|
|
|
Custom metrics can also be added. (documentation to be done - please contact the Linux Core group if you need this).
|
|
|
|
Depending on the location of the system hiera/puppet will configure the system to either send the data directly (PSI intranet) or via reverse proxy (DMZ, Extranet, tier3) to the central metrics server.
|
|
|
|
If you run your own metric server or you want to explicitly overwrite where data is send to you can do this as follows:
|
|
|
|
```yaml
|
|
telegraf::agent:
|
|
url: http://your-metric-server.psi.ch
|
|
```
|
|
|
|
If you want to tweak the configuration on how metrics are collected, you can do this as well like this (following are the defaults - only specify the keys you would like to overwrite):
|
|
|
|
```yaml
|
|
telegraf::agent:
|
|
interval: '1m'
|
|
collection_jitter: '0s'
|
|
flush_interval: '1m'
|
|
flush_jitter: '10s'
|
|
metric_buffer_limit: 10000
|
|
```
|
|
|
|
|
|
By default puppet will purge and recreate (if needed) all config files in `/etc/telegraf/telegraf.d`. If you want to deploy your own metrics collection scripts outside of puppet/hiera you can disable the purging via:
|
|
|
|
```yaml
|
|
telegraf::config::purge: false
|
|
```
|
|
|
|
|
|
You can also configure your own metric to be collected via hiera as follows:
|
|
```yaml
|
|
telegraf::metrics:
|
|
'your_metric':
|
|
plugin: 'exec'
|
|
timeout: '30s'
|
|
interval: '1m'
|
|
data_format: 'influx'
|
|
commands: ['sudo /your/script/location/script.sh']
|
|
enable: true
|
|
```
|
|
This will only work if you have deployed the necessary script (in the example `/your/script/location/script.sh`) and the necessary sudo rule(s) beforehand. For this you might wanna use techniques described in [Distribute Files](../files/distribute_files) and/or [Custom sudo Rules](../access/sudo).
|
|
|
|
## Examples
|
|
|
|
### Custom Script
|
|
|
|
A custom telegraf collector can look something like this:
|
|
```bash
|
|
#!/bin/bash
|
|
|
|
CONNECTED=$(/usr/NX/bin/nxserver --history | awk '$7 == "Connected" {print}' | wc -l)
|
|
DISCONNECTED=$(/usr/NX/bin/nxserver --history | awk '$7 == "Disconnected" {print}' | wc -l)
|
|
FINISHED=$(/usr/NX/bin/nxserver --history | awk '$7 == "Finished" {print}' | wc -l)
|
|
|
|
# Provide data to telegraf
|
|
echo "nxserver open_sockets=$(lsof -i -n -P | wc -l),connected_sessions=${CONNECTED},disconnected_sessions=${DISCONNECTED},finished_sessions=${FINISHED}"
|
|
```
|
|
|
|
The first string of the echo command is the name of the series the data is written into. This name can be overwritten in the metric config via `name_override = "nxserver_report"`
|
|
|
|
### Custom Config File
|
|
|
|
A custom config file in /etc/telegraf/telegraf.d could look like this:
|
|
|
|
```
|
|
[[inputs.exec]]
|
|
name_override = "remacc_report"
|
|
timeout = "30s"
|
|
interval = "5m"
|
|
data_format = "influx"
|
|
commands = ["sudo /usr/lib/telegraf/scripts/remacc_report.sh"]
|
|
```
|