removed all non databuffer specific stuff - moved imagebuffer into separate repo

This commit is contained in:
2022-01-05 16:39:53 +01:00
parent ade36c09a8
commit aec67cc604
64 changed files with 78 additions and 3057 deletions

View File

@@ -38,7 +38,7 @@ More details on the gitutils command can be found at: https://gitutils.readthedo
# Administration
If there are new changes to this configuration (either through a merge request or direct commit) the configuration needs to be uploaded to the Data/ImageBuffer. To do so clone or pull the latest changes from this repository and execute the `./bufferutils upload` script that comes with this repository (you have to be on a machine that have /opt/gfa/python available!).
If there are new changes to this configuration (either through a merge request or direct commit) the configuration needs to be uploaded to the Data/Buffer. To do so clone or pull the latest changes from this repository and execute the `./bufferutils upload` script that comes with this repository (you have to be on a machine that have /opt/gfa/python available!).
## Uploading Sources
To upload and start recording of all configured sources use:
@@ -70,10 +70,10 @@ _Note:_ Labled sources can be individually stopped and/or restarted by the stop/
## Stopping a sources by backend
Sources of a specific backend can be stopped like this (currently only the "sf-imagebuffer" backend is supported)
Sources of a specific backend can be stopped like this (currently only the "sf-databuffer" backend is supported)
```bash
./bufferutils stop --backend sf-imagebuffer
./bufferutils stop --backend sf-databuffer
```
## Stopping all sources

View File

@@ -13,8 +13,8 @@ logging.basicConfig(
)
base_directory = Path(".")
upload_url = "https://dispatcher-api.psi.ch/sf/configuration/upload"
delete_url = "https://dispatcher-api.psi.ch/sf/configuration/delete"
upload_url = "https://dispatcher-api.psi.ch/sf-databuffer/configuration/upload"
delete_url = "https://dispatcher-api.psi.ch/sf-databuffer/configuration/delete"
# upload_url = "http://localhost:1234"

Binary file not shown.

View File

@@ -49,13 +49,6 @@ A healthy network load looks something like this:
If more than 2 machines does not show this pattern - the DataBuffer has an issue.
Check memory usage and network load ImageBuffer:
https://hpc-monitor02.psi.ch/d/TW0pr_bik/gl2?refresh=30s&orgId=1
A healthy memory consumption is between 50GB and 250GB, any drop below 50GB indicates a crash.
![image](documentation/ImageBufferMemory.png)
Check disk free DataBuffer:
http://gmeta00.psi.ch/?r=hour&cs=&ce=&c=sf-daqbuf&h=&tab=m&vn=&hide-hf=false&m=disk_free_percent_data&sh=1&z=small&hc=4&host_regex=&max_graphs=0&s=by+name
@@ -64,36 +57,9 @@ http://gmeta00.psi.ch/?r=hour&cs=&ce=&c=sf-daqbuf&h=&tab=m&vn=&hide-hf=false&m=d
Checks on the cluster can be performed via ansible ad-hoc commands:
```bash
ansible databuffer_cluster -m shell -a 'uptime'
ansible databuffer -m shell -a 'uptime'
```
Check whether time are synchronized between the machines:
```bash
ansible databuffer_cluster -m shell -a 'date +%s.%N'
```
Check if the ntp synchronization is enabled and running
```bash
ansible databuffer_cluster -b -m shell -a "systemctl is-enabled chronyd"
ansible databuffer_cluster -b -m shell -a "systemctl is-active chronyd"
```
Check if the tuned service is running:
```bash
ansible databuffer_cluster -b -m shell -a "systemctl is-active tuned"
```
Check latest 10 lines of the dispatcher node logs
```bash
ansible databuffer_cluster -b -m shell -a "journalctl -n 10 -u daq-dispatcher-node.service"
```
Check for failed compactions
```bash
ansible -b databuffer -m shell -a "journalctl -n 50000 -u daq-dispatcher-node.service | grep \"Exception while compacting\" | grep -oP \"\\'\K[^\\']+\" | sort | uniq"
```
## Find Sources With Issues
Find sources with bsread level issues
@@ -125,8 +91,6 @@ Find channels that send corrupt MainHeader:
https://kibana.psi.ch/s/gfa/app/discover#/view/cb725720-ca89-11ea-bc5d-315bf3957d13?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-15m,to:now))&_a=(columns:!(bsread.error.type,bsread.source,message),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'2e885ee0-c5d0-11ea-82c0-2da95a58e9d4',key:systemd.unit,negate:!f,params:(query:daq-dispatcher-node.service),type:phrase),query:(match_phrase:(systemd.unit:daq-dispatcher-node.service)))),index:'2e885ee0-c5d0-11ea-82c0-2da95a58e9d4',interval:auto,query:(language:kuery,query:MainHeader),sort:!(!('@timestamp',desc)))
# Maintenance
## Restart Procedures
@@ -142,65 +106,7 @@ bash -c 'caqtdm -noMsg -stylefile sfop.qss S_OP_Messages_all_stations_small.u
Inform the sf-operations@psi.ch mailing list before the restart!
### Restart Data Retrieval
If there are issues with data retrieval (DataBuffer, ImageBuffer, Epics Channel Archiver) but all checks regarding the DataBuffer shows normal operation use this procedure to restart the SwissFEL data retrieval services. This will only affect the data retrieval of SwissFEL at the time of restart but there will be no interrupt in the recording of the data.
- login to sf-lca.psi.ch
- clone the databuffer repository (if you haven't yet- https://git.psi.ch/archiver_config/sf_databuffer.git), change to the `operation-tools` directory and/or pull the latest changes
```bash
cd operation-tools
```
- call the restart_dataretrieval script
```bash
ansible-playbook restart_dataretrieval.yml
```
### Restart Data Retrieval All
If the method above doesn't work try to restart all of the data retrieval services via this procedure. This will not interrupt any data recording __but this restart will, beside SwissFEL also affect the data retrieval of GLS, Hipa and Proscan__!
- login to sf-lca.psi.ch
- clone the databuffer repository (if you haven't yet - https://git.psi.ch/archiver_config/sf_databuffer.git), change to the `operation-tools` directory and/or pull the latest changes
```bash
cd operation-tools
```
- call the restart_dataretrieval script
```bash
ansible-playbook restart_dataretrieval_all.yml
```
### Restart ImageBuffer
If the DataBuffer looks healthy but the ImageBuffer seems to be in a buggy state the restart of the ImageBuffer only can be triggered as follows:
- login to sf-lca.psi.ch (_sf-lca.psi.ch is the machine in the machine network !!!!_)
- clone the databuffer repository (if you haven't yet), change to the repository directory and/or pull the latest changes
```bash
git clone https://git.psi.ch/archiver_config/sf_databuffer.git
cd sf_databuffer
# and/or
git pull
```
- stop the sources belonging to the imagebuffer
```bash
./bufferutils stop --backend sf-imagebuffer
```
- change to the operation-tools directory and call the restart_imagebuffer script
```bash
cd operation-tools
ansible-playbook restart_imagebuffer.yml
```
- Afterwards restart the recording of the image sources:
```bash
cd ..
./bufferutils upload
```
### Restart DataBuffer Cluster
### Restart DataBuffer
This is the procedure to follow to restart the DataBuffer in an emergency.
After checking whether the restart is really necessary do this:
@@ -215,132 +121,13 @@ cd sf_databuffer/operation-tools
git pull
```
- call the restart_cluster script
- call the restart script
```bash
ansible-playbook restart_cluster.yml
ansible-playbook restart.yml
```
- Afterwards restart the recording again:
```bash
cd ..
./bufferutils upload
```
## Manual Restart Procedures (Experts Only)
### Restart query-node Services
Restart daq-query-node service:
```bash
ansible databuffer_cluster --forks 1 -b -m shell -a "systemctl restart daq-query-node.service"
```
__Important Note:__ To be able to start the query node processes the dispatcher nodes need to be up and running! After restarting all query nodes you have to restart the data-api service as well. A single restart of a Query Node server should work fine (as there is no complete shutdown of the Hazelcast cluster).
### Restart dispatcher-node Services
Restart daq-dispatcher-node service:
```bash
ansible databuffer_cluster --forks 1 -b -m shell -a "systemctl restart daq-dispatcher-node.service"
```
This restart should also restart all recordings and reestablish streams. If there are issues, this recording restart can be enabled/disabled by setting dispatcher.local.sources.restart=false in /home/daqusr/.config/daq/dispatcher.properties. An other option is to delete the restart configurations as follows:
```bash
ansible databuffer_cluster -b -m shell -a "rm -rf /home/daqusr/.config/daq/stores/sources; rm -rf /home/daqusr/.config/daq/stores/streamers"
```
__Note:__ After restarting all dispatcher nodes you have to restart the dispatcher-api service as well. A single restart of Dispatcher Node server should work fine (as there is no complete shutdown of the Hazelcast cluster).
# Installation
## Prerequisites
To be able to install a new version of the daq system, the binaries need to be build and available in the Maven repository. (details see toplevel [Readme](../Readme.md))
## Pre-Checks
Make sure that the time is in sync within the machines:
```bash
ansible databuffer_cluster -m shell -a 'date +%s.%N'
```
Check if the ntp synchronization is enabled and running
```bash
ansible databuffer_cluster -b -m shell -a "systemctl is-enabled chronyd"
ansible databuffer_cluster -b -m shell -a "systemctl is-active chronyd"
```
On the ImageBuffer nodes check that the MTU size of the 25Gb/s interface is set to 9000
```bash
ip link
```
On the ImageBuffer nodes test the connection to camera servers with iperf3. As all the camera servers only have a 10Gb/s interface the overall throughput (SUM) should be around 9Gb/s. While testing connecting to two servers simultaneously should show 9Gb/s for each stream.
```
# Start iperf server on the camera servers
iperf3 -s
```
```
# Check speed to different camera servers via
iperf3 -P 3 -c daqsf-sioc-cs-02
iperf3 -P 3 -c daqsf-sioc-cs-31
iperf3 -P 3 -c daqsf-sioc-cs-73
iperf3 -P 3 -c daqsf-sioc-cs-85
```
Check whether all the firmware of the servers are on the same level:
```bash
ansible databuffer_cluster -b -m shell -a "dmidecode -s bios-release-date"
```
Check whether the Power Regulator Settings in the bios is set to __Static High Performance Mode__ !
![documentation/BIOSSettings.png](documentation/BIOSSettings.png)
## Steps
Add daqusr user and daq group on all nodes:
```bash
ansible-playbook install_user.yml
```
Install file and memory limits for the _daqusr_ on all nodes:
```bash
ansible-playbook install_limits.yml
```
Install the current JDK on all nodes:
```bash
ansible-playbook install_jdk.yml
```
Install dispatcher node:
```bash
ansible-playbook install_dispatcher_node.yml
```
Install query node:
```bash
ansible-playbook install_query_node.yml
```
The installation of the dispatcher and query nodes does not start the services. See above how to start them.
## Post-Checks
Check if the tuned service is running:
```bash
ansible databuffer_cluster -b -m shell -a "systemctl is-active tuned"
```
Check whether all CPUs are set to performance:
```bash
ansible databuffer_cluster -b -m shell -a "cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor | uniq -c"
```
```

Binary file not shown.

View File

@@ -1,247 +0,0 @@
# Troubleshooting
Find sources with issues in validation log:
```
tail -n 10000 /opt/dispatcher_node/latest/logs/data_validation.log | grep "Invalid and drop" | awk -e '{print($4)}' | sort | uniq
```
# Overview
Following are the step to setup the imagebuffer from scratch:
- Install Users
```bash
for THE_HOST in $(sort -u ../hostlists_daqbufs/ImageBufferHosts.txt); do cat user_settings.sh ../hostlists_daqbufs/env_settings.sh add_user.sh | ssh -i ${HOME}/.ssh/id_rsa_daq root@${THE_HOST} ; done
```
- Install Java
```bash
for THE_HOST in $(sort -u ../hostlists_daqbufs/ImageBufferHosts.txt); do cat user_settings.sh ../hostlists_daqbufs/env_settings.sh install_java.sh | ssh -i ${HOME}/.ssh/id_rsa_daq root@${THE_HOST} ; done
```
- Install Query Nodes - ImageBuffer
```bash
for THE_HOST in $(sort -u ../hostlists_daqbufs/ImageBufferHosts.txt); do cat user_settings.sh ../hostlists_daqbufs/env_settings.sh install_query_node.sh | ssh -i ${HOME}/.ssh/id_rsa_daq root@${THE_HOST} ; done
```
- Install Dispatcher Nodes ImageBuffer
```bash
for THE_HOST in $(sort -u ../hostlists_daqbufs/ImageBufferHosts.txt); do cat user_settings.sh ../hostlists_daqbufs/env_settings.sh install_dispatcher_node.sh | ssh -i ${HOME}/.ssh/id_rsa_daq root@${THE_HOST} ; done
```
- Create Required Config File
```
[root@sf-daq-5 /]# cat /home/daqusr/.config/daq/domain.properties
backend.default=sf-imagebuffer
chown -R daqusr:daq /home/daqusr/.config/daq/domain.properties
chown -R daqusr:daq /home/daqusr/.config/daq/dispatcher.properties
```
- Restart Services (if needed)
```
for THE_HOST in $(sort -u ../hostlists_daqbufs/ImageBufferHosts.txt); do echo -e "systemctl stop daq-query-node.service" | ssh -i ${HOME}/.ssh/id_rsa_daq root@${THE_HOST} ; done
for THE_HOST in $(sort -u ../hostlists_daqbufs/ImageBufferHosts.txt); do echo -e "systemctl stop daq-dispatcher-node.service" | ssh -i ${HOME}/.ssh/id_rsa_daq root@${THE_HOST} ; done
```
```
for THE_HOST in $(sort -u ../hostlists_daqbufs/ImageBufferHosts.txt); do echo -e "hostname \n systemctl start daq-dispatcher-node.service \n sleep 20" | ssh -i ${HOME}/.ssh/id_rsa_daq root@${THE_HOST} ; done
for THE_HOST in $(sort -u ../hostlists_daqbufs/ImageBufferHosts.txt); do echo -e "hostname \n systemctl start daq-query-node.service \n sleep 10" | ssh -i ${HOME}/.ssh/id_rsa_daq root@${THE_HOST} ; done
```
Monitoring of the system is available via telegraf and grafana:
https://hpc-monitor01.psi.ch/d/TW0pr_bik/gl2?refresh=30s&orgId=1
----
TODO CONVERT DOCUMENTATION
## Dispatcher
<a name="dispatcher_node"/>
### Dispatcher Nodes
#### Install
1. Go to ch.psi.daq.buildall and execute: `./gradlew dropItDispatcherNode -x test`
2. Login to master node and follow [these instructions](Readme.md#clone_git) to setup the git environment.
3. Multihost command: `for THE_HOST in $(sort -u ../hostlists/DispatcherNodeHosts.txt); do cat user_settings.sh ../hostlists/env_settings.sh install_dispatcher_node.sh | ssh -i ${HOME}/.ssh/id_rsa_daq root@${THE_HOST} ; done`
#### De-Install
Multihost command: `for THE_HOST in $(sort -u ../hostlists/DispatcherNodeHosts.txt); do echo -e "systemctl stop daq-dispatcher-node.service \n systemctl disable daq-dispatcher-node.service \n rm /usr/lib/systemd/system/daq-dispatcher-node.service \n rm -rf /opt/dispatcher_node \n systemctl daemon-reload" | ssh -i ${HOME}/.ssh/id_rsa_daq root@${THE_HOST} ; done`
<a name="dispatcher_rest"/>
### Dispatcher REST Server
#### Install
1. Go to ch.psi.daq.buildall and execute: `./gradlew dropItDispatcherREST -x test`
2. Login to master node and follow [these instructions](Readme.md#clone_git) to setup the git environment.
3. Multihost command: `for THE_HOST in $(sort -u ../hostlists/DispatcherRESTHost.txt); do cat user_settings.sh ../hostlists/env_settings.sh install_dispatcher_rest.sh | ssh -i ${HOME}/.ssh/id_rsa_daq root@${THE_HOST} ; done`
4. Check if ui is running by using a browser: http://sf-nube-13.psi.ch:8080/
#### De-Install
Multihost command: `for THE_HOST in $(sort -u ../hostlists/DispatcherRESTHost.txt); do echo -e "systemctl stop daq-dispatcher-rest.service \n systemctl disable daq-dispatcher-rest.service \n rm /usr/lib/systemd/system/daq-dispatcher-rest.service \n rm -rf /opt/dispatcher_rest \n systemctl daemon-reload" | ssh -i ${HOME}/.ssh/id_rsa_daq root@${THE_HOST} ; done`
## Querying
<a name="query_node"/>
### Query Nodes
#### Install
1. Go to ch.psi.daq.buildall and execute: `./gradlew dropItQueryNode -x test`
2. Login to master node and follow [these instructions](Readme.md#clone_git) to setup the git environment.
3. Multihost command: `for THE_HOST in $(sort -u ../hostlists/QueryNodeHosts.txt); do cat user_settings.sh ../hostlists/env_settings.sh install_query_node.sh | ssh -i ${HOME}/.ssh/id_rsa_daq root@${THE_HOST} ; done`
#### De-Install
Multihost command: `for THE_HOST in $(sort -u ../hostlists/QueryNodeHosts.txt); do echo -e "systemctl stop daq-query-node.service \n systemctl disable daq-query-node.service \n rm /usr/lib/systemd/system/daq-query-node.service \n rm -rf /opt/query_node \n systemctl daemon-reload" | ssh -i ${HOME}/.ssh/id_rsa_daq root@${THE_HOST} ; done`
<a name="query_rest"/>
### Query REST Server
#### Install
1. Go to ch.psi.daq.buildall and execute: `./gradlew dropItQueryREST -x test`
2. Login to master node and follow [these instructions](Readme.md#clone_git) to setup the git environment.
3. Multihost command: `for THE_HOST in $(sort -u ../hostlists/QueryRESTHost.txt); do cat user_settings.sh ../hostlists/env_settings.sh install_query_rest.sh | ssh -i ${HOME}/.ssh/id_rsa_daq root@${THE_HOST} ; done`
#### De-Install
Multihost command: `for THE_HOST in $(sort -u ../hostlists/QueryRESTHost.txt); do echo -e "systemctl stop daq-query-rest.service \n systemctl disable daq-query-rest.service \n rm /usr/lib/systemd/system/daq-query-rest.service \n rm -rf /opt/query_rest \n systemctl daemon-reload" | ssh -i ${HOME}/.ssh/id_rsa_daq root@${THE_HOST} ; done`
## DAQLocal
<a name="daqlocal"/>
### DAQLocal
#### Install
1. Go to ch.psi.daq.buildall and execute: `./gradlew dropItDAQLocal -x test`
2. Login to master node and follow [these instructions](Readme.md#clone_git) to setup the git environment.
1. Multihost command: `cat user_settings.sh ../hostlists/env_settings.sh ../hostlists/stream_sources.sh install_daqlocal.sh | bash`
#### De-Install
Multihost command: `systemctl stop daq-daqlocal.service; systemctl disable daq-daqlocal.service; rm -rf /usr/lib/systemd/system/daq-daqlocal.service; rm -rf /opt/daqlocal`
## Helpful Commands
Dispatcher Node service:
`for THE_HOST in $(sort -u ../hostlists/*Host*.txt); do echo -e "echo -e '\n\nHOST:${THE_HOST}' && ls /usr/lib/systemd/system/daq-dispatcher-node* | xargs -n1 basename | xargs -n1 systemctl stop" | ssh -i ${HOME}/.ssh/id_rsa_daq root@${THE_HOST} ; done`
### Miscellaneous
Remove log files:
`for THE_HOST in $(sort -u ../hostlists/*Host*.txt); do echo -e "find /data_meta -name "*.log*" | grep "logs" | xargs rm" | ssh -i ${HOME}/.ssh/id_rsa_daq root@${THE_HOST} ; done`
Monitor log messages
systemd:
`journalctl -f -u daq-dispatcher-rest.service`
`journalctl --since=today -u daq-dispatcher-rest.service`
CPU/Disk usage:
`dstat -d -D sdb1,sda5,total -cm -n`
### Docker Issues
```
systemctl stop docker
rm -rf /var/lib/docker
systemctl start docker
systemctl start nginx
```
### Modify Logging
1. Modify logback-server.xml (e.g. in /opt/dispatcher_node/latest/lib/)
2. Run JConsole
a. /usr/java/latest/bin/jconsole <PID>
b. /usr/java/latest/bin/jconsole and use `Remote Process` (application needs to be started with `-Dcom.sun.management.jmxremote.port=3334 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false`)
a. localhost:3334 for DispatcherNode
b. localhost:3335 for DispatcherRest
c. localhost:3336 for QueryNode
d. localhost:3337 for QueryRest.
3. Go to ch.qos.logback.classic -> ... -> Operations
4. Press `reloadDefaultConfiguration`
### Profiling
1. Run `/usr/java/latest/bin/jvisualvm` (use `/usr/java/latest/bin/jvisualvm -J-Dnetbeans.logger.console=true` for debuging).
2. Add `JMX Connestion` (see [here](Readme.md#modify_logging) for `hostname:port` settings
Note: You might need to install `yum install xorg-x11-xauth libXtst`
### Folder Crawler
The NAS system needs incredibly long to list folders. Current workaround is to use a folder crawler that periodically lists the folder structure and thus keeps it in the cache.
1. `mkdir /home/daqusr/scripts && cp ../scripts/folder_crawler.sh /home/daqusr/scripts && chown -R daqusr:daq /home/daqusr/scripts`
2. `cp ../hostlists/systemd/folder-crawler.service /etc/systemd/system/ && systemctl enable folder-crawler.service && systemctl daemon-reload`
3. `cp ../hostlists/systemd/folder-crawler.timer /etc/systemd/system/ && systemctl enable folder-crawler.timer && systemctl daemon-reload && systemctl start folder-crawler.timer`
## Maintenance Utils
### Find Largest Files
`find /data/sf-databuffer/daq_swissfel/daq_swissfel_3 -mtime +5 -printf "%s %n %m %u %g %t %p" \( -type l -printf ' -> %l\n' -o -printf '\n' \) | sort -k1,1 -n`
### Count Disk Usage
`find /data/sf-databuffer/daq_swissfel/daq_swissfel_3 -type f -mtime +5 -printf '%s\n' | awk '{a+=$1;} END {printf "%.1f GB\n", a/2**30;}'`
`find /data/sf-databuffer/daq_swissfel/daq_swissfel_3 -type f -newerct "2000-01-01" ! -newerct "2018-06-27 23:00" -printf '%s\n' | awk '{a+=$1;} END {printf "%.1f GB\n", a/2**30;}'`
### Delete Specific Files (do not forget -empty if needed!!!)
`find /data/sf-databuffer/daq_swissfel/daq_swissfel_3 -type f -mtime +5 -regextype sed -regex '.*LOSS_SIGNAL_RAW.*' -delete`
`find /data/sf-databuffer/daq_swissfel/daq_swissfel_3 -type d -empty -delete`
`find /gpfs/sf-data/sf-imagebuffer/daq_swissfel/daq_swissfel_4 -type f -newerct "2000-01-01" ! -newerct "2018-06-25 23:00" -delete`
`find /gpfs/sf-data/sf-imagebuffer/daq_swissfel/daq_swissfel_4 -type d -empty -delete`
### Delete empty files in parallel
```bash
# use 32 threads
find /gls_data/gls-archive/daq_local/daq_local_*/byTime/ -maxdepth 1 | tail -n +2 | xargs -I {} -P 32 -n 1 find {} -type f -empty -delete
```
### Parallel rsync
```bash
# see: https://stackoverflow.com/a/46611168
# SETUP OPTIONS
export SRCDIR="/home/maerki_f/Downloads/rsync_test/.snapshot/data/test"
# export SRCDIR="/gls_data/.snapshot/daily.2018-07-18_0010/gls-archive/daq_local/daq_local_2/byTime"
export DESTDIR="/home/maerki_f/Downloads/rsync_test/data/test"
# export DESTDIR="/gls_data/gls-archive/daq_local/daq_local_2/byTime"
# use 32 threads
ls -1 $SRCDIR | xargs -I {} -P 32 -n 1 rsync -auvh --progress $SRCDIR/{} $DESTDIR/
```

View File

@@ -2,7 +2,7 @@
# this controls whether an Ansible playbook should prompt for a sudo password by default when sudoing. Default: False
#ask_sudo_pass=True
inventory=inventories/sf
inventory=inventory
host_key_checking = False
[ssh_connection]

Binary file not shown.

Before

Width:  |  Height:  |  Size: 139 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.4 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 574 KiB

View File

@@ -1,13 +0,0 @@
- hosts: databuffer_cluster
become: true
tasks:
- name: Specifying a path directly
fetch:
src: /opt/dispatcher_node/latest/logs/data_validation.log.1.zip
dest: /tmp/daq/prefix-{{ inventory_hostname }}
flat: yes
# - name: Fetch stuff from the remote and save to local
# synchronize: src={{ item }} dest=/tmp/daq mode=pull
# with_items:
# - "/opt/dispatcher_node/latest/logs/data_validation.log"
# - "/opt/dispatcher_node/latest/logs/data_validation.log.1.zip"

View File

@@ -1,10 +0,0 @@
# ensure that autofs is disabled
# that there is no afs
- include: install_user.yml
- include: install_limits.yml
- include: install_jdk.yml
- include: install_dispatcher_node.yml
- include: install_query_node.yml
# a restart of the machine is required to have the limits being applied

View File

@@ -1,67 +0,0 @@
- hosts: databuffer_cluster
become: true
vars:
dispatcher_node_version: 1.14.17
binaries_install_dir: /opt/databuffer
tasks:
- name: Create deployment directory - dispatcher_node
file:
path: "{{binaries_install_dir}}/lib"
owner: daqusr
group: daq
state: directory
- name: Download app binary
get_url:
url: https://artifacts.psi.ch/artifactory/libs-snapshots-local/ch/psi/daq/dispatchernode/{{dispatcher_node_version}}/dispatchernode-{{dispatcher_node_version}}-all.jar
dest: "{{binaries_install_dir}}/lib/"
owner: daqusr
group: daq
# Deploy systemd unit file for dispatchernode
- template:
src: templates/daq-dispatcher-node.service.j2
dest: /etc/systemd/system/daq-dispatcher-node.service
- name: Reload systemd unit files
systemd:
daemon_reload: yes
- name: Make sure the tuned service is enabled an started
systemd:
enabled: yes
state: started
name: tuned
- name: Make sure the daq-dispatcher-node is enabled
systemd:
enabled: yes
name: daq-dispatcher-node
- hosts: imagebuffer
become: true
tasks:
- name: Creates configuration directory
file:
path: /home/daqusr/.config/daq
owner: daqusr
group: daq
state: directory
- template:
src: templates/domain.properties.j2
dest: /home/daqusr/.config/daq/domain.properties
owner: daqusr
group: daq
mode: '644'
- hosts: databuffer
become: true
tasks:
- name: Creates configuration directory
file:
path: /home/daqusr/.config/daq
owner: daqusr
group: daq
state: directory
- template:
src: templates/domain.properties.j2
dest: /home/daqusr/.config/daq/domain.properties
owner: daqusr
group: daq
mode: '644'

View File

@@ -1,66 +0,0 @@
- hosts: databuffer_cluster
become: true
vars:
dispatcher_node_version: 1.14.8
tasks:
- name: Creates deployment directory
file:
path: /opt/dispatcher_node/latest/lib
owner: daqusr
group: daq
state: directory
- name: Download app binary
get_url:
url: https://artifacts.psi.ch/artifactory/libs-snapshots-local/ch/psi/daq/dispatchernode/{{dispatcher_node_version}}/dispatchernode-{{dispatcher_node_version}}-all.jar
dest: /opt/dispatcher_node/latest/lib/
owner: daqusr
group: daq
# Deploy systemd unit file for dispatchernode
- template:
src: templates/daq-dispatcher-node.service_jdk8.j2
dest: /etc/systemd/system/daq-dispatcher-node.service
- name: Reload systemd unit files
systemd:
daemon_reload: yes
- name: Make sure the tuned service is enabled an started
systemd:
enabled: yes
state: started
name: tuned
- name: Make sure the daq-dispatcher-node is enabled
systemd:
enabled: yes
name: daq-dispatcher-node
- hosts: imagebuffer
become: true
tasks:
- name: Creates configuration directory
file:
path: /home/daqusr/.config/daq
owner: daqusr
group: daq
state: directory
- template:
src: templates/imagebuffer_domain.properties
dest: /home/daqusr/.config/daq/domain.properties
owner: daqusr
group: daq
mode: '644'
- hosts: databuffer
become: true
tasks:
- name: Creates configuration directory
file:
path: /home/daqusr/.config/daq
owner: daqusr
group: daq
state: directory
- template:
src: templates/databuffer_domain.properties
dest: /home/daqusr/.config/daq/domain.properties
owner: daqusr
group: daq
mode: '644'

View File

@@ -1,57 +0,0 @@
- hosts: imagebuffer
become: true
tasks:
- name: Install https certificate
template:
src: templates/elastic-stack-ca.pem
dest: /etc/pki/tls/certs/elastic-stack-ca.pem
- name: Install journalbeat
yum:
name: https://artifacts.elastic.co/downloads/beats/journalbeat/journalbeat-7.3.2-x86_64.rpm
state: present
- name: Install journalbeat configuration
template:
src: templates/journalbeat.yml
dest: /etc/journalbeat/journalbeat.yml
# - name: Install auditbeat
# yum:
# name: https://artifacts.elastic.co/downloads/beats/auditbeat/auditbeat-7.3.2-x86_64.rpm
# state: present
# - name: Install auditbeat configuration
# template:
# src: templates/auditbeat.yml
# dest: /etc/auditbeat/auditbeat.yml
#
# - name: Install metricbeat
# yum:
# name: https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-7.3.2-x86_64.rpm
# state: present
# - name: Install metricbeat configuration
# template:
# src: templates/metricbeat.yml
# dest: /etc/metricbeat/metricbeat.yml
# - name: Install metricbeat system.yml configuration
# template:
# src: templates/system.yml
# dest: /etc/metricbeat/modules.d/system.yml
- name: Reload systemd unit files
systemd:
daemon_reload: yes
- name: Enable and start journalbeat
systemd:
enabled: yes
state: restarted
name: journalbeat
# - name: Enable and start metricbeat
# systemd:
# enabled: yes
# state: restarted
# name: metricbeat
# - name: Enable and start auditbeat
# systemd:
# enabled: yes
# state: restarted
# name: auditbeat

View File

@@ -1,66 +0,0 @@
- hosts: dispatcher_api_office
become: true
tasks:
- name: Install https certificate
template:
src: templates/elastic-stack-ca.pem
dest: /etc/pki/tls/certs/elastic-stack-ca.pem
- name: Install osquery
yum:
name: https://pkg.osquery.io/rpm/osquery-4.0.2-1.linux.x86_64.rpm
state: present
- name: Install heartbeat
yum:
name: https://artifacts.elastic.co/downloads/beats/heartbeat/heartbeat-7.3.2-x86_64.rpm
state: present
- name: Install heartbeat configuration
template:
src: templates/heartbeat.yml
dest: /etc/heartbeat/heartbeat.yml
- name: Install heartbeat monitors
template:
src: templates/reachable.icmp.yml
dest: /etc/heartbeat/monitors.d/reachable.icmp.yml
- name: Install metricbeat
yum:
name: https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-7.3.2-x86_64.rpm
state: present
- name: Install metricbeat configuration
template:
src: templates/metricbeat.yml
dest: /etc/metricbeat/metricbeat.yml
- name: Install metricbeat system.yml configuration
template:
src: templates/system.yml
dest: /etc/metricbeat/modules.d/system.yml
- name: Install auditbeat
yum:
name: https://artifacts.elastic.co/downloads/beats/auditbeat/auditbeat-7.3.2-x86_64.rpm
state: present
- name: Install auditbeat configuration
template:
src: templates/auditbeat.yml
dest: /etc/auditbeat/auditbeat.yml
- name: Reload systemd unit files
systemd:
daemon_reload: yes
- name: Enable and start journalbeat
systemd:
enabled: yes
state: restarted
name: heartbeat-elastic
- name: Enable and start metricbeat
systemd:
enabled: yes
state: restarted
name: metricbeat
- name: Enable and start auditbeat
systemd:
enabled: yes
state: restarted
name: auditbeat

View File

@@ -1,36 +0,0 @@
- hosts: imageapi
become: true
gather_facts: no
vars:
imageapi_version: 0.0.0.000
tasks:
- name: mkdir deployment directory
file:
path: /opt/imageapi/{{imageapi_version}}/lib
owner: daqusr
group: daq
state: directory
- name: deploy jar
get_url:
url: https://artifacts.psi.ch/artifactory/libs-snapshots-local/ch/psi/daq/imageapi/{{imageapi_version}}/imageapi-{{imageapi_version}}-all.jar
dest: /opt/imageapi/{{imageapi_version}}/lib
owner: daqusr
group: daq
- template:
src: templates/imageapi.service.j2
dest: /etc/systemd/system/imageapi.service
- name: mkdir etc
file:
path: /etc/imageapi
group: daq
state: directory
- template:
src: templates/imageapi.application.properties
dest: /etc/imageapi/application.properties
- name: reload systemd daemon
systemd:
daemon_reload: yes
- name: enable service
systemd:
name: imageapi
enabled: yes

View File

@@ -1,7 +0,0 @@
- hosts: databuffer_cluster
become: true
tasks:
- name: Install jdk-13
yum:
name: java-13-openjdk-devel
state: present

View File

@@ -1,8 +0,0 @@
- hosts: databuffer_cluster
become: true
tasks:
- name: Install jdk from a local file
yum:
# name: https://artifacts.psi.ch/artifactory/releases/jdk-8u162-linux-x64.rpm
name: java-1.8.0-openjdk-devel
state: present

View File

@@ -1,28 +0,0 @@
- hosts: databuffer_cluster
become: true
tasks:
- template:
src: templates/90-daq_limits.d.conf
dest: /etc/security/limits.d/90-daq.conf
mode: '644'
- template:
src: templates/90-daq_sysctl.d.conf
dest: /etc/sysctl.d/90-daq.conf
mode: '644'
# - name: Set limits in /etc/security/limits.conf
# shell: |
# echo "daqusr - memlock unlimited" >> /etc/security/limits.conf
# echo "daqusr - nofile 500000" >> /etc/security/limits.conf
# echo "daqusr - nproc 32768" >> /etc/security/limits.conf
# echo "daqusr - as unlimited" >> /etc/security/limits.conf
# unlimited does not work for nofile (user cannot login ???)
# this should actually go into /etc/security/limits.d/99-daq.conf
# - name: Set limits in /etc/sysctl.conf
# shell: |
# echo "" >> /etc/sysctl.conf
# echo "vm.max_map_count = 131072" >> /etc/sysctl.conf
# echo "vm.swappiness = 1" >> /etc/sysctl.conf
# sysctl -p
# a restart of the machine is required to have the limits being applied

View File

@@ -1,36 +0,0 @@
- hosts: databuffer_cluster
become: true
vars:
query_node_version: 1.14.17
binaries_install_dir: /opt/databuffer
tasks:
- name: Create deployment directory - dispatcher_node
file:
path: "{{binaries_install_dir}}/lib"
owner: daqusr
group: daq
state: directory
- name: Download app binary
get_url:
url: https://artifacts.psi.ch/artifactory/libs-snapshots-local/ch/psi/daq/querynode/{{query_node_version}}/querynode-{{query_node_version}}-all.jar
dest: "{{binaries_install_dir}}/lib/"
owner: daqusr
group: daq
# Deploy systemd unit file for querynode
- template:
src: templates/daq-query-node.service.j2
dest: /etc/systemd/system/daq-query-node.service
- name: Reload systemd unit files
systemd:
daemon_reload: yes
- name: Make sure the tuned service is enabled an started
systemd:
enabled: yes
state: started
name: tuned
- name: Make sure the daq-query-node is enabled
systemd:
enabled: yes
name: daq-query-node

View File

@@ -1,34 +0,0 @@
- hosts: databuffer_cluster
become: true
vars:
query_node_version: 1.14.8
tasks:
- name: Creates deployment directory
file:
path: /opt/query_node/latest/lib
owner: daqusr
group: daq
state: directory
- name: Download app binary
get_url:
url: https://artifacts.psi.ch/artifactory/libs-snapshots-local/ch/psi/daq/querynode/{{query_node_version}}/querynode-{{query_node_version}}-all.jar
dest: /opt/query_node/latest/lib/
owner: daqusr
group: daq
# Deploy systemd unit file for querynode
- template:
src: templates/daq-query-node.service_jdk8.j2
dest: /etc/systemd/system/daq-query-node.service
- name: Reload systemd unit files
systemd:
daemon_reload: yes
- name: Make sure the tuned service is enabled an started
systemd:
enabled: yes
state: started
name: tuned
- name: Make sure the daq-query-node is enabled
systemd:
enabled: yes
name: daq-query-node

View File

@@ -1,40 +0,0 @@
- hosts: databuffer_cluster
become: true
vars:
binaries_version: 1.14.17
binaries_install_dir: /opt/databuffer
tasks:
- name: Download jar - dispatcher_rest
get_url:
url: https://artifacts.psi.ch/artifactory/libs-snapshots-local/ch/psi/daq/dispatcherrest/{{binaries_version}}/dispatcherrest-{{binaries_version}}-all.jar
dest: "{{binaries_install_dir}}/lib/"
owner: daqusr
group: daq
- name: Download jar - query_rest
get_url:
url: https://artifacts.psi.ch/artifactory/libs-snapshots-local/ch/psi/daq/queryrest/{{binaries_version}}/queryrest-{{binaries_version}}-all.jar
dest: "{{binaries_install_dir}}/lib/"
owner: daqusr
group: daq
# Deploy systemd unit file for dispatchernode
- template:
src: templates/daq-dispatcher-rest.service.j2
dest: /etc/systemd/system/daq-dispatcher-rest.service
- template:
src: templates/daq-query-rest.service.j2
dest: /etc/systemd/system/daq-query-rest.service
- name: Reload systemd unit files
systemd:
daemon_reload: yes
- systemd:
enabled: yes
name: daq-dispatcher-rest
- systemd:
enabled: yes
name: daq-query-rest

View File

@@ -1,14 +0,0 @@
- hosts: databuffer_cluster
become: true
tasks:
- name: Ensure group "daq" exists
group:
name: daq
gid: 1000
state: present
- name: Add the user 'daqusr'
user:
name: daqusr
uid: 1000
comment: DAQ User
group: daq

View File

@@ -1,79 +0,0 @@
[data_api_office]
data-api.psi.ch
[dispatcher_api_office]
dispatcher-api.psi.ch
[data_api]
sf-data-api.psi.ch
sf-data-api-02.psi.ch
[dispatcher_api]
sf-dispatcher-api.psi.ch
[databuffer_cluster:children]
databuffer
imagebuffer
[imagebuffer]
sf-daq-5.psi.ch
sf-daq-6.psi.ch
[databuffer]
sf-daqbuf-21.psi.ch
sf-daqbuf-22.psi.ch
sf-daqbuf-23.psi.ch
sf-daqbuf-24.psi.ch
sf-daqbuf-25.psi.ch
sf-daqbuf-26.psi.ch
sf-daqbuf-27.psi.ch
sf-daqbuf-28.psi.ch
sf-daqbuf-29.psi.ch
sf-daqbuf-30.psi.ch
sf-daqbuf-31.psi.ch
sf-daqbuf-32.psi.ch
sf-daqbuf-33.psi.ch
[databuffer_cluster:vars]
daq_environment=swissfel
[imagebuffer:vars]
# This was calculated like this:
# SYSTEM_CPU_COUNT=$(cat /proc/cpuinfo | grep "physical id" | sort | uniq | wc -l)
# SYSTEM_CORE_PER_CPU_COUNT=$(cat /proc/cpuinfo | grep -o -P 'cpu cores\t: [^\n]*' | cut -f2- -d':' | uniq | tr -d ' ')
# # without hyper threading
# SYSTEM_CORE_COUNT=$(( ${SYSTEM_CPU_COUNT} * ${SYSTEM_CORE_PER_CPU_COUNT} ))
number_of_cores=44
# This was calculated like this:
# SYSTEM_THREAD_COUNT=$(egrep -c 'processor([[:space:]]+):.*' /proc/cpuinfo)
# QUERY_NODE_COMMON_FORK_JOIN_POOL_PARALLELISM="$((2 * ${SYSTEM_THREAD_COUNT}))"
fork_join_pool_parallelism=88
backend_default=sf-imagebuffer
[databuffer:vars]
# This was calculated like this:
# SYSTEM_CPU_COUNT=$(cat /proc/cpuinfo | grep "physical id" | sort | uniq | wc -l)
# SYSTEM_CORE_PER_CPU_COUNT=$(cat /proc/cpuinfo | grep -o -P 'cpu cores\t: [^\n]*' | cut -f2- -d':' | uniq | tr -d ' ')
# # without hyper threading
# SYSTEM_CORE_COUNT=$(( ${SYSTEM_CPU_COUNT} * ${SYSTEM_CORE_PER_CPU_COUNT} ))
number_of_cores=28
# This was calculated like this:
# SYSTEM_THREAD_COUNT=$(egrep -c 'processor([[:space:]]+):.*' /proc/cpuinfo)
# QUERY_NODE_COMMON_FORK_JOIN_POOL_PARALLELISM="$((2 * ${SYSTEM_THREAD_COUNT}))"
fork_join_pool_parallelism=112
backend_default=sf-databuffer
[test]
sf-nube-11
sf-nube-12
[iodatatest]
sf-daq-5.psi.ch
[imageapi]
sf-daq-5.psi.ch

View File

@@ -1,30 +0,0 @@
[data_api]
# sf-daqbuf-34.psi.ch
[dispatcher_api]
# sf-daqbuf-34.psi.ch
[databuffer_cluster:children]
databuffer
[databuffer]
sf-daqbuf-34.psi.ch
[databuffer_cluster:vars]
daq_environment=sf-rf-databuffer
[databuffer:vars]
# This was calculated like this:
# SYSTEM_CPU_COUNT=$(cat /proc/cpuinfo | grep "physical id" | sort | uniq | wc -l)
# SYSTEM_CORE_PER_CPU_COUNT=$(cat /proc/cpuinfo | grep -o -P 'cpu cores\t: [^\n]*' | cut -f2- -d':' | uniq | tr -d ' ')
# # without hyper threading
# SYSTEM_CORE_COUNT=$(( ${SYSTEM_CPU_COUNT} * ${SYSTEM_CORE_PER_CPU_COUNT} ))
number_of_cores=28
# This was calculated like this:
# SYSTEM_THREAD_COUNT=$(egrep -c 'processor([[:space:]]+):.*' /proc/cpuinfo)
# QUERY_NODE_COMMON_FORK_JOIN_POOL_PARALLELISM="$((2 * ${SYSTEM_THREAD_COUNT}))"
fork_join_pool_parallelism=112
backend_default=sf-rf-databuffer

View File

@@ -1,41 +0,0 @@
[databuffer_cluster:children]
databuffer
imagebuffer
[databuffer]
sf-nube-11.psi.ch
sf-nube-12.psi.ch
sf-nube-13.psi.ch
[imagebuffer]
sf-nube-14.psi.ch
[databuffer_cluster:vars]
daq_environment=swissfel-test
[imagebuffer:vars]
# This was calculated like this:
# SYSTEM_CPU_COUNT=$(cat /proc/cpuinfo | grep "physical id" | sort | uniq | wc -l)
# SYSTEM_CORE_PER_CPU_COUNT=$(cat /proc/cpuinfo | grep -o -P 'cpu cores\t: [^\n]*' | cut -f2- -d':' | uniq | tr -d ' ')
# # without hyper threading
# SYSTEM_CORE_COUNT=$(( ${SYSTEM_CPU_COUNT} * ${SYSTEM_CORE_PER_CPU_COUNT} ))
number_of_cores=20
# This was calculated like this:
# SYSTEM_THREAD_COUNT=$(egrep -c 'processor([[:space:]]+):.*' /proc/cpuinfo)
# QUERY_NODE_COMMON_FORK_JOIN_POOL_PARALLELISM="$((2 * ${SYSTEM_THREAD_COUNT}))"
fork_join_pool_parallelism=80
[databuffer:vars]
# This was calculated like this:
# SYSTEM_CPU_COUNT=$(cat /proc/cpuinfo | grep "physical id" | sort | uniq | wc -l)
# SYSTEM_CORE_PER_CPU_COUNT=$(cat /proc/cpuinfo | grep -o -P 'cpu cores\t: [^\n]*' | cut -f2- -d':' | uniq | tr -d ' ')
# # without hyper threading
# SYSTEM_CORE_COUNT=$(( ${SYSTEM_CPU_COUNT} * ${SYSTEM_CORE_PER_CPU_COUNT} ))
number_of_cores=22
# This was calculated like this:
# SYSTEM_THREAD_COUNT=$(egrep -c 'processor([[:space:]]+):.*' /proc/cpuinfo)
# QUERY_NODE_COMMON_FORK_JOIN_POOL_PARALLELISM="$((2 * ${SYSTEM_THREAD_COUNT}))"
fork_join_pool_parallelism=82

View File

@@ -1,40 +0,0 @@
[data_api_office]
data-api.psi.ch
[dispatcher_api_office]
dispatcher-api.psi.ch
[data_api]
#twlha-data-api.psi.ch
[dispatcher_api]
#twlha-dispatcher-api.psi.ch
[databuffer_cluster:children]
databuffer
[databuffer]
twlha-daqbuf-21.psi.ch
[imagebuffer]
[databuffer_cluster:vars]
daq_environment=twlha-databuffer
[databuffer:vars]
# This was calculated like this:
# SYSTEM_CPU_COUNT=$(cat /proc/cpuinfo | grep "physical id" | sort | uniq | wc -l)
# SYSTEM_CORE_PER_CPU_COUNT=$(cat /proc/cpuinfo | grep -o -P 'cpu cores\t: [^\n]*' | cut -f2- -d':' | uniq | tr -d ' ')
# # without hyper threading
# SYSTEM_CORE_COUNT=$(( ${SYSTEM_CPU_COUNT} * ${SYSTEM_CORE_PER_CPU_COUNT} ))
number_of_cores=28
# This was calculated like this:
# SYSTEM_THREAD_COUNT=$(egrep -c 'processor([[:space:]]+):.*' /proc/cpuinfo)
# QUERY_NODE_COMMON_FORK_JOIN_POOL_PARALLELISM="$((2 * ${SYSTEM_THREAD_COUNT}))"
fork_join_pool_parallelism=112
backend_default=twlha-databuffer
[imagebuffer:vars]
backend_default=twlha-imagebuffer

20
operation-tools/inventory Normal file
View File

@@ -0,0 +1,20 @@
[databuffer]
sf-daqbuf-21.psi.ch
sf-daqbuf-22.psi.ch
sf-daqbuf-23.psi.ch
sf-daqbuf-24.psi.ch
sf-daqbuf-25.psi.ch
sf-daqbuf-26.psi.ch
sf-daqbuf-27.psi.ch
sf-daqbuf-28.psi.ch
sf-daqbuf-29.psi.ch
sf-daqbuf-30.psi.ch
sf-daqbuf-31.psi.ch
sf-daqbuf-32.psi.ch
sf-daqbuf-33.psi.ch
[sf_data_api_databuffer]
sf-data-api.psi.ch
[sf_dispatcher_api_databuffer]
sf-dispatcher-api.psi.ch

View File

@@ -0,0 +1,23 @@
- import_playbook: stop.yml
- import_playbook: start.yml
- name: restart data api
hosts: sf_data_api_databuffer
become: true
tasks:
- name: restart data-api-databuffer
systemd:
state: restarted
name: data-api-databuffer
- name: restart dispatcher api
hosts: sf_dispatcher_api_databuffer
become: true
tasks:
- name: restart dispatcher-api-databuffer
systemd:
state: restarted
name: dispatcher-api-databuffer

View File

@@ -1,108 +0,0 @@
- name: stop data api
hosts: data_api
become: true
tasks:
- name: stop data-api
systemd:
state: stopped
name: data-api
- name: stop nginx
systemd:
state: stopped
name: nginx
- name: stop dispatcher api
hosts: dispatcher_api
become: true
tasks:
- name: stop dispatcher-api
systemd:
state: stopped
name: dispatcher-api
- name: stop nginx
systemd:
state: stopped
name: nginx
- name: stop nodes
hosts: databuffer_cluster
become: true
tasks:
- name: stop daq-dispatcher-node
systemd:
state: stopped
name: daq-dispatcher-node
- name: stop daq-query-node
systemd:
state: stopped
name: daq-query-node
- name: Remove sources
file:
path: /home/daqusr/.config/daq/stores/sources
state: absent
- name: Remove streamers
file:
path: /home/daqusr/.config/daq/stores/streamers
state: absent
# IMPORTANT: It is necessary to bring up the dispatcher node processes first
# before starting the query node processes!
- name: start dispatcher nodes
hosts: databuffer_cluster
become: true
# serial: 1
tasks:
- name: start daq-dispatcher-node
systemd:
state: started
name: daq-dispatcher-node
- name: wait for dispatcher nodes to come up
hosts: dispatcher_api
tasks:
- name: sleep for 120 seconds and continue with play
wait_for:
timeout: 120
- name: start query nodes
hosts: databuffer_cluster
become: true
# serial: 1
tasks:
- name: start daq-query-node
systemd:
state: started
name: daq-query-node
- name: start data api
hosts: data_api
become: true
tasks:
- name: start data-api
systemd:
state: started
name: data-api
- name: start nginx
systemd:
state: started
name: nginx
- name: start dispatcher api
hosts: dispatcher_api
become: true
tasks:
- name: start dispatcher-api
systemd:
state: started
name: dispatcher-api
- name: start nginx
systemd:
state: started
name: nginx

View File

@@ -1,26 +0,0 @@
- name: stop data api
hosts: data_api
become: true
tasks:
- name: stop data-api
systemd:
state: stopped
name: data-api
- name: stop nginx
systemd:
state: stopped
name: nginx
- name: start data api
hosts: data_api
become: true
tasks:
- name: start data-api
systemd:
state: restarted
name: data-api
- name: start nginx
systemd:
state: restarted
name: nginx

View File

@@ -1,38 +0,0 @@
- name: Restart API3 Processes
hosts: databuffer
become: true
tasks:
- name: Stop nginx
systemd:
state: stopped
name: nginx
- name: Stop retrieval00
systemd:
state: stopped
name: retrieval-00
- name: Stop retrieval01
systemd:
state: stopped
name: retrieval-01
- name: Stop retrieval02
systemd:
state: stopped
name: retrieval-02
- name: Start nginx
systemd:
state: started
name: nginx
- name: Start retrieval00
systemd:
state: started
name: retrieval-00
- name: Start retrieval01
systemd:
state: started
name: retrieval-01
- name: Start retrieval02
systemd:
state: started
name: retrieval-02

View File

@@ -1,54 +0,0 @@
- name: restart dataretrieval
hosts: data_api
become: true
tasks:
- name: stop data-api
systemd:
state: stopped
name: data-api
- name: stop nginx
systemd:
state: stopped
name: nginx
- name: stop dispatcher api
hosts: dispatcher_api
become: true
tasks:
- name: stop dispatcher-api
systemd:
state: stopped
name: dispatcher-api
- name: stop nginx
systemd:
state: stopped
name: nginx
- name: start data api
hosts: data_api
become: true
tasks:
- name: start data-api
systemd:
state: started
name: data-api
- name: start nginx
systemd:
state: started
name: nginx
- name: start dispatcher api
hosts: dispatcher_api
become: true
tasks:
- name: start dispatcher-api
systemd:
state: started
name: dispatcher-api
- name: start nginx
systemd:
state: started
name: nginx

View File

@@ -1,79 +0,0 @@
- name: stop data api
hosts: data_api
become: true
tasks:
- name: stop data-api
systemd:
state: stopped
name: data-api
- name: stop nginx
systemd:
state: stopped
name: nginx
- name: stop dispatcher api
hosts: dispatcher_api
become: true
tasks:
- name: stop dispatcher-api
systemd:
state: stopped
name: dispatcher-api
- name: stop nginx
systemd:
state: stopped
name: nginx
- name: restart dispatcher api office
hosts: dispatcher_api_office
become: true
tasks:
- name: restart central dispatcher-api
systemd:
name: dispatcher-api-central
state: restarted
- name: restart nginx
systemd:
state: restarted
name: nginx
- name: restart data api office
hosts: data_api_office
become: true
tasks:
- name: restart central data-api
systemd:
name: data-api-central
state: restarted
- name: restart nginx
systemd:
name: nginx
state: restarted
- name: start data api
hosts: data_api
become: true
tasks:
- name: start data-api
systemd:
state: started
name: data-api
- name: start nginx
systemd:
state: started
name: nginx
- name: start dispatcher api
hosts: dispatcher_api
become: true
tasks:
- name: start dispatcher-api
systemd:
state: started
name: dispatcher-api
- name: start nginx
systemd:
state: started
name: nginx

View File

@@ -1,11 +0,0 @@
- hosts: imageapi
become: true
gather_facts: no
tasks:
- name: systemd daemon reload
systemd:
daemon_reload: yes
- name: restart service
systemd:
name: imageapi
state: restarted

View File

@@ -1,70 +0,0 @@
- name: stop nodes
hosts: imagebuffer
become: true
tasks:
- name: stop daq-dispatcher-node
systemd:
state: stopped
name: daq-dispatcher-node
- name: stop daq-query-node
systemd:
state: stopped
name: daq-query-node
- name: Remove sources
file:
path: /home/daqusr/.config/daq/stores/sources
state: absent
- name: Remove streamers
file:
path: /home/daqusr/.config/daq/stores/streamers
state: absent
- name: start dispatcher nodes
hosts: imagebuffer
become: true
# serial: 1
tasks:
- name: start daq-dispatcher-node
systemd:
state: started
name: daq-dispatcher-node
- name: wait for dispatcher nodes to come up
hosts: imagebuffer
tasks:
- name: sleep for 30 seconds and continue with play
wait_for:
timeout: 30
- name: start query nodes
hosts: imagebuffer
become: true
# serial: 1
tasks:
- name: start daq-query-node
systemd:
state: started
name: daq-query-node
- name: restart data api
hosts: data_api
become: true
tasks:
- name: restart data-api
systemd:
state: restarted
name: data-api
- name: restart dispatcher api
hosts: dispatcher_api
become: true
tasks:
- name: restart dispatcher-api
systemd:
state: restarted
name: dispatcher-api

View File

@@ -1,31 +0,0 @@
- name: Restart API3 Processes
hosts: imagebuffer
become: true
tasks:
- name: Stop nginx
systemd:
state: stopped
name: nginx
- name: Stop retrieval00
systemd:
state: stopped
name: retrieval-00
- name: Stop retrieval01
systemd:
state: stopped
name: retrieval-01
- name: Start nginx
systemd:
state: started
name: nginx
- name: Start retrieval00
systemd:
state: started
name: retrieval-00
- name: Start retrieval01
systemd:
state: started
name: retrieval-01

18
operation-tools/start.yml Normal file
View File

@@ -0,0 +1,18 @@
- name: Start nodes
hosts: databuffer
become: true
tasks:
- name: start daq-dispatcher-node
systemd:
state: started
name: daq-dispatcher-node
- name: sleep for 30 seconds and continue with play
wait_for:
timeout: 30
- name: start daq-query-node
systemd:
state: started
name: daq-query-node

View File

@@ -1,28 +1,24 @@
- name: stop nodes
hosts: databuffer_cluster
- name: Stop nodes
hosts: databuffer
become: true
tasks:
- name: stop daq-dispatcher-node
systemd:
state: stopped
name: daq-dispatcher-node
- name: stop daq-query-node
systemd:
state: stopped
name: daq-query-node
- name: Remove sources
file:
path: /home/daqusr/.config/daq/stores/sources
state: absent
- name: Remove streamers
file:
path: /home/daqusr/.config/daq/stores/streamers
state: absent
- import_playbook: uninstall_query_node.yml
- import_playbook: uninstall_dispatcher_node.yml
- import_playbook: install_query_node_jdk8.yml
- import_playbook: install_dispatcher_node_jdk8.yml
- import_playbook: restart_cluster.yml

View File

@@ -1,48 +0,0 @@
- name: stop data api
hosts: data_api
become: true
tasks:
- name: stop data-api
systemd:
state: stopped
name: data-api
- name: stop nginx
systemd:
state: stopped
name: nginx
- name: stop dispatcher api
hosts: dispatcher_api
become: true
tasks:
- name: stop dispatcher-api
systemd:
state: stopped
name: dispatcher-api
- name: stop nginx
systemd:
state: stopped
name: nginx
- name: stop nodes
hosts: databuffer_cluster
become: true
tasks:
- name: stop daq-dispatcher-node
systemd:
state: stopped
name: daq-dispatcher-node
- name: stop daq-query-node
systemd:
state: stopped
name: daq-query-node
- name: Remove sources
file:
path: /home/daqusr/.config/daq/stores/sources
state: absent
- name: Remove streamers
file:
path: /home/daqusr/.config/daq/stores/streamers
state: absent

View File

@@ -1,4 +0,0 @@
daqusr - memlock unlimited
daqusr - nofile 500000
daqusr - nproc 32768
daqusr - as unlimited

View File

@@ -1,2 +0,0 @@
vm.max_map_count = 131072
vm.swappiness = 1

View File

@@ -1,220 +0,0 @@
###################### Auditbeat Configuration Example #########################
# This is an example configuration file highlighting only the most common
# options. The auditbeat.reference.yml file from the same directory contains all
# the supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/auditbeat/index.html
#========================== Modules configuration =============================
auditbeat.modules:
- module: auditd
# Load audit rules from separate files. Same format as audit.rules(7).
audit_rule_files: [ '${path.config}/audit.rules.d/*.conf' ]
audit_rules: |
## Define audit rules here.
## Create file watches (-w) or syscall audits (-a or -A). Uncomment these
## examples or add your own rules.
## If you are on a 64 bit platform, everything should be running
## in 64 bit mode. This rule will detect any use of the 32 bit syscalls
## because this might be a sign of someone exploiting a hole in the 32
## bit API.
-a always,exit -F arch=b32 -S all -F key=32bit-abi
## Executions.
-a always,exit -F arch=b64 -S execve,execveat -k exec
## External access (warning: these can be expensive to audit).
-a always,exit -F arch=b64 -S accept,bind,connect -F key=external-access
## Identity changes.
-w /etc/group -p wa -k identity
-w /etc/passwd -p wa -k identity
-w /etc/gshadow -p wa -k identity
## Unauthorized access attempts.
-a always,exit -F arch=b64 -S open,creat,truncate,ftruncate,openat,open_by_handle_at -F exit=-EACCES -k access
-a always,exit -F arch=b64 -S open,creat,truncate,ftruncate,openat,open_by_handle_at -F exit=-EPERM -k access
- module: file_integrity
paths:
- /bin
- /usr/bin
- /sbin
- /usr/sbin
- /etc
- module: system
datasets:
- host # General host information, e.g. uptime, IPs
- login # User logins, logouts, and system boots.
- package # Installed, updated, and removed packages
- process # Started and stopped processes
- socket # Opened and closed sockets
- user # User information
# How often datasets send state updates with the
# current state of the system (e.g. all currently
# running processes, all open sockets).
state.period: 12h
# Enabled by default. Auditbeat will read password fields in
# /etc/passwd and /etc/shadow and store a hash locally to
# detect any changes.
user.detect_password_changes: true
# File patterns of the login record files.
login.wtmp_file_pattern: /var/log/wtmp*
login.btmp_file_pattern: /var/log/btmp*
#==================== Elasticsearch template setting ==========================
setup.template.settings:
index.number_of_shards: 1
#index.codec: best_compression
#_source.enabled: false
#================================ General =====================================
# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:
# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]
# Optional fields that you can specify to add additional information to the
# output.
#fields:
# env: staging
#============================== Dashboards =====================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false
# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:
#============================== Kibana =====================================
# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:
host: "https://realstuff.psi.ch:5601"
ssl.certificate_authorities: ["/etc/pki/tls/certs/elastic-stack-ca.pem"]
# Kibana Host
# Scheme and port can be left out and will be set to the default (http and 5601)
# In case you specify and additional path, the scheme is required: http://localhost:5601/path
# IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
#host: "localhost:5601"
# Kibana Space ID
# ID of the Kibana Space into which the dashboards should be loaded. By default,
# the Default Space will be used.
#space.id:
#============================= Elastic Cloud ==================================
# These settings simplify using Auditbeat with the Elastic Cloud (https://cloud.elastic.co/).
# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:
# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:
#================================ Outputs =====================================
# Configure what output to use when sending the data collected by the beat.
#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["realstuff.psi.ch:9200"]
# Optional protocol and basic auth credentials.
protocol: "https"
username: "beats_user"
password: "beats123"
ssl.certificate_authorities: ["/etc/pki/tls/certs/elastic-stack-ca.pem"]
# Optional protocol and basic auth credentials.
#protocol: "https"
#username: "elastic"
#password: "changeme"
#----------------------------- Logstash output --------------------------------
#output.logstash:
# The Logstash hosts
#hosts: ["localhost:5044"]
# Optional SSL. By default is off.
# List of root certificates for HTTPS server verifications
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
# Certificate for SSL client authentication
#ssl.certificate: "/etc/pki/client/cert.pem"
# Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"
#================================ Processors =====================================
# Configure processors to enhance or manipulate events generated by the beat.
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
#================================ Logging =====================================
# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug
# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]
#============================== Xpack Monitoring ===============================
# auditbeat can export internal metrics to a central Elasticsearch monitoring
# cluster. This requires xpack monitoring to be enabled in Elasticsearch. The
# reporting is disabled by default.
# Set to true to enable the monitoring reporter.
monitoring.enabled: true
# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Auditbeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
#monitoring.cluster_uuid:
monitoring.cluster_uuid: "57-GhvUVR1WM1D-42XEFYg"
# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well.
# Note that the settings should point to your Elasticsearch *monitoring* cluster.
# Any setting that is not set is automatically inherited from the Elasticsearch
# output configuration, so if you have the Elasticsearch output configured such
# that it is pointing to your Elasticsearch monitoring cluster, you can simply
# uncomment the following line.
#monitoring.elasticsearch:
#================================= Migration ==================================
# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true

View File

@@ -1,47 +0,0 @@
[Unit]
Description=Dispatcher Node
After=network.target local-fs.target tuned.service
[Service]
User=daqusr
ExecStart=/usr/lib/jvm/jre/bin/java --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED \
--add-opens java.management/sun.management=ALL-UNNAMED \
-Xms8G \
-Xmx32G \
-Xmn2G \
-Xss256k \
-DDirectMemoryAllocationThreshold=2KB \
-XX:MaxDirectMemorySize=64G \
-DDirectMemoryCleanerThreshold=0.7 \
-XX:+ExitOnOutOfMemoryError \
--add-exports java.base/jdk.internal.ref=ALL-UNNAMED \
--add-opens java.base/java.nio=ALL-UNNAMED \
--add-opens java.base/sun.nio.ch=ALL-UNNAMED \
--add-opens java.base/java.lang=ALL-UNNAMED \
--add-modules jdk.unsupported \
-XX:+UnlockExperimentalVMOptions \
-XX:+UseZGC \
-XX:ConcGCThreads={{ number_of_cores }} \
-Djava.util.concurrent.ForkJoinPool.common.parallelism={{fork_join_pool_parallelism}} \
-Duser.timezone=Europe/Zurich \
-Dcom.sun.management.jmxremote.port=3334 \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.local.only=false \
-jar {{binaries_install_dir}}/lib/dispatchernode-{{dispatcher_node_version}}-all.jar \
--daq.config.environment={{daq_environment}}
Restart=on-failure
RestartSec=3s
SuccessExitStatus=143
StandardOutput=journal
StandardError=journal
OOMScoreAdjust=-500
LimitNOFILE=500000
LimitMEMLOCK=infinity
LimitNPROC=infinity
LimitAS=infinity
#CPUAccounting=true
#CPUShares=2048
[Install]
WantedBy=multi-user.target

View File

@@ -1,52 +0,0 @@
[Unit]
Description=Dispatcher Node
After=network.target local-fs.target tuned.service
[Service]
User=daqusr
ExecStart=/usr/java/jdk1.8.0_162/bin/java -XX:+CMSClassUnloadingEnabled \
-XX:+UseThreadPriorities \
-Xms8G \
-Xmx32G \
-Xmn2G \
-DDirectMemoryAllocationThreshold=2KB \
-XX:MaxDirectMemorySize=64G \
-DDirectMemoryCleanerThreshold=0.7 \
-XX:+ExitOnOutOfMemoryError \
-Xss256k \
-XX:StringTableSize=1000003 \
-XX:+UseParNewGC \
-XX:+UseConcMarkSweepGC \
-XX:+CMSParallelRemarkEnabled \
-XX:SurvivorRatio=8 \
-XX:MaxTenuringThreshold=1 \
-XX:CMSInitiatingOccupancyFraction=75 \
-XX:+UseCMSInitiatingOccupancyOnly \
-XX:+UseTLAB \
-XX:+PerfDisableSharedMem \
-XX:CMSWaitDuration=10000 \
-XX:+CMSParallelInitialMarkEnabled \
-XX:+CMSEdenChunksRecordAlways \
-XX:CMSWaitDuration=10000 \
-XX:+UseCondCardMark \
-Dcom.sun.management.jmxremote.port=3334 \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.local.only=false \
-jar /opt/dispatcher_node/latest/lib/dispatchernode-{{dispatcher_node_version}}-all.jar \
--daq.config.environment={{daq_environment}}
Restart=on-failure
RestartSec=3s
SuccessExitStatus=143
StandardOutput=journal
StandardError=journal
OOMScoreAdjust=-500
LimitNOFILE=500000
LimitMEMLOCK=infinity
LimitNPROC=infinity
LimitAS=infinity
CPUAccounting=true
CPUShares=2048
[Install]
WantedBy=multi-user.target

View File

@@ -1,46 +0,0 @@
[Unit]
Description=Dispatcher REST Server
After=network.target
PartOf=daq-dispatcher-node.service
[Service]
User=daqusr
ExecStart=/usr/lib/jvm/jre/bin/java \
--add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED \
--add-opens java.management/sun.management=ALL-UNNAMED \
--add-exports java.base/jdk.internal.ref=ALL-UNNAMED \
--add-opens java.base/java.nio=ALL-UNNAMED \
--add-opens java.base/sun.nio.ch=ALL-UNNAMED \
--add-opens java.base/java.lang=ALL-UNNAMED \
--add-modules jdk.unsupported \
-XX:+UnlockExperimentalVMOptions \
-XX:+UseZGC \
-XX:ConcGCThreads=8 \
-Djava.util.concurrent.ForkJoinPool.common.parallelism=16 \
-Duser.timezone=Europe/Zurich \
-Xms128M \
-Xmx1G \
-Xmn64M \
-Xss256k \
-DDirectMemoryAllocationThreshold=50MB \
-DDirectMemoryCleanerThreshold=0.7 \
-XX:MaxDirectMemorySize=1G \
-XX:+ExitOnOutOfMemoryError \
-jar {{binaries_install_dir}}/lib/dispatcherrest-{{binaries_version}}-all.jar \
--daq.config.environment={{daq_environment}} \
--server.port=8081
Restart=on-failure
RestartSec=3s
SuccessExitStatus=143
StandardOutput=journal
StandardError=journal
OOMScoreAdjust=-500
LimitNOFILE=infinity
LimitMEMLOCK=infinity
LimitNPROC=infinity
LimitAS=infinity
CPUAccounting=true
CPUShares=2048
[Install]
WantedBy=multi-user.target

View File

@@ -1,47 +0,0 @@
[Unit]
Description=Query Node
After=network.target local-fs.target tuned.service
[Service]
User=daqusr
ExecStart=/usr/lib/jvm/jre/bin/java --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED \
--add-opens java.management/sun.management=ALL-UNNAMED \
-Xms8G \
-Xmx16G \
-Xmn4G \
-Xss256k \
-DDirectMemoryAllocationThreshold=2KB \
-DDirectMemoryCleanerThreshold=0.7 \
-XX:+ExitOnOutOfMemoryError \
--add-exports java.base/jdk.internal.ref=ALL-UNNAMED \
--add-opens java.base/java.nio=ALL-UNNAMED \
--add-opens java.base/sun.nio.ch=ALL-UNNAMED \
--add-opens java.base/java.lang=ALL-UNNAMED \
--add-modules jdk.unsupported \
-XX:+UnlockExperimentalVMOptions \
-XX:+UseZGC \
-XX:ConcGCThreads={{ number_of_cores }} \
-Djava.util.concurrent.ForkJoinPool.common.parallelism={{fork_join_pool_parallelism}} \
-Duser.timezone=Europe/Zurich \
-XX:MaxDirectMemorySize=64G \
-Dcom.sun.management.jmxremote.port=3336 \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.local.only=false \
-jar {{binaries_install_dir}}/lib/querynode-{{query_node_version}}-all.jar \
--daq.config.environment={{daq_environment}}
Restart=on-failure
RestartSec=3s
SuccessExitStatus=143
StandardOutput=journal
StandardError=journal
OOMScoreAdjust=-500
LimitNOFILE=500000
LimitMEMLOCK=infinity
LimitNPROC=infinity
LimitAS=infinity
#CPUAccounting=true
#CPUShares=2048
[Install]
WantedBy=multi-user.target

View File

@@ -1,52 +0,0 @@
[Unit]
Description=Query Node
After=network.target local-fs.target tuned.service
[Service]
User=daqusr
ExecStart=/usr/java/jdk1.8.0_162/bin/java -XX:+CMSClassUnloadingEnabled \
-XX:+UseThreadPriorities \
-Xms8G \
-Xmx16G \
-Xmn4G \
-DDirectMemoryAllocationThreshold=2KB \
-XX:MaxDirectMemorySize=64G \
-DDirectMemoryCleanerThreshold=0.7 \
-XX:+ExitOnOutOfMemoryError \
-Xss256k \
-XX:StringTableSize=1000003 \
-XX:+UseParNewGC \
-XX:+UseConcMarkSweepGC \
-XX:+CMSParallelRemarkEnabled \
-XX:SurvivorRatio=8 \
-XX:MaxTenuringThreshold=1 \
-XX:CMSInitiatingOccupancyFraction=75 \
-XX:+UseCMSInitiatingOccupancyOnly \
-XX:+UseTLAB \
-XX:+PerfDisableSharedMem \
-XX:CMSWaitDuration=10000 \
-XX:+CMSParallelInitialMarkEnabled \
-XX:+CMSEdenChunksRecordAlways \
-XX:CMSWaitDuration=10000 \
-XX:+UseCondCardMark \
-Dcom.sun.management.jmxremote.port=3336 \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.local.only=false \
-jar /opt/query_node/latest/lib/querynode-{{query_node_version}}-all.jar \
--daq.config.environment={{daq_environment}}
Restart=on-failure
RestartSec=3s
SuccessExitStatus=143
StandardOutput=journal
StandardError=journal
OOMScoreAdjust=-500
LimitNOFILE=500000
LimitMEMLOCK=infinity
LimitNPROC=infinity
LimitAS=infinity
CPUAccounting=true
CPUShares=2048
[Install]
WantedBy=multi-user.target

View File

@@ -1,46 +0,0 @@
[Unit]
Description=Query REST Server
After=network.target
PartOf=daq-query-node.service
[Service]
User=daqusr
ExecStart=/usr/lib/jvm/jre/bin/java \
--add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED \
--add-opens java.management/sun.management=ALL-UNNAMED \
--add-exports java.base/jdk.internal.ref=ALL-UNNAMED \
--add-opens java.base/java.nio=ALL-UNNAMED \
--add-opens java.base/sun.nio.ch=ALL-UNNAMED \
--add-opens java.base/java.lang=ALL-UNNAMED \
--add-modules jdk.unsupported \
-XX:+UnlockExperimentalVMOptions \
-XX:+UseZGC \
-XX:ConcGCThreads=8 \
-Djava.util.concurrent.ForkJoinPool.common.parallelism=16 \
-Duser.timezone=Europe/Zurich \
-Xms1G \
-Xmx12G \
-Xmn1G \
-Xss256k \
-DDirectMemoryAllocationThreshold=50MB \
-DDirectMemoryCleanerThreshold=0.7 \
-XX:MaxDirectMemorySize=1G \
-XX:+ExitOnOutOfMemoryError \
-jar {{binaries_install_dir}}/lib/queryrest-{{binaries_version}}-all.jar \
--daq.config.environment={{daq_environment}} \
--server.port=8080
Restart=on-failure
RestartSec=3s
SuccessExitStatus=143
StandardOutput=journal
StandardError=journal
OOMScoreAdjust=-500
LimitNOFILE=infinity
LimitMEMLOCK=infinity
LimitNPROC=infinity
LimitAS=infinity
CPUAccounting=true
CPUShares=2048
[Install]
WantedBy=multi-user.target

View File

@@ -1 +0,0 @@
backend.default={{backend_default}}

View File

@@ -1,25 +0,0 @@
Bag Attributes
friendlyName: ca
localKeyID: 54 69 6D 65 20 31 35 36 38 36 32 32 33 34 39 32 31 31
subject=/CN=Elastic Certificate Tool Autogenerated CA
issuer=/CN=Elastic Certificate Tool Autogenerated CA
-----BEGIN CERTIFICATE-----
MIIDSjCCAjKgAwIBAgIVALSBEnmcvNWcKOgb37AwpamramBkMA0GCSqGSIb3DQEB
CwUAMDQxMjAwBgNVBAMTKUVsYXN0aWMgQ2VydGlmaWNhdGUgVG9vbCBBdXRvZ2Vu
ZXJhdGVkIENBMB4XDTE5MDkxNjA4MjQ1N1oXDTIyMDkxNTA4MjQ1N1owNDEyMDAG
A1UEAxMpRWxhc3RpYyBDZXJ0aWZpY2F0ZSBUb29sIEF1dG9nZW5lcmF0ZWQgQ0Ew
ggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCWgsvEjbDXlFE6f1OlRg3o
9K9guQfFtio1S1IR+J8itRTc6QtVJ0YSoTLlArj1ZE5SeqctDUFQIwNCm/vD4/6d
kiUrXUamJW+73g1kJgBWi/kn2oMUAUOerSXNF7Y1vKkCwtG9lQqk4ZMt8dKGd0x0
5WkVgAORZrTMUNPYK2HIHG3DhsntHe84u8nR7xMZCuYza/mHC42OiCAEDeIu0R6v
zeQQY0tqxKcQE3FzGzv7fKX0FNjW+fFe4F8qqANy/+YsmIfce/iEd/7bOdIizG3V
P5e1W4jORbhTDnbw79rGgyzLHy0yGLn/o95ixXyM/3qO/aaB44KPIJlFBxz9MsM5
AgMBAAGjUzBRMB0GA1UdDgQWBBRwArjMBG5pxXwo1sWdoY+If3yAzjAfBgNVHSME
GDAWgBRwArjMBG5pxXwo1sWdoY+If3yAzjAPBgNVHRMBAf8EBTADAQH/MA0GCSqG
SIb3DQEBCwUAA4IBAQBNeV3zlAF12/sk4W9icWuuTV2lT6MobTouy0u8zJs4ciQ3
IGzXR6eGfvqulnVNOc754Ndmdj80WbV/WMnZY32IUsMpCebZkUmjYrSej2vozPWU
rc7AOkran3vicUN6J3OWnoWATo04HH0uJnM0HgP/oqelq0Iu4+5J+DP2OhX2kir0
OpktbBPOlhogT15Zt1kZTU3RuY1AL3TLSy9pvfB+bfrd7Z2AKJ9rdrSKgboB/gKv
czcNTwvGAW9m9LlwUqTFzwf0Vb/1bSi8Z93+pGzm2s1LmZ6Ubvr1mOZDjcGibMTm
pIepviI2Nzd6DosV6N9VqA7UxZWklCaXvqbTp72c
-----END CERTIFICATE-----

View File

@@ -1,176 +0,0 @@
################### Heartbeat Configuration Example #########################
# This file is an example configuration file highlighting only some common options.
# The heartbeat.reference.yml file in the same directory contains all the supported options
# with detailed comments. You can use it for reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/heartbeat/index.html
############################# Heartbeat ######################################
# Define a directory to load monitor definitions from. Definitions take the form
# of individual yaml files.
heartbeat.config.monitors:
# Directory + glob pattern to search for configuration files
path: ${path.config}/monitors.d/*.yml
# If enabled, heartbeat will periodically check the config.monitors path for changes
reload.enabled: false
# How often to check for changes
reload.period: 5s
# Configure monitors inline
#heartbeat.monitors:
#- type: http
#
# # List or urls to query
# urls: ["http://localhost:9200"]
#
# # Configure task schedule
# schedule: '@every 10s'
# Total test connection and data exchange timeout
#timeout: 16s
#==================== Elasticsearch template setting ==========================
setup.template.settings:
index.number_of_shards: 1
index.codec: best_compression
#_source.enabled: false
#================================ General =====================================
# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:
# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]
# Optional fields that you can specify to add additional information to the
# output.
#fields:
# env: staging
#============================== Kibana =====================================
# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:
# Kibana Host
# Scheme and port can be left out and will be set to the default (http and 5601)
# In case you specify and additional path, the scheme is required: http://localhost:5601/path
# IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
#host: "localhost:5601"
host: "https://realstuff.psi.ch:5601"
ssl.certificate_authorities: ["/etc/pki/tls/certs/elastic-stack-ca.pem"]
# Kibana Space ID
# ID of the Kibana Space into which the dashboards should be loaded. By default,
# the Default Space will be used.
#space.id:
#============================= Elastic Cloud ==================================
# These settings simplify using Heartbeat with the Elastic Cloud (https://cloud.elastic.co/).
# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:
# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:
#================================ Outputs =====================================
# Configure what output to use when sending the data collected by the beat.
#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
# Array of hosts to connect to.
# Array of hosts to connect to.
hosts: ["realstuff.psi.ch:9200"]
# Optional protocol and basic auth credentials.
protocol: "https"
username: "beats_user"
password: "beats123"
ssl.certificate_authorities: ["/etc/pki/tls/certs/elastic-stack-ca.pem"]
# Optional protocol and basic auth credentials.
#protocol: "https"
#username: "elastic"
#password: "changeme"
#----------------------------- Logstash output --------------------------------
#output.logstash:
# The Logstash hosts
#hosts: ["localhost:5044"]
# Optional SSL. By default is off.
# List of root certificates for HTTPS server verifications
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
# Certificate for SSL client authentication
#ssl.certificate: "/etc/pki/client/cert.pem"
# Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"
#================================ Processors =====================================
processors:
- add_observer_metadata:
# Optional, but recommended geo settings for the location Heartbeat is running in
#geo:
# Token describing this location
#name: us-east-1a
# Lat, Lon "
#location: "37.926868, -78.024902"
#================================ Logging =====================================
# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug
# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]
#============================== Xpack Monitoring ===============================
# heartbeat can export internal metrics to a central Elasticsearch monitoring
# cluster. This requires xpack monitoring to be enabled in Elasticsearch. The
# reporting is disabled by default.
# Set to true to enable the monitoring reporter.
monitoring.enabled: true
# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Heartbeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
#monitoring.cluster_uuid:
monitoring.cluster_uuid: "57-GhvUVR1WM1D-42XEFYg"
# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well.
# Note that the settings should point to your Elasticsearch *monitoring* cluster.
# Any setting that is not set is automatically inherited from the Elasticsearch
# output configuration, so if you have the Elasticsearch output configured such
# that it is pointing to your Elasticsearch monitoring cluster, you can simply
# uncomment the following line.
#monitoring.elasticsearch:
#================================= Migration ==================================
# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true

View File

@@ -1 +0,0 @@
server.port=8080

View File

@@ -1,15 +0,0 @@
[Unit]
Description=imageapi
[Service]
User=daqusr
ExecStart=/usr/lib/jvm/java-11/bin/java \
-Xms512m -Xmx2048m \
-Dspring.config.location=/etc/imageapi/application.properties \
-jar /opt/imageapi/{{imageapi_version}}/lib/imageapi-{{imageapi_version}}-all.jar
Restart=on-failure
RestartSec=10s
[Install]
WantedBy=multi-user.target

View File

@@ -1,12 +0,0 @@
[Unit]
Description=iodata
[Service]
User=daqusr
ExecStart=/usr/lib/jvm/java-11/bin/java -Dspring.config.location=/etc/iodata/application.properties \
-jar /opt/iodata/latest/lib/iodata-{{query_node_version}}-all.jar
Restart=on-failure
RestartSec=3s
[Install]
WantedBy=multi-user.target

View File

@@ -1,8 +0,0 @@
rootDir=/gpfs/sf-data/sf-imagebuffer
baseKeyspaceName=daq_swissfel
binSize=3600000
# binSize=86400000
# Not used right now
nodeId=1
spring.mvc.async.request-timeout = 3600000

View File

@@ -1,192 +0,0 @@
###################### Journalbeat Configuration Example #########################
# This file is an example configuration file highlighting only the most common
# options. The journalbeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/journalbeat/index.html
# For more available modules and options, please see the journalbeat.reference.yml sample
# configuration file.
#=========================== Journalbeat inputs =============================
journalbeat.inputs:
# Paths that should be crawled and fetched. Possible values files and directories.
# When setting a directory, all journals under it are merged.
# When empty starts to read from local journal.
- paths: []
# The number of seconds to wait before trying to read again from journals.
#backoff: 1s
# The maximum number of seconds to wait before attempting to read again from journals.
#max_backoff: 20s
# Position to start reading from journal. Valid values: head, tail, cursor
seek: cursor
# Fallback position if no cursor data is available.
#cursor_seek_fallback: head
# Exact matching for field values of events.
# Matching for nginx entries: "systemd.unit=nginx"
#include_matches: []
include_matches:
- "systemd.unit=daq-dispatcher-node.service"
- "systemd.unit=daq-query-node.service"
- "systemd.unit=imageapi.service"
# Optional fields that you can specify to add additional information to the
# output. Fields can be scalar values, arrays, dictionaries, or any nested
# combination of these.
#fields:
# env: staging
#========================= Journalbeat global options ============================
#journalbeat:
# Name of the registry file. If a relative path is used, it is considered relative to the
# data path.
#registry_file: registry
#==================== Elasticsearch template setting ==========================
setup.template.settings:
index.number_of_shards: 1
#index.codec: best_compression
#_source.enabled: false
#================================ General =====================================
# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:
# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]
# Optional fields that you can specify to add additional information to the
# output.
#fields:
# env: staging
#============================== Dashboards =====================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false
# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:
#============================== Kibana =====================================
# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:
# Kibana Host
# Scheme and port can be left out and will be set to the default (http and 5601)
# In case you specify and additional path, the scheme is required: http://localhost:5601/path
# IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
host: "https://realstuff.psi.ch:5601"
ssl.certificate_authorities: ["/etc/pki/tls/certs/elastic-stack-ca.pem"]
# Kibana Space ID
# ID of the Kibana Space into which the dashboards should be loaded. By default,
# the Default Space will be used.
#space.id:
#============================= Elastic Cloud ==================================
# These settings simplify using Journalbeat with the Elastic Cloud (https://cloud.elastic.co/).
# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:
# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:
#================================ Outputs =====================================
# Configure what output to use when sending the data collected by the beat.
#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["realstuff.psi.ch:9200"]
pipeline: "imagebuffer-log-pipeline"
# Optional protocol and basic auth credentials.
protocol: "https"
username: "beats_user"
password: "beats123"
ssl.certificate_authorities: ["/etc/pki/tls/certs/elastic-stack-ca.pem"]
#----------------------------- Logstash output --------------------------------
#output.logstash:
# The Logstash hosts
#hosts: ["localhost:5044"]
# Optional SSL. By default is off.
# List of root certificates for HTTPS server verifications
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
# Certificate for SSL client authentication
#ssl.certificate: "/etc/pki/client/cert.pem"
# Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"
#================================ Processors =====================================
# Configure processors to enhance or manipulate events generated by the beat.
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
#================================ Logging =====================================
# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug
# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]
#============================== Xpack Monitoring ===============================
# journalbeat can export internal metrics to a central Elasticsearch monitoring
# cluster. This requires xpack monitoring to be enabled in Elasticsearch. The
# reporting is disabled by default.
# Set to true to enable the monitoring reporter.
monitoring.enabled: true
# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Journalbeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
monitoring.cluster_uuid: "57-GhvUVR1WM1D-42XEFYg"
# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well.
# Note that the settings should point to your Elasticsearch *monitoring* cluster.
# Any setting that is not set is automatically inherited from the Elasticsearch
# output configuration, so if you have the Elasticsearch output configured such
# that it is pointing to your Elasticsearch monitoring cluster, you can simply
# uncomment the following line.
#monitoring.elasticsearch:
#================================= Migration ==================================
# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true

View File

@@ -1,163 +0,0 @@
###################### Metricbeat Configuration Example #######################
# This file is an example configuration file highlighting only the most common
# options. The metricbeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/metricbeat/index.html
#========================== Modules configuration ============================
metricbeat.config.modules:
# Glob pattern for configuration loading
path: ${path.config}/modules.d/*.yml
# Set to true to enable config reloading
reload.enabled: false
# Period on which files under path should be checked for changes
#reload.period: 10s
#==================== Elasticsearch template setting ==========================
setup.template.settings:
index.number_of_shards: 1
index.codec: best_compression
#_source.enabled: false
#================================ General =====================================
# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:
# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]
tags: ["swissfel", "daq", "databuffer", "linux"]
# Optional fields that you can specify to add additional information to the
# output.
#fields:
# env: staging
#============================== Dashboards =====================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false
# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:
#============================== Kibana =====================================
# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:
# Kibana Host
# Scheme and port can be left out and will be set to the default (http and 5601)
# In case you specify and additional path, the scheme is required: http://localhost:5601/path
# IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
host: "https://realstuff.psi.ch:5601"
ssl.certificate_authorities: ["/etc/pki/tls/certs/elastic-stack-ca.pem"]
# Kibana Space ID
# ID of the Kibana Space into which the dashboards should be loaded. By default,
# the Default Space will be used.
#space.id:
#============================= Elastic Cloud ==================================
# These settings simplify using Metricbeat with the Elastic Cloud (https://cloud.elastic.co/).
# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:
# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:
#================================ Outputs =====================================
# Configure what output to use when sending the data collected by the beat.
#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["realstuff.psi.ch:9200"]
# Optional protocol and basic auth credentials.
protocol: "https"
username: "beats_user"
password: "beats123"
ssl.certificate_authorities: ["/etc/pki/tls/certs/elastic-stack-ca.pem"]
#----------------------------- Logstash output --------------------------------
#output.logstash:
# The Logstash hosts
#hosts: ["localhost:5044"]
# Optional SSL. By default is off.
# List of root certificates for HTTPS server verifications
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
# Certificate for SSL client authentication
#ssl.certificate: "/etc/pki/client/cert.pem"
# Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"
#================================ Processors =====================================
# Configure processors to enhance or manipulate events generated by the beat.
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
#================================ Logging =====================================
# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug
# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]
#============================== Xpack Monitoring ===============================
# metricbeat can export internal metrics to a central Elasticsearch monitoring
# cluster. This requires xpack monitoring to be enabled in Elasticsearch. The
# reporting is disabled by default.
# Set to true to enable the monitoring reporter.
monitoring.enabled: true
# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Metricbeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
monitoring.cluster_uuid: "57-GhvUVR1WM1D-42XEFYg"
# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well.
# Note that the settings should point to your Elasticsearch *monitoring* cluster.
# Any setting that is not set is automatically inherited from the Elasticsearch
# output configuration, so if you have the Elasticsearch output configured such
# that it is pointing to your Elasticsearch monitoring cluster, you can simply
# uncomment the following line.
#monitoring.elasticsearch:
#================================= Migration ==================================
# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true

View File

@@ -1,52 +0,0 @@
# These files contain a list of monitor configurations identical
# to the heartbeat.monitors section in heartbeat.yml
# The .example extension on this file must be removed for it to
# be loaded.
- type: icmp # monitor type `icmp` (requires root) uses ICMP Echo Request to ping
# configured hosts
# Monitor name used for job name and document type.
#name: icmp
# Enable/Disable monitor
#enabled: true
# Configure task schedule using cron-like syntax
schedule: '*/5 * * * * * *' # exactly every 5 seconds like 10:00:00, 10:00:05, ...
# List of hosts to ping
hosts: ["data-api.psi.ch", "sf-data-api.psi.ch"]
# Configure IP protocol types to ping on if hostnames are configured.
# Ping all resolvable IPs if `mode` is `all`, or only one IP if `mode` is `any`.
ipv4: true
ipv6: true
mode: any
# Total running time per ping test.
timeout: 16s
# Waiting duration until another ICMP Echo Request is emitted.
wait: 1s
# The tags of the monitors are included in their own field with each
# transaction published. Tags make it easy to group servers by different
# logical properties.
#tags: ["service-X", "web-tier"]
# Optional fields that you can specify to add additional information to the
# monitor output. Fields can be scalar values, arrays, dictionaries, or any nested
# combination of these.
#fields:
# env: staging
# If this option is set to true, the custom fields are stored as top-level
# fields in the output document instead of being grouped under a fields
# sub-dictionary. Default is false.
#fields_under_root: false
- type: tcp
name: tcp
enabled: true
schedule: '@every 10s'
hosts: ["data-api.psi.ch:22", "sf-data-api.psi.ch:22"]

View File

@@ -1,41 +0,0 @@
# Module: system
# Docs: https://www.elastic.co/guide/en/beats/metricbeat/7.3/metricbeat-module-system.html
- module: system
period: 10s
metricsets:
- cpu
- load
- memory
- network
- process
- process_summary
- socket_summary
- entropy
- core
- diskio
- socket
cpu.metrics: ["percentages","normalized_percentages"]
process.include_top_n:
by_cpu: 5 # include top 5 processes by CPU
by_memory: 5 # include top 5 processes by memory
- module: system
period: 1m
metricsets:
- filesystem
- fsstat
processors:
- drop_event.when.regexp:
system.filesystem.mount_point: '^/(sys|cgroup|proc|dev|etc|host|lib)($|/)'
- module: system
period: 15m
metricsets:
- uptime
#- module: system
# period: 5m
# metricsets:
# - raid
# raid.mount_point: '/'

View File

@@ -1,20 +0,0 @@
- hosts: databuffer_cluster
become: true
tasks:
- name: Make sure the daq-dispatcher-node is stopped and disabled
systemd:
enabled: no
state: stopped
name: daq-dispatcher-node
- name: remove systemd file
file:
path: /etc/systemd/system/daq-dispatcher-node.service
state: absent
- name: Reload systemd unit files
systemd:
daemon_reload: yes
- name: Remove deployment directory
file:
path: /opt/dispatcher_node
state: absent

View File

@@ -1,21 +0,0 @@
- hosts: databuffer_cluster
become: true
tasks:
- name: Make sure the daq-query-node is stopped and disabled
systemd:
enabled: no
state: stopped
name: daq-query-node
- name: remove systemd file
file:
path: /etc/systemd/system/daq-query-node.service
state: absent
- name: Reload systemd unit files
systemd:
daemon_reload: yes
- name: Remove deployment directory
file:
path: /opt/query_node
state: absent

View File

@@ -1,196 +0,0 @@
/*
Get the bsread stream address from an image name (e.g. for SARES20-PROF142-M3:FPICTURE)
caget SARES20-PROF142-M3:BSREADCONFIG
the current camserver/pipline configuration can be found here: https://git.psi.ch/controls_highlevel_applications/cam_server_configuration/blob/master/configuration/pipeline_config/servers.json
*/
{
"sources": [
/* Gun Laser */
{"stream":"tcp://daqsf-sioc-cs-01.psi.ch:8160","split":4,"backend":"sf-imagebuffer"}
/* Machine */
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:8020","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:8030","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:8040","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:9000","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:9010","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:9020","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:9030","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:9040","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:9050","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:9060","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:9070","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:9080","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:9100","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:9110","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:9120","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:9130","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:9140","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:9150","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:9160","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:9170","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:9180","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-11.psi.ch:9190","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-12.psi.ch:9000","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-12.psi.ch:9010","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-13.psi.ch:9000","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-13.psi.ch:9010","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-14.psi.ch:9000","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-14.psi.ch:9010","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-14.psi.ch:9020","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-21.psi.ch:9000","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-21.psi.ch:9010","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-31.psi.ch:9000","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-31.psi.ch:9020","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-31.psi.ch:9010","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-41.psi.ch:9000","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-41.psi.ch:9010","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-51.psi.ch:9000","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-61.psi.ch:9000","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-61.psi.ch:9010","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-62.psi.ch:9000","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-62.psi.ch:9010","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-63.psi.ch:9000","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-63.psi.ch:9010","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-64.psi.ch:9000","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-64.psi.ch:9010","split":4,"backend":"sf-imagebuffer"}
/* Pump Laser */
,{"stream":"tcp://daqsf-sioc-cs-71.psi.ch:8090","split":4,"backend":"sf-imagebuffer"}
/* Athos Machine + Photonics */
,{"stream":"tcp://daqsf-sioc-cs-65.psi.ch:9000","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-65.psi.ch:9010","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-65.psi.ch:9020","split":4,"backend":"sf-imagebuffer"}
/* Aramis Photonics */
/* ,{"stream":"tcp://daqsf-sioc-cs-73.psi.ch:9000","split":4,"backend":"sf-imagebuffer"} */
,{"stream":"tcp://daqsf-sioc-cs-74.psi.ch:9000","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-74.psi.ch:9010","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-74.psi.ch:9020","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-74.psi.ch:9030","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-74.psi.ch:9040","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-74.psi.ch:9050","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-74.psi.ch:9060","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-74.psi.ch:9070","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-74.psi.ch:9090","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-74.psi.ch:9100","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-74.psi.ch:9120","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-74.psi.ch:9130","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-74.psi.ch:9140","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-74.psi.ch:9150","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-74.psi.ch:9160","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-74.psi.ch:9170","split":4,"backend":"sf-imagebuffer"}
/* Aramis ESA-Alvra */
,{"stream":"tcp://daqsf-sioc-cs-81.psi.ch:9000","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-81.psi.ch:9010","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-81.psi.ch:9030","split":4,"backend":"sf-imagebuffer"}
/* Aramis ESB-Bernina */
,{"stream":"tcp://daqsf-sioc-cs-83.psi.ch:8060","split":4,"backend":"sf-imagebuffer"}
/* ,{"stream":"tcp://daqsf-sioc-cs-83.psi.ch:9000","split":4,"backend":"sf-imagebuffer"} */
,{"stream":"tcp://daqsf-sioc-cs-83.psi.ch:9010","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-83.psi.ch:9020","split":4,"backend":"sf-imagebuffer"}
/* ,{"stream":"tcp://daqsf-sioc-cs-83.psi.ch:9030","split":4,"backend":"sf-imagebuffer"} */
/* ,{"stream":"tcp://daqsf-sioc-cs-83.psi.ch:9040","split":4,"backend":"sf-imagebuffer"} */
/* ,{"stream":"tcp://daqsf-sioc-cs-83.psi.ch:9050","split":4,"backend":"sf-imagebuffer"} */
/* ,{"stream":"tcp://daqsf-sioc-cs-83.psi.ch:9060","split":4,"backend":"sf-imagebuffer"} */
/* ,{"stream":"tcp://daqsf-sioc-cs-83.psi.ch:9070","split":4,"backend":"sf-imagebuffer"} */
/* ,{"stream":"tcp://daqsf-sioc-cs-83.psi.ch:9080","split":4,"backend":"sf-imagebuffer"} */
,{"stream":"tcp://daqsf-sioc-cs-84.psi.ch:9000","split":4,"backend":"sf-imagebuffer"}
/* ,{"stream":"tcp://daqsf-sioc-cs-85.psi.ch:9000","split":4,"backend":"sf-imagebuffer"} */
/* ,{"stream":"tcp://daqsf-sioc-cs-85.psi.ch:9010","split":4,"backend":"sf-imagebuffer"} */
/* Aramis ESC-Cristallina (none yet) */
/* Athos Photonics */
/* ,{"stream":"tcp://daqsf-sioc-cs-a1.psi.ch:9000","split":4,"backend":"sf-imagebuffer"} */
/* ,{"stream":"tcp://daqsf-sioc-cs-a1.psi.ch:9010","split":4,"backend":"sf-imagebuffer"} */
/* ,{"stream":"tcp://daqsf-sioc-cs-a1.psi.ch:9020","split":4,"backend":"sf-imagebuffer"} */
/* ,{"stream":"tcp://daqsf-sioc-cs-a1.psi.ch:9030","split":4,"backend":"sf-imagebuffer"} */
/* ,{"stream":"tcp://daqsf-sioc-cs-a1.psi.ch:9040","split":4,"backend":"sf-imagebuffer"} */
/* ,{"stream":"tcp://daqsf-sioc-cs-b1.psi.ch:9000","split":4,"backend":"sf-imagebuffer"} */
/* ,{"stream":"tcp://daqsf-sioc-cs-b1.psi.ch:9010","split":4,"backend":"sf-imagebuffer"} */
/* ,{"stream":"tcp://daqsf-sioc-cs-b1.psi.ch:9020","split":4,"backend":"sf-imagebuffer"} */
/* Athos ESE-Maloja */
,{"stream":"tcp://daqsf-sioc-cs-c2.psi.ch:9000","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-c2.psi.ch:9010","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-c2.psi.ch:9020","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-c2.psi.ch:9030","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-c2.psi.ch:9040","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-c2.psi.ch:9050","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-c2.psi.ch:9060","split":4,"backend":"sf-imagebuffer"}
,{"stream":"tcp://daqsf-sioc-cs-c2.psi.ch:9070","split":4,"backend":"sf-imagebuffer"}
/* Athos ESF-Furka */
,{"stream":"tcp://daqsf-sioc-cs-c6.psi.ch:9000","split":4,"backend":"sf-imagebuffer"}
/* PSSS SARFE10-PSSS059 */
,{"stream":"tcp://daqsf-daqsync-02.psi.ch:8890","split":4,"backend":"sf-imagebuffer", "labels": ["SARFE10-PSSS059"]}
,{"stream":"tcp://sf-daqsync-02:8889", "labels": ["SARFE10-PSSS059"]}
/* PMOS SATOP31-PMOS132-2D */
,{"stream":"tcp://daqsf-daqsync-03.psi.ch:9002","split":4,"backend":"sf-imagebuffer", "labels": ["SATOP31-PMOS132-2D"]}
,{"stream":"tcp://sf-daqsync-03.psi.ch:9001", "labels": ["SATOP31-PMOS132-2D"]}
/* SATES21-CAMS154-M1 */
,{"stream":"tcp://daqsf-daqsync-04.psi.ch:9000","split":4,"backend":"sf-imagebuffer", "labels": ["SATES21-CAMS154-M1"]}
,{"stream": "tcp://sf-daqsync-04.psi.ch:9001", "labels": ["SATES21-CAMS154-M1"]}
/* SATES24-CAMS161-M1 */
,{"stream":"tcp://daqsf-daqsync-04.psi.ch:9010","split":4,"backend":"sf-imagebuffer", "labels": ["SATES24-CAMS161-M1"]}
,{"stream": "tcp://sf-daqsync-04.psi.ch:9011", "labels": ["SATES24-CAMS161-M1"]}
/* SATES21-PATT-M1 */
,{"stream":"tcp://daqsf-daqsync-04.psi.ch:9002","split":4,"backend":"sf-imagebuffer", "labels": ["SATES21-PATT-M1"]}
,{"stream": "tcp://sf-daqsync-04.psi.ch:9003", "labels": ["SATES21-PATT-M1"]}
/* PSEN SARES11-SPEC125-M1 */
,{"stream":"tcp://daqsf-daqsync-05.psi.ch:9000","split":4,"backend":"sf-imagebuffer", "labels": ["SARES11-SPEC125-M1"]}
,{"stream": "tcp://sf-daqsync-05:9001", "labels": ["SARES11-SPEC125-M1"]}
/* SARES11-SPEC125-M2 */
,{"stream":"tcp://daqsf-daqsync-05.psi.ch:9010","split":4,"backend":"sf-imagebuffer", "labels": ["SARES11-SPEC125-M2"]}
,{"stream": "tcp://sf-daqsync-05:9011", "labels": ["SARES11-SPEC125-M2"]}
/* FLEX SARES12-CAMS128-M1 */
,{"stream":"tcp://daqsf-daqsync-05.psi.ch:9002","split":4,"backend":"sf-imagebuffer", "labels": ["SARES12-CAMS128-M1"]}
,{"stream": "tcp://sf-daqsync-05:9003", "labels": ["SARES12-CAMS128-M1"]}
/* PSEN SARES20-CAMS142-M1 */
,{"stream":"tcp://daqsf-daqsync-06.psi.ch:9002","split":4,"backend":"sf-imagebuffer", "labels": ["SARES20-CAMS142-M1"]}
,{"stream": "tcp://sf-daqsync-06:9003", "labels": ["SARES20-CAMS142-M1"]}
/* PSEN SARES20-CAMS142-M4 */
,{"stream":"tcp://daqsf-daqsync-06.psi.ch:9000","split":4,"backend":"sf-imagebuffer", "labels": ["SARES20-CAMS142-M4"]}
,{"stream": "tcp://sf-daqsync-06:9001", "labels": ["SARES20-CAMS142-M4"]}
/* PSEN SARES20-CAMS142-M5 */
,{"stream":"tcp://daqsf-daqsync-06.psi.ch:9010","split":4,"backend":"sf-imagebuffer", "labels": ["SARES20-CAMS142-M5"]}
,{"stream": "tcp://sf-daqsync-06:9011", "labels": ["SARES20-CAMS142-M5"]}
/* SARES20-PROF142-M1 */
,{"stream": "tcp://daqsf-daqsync-06.psi.ch:9005","split":4,"backend":"sf-imagebuffer", "labels": ["SARES20-PROF142-M1"]}
,{"stream": "tcp://sf-daqsync-06.psi.ch:9017", "labels": ["SARES20-PROF142-M1"]}
/* SAROP21-PPRM138 pipeline */
,{"stream": "tcp://sf-daqsync-06.psi.ch:9015", "labels": ["SAROP21-PPRM138"]}
/* SARES20-PROF141-M1 pipeline */
,{"stream": "tcp://sf-daqsync-06.psi.ch:9016", "labels": ["SARES20-PROF141-M1"]}
,{"stream": "tcp://daqsf-daqsync-06.psi.ch:9013","split":4,"backend":"sf-imagebuffer", "labels": ["SARES20-PROF141-M1"]}
/* SARES20-PROF146-M1 pipeline */
,{"stream": "tcp://sf-daqsync-06.psi.ch:9018", "labels": ["SARES20-PROF146-M1"]}
/* SARES20-DSDPPRM pipeline */
,{"stream": "tcp://sf-daqsync-06.psi.ch:9014", "labels": ["SARES20-DSDPPRM"]}
]
}