2023-12-20 11:38:33 +01:00
2023-12-20 11:44:57 +01:00
2026-01-29 15:47:08 +01:00
2022-04-13 11:14:34 +02:00
2021-03-16 16:47:26 +01:00
2022-02-08 16:50:01 +01:00
2018-08-10 16:21:28 +02:00
2026-01-12 15:32:15 +01:00
2017-03-08 08:06:55 +01:00
2019-05-14 13:16:14 +02:00

Overview

This repository holds the requested bsread data sources and their data policies of the SwissFEL Data Buffer.

Workflow - Request New Sources / Change Sources

  1. If you don't have permissions to push to this repo contact your supporter.
  2. Clone the repo.
  3. Push a branch with your name.
  4. Make a merge request.

If you know what you are doing and have permissions, push to master directly.

Administration

If there are new changes to this configuration (either through a merge request or direct commit) the configuration needs to be uploaded to the Data/Buffer. To do so clone or pull the latest changes from this repository and execute the ./bufferutils upload script that comes with this repository (you have to be on a machine that have /opt/gfa/python available!).

Uploading Sources

To upload and start recording of all configured sources use:

./bufferutils upload

Checking for labled sources

./bufferutils list --label

Note: Labled sources can be individually stopped and/or restarted by the stop/restart subcommand. A label can be attached to more than one source. While doing so, the restart would affect all sources with the given label.

Restarting a labeled source

./bufferutils restart --label <label>

Stopping a labeled source

./bufferutils stop --label <label>

Stopping a sources by backend

Sources of a specific backend can be stopped like this (currently only the "sf-databuffer" backend is supported)

./bufferutils stop --backend sf-databuffer

Stopping all sources

./bufferutils stop --all

Configuration Management

The configuration change workflow is described in following Memorandum: http://i.psi.ch/PJkql

Data Sources

Data sources files are JSON formatted text files defining IOCs/sources and are stored in the sources folder.

Each group should maintain their own list of IOCs in (a) separate file(s). The filename should start with the group's short name (e.g. rf.sources, llrf.sources). A group might maintain more than 1 file. In this case all files should start with the groups short name followed by an underscore and then the rest of the name. The suffix should always be .sources (e.g. llrf.sources, llrf_group1.sources, ...).

Sources files might contain comments like /* my comment */ which are not interpreted.

The JSON structure is defined as follows:

{
    "sources": [{
        /* IOC using default port 9999 */
        "stream": "tcp://sf-ioc-xyz.psi.ch:9999"
    }, {
        /* IOC using non default port 20000 */
        "stream": "tcp://sf-ioc-abc.psi.ch:20000"
    }]
}

Explanation

  • sources: List of all IOCs/sources defined in this file.
  • stream: The IOC/source to be recorded (tcp://<hostname>:<port> - the default bsread port is 9999).

Data Policies

Policies files are JSON formatted text files defining the data reduction scheme to be applied to data points and are stored in the policies folder. Assigning policies to data points is done using regular expressions applied to channel names (see: Pattern, more precisely using Matcher.find()) whereas a longer match sequence is considered superior to a shorter match sequence.

Data points are tagged with the data reduction scheme at the time of their creation/arrival. Accordingly, changes in data policies only effects new data points and has no effect on already persisted data points.

The default data policy is defined in default.policies and is applied to every data point unless there is a data policy providing a superior match sequence. Currently, the default retention time is one day.

Each group should maintain their own list of data policies in (a) separate file(s). The filename should start with the groups short name (e.g. rf.policies, llrf.policies). A group might maintain more than 1 file. In this case all files should start with the groups short name followed by an underscore and then the rest of the name. The suffix should always be .policies (e.g. llrf.policies, llrf_group1.policies, ...).

Policies files might contain comments like /* my comment */ which are not interpreted.

The JSON structure is defined as follows:

{
    "policies": [{
        "pattern": "^SINDG01",
        "data_reduction": {
            "default": [{
                "ttl": "P1D",
                "modulo": 1
            }],
            "scalar": [{
                "ttl": "P2D",
                "modulo": 1
            }, {
                "ttl": "P7D",
                "modulo": 100
            }]
        }
    }]
}

Explanation

  • policies: List of all data policies defined in this file.
  • pattern: The regular expression applied to channel names.
  • data_reduction: The data reduction applied to the channels matching the above regular expression. This section can contain objects default (this reduction scheme is applied to all data points unless a more specific is defined), scalar (this reduction scheme is applied to scalar data points overwriting the default scheme), waveform (this reduction scheme is applied to waveform data points overwriting the default scheme), and image (this reduction scheme is applied to image data points overwriting the default scheme).
  • ttl: The time-to-live of the data point (after that time the data point will be deleted). The ttl is defined as a ISO-8601 duration (e.g., PT6H for 6 hours, P2D for 2 days etc.). Using a ttl of -1 will disable recording of channels matching the above pattern. The resolution is in seconds and thus the minimal ttl is 1 second.
  • modulo: Defines the x-th data point (based on the pulse-id) the ttl should be applied to (e.g., modulo 1: the ttl is applied to every data point, modulo 10: the ttl is applied to ever 10th data point, modulo 100: the ttl is applied to ever 100th data point). It is always the maximum ttl applied (pulse-id 1000 matches modulo 1 and 100 and thus "P7D" would be applied).
  • offset: (default: 0) Can be used to define an offset in the pulse-id match (e.g. modulo: 10, offset: 0 matches pulse-ids 0,10,20... whereas modulo: 10, offset: 2 matches pulse-ids 2,12,22...

Overwriting Policies

It is possible to overwrite existing data policies. Assigning policies to data points is done using regular expressions applied to channel names (see: Pattern, more precisely using Matcher.find()) whereas a longer match sequence is considered superior to a shorter match sequence (i.e., matching channel name 'SINDG01-RCIR-PUP10:SIG-AMPLT' to the pattern '^SINDG' and '^SINDG01' results in match sequences 'SINDG01-RCIR-PUP10:SIG-AMPLT' vs. 'SINDG01-RCIR-PUP10:SIG-AMPLT' and thus pattern '^SINDG01' is considered superior). It is also possible define a data policy for a specific channel by using the channels exact name. In case several files define the same pattern, the one defined in the first file (given by the natural ordering of the file system) will be used.

General Rules (Words of Care)

Using regular expressions to assign data policies to data points should assist users and facilitate their work by just having to define a few rules to handle their full data set. However, users are encouraged not to overstress the power of regular expressions. As a general rule:

  • Do not use wildcards '.*' at the beginning or end of a pattern as this would always result in match sequences covering the complete channel name, making overwrites impossible (Example 3 and 4 explain why).
  • Use or connections like '^SINDG|^SINOG|^SINUG' with care.
  • Do not use complex regular expressions (things difficult to understand like '((^|, )(part1|part2|part3))+$'.

Example Overwrites

Example 1 (Dos)

Lets assume the channel 'SINDG01-RCIR-PUP10:SIG-AMPLT' is applied to pattern '^SINDG' and '^SINDG01'. This results in the match sequences:

  • '^SINDG': 'SINDG01-RCIR-PUP10:SIG-AMPLT'
  • '^SINDG01': 'SINDG01-RCIR-PUP10:SIG-AMPLT'

and thus the data policy of '^SINDG01' will be applied to the channel.

Example 2 (Dos with care)

Lets assume a user is responsible for channels starting with 'SINDG' or 'SINOG' or 'SINUG'. This user defines a base data policy for all these channels by using the pattern '^SINDG|^SINOG|^SINUG'. Lets further assume this user wants to temporary overwrite the data policies for all channels starting with 'SINDG01' (e.g. because the machine shows unexpected behavior and channels starting with 'SINDG01' help to identify the problem). This is possible by defining a new data policy with the pattern '^SINDG01'. Lets apply channels 'SINDG01-RCIR-PUP10:SIG-AMPLT' and 'SINOG01-RCIR-PUP10:SIG-AMPLT' to these patterns:

  • '^SINDG|^SINOG|^SINUG': 'SINDG01-RCIR-PUP10:SIG-AMPLT' and 'SINOG01-RCIR-PUP10:SIG-AMPLT'
  • '^SINDG01': 'SINDG01-RCIR-PUP10:SIG-AMPLT' and 'SINOG01-RCIR-PUP10:SIG-AMPLT'

and thus the data policy of '^SINDG01' will be applied to 'SINDG01-RCIR-PUP10:SIG-AMPLT' and the policy of '^SINDG|^SINOG|^SINUG' to 'SINOG01-RCIR-PUP10:SIG-AMPLT'.

Example 3 (Don'ts)

Lets assume the channel 'SINDG01-RCIR-PUP10:SIG-AMPLT' would be applied to pattern '^SINDG.*' and '^SINDG01.*'. This would result in the match sequences:

  • '^SINDG.*': 'SINDG01-RCIR-PUP10:SIG-AMPLT'
  • '^SINDG01.*': 'SINDG01-RCIR-PUP10:SIG-AMPLT'

as the pattern matches the full channel name. In this case, ties could be broken by considering the length of the pattern where longer patterns are considered superior to shorter patterns (giving precedence to '^SINDG01.*'). Example 4 shows why this does not work in all cases.

Example 4 (Don'ts)

Lets assume the channels 'SINDG01-RCIR-PUP10:SIG-AMPLT' and 'SINOG01-RCIR-PUP10:SIG-AMPLT' would be applied to patterns '^SINDG.*|^SINOG.*|^SINUG.*' and 'SINDG01.*'. This would result in the match sequences:

  • '^SINDG.*|^SINOG.*|^SINUG.*': 'SINDG01-RCIR-PUP10:SIG-AMPLT' and 'SINOG01-RCIR-PUP10:SIG-AMPLT'
  • '^SINDG01.*': 'SINDG01-RCIR-PUP10:SIG-AMPLT' and 'SINOG01-RCIR-PUP10:SIG-AMPLT'

as the patterns matches the full channel names. Again, ties could be broken by considering the length of the pattern resulting in '^SINDG.*|^SINOG.*|^SINUG.*' matching both channels as it is longer. Certainly not what the user intended. One could try to overcome this problem by splitting or connections and use separate but equivalent data policies internally (i.e., a separate data policy for '^SINDG.*', '^SINOG.*', and '^SINUG.*'). However, regex allows for expressions which are difficult to split (not to mention to understand, e.g., '((^|, )(part1|part2|part3))+$'). Therefore, keep to the general rules of pattern definition mentioned above.

Example Policies

Example 1

Group XYZ defines there default data policy for SINDG01 to be 2 days.

{
   "pattern":"^SINDG01",
   "data_reduction":{
      "default":[
         {
            "ttl":"P2D",
            "modulo":1
         }
      ]
   }
}

All data points (scalar, waveform, image) of channels starting with SINDG01 (^ defines 'start with' in regex) have a ttl of 2 days.

Example 2

Lets assume group XYZ wants to track a problem in SINDG01 and therefore they want to keep waveforms for 3 days giving them time to analyze the problem.

{
   "pattern":"^SINDG01",
   "data_reduction":{
      "waveform":[
         {
            "ttl":"P3D",
            "modulo":1
         }
      ],
      "default":[
         {
            "ttl":"P2D",
            "modulo":1
         }
      ]
   }
}

All waveform data points of channels starting with SINDG01 have a ttl of 3 days. All other data points (scalar, image) have a ttl of 2 days.

Example 3

Lets assume group XYZ was not able to track down the problem using data worth of 3 days. Therefore they decide to extend the time horizon to 10 days but realize there is not enough storage space available. However, they are confident to find the problem with every 100th data point.

{
   "pattern":"^SINDG01",
   "data_reduction":{
      "waveform":[
         {
            "ttl":"P2D",
            "modulo":1
         },
         {
            "ttl":"P10D",
            "modulo":100
         }
      ],
      "default":[
         {
            "ttl":"P2D",
            "modulo":1
         }
      ]
   }
}

The default ttl of waveform data points of channels starting with SINDG01 is 2 days and every 100th of these data points has a ttl of 10 days. All other data points (scalar, image) have a ttl of 2 days.

Example 4

Lets assume group XYZ could solve the problem and (as exemplary DAQ users) decide to release the additional storage space for other DAQ users. However, they still want to track every 1000th scalar data point of channels starting with SINDG01 and ending with AMPL in their name for 10 days (as they provide enough information to verify that the problem did not re-appear).

{
   "pattern":"^SINDG01",
   "data_reduction":{
      "default":[
         {
            "ttl":"P2D",
            "modulo":1
         }
      ]
   }
}

{
   "pattern":"^SINDG01.*AMPL$",
   "data_reduction":{
      "scalar":[
         {
            "ttl":"P2D",
            "modulo":1
         },
         {
            "ttl":"P10D",
            "modulo":1000
         }
      ],
      "default":[
         {
            "ttl":"P2D",
            "modulo":1
         }
      ]
   }
}

The default ttl for data points of channels starting with SINDG01 is 2 days. All scalar data points staring with SINDG01 and ending with AMPL in their name have a ttl of 2 days and ever 1000th of these data points has a ttl of 10 days.

Description
No description provided
Readme 2.6 MiB
Languages
Jupyter Notebook 97.2%
Python 2.3%
Shell 0.5%