mirror of
https://github.com/paulscherrerinstitute/sf_daq_buffer.git
synced 2026-04-23 01:02:42 +02:00
277 lines
11 KiB
Markdown
277 lines
11 KiB
Markdown
[](https://travis-ci.org/paulscherrerinstitute/lib_cpp_h5_writer/)
|
|
|
|
# lib_cpp_h5_writer
|
|
This library is used for creating C++ based stream writer for H5 files. It focuses on the functionality
|
|
and performance needed for high performance detectors integrations.
|
|
|
|
Key features:
|
|
- Get data from ZMQ stream (Array-1.0 protocol) - [htypes specification](https://github.com/datastreaming/htypes)
|
|
- Write Nexus complaint H5 file (User specified) - [nexus format](http://www.nexusformat.org/)
|
|
- Specify additional zmq stream parameters to write to file.
|
|
- Receive additional parameters to write to file via REST api.
|
|
- Interaction with the writer over REST api (stop, kill, get_statistics).
|
|
|
|
# Table of content
|
|
1. [Quick start using the library](#quick_start)
|
|
2. [Build](#build)
|
|
1. [Conda build](#conda_build)
|
|
2. [Local build](#local_build)
|
|
3. [Basic concepts](#basic_concepts)
|
|
1. [ProcessManager](#process_manager)
|
|
2. [ZmqReceiver](#zmq_receiver)
|
|
3. [H5Writer](#h5_writer)
|
|
4. [H5Format](#h5_format)
|
|
5. [WriterManager](#writer_manager)
|
|
5. [RingBuffer](#ring_buffer)
|
|
4. [REST interface](#rest_interface)
|
|
5. [Examples](#examples)
|
|
|
|
|
|
<a id="quick_start"></a>
|
|
# Quick start for using the library
|
|
|
|
To create your own stream writer you need to specify:
|
|
- The H5 file format you want to write.
|
|
- The mapping of REST input variables to your H5 format.
|
|
- Additional H5 format fields with default values or calculated fields (based on input or default values).
|
|
- The mapping between the stream header metadata and your H5 file format.
|
|
- Additional metadata that is transfer in the stream message header.
|
|
|
|
Under **sf/** and **csaxs/** you can see examples of this. Feel free to use any of this folders as a template.
|
|
|
|
**IMPORTANT**: We are using a monorepo for this project (all implementations should live in this git repository).
|
|
To create a new implementation, please add a folder to the root of the proejct (like sf/ and csaxs/).
|
|
|
|
The minimum you need to implement your own writer is:
|
|
- Writer runner (example: csaxs/csaxs\_h5\_writer.cpp)
|
|
- File format (example: csaxs/CsaxsFormat.cpp)
|
|
- Build file (example: csaxs/Makefile)
|
|
|
|
## Writer runner
|
|
Example: **csaxs/csaxs\_h5\_writer.cpp**
|
|
|
|
The runner is the actual executable you will run to create files. In the writer runner you:
|
|
- Specify and parse input parameters.
|
|
- Prepare your system for writing (creating folders, switch process user etc.)
|
|
- Instantiate the file format object.
|
|
- Define the parameters that come in the stream header.
|
|
- Start the writer (mostly boilerplate code, if you do not need any special implementations).
|
|
|
|
## File format
|
|
Example: **csaxs/CsaxsFormat.cpp**
|
|
|
|
This is a class that extends the **H5Format** class. You need to specify:
|
|
- input\_value\_type (REST API value name to type mapping)
|
|
- default\_values (Fields in the file format that have default values)
|
|
- dataset\_move\_mapping (Move datasets to another place in the file if needed)
|
|
- file_format (The hierarchical structure of your H5 format)
|
|
It is best to specify all the values above in the class constructor. Some values (all except file_format) can be empty,
|
|
but they should not be null.
|
|
|
|
The current cSAXS and SF formats are quite simple. As a reference, you can check the old cSAXS file format implementation:
|
|
[csaxs_cpp_h5_writer](https://github.com/paulscherrerinstitute/csaxs_cpp_h5_writer/blob/master/CsaxsFormat.cpp)
|
|
|
|
## Build file
|
|
Example: **csaxs/Makefile**
|
|
|
|
If you want to use Makefiles, you can basically copy one from an existing implementation (csaxs/) and rename the executable. In
|
|
case you want something more sophisticated you will have to provide it yourself.
|
|
|
|
In addition, you can deploy your writer also as an anaconda package - you will need to include the conda-recipe folder in this case
|
|
as well (see csaxs/conda-recipe).
|
|
|
|
<a id="build"></a>
|
|
# Build
|
|
|
|
**You need your compiler to support C++11.**
|
|
|
|
The easiest way to build the library is via Anaconda. If you are not familiar with Anaconda (and do not want to learn),
|
|
you can also install all the dependencies directly in your os.
|
|
|
|
The base library is located in **lib/**. Change you current directory to lib/ and:
|
|
- make (build the library for production)
|
|
- make clean (clean the previous build)
|
|
- make deploy (deploy library to your local conda environemnt)
|
|
- make debug (build library with debug prints in the standard output)
|
|
- make perf (build the library with performance measurements in the standard output)
|
|
- make test (create tests)
|
|
|
|
The usual procedure would be:
|
|
- make test (build the tests)
|
|
- ./bin/execute_tests (execute the tests)
|
|
- make deploy (deploy the library)
|
|
|
|
You can then start building your executable. It is also a good idea to automate the base library build from your executable build system
|
|
(see csaxs/Makefile, lib target for example).
|
|
|
|
<a id="conda_build"></a>
|
|
## Conda build
|
|
If you use conda, you can create an environment with the needed library by running:
|
|
|
|
```bash
|
|
conda create -c paulscherrerinstitute --name <env_name> make cppzmq==4.3.0 hdf5==1.10.4 boost==1.61.0 gtest==1.8.1
|
|
```
|
|
|
|
After that you can just source you newly created environment:
|
|
|
|
```base
|
|
conda activate <env_name>
|
|
```
|
|
|
|
and start linking your builds against the libraries. To do that you can use the environament variables Anaconda sets:
|
|
|
|
```bash
|
|
-L${CONDA_PREFIX}/lib (for linking libraries you have installed with Anaconda)
|
|
```
|
|
|
|
To run you executables inside the Anaconda environment, you will need also to export the lib/ path in your env variables:
|
|
```bash
|
|
export LD_LIBRARY_PATH=${CONDA_PREFIX}/lib
|
|
```
|
|
|
|
<a id="local_build"></a>
|
|
## Local build
|
|
|
|
If you decide not to use Anaconda, you will have to install the following libraries in your system:
|
|
|
|
- make
|
|
- cppzmq ==4.3.0
|
|
- hdf5 ==1.10.4
|
|
- boost ==1.61.0
|
|
|
|
<a id="basic_concepts"></a>
|
|
# Basic concepts
|
|
In this chapter we will describe the basic concepts you need to get a hold off in order to use the library.
|
|
In case more advanced knowledge is needed, please feel free to browse the code. The most important components
|
|
are discussed in subchapters below.
|
|
|
|
**General overview**
|
|
|
|
The process and thread management is taken care by the [ProcessManager](#process_manager). The process manager initializes,
|
|
starts and stops the 3 threads discussed below.
|
|
|
|
The writer has 3 threads:
|
|
- ZMQ receiving thread (listens for incoming ZMQ stream messages).
|
|
- [ZmqReceiver](#zmq_receiver) is the only class really used here.
|
|
- H5 writer thread (writes the received data to disk).
|
|
- [H5Writer](#h5_writer) is the base writer implementation that can be extended at will.
|
|
- REST thread (listens to incoming REST requests).
|
|
- [REST interface](#rest_interface) describes how the REST interface works.
|
|
|
|
The communication bridges between threads are:
|
|
- REST to H5 thread: [WriterManager](#writer_manager).
|
|
- ZMQ to H5 thread: [WriterManager](#writer_manager) for process control and [RingBuffer](#ring_buffer) for data transfer.
|
|
|
|
In order to have a central place where to set fine tunning parameters, the **config.cpp** file is used.
|
|
|
|
The ZMQ thread receives data from the stream, it extracts it and packs it (with additional metadata) into the ring buffer.
|
|
Meanwhile, the H5 thread is listening for data in the ring buffer. When new data arrives, it writes this data down into
|
|
temporary datasets (for performance reasons we write the file format in the end).
|
|
|
|
When the end of the writing is triggered (via the REST api, when the desired number of frames are received, or when the user
|
|
terminates the process), an attempt to write the file format is performed. If the format writing is successful, the temporary
|
|
datasets are moved to their final place in the file format. If the format writing step fails for any reason, the data will
|
|
remain in the temporary datasets and the user will need to fix the file manually (the goal is to preserve the data as much as possible).
|
|
|
|
<a id="process_manager"></a>
|
|
## ProcessManager
|
|
|
|
Not yet here :(
|
|
|
|
<a id="zmq_receiver"></a>
|
|
## ZmqReceiver
|
|
The stream receiver that gets your data from the stream. This is PSI specific, and currently supports only the **Array-1.0** protocol.
|
|
You pass the ZmqReceiver you would like to use in your writer runner, so it should be easy to implement your own if needed.
|
|
|
|
The protocol specification can be found here: [htypes specification](https://github.com/datastreaming/htypes)
|
|
|
|
<a id="stream_header_values"></a>
|
|
### Stream header values
|
|
|
|
In addition to the image in the stream, the receiver can pass to the writer also data defined in the header of the stream, for example:
|
|
- pulse_id (The pulse id for the current image)
|
|
- source (source of the currect image)
|
|
- etc.
|
|
|
|
This fields are specific to your input stream, and you specify them in your writer runner. You can define both scalars and arrays
|
|
(see **csaxs/sf\_h5\_writer.cpp**, variable **header\_values** for an example).
|
|
|
|
The allowed data types for this values are:
|
|
|
|
- "uint8"
|
|
- "uint16"
|
|
- "uint32"
|
|
- "uint64"
|
|
- "int8"
|
|
- "int16"
|
|
- "int32"
|
|
- "int64"
|
|
- "float32"
|
|
- "float64"
|
|
|
|
This stream header parameters need to be specified when constructing your ZmqReceiver instance:
|
|
```cpp
|
|
auto header_values = shared_ptr<unordered_map<string, HeaderDataType>>(new unordered_map<string, HeaderDataType> {
|
|
{"frame", HeaderDataType("uint64")}, // Scalar for frame number
|
|
{"module_number", HeaderDataType("int64", n_modules)} // Array of n_modules elements for module_number.
|
|
});
|
|
|
|
// Pass the header_values to the ZmqReceiver constructor.
|
|
ZmqReceiver receiver(connect_address, n_io_threads, receive_timeout, header_values);
|
|
```
|
|
|
|
Read the [H5Writer](#h5_writer) chapter to see where this data is written in the H5 file.
|
|
Knowing where the data is written is important to properly setup the **dataset\_move\_mapping**
|
|
in the file format. See chapter [H5Format](#h5_format) for more info.
|
|
|
|
<a id="h5_writer"></a>
|
|
## H5Writer
|
|
|
|
Not yet here :(
|
|
|
|
<a id="h5_format"></a>
|
|
## H5Format
|
|
|
|
The H5Format is the base class you need to extend to implement your file format. It specifies that the following variables need to be set:
|
|
- input\_value\_type (REST API value name to type mapping)
|
|
- default\_values (Fields in the file format that have default values)
|
|
- dataset\_move\_mapping (Move datasets to another place in the file if needed)
|
|
- file_format (The hierarchical structure of your H5 format)
|
|
|
|
We will discuss each one in details in this chapter.
|
|
|
|
### input\_value\_type
|
|
|
|
Not yet here :(
|
|
|
|
### default\_values
|
|
|
|
Not yet here :(
|
|
|
|
### dataset\_move\_mapping
|
|
|
|
Not yet here :(
|
|
|
|
### file\_format
|
|
|
|
Not yet here :(
|
|
|
|
<a id="writer_manager"></a>
|
|
## WriterManager
|
|
|
|
Not yet here :(
|
|
|
|
<a id="RingBuffer"></a>
|
|
## RingBuffer
|
|
|
|
Not yet here :(
|
|
|
|
<a id="rest_interface"></a>
|
|
# REST interface
|
|
|
|
Not yet here :(
|
|
|
|
<a id="examples"></a>
|
|
# Examples
|
|
|
|
Not yet here :( |