diff --git a/README.md b/README.md index 317bd80..e91cb09 100644 --- a/README.md +++ b/README.md @@ -1,277 +1,30 @@ -[![Build Status](https://api.travis-ci.org/paulscherrerinstitute/lib_cpp_h5_writer.svg?branch=master)](https://travis-ci.org/paulscherrerinstitute/lib_cpp_h5_writer/) - -# lib_cpp_h5_writer -This library is used for creating C++ based stream writer for H5 files. It focuses on the functionality -and performance needed for high performance detectors integrations. - -Key features: -- Get data from ZMQ stream (Array-1.0 protocol) - [htypes specification](https://github.com/datastreaming/htypes) -- Write Nexus complaint H5 file (User specified) - [nexus format](http://www.nexusformat.org/) -- Specify additional zmq stream parameters to write to file. -- Receive additional parameters to write to file via REST api. -- Interaction with the writer over REST api (stop, kill, get_statistics). - -# Table of content -1. [Quick start using the library](#quick_start) -2. [Build](#build) - 1. [Conda build](#conda_build) - 2. [Local build](#local_build) -3. [Basic concepts](#basic_concepts) - 1. [ProcessManager](#process_manager) - 2. [ZmqReceiver](#zmq_receiver) - 3. [H5Writer](#h5_writer) - 4. [H5Format](#h5_format) - 5. [WriterManager](#writer_manager) - 5. [RingBuffer](#ring_buffer) -4. [REST interface](#rest_interface) -5. [Examples](#examples) - - - -# Quick start for using the library - -To create your own stream writer you need to specify: -- The H5 file format you want to write. -- The mapping of REST input variables to your H5 format. -- Additional H5 format fields with default values or calculated fields (based on input or default values). -- The mapping between the stream header metadata and your H5 file format. -- Additional metadata that is transfer in the stream message header. - -Under **sf/** and **csaxs/** you can see examples of this. Feel free to use any of this folders as a template. - -**IMPORTANT**: We are using a monorepo for this project (all implementations should live in this git repository). -To create a new implementation, please add a folder to the root of the proejct (like sf/ and csaxs/). - -The minimum you need to implement your own writer is: -- Writer runner (example: csaxs/csaxs\_h5\_writer.cpp) -- File format (example: csaxs/CsaxsFormat.cpp) -- Build file (example: csaxs/Makefile) - -## Writer runner -Example: **csaxs/csaxs\_h5\_writer.cpp** - -The runner is the actual executable you will run to create files. In the writer runner you: -- Specify and parse input parameters. -- Prepare your system for writing (creating folders, switch process user etc.) -- Instantiate the file format object. -- Define the parameters that come in the stream header. -- Start the writer (mostly boilerplate code, if you do not need any special implementations). - -## File format -Example: **csaxs/CsaxsFormat.cpp** - -This is a class that extends the **H5Format** class. You need to specify: -- input\_value\_type (REST API value name to type mapping) -- default\_values (Fields in the file format that have default values) -- dataset\_move\_mapping (Move datasets to another place in the file if needed) -- file_format (The hierarchical structure of your H5 format) -It is best to specify all the values above in the class constructor. Some values (all except file_format) can be empty, -but they should not be null. - -The current cSAXS and SF formats are quite simple. As a reference, you can check the old cSAXS file format implementation: -[csaxs_cpp_h5_writer](https://github.com/paulscherrerinstitute/csaxs_cpp_h5_writer/blob/master/CsaxsFormat.cpp) - -## Build file -Example: **csaxs/Makefile** - -If you want to use Makefiles, you can basically copy one from an existing implementation (csaxs/) and rename the executable. In -case you want something more sophisticated you will have to provide it yourself. - -In addition, you can deploy your writer also as an anaconda package - you will need to include the conda-recipe folder in this case -as well (see csaxs/conda-recipe). - - -# Build - -**You need your compiler to support C++11.** - -The easiest way to build the library is via Anaconda. If you are not familiar with Anaconda (and do not want to learn), -you can also install all the dependencies directly in your os. - -The base library is located in **lib/**. Change you current directory to lib/ and: -- make (build the library for production) -- make clean (clean the previous build) -- make deploy (deploy library to your local conda environemnt) -- make debug (build library with debug prints in the standard output) -- make perf (build the library with performance measurements in the standard output) -- make test (create tests) - -The usual procedure would be: -- make test (build the tests) -- ./bin/execute_tests (execute the tests) -- make deploy (deploy the library) - -You can then start building your executable. It is also a good idea to automate the base library build from your executable build system -(see csaxs/Makefile, lib target for example). - - -## Conda build -If you use conda, you can create an environment with the needed library by running: - -```bash -conda create -c paulscherrerinstitute --name make cppzmq==4.3.0 hdf5==1.10.4 boost==1.61.0 gtest==1.8.1 -``` - -After that you can just source you newly created environment: - -```base -conda activate -``` - -and start linking your builds against the libraries. To do that you can use the environament variables Anaconda sets: - -```bash --L${CONDA_PREFIX}/lib (for linking libraries you have installed with Anaconda) -``` - -To run you executables inside the Anaconda environment, you will need also to export the lib/ path in your env variables: -```bash -export LD_LIBRARY_PATH=${CONDA_PREFIX}/lib -``` - - -## Local build - -If you decide not to use Anaconda, you will have to install the following libraries in your system: - -- make -- cppzmq ==4.3.0 -- hdf5 ==1.10.4 -- boost ==1.61.0 - - -# Basic concepts -In this chapter we will describe the basic concepts you need to get a hold off in order to use the library. -In case more advanced knowledge is needed, please feel free to browse the code. The most important components -are discussed in subchapters below. - -**General overview** - -The process and thread management is taken care by the [ProcessManager](#process_manager). The process manager initializes, -starts and stops the 3 threads discussed below. - -The writer has 3 threads: -- ZMQ receiving thread (listens for incoming ZMQ stream messages). - - [ZmqReceiver](#zmq_receiver) is the only class really used here. -- H5 writer thread (writes the received data to disk). - - [H5Writer](#h5_writer) is the base writer implementation that can be extended at will. -- REST thread (listens to incoming REST requests). - - [REST interface](#rest_interface) describes how the REST interface works. - -The communication bridges between threads are: -- REST to H5 thread: [WriterManager](#writer_manager). -- ZMQ to H5 thread: [WriterManager](#writer_manager) for process control and [RingBuffer](#ring_buffer) for data transfer. - -In order to have a central place where to set fine tunning parameters, the **config.cpp** file is used. - -The ZMQ thread receives data from the stream, it extracts it and packs it (with additional metadata) into the ring buffer. -Meanwhile, the H5 thread is listening for data in the ring buffer. When new data arrives, it writes this data down into -temporary datasets (for performance reasons we write the file format in the end). - -When the end of the writing is triggered (via the REST api, when the desired number of frames are received, or when the user -terminates the process), an attempt to write the file format is performed. If the format writing is successful, the temporary -datasets are moved to their final place in the file format. If the format writing step fails for any reason, the data will -remain in the temporary datasets and the user will need to fix the file manually (the goal is to preserve the data as much as possible). - - -## ProcessManager - -Not yet here :( - - -## ZmqReceiver -The stream receiver that gets your data from the stream. This is PSI specific, and currently supports only the **Array-1.0** protocol. -You pass the ZmqReceiver you would like to use in your writer runner, so it should be easy to implement your own if needed. - -The protocol specification can be found here: [htypes specification](https://github.com/datastreaming/htypes) - - -### Stream header values - -In addition to the image in the stream, the receiver can pass to the writer also data defined in the header of the stream, for example: -- pulse_id (The pulse id for the current image) -- source (source of the currect image) -- etc. - -This fields are specific to your input stream, and you specify them in your writer runner. You can define both scalars and arrays -(see **csaxs/sf\_h5\_writer.cpp**, variable **header\_values** for an example). - -The allowed data types for this values are: - -- "uint8" -- "uint16" -- "uint32" -- "uint64" -- "int8" -- "int16" -- "int32" -- "int64" -- "float32" -- "float64" - -This stream header parameters need to be specified when constructing your ZmqReceiver instance: -```cpp -auto header_values = shared_ptr>(new unordered_map { - {"frame", HeaderDataType("uint64")}, // Scalar for frame number - {"module_number", HeaderDataType("int64", n_modules)} // Array of n_modules elements for module_number. -}); - -// Pass the header_values to the ZmqReceiver constructor. -ZmqReceiver receiver(connect_address, n_io_threads, receive_timeout, header_values); -``` - -Read the [H5Writer](#h5_writer) chapter to see where this data is written in the H5 file. -Knowing where the data is written is important to properly setup the **dataset\_move\_mapping** -in the file format. See chapter [H5Format](#h5_format) for more info. - - -## H5Writer - -Not yet here :( - - -## H5Format - -The H5Format is the base class you need to extend to implement your file format. It specifies that the following variables need to be set: -- input\_value\_type (REST API value name to type mapping) -- default\_values (Fields in the file format that have default values) -- dataset\_move\_mapping (Move datasets to another place in the file if needed) -- file_format (The hierarchical structure of your H5 format) - -We will discuss each one in details in this chapter. - -### input\_value\_type - -Not yet here :( - -### default\_values - -Not yet here :( - -### dataset\_move\_mapping - -Not yet here :( - -### file\_format - -Not yet here :( - - -## WriterManager - -Not yet here :( - - -## RingBuffer - -Not yet here :( - - -# REST interface - -Not yet here :( - - -# Examples - -Not yet here :( \ No newline at end of file +# sf_daq_buffer + +Prof of concept for the SF DAQ detector needs. + +## Useful links + +- Hyperslab selection +https://support.hdfgroup.org/HDF5/Tutor/phypecont.html +- Intro to lock free programming +https://preshing.com/20120612/an-introduction-to-lock-free-programming/ +- POSIX compliant write order test on GPFS +https://svn.hdfgroup.org/hdf5/branches/hdf5_1_10_0/test/POSIX_Order_Write_Test_Report.pdf +- Best Practice Guide - Parallel I/O +https://prace-ri.eu/wp-content/uploads/Best-Practice-Guide_Parallel-IO.pdf +- MPI-IO/GPFS, an Optimized Implementationof MPI-IO on top of GPFS +https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1592834 + +## Build + +In order to compile you will need to install: +- devtoolset-8 +- cmake3 +- zeromq-devel +- hdf5-devel + +Procedure: +- mkdir build +- cd build +- cmake3 .. +- make \ No newline at end of file