From 79d3e86d9e20cdf20dbd7f5f835cfd6e19e80b2a Mon Sep 17 00:00:00 2001
From: Andrej Babic <andrej.babic@psi.ch>
Date: Wed, 22 Jul 2020 14:22:50 +0200
Subject: [PATCH] Added first version of sf-buffer documentation

---
 README.md           |   4 ++
 sf-buffer/README.md | 111 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 115 insertions(+)
 create mode 100644 sf-buffer/README.md

diff --git a/README.md b/README.md
index 343d1d7..cb60d16 100644
--- a/README.md
+++ b/README.md
@@ -4,6 +4,10 @@ Overview of current architecture and component interaction.
 
 ![Overview image](docs/sf_daq_buffer-overview.jpg)
 
+Documentation of individual components:
+
+- [sf-buffer](sf-buffer/README.md) (Receive UDP and write buffer files)
+
 ## Useful links
 
 ### Architecture
diff --git a/sf-buffer/README.md b/sf-buffer/README.md
new file mode 100644
index 0000000..d4675a4
--- /dev/null
+++ b/sf-buffer/README.md
@@ -0,0 +1,111 @@
+# sf-buffer
+sf-buffer is the component that receives the detector data in form of UDP 
+packages and writes them down to disk to a binary format. In addition, it 
+sends a copy of the module frame to sf-stream via ZMQ.
+
+Each sf-buffer process is taking care of a single detector module. The 
+processes are all independent and do not rely on any external data input 
+to maximize isolation and possible interactions in our system.
+
+The main design principle is simplicity and decoupling:
+
+- No interprocess dependencies/communication.
+- No dependencies on external libraries (as much as possible).
+- Using POSIX as much as possible.
+
+We are optimizing for maintainability and long term stability. Performance is 
+of concern only if the performance criteria are not met.
+
+## Overview
+
+### UDP receiving
+
+### File writing
+
+Files are written to disk in "frame" bunches - each frame is first assembled 
+from multiple received packets, and then written to disk as a block. This is 
+the complete frame from one module (module assembly is done in the 
+writer).
+
+#### File format
+
+The binary file on disk is just a serialization of multiple 
+**BufferBinaryFormat** structs:
+```c++
+#pragma pack(push)
+#pragma pack(1)
+struct ModuleFrame {
+    uint64_t pulse_id;
+    uint64_t frame_index;
+    uint64_t daq_rec;
+    uint64_t n_recv_packets;
+    uint64_t module_id;
+};
+#pragma pack(pop)
+
+#pragma pack(push)
+#pragma pack(1)
+struct BufferBinaryFormat {
+    const char FORMAT_MARKER = 0xBE;
+    ModuleFrame metadata;
+    char data[buffer_config::MODULE_N_BYTES];
+};
+#pragma pack(pop)
+```
+
+![file_layout_image](../docs/sf_daq_buffer-FileLayout.jpg)
+
+Each frame is composed by:
+
+- **FORMAT\_MARKER** (0xBE) - a control byte to determine the validity of the frame.
+- **ModuleFrame** - frame metadata used in image assembly phase.
+- **Data** - assembled frame from a single module.
+
+Frames are written one after another to a specific offset in the file. The 
+offset is calculated based on the pulse_id, so each frame has a specific place 
+in the file and there is no need to have an index for frame retrieval.
+
+#### Folder structure
+
+The folder (as well as file) structure is deterministic in the sense that given 
+a specific pulse_id, we can directly calculate the folder, file, and file 
+offset where the data is stored. This allows us to have independent writing 
+and reading from the buffer without building any indexes.
+
+The binary files written by sf_buffer are saved to:
+
+[detector_folder]/[module_name]/[data_folder]/[data_file].bin
+
+- **detector\_folder** should always be passed as an absolute path.
+- **module\_name** is usually composed like "M00", "M01".
+- **data\_folder** and **data\_file** are automatically calculated based on the 
+current pulse_id, FOLDER_MOD and FILE_MOD attributes.
+
+![folder_layout_image](../docs/sf_daq_buffer-FolderLayout.jpg)
+
+```c++
+// FOLDER_MOD = 100000
+int data_folder = (pulse_id % FOLDER_MOD) * FOLDER_MOD; 
+// FILE_MOD = 1000
+int data_file = (pulse_id % FILE_MOD) * FILE_MOD; 
+```
+
+FOLDER_MOD == 100000 means that each data_folder will contain data for 100000
+pulses, while FILE_MOD == 1000 means that each file inside the data_folder 
+will contain 1000 pulses. The total number of data_files in each data_folder 
+will therefore be **FILE\_MOD / FOLDER\_MOD = 100**.
+
+### Analyzing the buffer
+In **sf-utils** there is a Python module that allows you to read directly the 
+buffer in order to debug it or to verify the consistency between the HDF5 file 
+and the received data.
+
+- VerifyH5DataConsistency.py checks the consistency between the H5 file and 
+buffer.
+- BinaryBufferReader.py reads the buffer and prints metadata. The class inside 
+can also be used in external scripts.
+
+### ZMQ sending
+
+
+