next version of dap documentation

2023-11-22 16:13:22 +01:00
parent 1cfa2f5549
commit 42c6d8f157
1 changed files with 126 additions and 104 deletions
--- a/README.md
+++ b/README.md
@ -1,41 +1,46 @@
-# dap
+# dap (Detector Analysis Pipeline)

-Detector Analysis Pipeline
-Runs on detector data stream produced by sf-daq
+Runs on detector data stream provided by [sf-daq](https://github.com/paulscherrerinstitute/sf_daq_broker)


-## Installation
-At PSI-Swissfel there is already pre-installed package/conda environment to use with:
-```
+# Installation
+
+## Pre-Installed Package (PSI)
+
+At PSI, a pre-installed conda environment is available:
+```bash
 source /sf/jungfrau/applications/miniconda3/etc/profile.d/conda.sh
 conda activate dap
 ```

-### To install from source
-Create conda environment
-```
+## Installing from Source
+
+Create and activate conda environment
+
+```bash
 conda create -n test-dap cython numpy pyzmq jungfrau_utils
 conda activate test-dap
 ```
-Clone code of the dap and install peakfinder8_extension in the conda environment (in the same session where conda environment was made)
-```
+
+Clone and install dap
+
+```bash
 git clone https://gitlab.psi.ch/sf-daq/dap.git
 cd dap
 make install
 ``` 

-## Architecture
+# Architecture

-Design of dap is made with the idea to scale horisontally and be able to process different algorithms(very different in computing complexity) running on data from detectors of different sizes (from 1 module to 32 modules detector). Independent **worker**s consumes  zeromq stream of detector data from sf-daq, run desired/selected algorithms on the frame it got from the stream and sends results : 
-  * (frame with metadata information) to visualisation by [streamvis](https://github.com/paulscherrerinstitute/streamvis) 
-  * (only metadata information) to **accumulator**. 
+The dap architecture is designed for horizontal scalability, processing various algorithms on detector data of different sizes. Each independent worker consumes a ZeroMQ stream from sf-daq, applies selected algorithms on received frames, and sends results:
+- Metadata-enriched frames to [streamvis](https://github.com/paulscherrerinstitute/streamvis).
+- Metadata-only results to the accumulator for storage.

-  Purpose of **accumulator** is to save results of dap processing. Since currently dap is running inside network which doesn't allow to send results as BS source(s), results are saved in dap buffer and is written permanently to data space upon of user request to [sf-daq](https://github.com/paulscherrerinstitute/sf_daq_broker) ("save_dap_results: True" in the detector section of request to sf-daq).
-
-### Worker
- Each [worker](https://gitlab.psi.ch/sf-daq/dap/-/blob/main/dap/worker.py) run completely independent from another worker, and do frames processing, getting them from sf-daq by zeromq stream. Detector data is sent as a raw (ADC) values, so before applying any algorithm, worker do a conversion to energy, using [jungfrau_utils](https://github.com/paulscherrerinstitute/jungfrau_utils) package. Input parameters for the worker:
- ```
- $ python dap/worker.py --help
+## Worker
+Each worker runs independently and processes frames received via ZeroMQ. Before applying algorithms, it converts raw (ADC) detector values to energy using [jungfrau_utils](https://github.com/paulscherrerinstitute/jungfrau_utils). 
+Input parameters:
+```bash
+python dap/worker.py --help
 usage: worker.py [-h] [--backend BACKEND] [--accumulator ACCUMULATOR]
                 [--accumulator_port ACCUMULATOR_PORT]
                 [--visualisation VISUALISATION]
@ -59,11 +64,16 @@ options:
  --skip_frames_rate SKIP_FRAMES_RATE
                        send to streamvis each of skip_frames_rate frames
 ```
-Number of needed workers strongly depends on the detector size and the desired algorithm. In easiest case (1 module detector and algorithm is to make threshold on energy values of pixels) - 1-2 workers are enough; while for the larger detector(8-16 or 32 modules) and heavy algorithm like peakfinder8 - more than hundred workers are needed. It's better to start each worker pinned to a particular processor core, since they are CPU limited application. Since each worker is completely independent from each other - it's possible to run workers on a different nodes, distributing load and increasing number of workers.

-### Accumulator
-Purpose of accumulator is to collect result of the processing done by workers. Currently dap is running in network which doesn't allow sending this result as BS source(s), so as temporary workaround, saving of the sub-sample of dap output results (frame indentification and intensity in the selected roi's) is done to the dap-buffer. Accumulator input parameters:
-```
+The number of required workers varies based on detector size and algorithm complexity. 
+Workers can be pinned to specific processor cores and distributed across multiple nodes.
+
+## Accumulator
+
+The accumulator collects results from workers due to network constraints, temporarily saving them to the dap-buffer before permanent storage upon user request made to sf-daq.
+
+Input parameters:
+```bash
 python dap/accumulator.py --help
 usage: accumulator.py [-h] [--accumulator ACCUMULATOR]
                      [--accumulator_port ACCUMULATOR_PORT]
@ -76,117 +86,128 @@ options:
                        accumulator port
 ```

-### Implemented algorithms
-There are several algorithms implemented in dap, following the request/need of the experiments. 
+# Implemented algorithms
 
-   * peakfinder 
+   * **peakfinder Algorithm** 
   
-     algorithm, based on peakfinder8 from cheetah package. Identifies peaks as connected pixels with the intensity above background, where background is radial averaged, determined iteratively excluding signal pixels.
-
-     Input parameters to algorithm:
-       * `'do_peakfinder_analysis': 1/0` - to perform or not peakfinder8 algorithm
-       * `'beam_center_x/beam_center_y': float/float` (beam center in the detector coordinates)
-       * `'hitfinder_min_snr': float` - signal-to-noise value to discriminate background and signal
-       * `hitfinder_min_pix_count': float` - minimum number of pixels to form a peak
-       * `hitfinder_adc_thresh: float` - exclude pixels below the threshold from peak formation/determination
-       * `'npeaks_threshold_hit': float` - threshold on number of found peaks to mark frame as *hit* or not
-
-     Output of algorithm:
-       * `'number_of_spots': int` - number of found peaks
-       * `'spot_x/spot_y/spot_intensity': 3*list[float]` - coordinates and intensity of the found peaks in the frame
-       * `'is_hit_frame': True/False` - mark frame as hit or not, in case number of found peaks are above defined threshold
-
-   * radial profile integration
-   
-      Input parameters to algorithm: 
-       * `'do_radial_integration': 1/0` - to perform or not radial integration on dap
-       * `'beam_center_x/beam_center_y': float/float` - beam center in the detector coordinates
-       * `'apply_threshold': 1/0` - apply or not threshold to the pixel intensities before doing radial integaration
-       * `'threshold_min/threshold_max': float/float` - in case of applying threshold - value of the threshold (threshold_max is applied only if it's value larger than threshold_min)
-       * `'radial_integration_silent_min/radial_integration_silent_max': float/float` - if both values are present - normalize radial integrated profile to the region between that values). Needed to be able to combine frames, to exclude different beam intensity in each of it.
-     
-     Output of algorithm:  
-       * `'radint_I': list[float]` - (in pixels coordinates) of radial integrated profile
-       * `'radint_q' : list[float]` - (intensity of normalised intensity) if the profile
-
-   * threshold on pixel intensity
-
-     ignore measured pixel intensity if that intensity is above or below threshold values
-
-     Input parameters to algorithm:
-       * `'apply_threshold': 1/0` - apply or not threshold to pixel intensity
-       * `'threshold_min/threshold_max': float/float` - in case of applying threshold - value of the threshold (threshold_max is applied only if it's value larger than threshold_min)
-       * `'threshold_value': 0/NaN` - subsitute pixel intensity by 0 or NaN value, if pixels intensity is outside of the defined values
-
-   * ROI processing
-
-     it's possible to define multiple ROI's on the detector and dap will output results for each of that ROI. Algorithm to threshold pixel intensity will be applied before roi processing, if requested.
+     This algorithm is based on peakfinder8 from the [cheetah package](https://www.desy.de/~barty/cheetah/Cheetah/Welcome.html). It identifies peaks as connected pixels exhibiting intensity above the background. The background is determined iteratively by radial averaging, excluding signal pixels.

     Input parameters:
-       * `'roi_x1/roi_x2/roi_y1/roi_y2': 4*list[float]` - coordinates of ROI's
+       * `'do_peakfinder_analysis': 1/0` - Specifies whether to execute the peakfinder8 algorithm.
+       * `'beam_center_x/beam_center_y': float/float` - Represents the beam center coordinates in the detector space.
+       * `'hitfinder_min_snr': float` - Signal-to-noise value used to differentiate between background and signal.
+       * `hitfinder_min_pix_count': float` - Sets the minimum pixel count required to constitute a peak.
+       * `hitfinder_adc_thresh: float` - Excludes pixels below this threshold from peak determination.
+       * `'npeaks_threshold_hit': float` - Threshold on the number of discovered peaks to categorize a frame as a hit or not.

-     Output of algorithm:
-       * `'roi_intensities': list[float]` - summ of the intensities of pixels in roi
-       * `'roi_intensities_normalised': list[float]` - intensity inside the defined ROI normalised to the ROI size or number of active pixels inside the ROI 
-       * `'roi_intensities_x': list(float, float)` - x1/x2 (left/right) x-coordinates of the roi
-       * `'roi_intensities_proj_x': list(value)` - projection on the x-coordinate of pixel intensities (sum) 
+     Algorithm Output:
+       * `'number_of_spots': int` - Indicates the count of identified peaks.
+       * `'spot_x/spot_y/spot_intensity': 3*list[float]` - Provides coordinates and intensity of the identified peaks within the frame.
+       * `'is_hit_frame': True/False` - Marks whether a frame qualifies as a hit based on the number of identified peaks exceeding the defined threshold.

-   * High intensity frame determination (was used for SPI experiments)
+   * **Radial Profile Integration**
   
-     Determine and mark if frame contains signal in a certain regions. It's a simple algorithm which uses "ROI processing" results and compares normalised intensities in first two ROI's to the threshold values. In case any of the threshold is exceeded - frame marked as hit.
+      This algorithm integrates pixel intensities radially based on defined parameters.

      Input parameters: 
-       * `'do_spi_analysis': 1/0` - run determination algorithm
-       * `'roi_x1/roi_x2/roi_y1/roi_y2': 4*list[float]` - coordinates of (at least) two ROI's
-       * `'spi_limit': list(float, float)` - threshold values for first to ROI's
+       * `'do_radial_integration': 1/0` - Indicates whether radial integration should occur within dap.
+       * `'beam_center_x/beam_center_y': float/float` - Specifies the beam center coordinates in the detector space.
+       * `'apply_threshold': 1/0` - Determines whether to apply a threshold to pixel intensities before radial integration.
+       * `'radial_integration_silent_min/radial_integration_silent_max': float/float` - If both values are present, normalizes the radial integrated profile within this specified range. This is crucial for frame combination to eliminate variations in beam intensity across frames.
     
     Output of algorithm:  
-       * `'is_hit_frame': True/False` - mark frame as a hit, in case intensity in at least one ROI is above the threshold
-       * `'number_of_spots': int` - 0: if intenisty in both ROI's are below corresponding threshold; 25 - ROI1 is energetic, but not ROI2; 50 - ROI2 is energetic, but not ROI1; 75 - intensities in both ROI's are above the threshold  
+       * `'radint_I': list[float]` - Represents the radial integrated profile in pixel coordinates.
+       * `'radint_q' : [float, float]` - Represents the minimum and maximum x-coordinate values considered during integration in pixel coordinates.

-   * Frame aggregation
+   * **Thresholding Pixel Intensity**

-      For small intensity signal, which is hard to see on single frame, it's possible to aggregate frames on dap level and send resulting (aggregated) frame to visualisation. Note, however, that such aggregation is happening on every worker independently, so number of frames to aggregate should be reasonable compared to number of running workers of dap for that detector. Aggregation does not influence any other algorithms - they are running on real frames. Algorithm to make a threshold can be applied before frames aggregation.
+      This function disregards measured pixel intensity falling above or below specified threshold values.
+
+     Algorithm Input Parameters:
+       * `'apply_threshold': 1/0` - Enables or disables the application of threshold to pixel intensity.
+       * `'threshold_min/threshold_max': float/float` - Specifies threshold values. If applied, `threshold_max` is enforced only when its value surpasses `threshold_min`.
+       * `'threshold_value': 0/NaN` - Replaces pixel intensity with either 0 or NaN if the pixel intensity falls outside the defined threshold values.
+
+   * **Region of Interest (ROI) Processing**
+
+     dap allows the definition of multiple ROIs on the detector, and it generates output results for each defined ROI. Prior to ROI processing, the algorithm to threshold pixel intensity can be applied if requested.

     Input parameters:
-        * `'apply_aggregation' : 1/0` - apply or not aggregation to frames before sending them to visualisation  
-        * `'aggregation_max': int` -number of frames to aggregate before sending them to visualisation. Note that this value is per worker
+       * `'roi_x1/roi_x2/roi_y1/roi_y2': 4*list[float]` - Specifies the coordinates of the ROIs.

-      Output of algorithm:
-        * `'aggregated_images': int` - number of aggregated images  
+     Algorithm Output:
+       * `'roi_intensities': list[float]` - Sum of pixel intensities within the ROI.
+       * `'roi_intensities_normalised': list[float]` - Intensity within the defined ROI normalized to the ROI size (the count of active pixels within the ROI).
+       * `'roi_intensities_x': list(float, float)` - x1/x2 (left/right) x-coordinates of the ROI.
+       * `'roi_intensities_proj_x': list(value)` - Projection onto the x-coordinate of pixel intensities (sum). 

-   * Frame labeling
+   * **Detecting Frames with High intensity in Specific Regions**

-      In case if the event propagation is implemented for the detector (so some event information is known from the detector header), frame can be marked accordingly on dap level to enable frame sorting on visualisation step.
+     This algorithm identifies frames containing signals within defined regions. It leverages "ROI processing" outcomes by comparing normalized intensities in the first two ROIs against predetermined thresholds. If any threshold is exceeded, the frame is labeled as a *hit*.

-      Currently implemented markers:
-        * `'laser_on': True/False` - frame is marked as laser activated if darkshot event code is False and laser event code is True; in all other cases frame marked as not laser activated
+     Input parameters:
+       * `'do_spi_analysis': 1/0` - Initiates the determination algorithm.
+       * `'roi_x1/roi_x2/roi_y1/roi_y2': 4*list[float]` - Coordinates of (at least) two ROIs.
+       * `'spi_limit': list(float, float)` - Threshold values for first two ROIs.

-   * Saturated pixels
+     Algorithm Output:
+       * `'is_hit_frame': True/False` - Marks frame as a *hit* if intensity in at least one ROI surpasses the threshold.
+       * `'number_of_spots': int` - Indicates:
+         *  0: if intensity in both ROIs falls below the respective thresholds
+         * 25: ROI1 has high energy but not ROI2
+         * 50: ROI2 has high energy but not ROI1
+         * 75: intensities in both ROIs exceed the thresholds  

-      Analysis on number and position of saturated pixels is done for each frame received by the dap. 
+   * **Frame aggregation**
  
-      Output of algorithm:
-        * `'saturated_pixels': int` - number of saturated pixels in the frame
-        * `'saturated_pixels_x/saturated_pixels_y': list[float]/list[float]` - coordinates of saturated pixels  
+      When dealing with faint signals that are challenging to discern in individual frames, dap offers the option to aggregate frames, combining them at the dap level before sending the resulting aggregate frame to visualization. It's important to note that this aggregation occurs independently for each worker. Thus, it's crucial to maintain a reasonable balance between the number of frames to aggregate and the number of active dap workers for a given detector. This aggregation process does not impact other algorithms, as they operate on individual frames (example: the threshold algorithm runs before frame aggregation).

-   * Parameters propagated from dap input file to visualisation
+      Input parameters:
+        * `'apply_aggregation' : 1/0` - Enables or disables frame aggregation before transmission to visualization.
+        * `'aggregation_max': int` - Specifies the maximum number of frames to aggregate before transmitting to visualization. This value pertains to each worker.

-      Some of the input parameters to dap are not used by the dap processing, but propagated to visualisation and used there to display certain characteristics of data
-         * 'disabled_modules': list() (list of the number of disabled modules to visualise them)
-         * 'detector_distance': value (distance from sample to detector in meters)
-         * 'beam_energy': value (photon beam energy in eV)
+      Algorithm Output:
+        * `'aggregated_images': int` - Indicates the count of aggregated images. 
       
-   * Apply additonal mask
+   * **Frame Tagging**

-       For sensitive algorithms (like peakfinder) it may be important to remove some of the detector parts (defective pixels, edge pixels of the modules.. ). Currently definition of that detector parts are made manually and hardcoded in the worker.py code. Activation of that removal is done with `'apply_additional_mask': 0/1` input flag                
+      When event propagation is integrated into the detector, dap allows frames to be tagged accordingly, facilitating their categorization during visualization.
      
-### Input parameters (file)
+      Presently supported markers:
+        * `'laser_on': True/False` - Marks frames as "laser activated" when the darkshot event code is False and the laser event code is True. Otherwise, frames are labeled as "not laser activated".
          
- All input parameters to algorithms described in the previous section, are specified in the json file which provided as an input to the worker.py application (--peakfinder_parameters option). Each of the worker constantly monitor update to that json file and in case of the change - re-loads that file to apply new set of parameters to the data stream.
+   * **Detection of Saturated Pixels**

- Example of the json file:
- ```
+      For every frame received by dap, an analysis is performed to ascertain the quantity and positions of saturated pixels.
+    
+      Algorithm Output:
+        * `'saturated_pixels': int` - Number of saturated pixels within the frame.
+        * `'saturated_pixels_x/saturated_pixels_y': list[float]/list[float]` - Coordinates of the saturated pixels.  
+
+   * **Transmitted Parameters from dap Input to Visualization**
+
+      Certain input parameters in dap remain unused during the dap processing phase. However, these parameters are transmitted to the visualization component, where they serve to depict specific data characteristics:
+       * `'disabled_modules': list[int]` - Enumerates the disabled module numbers for visualization.
+       * `'detector_distance': float`  - Distance between sample and detector in meters.
+       * `'beam_energy': float` -  Photon beam energy in eV.
+
+   * **Implement Additional Masking**
+
+       Sensitive algorithms, such as the peakfinder, often necessitate the exclusion of specific detector regions (like defective or module edge pixels). Presently, defining these detector segments requires manual hardcoding within the worker.py code.
+       
+       Use the `'apply_additional_mask': 0/1`  - Input flag to enable this functionality.
+
+   * **Filter based on pulse picker information**
+       
+       If the event propagation capability is accessible for the detector and the pulse picker information is correctly configured for propagation, the filtration based on pulse picker information becomes feasible by using the 
+       `'select_only_ppicker_events': 0/1` - Input flag.
+
+## Input parameters (File)
+
+Algorithms use input parameters specified in a JSON file provided to worker.py (`--peakfinder_parameters`). It constantly monitors this file for updates to apply new parameters.
+
+ Example JSON:
+ ```json
 {
    "beam_center_x": 1119.0,
    "beam_center_y": 1068.0,
@ -208,6 +229,7 @@ There are several algorithms implemented in dap, following the request/need of t
    "do_radial_integration": 0,
    "do_spi_analysis": 0,
    "threshold_value": "NaN",
+    "select_only_ppicker_events": 0,
    "disabled_modules": [],
    "roi_x1": [],
    "roi_y1": [],
@ -215,8 +237,8 @@ There are several algorithms implemented in dap, following the request/need of t
    "roi_y2": []
 }
 ```
-That's the example of real dap input parameter file for 4M detector of Alvra. Peakfinder parameter is selected to process frame. No aggregation, thresholding, module disabling, radial integration are requested.

-## Acknowledgment
+# Acknowledgment
+
+Special thanks to Valerio Mariani for providing the cython implementation of peakfinder8.

- Initially DAP was made for SFX analysis, to run peak finder on a stream of data. peakfinder8 from cheetah was used for this purpose and big thanks to Valerio Mariani who provided cython implementation of peakfinder8