Files
Jungfraujoch/docs/JFJOCH_WRITER.md
2025-05-07 16:24:35 +02:00

15 KiB

jfjoch_writer

jfjoch_writer is NeXus compliant HDF5 file writer.

Acknowledgements

  • Zdenek Matej (MAX IV)
  • Felix Engelmann (MAX IV) for testing and multiple improvement suggestions.

Running directory

Writer needs to be running in base directory for writing files - file_prefix will be always relative in regard to writer running directory. Writer detects and protects for basic security issues, like file_prefix starting with a slash, or starting with ../, or containing /../.

Usage

Writer needs to be started as a background service, with the following command:

jfjoch_writer {options} <address to connect via ZeroMQ to DCU>

Options:
-R<int> | --root_dir=<int>           Root directory for file writing
-H<int> | --http_port=<int>          HTTP port for statistics
-r<int> | --zmq_repub_port=<int>     ZeroMQ port for PUSH socket to republish images
-f<int> | --zmq_file_port=<int>      ZeroMQ port for PUB socket for notifications on finalized files
-w<int> | --rcv_watermark=<int>      Receiving ZeroMQ socket watermark (default = 100)
-W<int> | --repub_watermark=<int>    Republish ZeroMQ socket watermark (default = 1000)

for example:

jfjoch_writer -H5234 tcp://dcu-address:5400 

HTTP interface

Writer has dedicated status interface via HTTP. It allows for two operations:

  • check state of the writer to check if the writer is properly synchronized with DCU (e.g., that file_prefix agrees with what was set on the DCU) and monitor progress.
  • cancel writing this will close all the HDF5 files being written and restart writer - the option should be used only if DCU process was terminated or disconnected, it SHOULD NOT be used as standard cancellation procedure (when DCU received cancel command it should properly finish writing as well)

Republish

Republish creates a PULL socket on the writer, where all the messages are republished for further use by data analysis pipeline. Republish is non-blocking, so if there is no receiver on other end or the sending queue is full - images won't be republished. In case of START/END messages republishing will attempt sending for 100 ms, but if send times out it won't be retried.

Republish functionality is optional, if republish port number is omitted this functionality is not enabled.

Overwriting files

When jfjoch_writer creates a HDF5 file, it first adds suffix .<random>.tmp. Random value depends on current time-stamp and likely will be different from each file of the particular series. After file is all saved and closed, it is renamed to remove the suffix. By default, renaming won't happen if this would overwrite existing file. However, this behavior can be changed by setting overwrite parameter to true in the file writer configuration.

Finalized files information

Creates PUB socket to inform about finalized data files. For each closed file, the socket will send a JSON message, with the following structure:

{
  "filename": <string>: HDF5 data file name (relative to writer root directory),
  "nimages": <int> number of images in the file (counting from 1!),
  "file_number": <int> number of file within the acquisition,
  "sample_name": <string> name of sample,
  "run_name": <string> name of run,
  "run_number": <int> number of run,
  "experiment_group": <string> number of p-group / proposal (optional),
  "user_data": <any json> user_data,
  "beam_x_pxl": <float> beam center (X) in pixels,
  "beam_y_pxl": <float> beam center (Y) in pixels,
  "detector_distance_m": <float> detector distance (X) in m,
  "detector_height_pxl": <int> detector size (X) in pixels,
  "detector_width_pxl": <int> detector size (Y) in pixels,
  "incident_energy_eV": <float> photon energy of the X-ray beam,
  "pixel_size_m": <float> pixel size in meter (assuming pixel X == Y),
  "saturation": <int> this count and higher mean saturation,
  "space_group_number": <int> space group number (optional),
  "underload": <int> pixels with this count should be excluded,
  "unit_cell": <optinal> unit cell dimensions in Angstrom/degree {
    "a": <float>, "b": <float>, "c": <float>,
    "alpha": <float>, "beta": <float>, "gamma": <float>
  },
}

user_data is defined as header_appendix in the /start operation in the jfjoch_broker. Other metadata are also carried over from /start operation.

If the header_appendix is a string with valid JSON meaning, it will be embedded as JSON, otherwise it will be escaped as string. For example header_appendix of {"param1": "test1", "param2": ["test1", "test2"]}, than example message will look as follows:

{
  "filename": "dataset_name_data_000001.h5",
  "nimages": 1000,
  "file_number": 0,
  "sample_name": "lysozyme",
  "run_name": "lyso_cryo",
  "run_number": 25,
  "experiment_group": "p00001",
  "beam_x_pxl": 1200,
  "beam_y_pxl": 1500,
  "detector_distance_m": 0.155,
  "detector_height_pxl": 2164,
  "detector_width_pxl": 2068,
  "image_time_s": 0.001,
  "nimages": 2,
  "incident_energy_eV": 12400.0,
  "pixel_size_m": 7.5e-05,
  "saturation": 32766,
  "space_group_number": 96,
  "underload": -32768,
  "unit_cell": {
    "a": 78.0,
    "alpha": 90.0,
    "b": 78.0,
    "beta": 90.0,
    "c": 39.0,
    "gamma": 90.0
  },
  "user_data": {
    "param1": "test1", 
    "param2": ["test1", "test2"]
  }
}

Notifications for finalized files are optional, if notification port number is omitted this functionality is not enabled.

HDF5 file structure

Jungfraujoch aims to generate files compliant with NXmx format.

Master file

There are custom extension to NXmx format. These will be documented in the future.

Specifically, if data collection was configured with header_appendix having key equal to hdf5 and value as JSON object with number and string values. These will be added to /entry/user.

There are two versions of master file possible.

By default, legacy version is used. This version is compatible with DECTRIS file writer version 1.0 format. This ensures the file compatibility of Neggia and Durin XDS plugins, as well as DECTRIS Albula viewer version 4.0. Distinct feature is that if images are split into data files, there will be multiple links in /entry/data, each corresponding to a data file. Yet, certain new HDF5 features, like virtual datasets, are not possible in this format since it has to be compatible with HDF5 1.8 features.

Therefore, we have enabled format VDS version. This will link to all data files via a single virtual dataset /entry/data/data. The same way spot finding, azimuthal integration and others, will be linked between master and data files. This format allows to display processing results in currently developed Jungfraujoch Viewer. For the time being it only works with Durin XDS plugin, and require DECTRIS Albula viewer version 4.1+.

Data file

Data file has the following structure:

Location Description Optional Linked in master file v. 2
/entry/data/data Images X
/entry/detector/timestamp Timestamp of the image
/entry/detector/exptime Exposure time of the image
/entry/detector/number Image number; if image rejection was used this will be the original image number
/entry/detector/det_info Debug field of the JF detector X
/entry/detector/storage_cell_image Storage cell number X X *
/entry/detector/rcv_delay Receiver delay for the image (Jungfraujoch debugging) X
/entry/detector/rcv_free_send_buffers Receiver number of free send buffers at the time of sending the image (Jungfraujoch debugging) X
/entry/detector/data_collection_efficiency_image Ratio of received and expected UDP packets X X *
/entry/detector/packets_expected Number of UDP packets expected for the image X
/entry/detector/packets_received Number of UDP packets received for the image X
/entry/image/max_value Max viable value of the image (excl. overloads, etc.) X
/entry/azint/bin_to_q Azimuthal integration - bin-to-Q mapping X
/entry/azint/image Azimuthal integration - per image X X
/entry/MX/peakXPosRaw Peak position X (see CXI format) X X
/entry/MX/peakYPosRaw Peak position Y (see CXI format) X X
/entry/MX/peakTotalIntensity Peak total intensity (see CXI format) X X
/entry/MX/nPeaks Number of peaks per image (see CXI format) X X
/entry/MX/strongPixels Number of strong pixel per image X X
/entry/MX/nPeaksRingFiltered Number of peaks not belonging to rings X X
/entry/MX/imageIndexed Image is successfully indexed X X
/entry/MX/latticeIndexed Crystal lattice for the image, assuming it is indexed X X
/entry/MX/bkgEstimate Mean value of pixels in the radius of 3-5 A X X
/entry/MX/resolutionEstimate Resolution estimate based on ML model from SSRL X X
/entry/roi/{roi_name}/max Max pixel value for roi named {roi_name} X X
/entry/roi/{roi_name}/sum Sum pixel value for roi named {roi_name} X X
/entry/roi/{roi_name}/sum_sq Sum pixel values squared for roi named {roi_name} X X
/entry/roi/{roi_name}/npixel Number of valid pixel for roi named {roi_name} X X
/entry/roi/{roi_name}/x Weighted X-coordinate for roi named {roi_name} X X
/entry/roi/{roi_name}/y Weighted Y-coordinate for roi named {roi_name} X X
/entry/xfel/pulseID Pulse ID (for XFEL only) X X
/entry/xfel/eventCode Event code (for XFEL only) X X

* - Datasets from /entry/detector in data file are mapped to /entry/instrument/detector/detectorSpecific in master file.

If spot finding is enabled, spots are written in the CXI format and are recognized by CrystFEL. The following has to be added to the CrystFEL geometry file:

peak_list = /opt/MX
peak_list_type = cxi

Other formats (CBF and TIFF)

In addition to HDF5 format, Jungfraujoch allows to save images in the Crystallographic Binary File (CBF) format. CBF files are written according to miniCBF format, with only basic header, and always with 32-bit signed integer format. Dynamic range is reduced to max 2^24, negative numbers are zeroed, and masked, and/or bad pixels are set to -1.

Also writing to TIFF files is possible, though no metadata are saved in this case.