New docs/HDF5.md documents the on-disk HDF5/NeXus format produced by the writer: a FAIR/derived-metadata rationale (CXI-style per-image spot layout, NXreflections for integration), the master/data-file layout and the three NXmx format variants, the NXmx-standard fields that are populated, and every Jungfraujoch extension group (/entry/MX, /entry/reflections, /entry/azint, /entry/roi, /entry/image, /entry/profiling, /entry/detector, /entry/xfel, detectorSpecific, calibration, fluorescence, user). Content is derived from writer/HDF5NXmx.cpp and writer/HDF5DataFilePlugin*.cpp and cross-checked against the NXmx and NXreflections definitions. JFJOCH_WRITER.md's stale, partial structure table is replaced by a pointer to the new doc; HDF5 is added to the Sphinx toctree. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
6.8 KiB
jfjoch_writer
jfjoch_writer is NeXus compliant HDF5 file writer.
Acknowledgements
- Zdenek Matej (MAX IV)
- Felix Engelmann (MAX IV) for testing and multiple improvement suggestions.
Running directory
Writer needs to be running in base directory for writing files - file_prefix will be always relative in regard to writer running directory.
Writer detects and protects for basic security issues, like file_prefix starting with a slash, or starting with ../, or containing /../.
Usage
Writer needs to be started as a background service, with the following command:
jfjoch_writer {options} <address to connect via ZeroMQ to DCU>
Options:
-R<int> | --root_dir=<int> Root directory for file writing
-H<int> | --http_port=<int> HTTP port for statistics
-r<int> | --zmq_repub_port=<int> ZeroMQ port for PUSH socket to republish images
-f<int> | --zmq_file_port=<int> ZeroMQ port for PUB socket for notifications on finalized files
-w<int> | --rcv_watermark=<int> Receiving ZeroMQ socket watermark (default = 100)
-W<int> | --repub_watermark=<int> Republish ZeroMQ socket watermark (default = 1000)
for example:
jfjoch_writer -H5234 tcp://dcu-address:5400
HTTP interface
Writer has dedicated status interface via HTTP. It allows for two operations:
- check state of the writer to check if the writer is properly synchronized with DCU (e.g., that
file_prefixagrees with what was set on the DCU) and monitor progress. - cancel writing this will close all the HDF5 files being written and restart writer - the option should be used only if DCU process was terminated or disconnected, it SHOULD NOT be used as standard cancellation procedure (when DCU received cancel command it should properly finish writing as well)
Republish
Republish creates a PULL socket on the writer, where all the messages are republished for further use by data analysis pipeline. Republish is non-blocking, so if there is no receiver on other end or the sending queue is full - images won't be republished. In case of START/END messages republishing will attempt sending for 100 ms, but if send times out it won't be retried.
Republish functionality is optional, if republish port number is omitted this functionality is not enabled.
Overwriting files
When jfjoch_writer creates a HDF5 file, it first adds suffix .<random>.tmp.
Random value depends on current time-stamp and likely will be different from each file of the particular series.
After file is all saved and closed, it is renamed to remove the suffix.
By default, renaming won't happen if this would overwrite existing file.
However, this behavior can be changed by setting overwrite parameter to true in the file writer configuration.
Finalized files information
Creates PUB socket to inform about finalized data files. For each closed file, the socket will send a JSON message, with the following structure:
{
"filename": <string>: HDF5 data file name (relative to writer root directory),
"nimages": <int> number of images in the file (counting from 1!),
"file_number": <int> number of file within the acquisition,
"sample_name": <string> name of sample,
"run_name": <string> name of run,
"run_number": <int> number of run,
"experiment_group": <string> number of p-group / proposal (optional),
"user_data": <any json> user_data,
"beam_x_pxl": <float> beam center (X) in pixels,
"beam_y_pxl": <float> beam center (Y) in pixels,
"detector_distance_m": <float> detector distance (X) in m,
"detector_height_pxl": <int> detector size (X) in pixels,
"detector_width_pxl": <int> detector size (Y) in pixels,
"incident_energy_eV": <float> photon energy of the X-ray beam,
"pixel_size_m": <float> pixel size in meter (assuming pixel X == Y),
"saturation": <int> this count and higher mean saturation,
"space_group_number": <int> space group number (optional),
"underload": <int> pixels with this count should be excluded,
"unit_cell": <optinal> unit cell dimensions in Angstrom/degree {
"a": <float>, "b": <float>, "c": <float>,
"alpha": <float>, "beta": <float>, "gamma": <float>
},
}
user_data is defined as header_appendix in the /start operation in the jfjoch_broker.
Other metadata are also carried over from /start operation.
If the header_appendix is a string with valid JSON meaning, it will be embedded as JSON, otherwise it will be escaped as string.
For example header_appendix of {"param1": "test1", "param2": ["test1", "test2"]}, than example message will look as follows:
{
"filename": "dataset_name_data_000001.h5",
"nimages": 1000,
"file_number": 0,
"sample_name": "lysozyme",
"run_name": "lyso_cryo",
"run_number": 25,
"experiment_group": "p00001",
"beam_x_pxl": 1200,
"beam_y_pxl": 1500,
"detector_distance_m": 0.155,
"detector_height_pxl": 2164,
"detector_width_pxl": 2068,
"image_time_s": 0.001,
"nimages": 2,
"incident_energy_eV": 12400.0,
"pixel_size_m": 7.5e-05,
"saturation": 32766,
"space_group_number": 96,
"underload": -32768,
"unit_cell": {
"a": 78.0,
"alpha": 90.0,
"b": 78.0,
"beta": 90.0,
"c": 39.0,
"gamma": 90.0
},
"user_data": {
"param1": "test1",
"param2": ["test1", "test2"]
}
}
Notifications for finalized files are optional, if notification port number is omitted this functionality is not enabled.
HDF5 file structure
Jungfraujoch writes NXmx-compliant HDF5, with substantial derived metadata (spot finding, indexing,
integration, azimuthal integration, per-image statistics and timing) stored beyond the NXmx
standard. The complete file layout — master vs data files, the three format variants
(NXmxLegacy, NXmxVDS, NXmxIntegrated), every NXmx field that is populated and every
Jungfraujoch extension — is documented in HDF5 / NeXus data format.
If data collection was configured with a header_appendix containing a key hdf5 whose value is a
JSON object of numbers and strings, those entries are written to /entry/user.
Other formats (CBF and TIFF)
In addition to HDF5 format, Jungfraujoch allows to save images in the Crystallographic Binary File (CBF) format. CBF files are written according to miniCBF format, with only basic header, and always with 32-bit signed integer format. Dynamic range is reduced to max 2^24, negative numbers are zeroed, and masked, and/or bad pixels are set to -1.
Also writing to TIFF files is possible, though no metadata are saved in this case.
No file option(s)
There are two options to disable writing of files by the writer:
- Setting
file_prefixto empty string - this will disable sending files on ZeroMQ image socket. - Setting file format to
NoFile- files are streamed over ZeroMQ socket, butjfjoch_writerwill not write anything. This can be useful for debugging purposes, or if you only rely on republishing functionality of thejfjoch_writer