15 KiB
jfjoch_writer
jfjoch_writer is NeXus compliant HDF5 file writer.
Acknowledgements
- Zdenek Matej (MAX IV)
- Felix Engelmann (MAX IV) for testing and multiple improvement suggestions.
Running directory
Writer needs to be running in base directory for writing files - file_prefix will be always relative in regard to writer running directory.
Writer detects and protects for basic security issues, like file_prefix starting with a slash, or starting with ../, or containing /../.
Usage
Writer needs to be started as a background service, with the following command:
jfjoch_writer {options} <address to connect via ZeroMQ to DCU>
Options:
-R<int> | --root_dir=<int> Root directory for file writing
-H<int> | --http_port=<int> HTTP port for statistics
-r<int> | --zmq_repub_port=<int> ZeroMQ port for PUSH socket to republish images
-f<int> | --zmq_file_port=<int> ZeroMQ port for PUB socket for notifications on finalized files
-w<int> | --rcv_watermark=<int> Receiving ZeroMQ socket watermark (default = 100)
-W<int> | --repub_watermark=<int> Republish ZeroMQ socket watermark (default = 1000)
for example:
jfjoch_writer -H5234 tcp://dcu-address:5400
HTTP interface
Writer has dedicated status interface via HTTP. It allows for two operations:
- check state of the writer to check if the writer is properly synchronized with DCU (e.g., that
file_prefixagrees with what was set on the DCU) and monitor progress. - cancel writing this will close all the HDF5 files being written and restart writer - the option should be used only if DCU process was terminated or disconnected, it SHOULD NOT be used as standard cancellation procedure (when DCU received cancel command it should properly finish writing as well)
Republish
Republish creates a PULL socket on the writer, where all the messages are republished for further use by data analysis pipeline. Republish is non-blocking, so if there is no receiver on other end or the sending queue is full - images won't be republished. In case of START/END messages republishing will attempt sending for 100 ms, but if send times out it won't be retried.
Republish functionality is optional, if republish port number is omitted this functionality is not enabled.
Overwriting files
When jfjoch_writer creates a HDF5 file, it first adds suffix .<random>.tmp.
Random value depends on current time-stamp and likely will be different from each file of the particular series.
After file is all saved and closed, it is renamed to remove the suffix.
By default, renaming won't happen if this would overwrite existing file.
However, this behavior can be changed by setting overwrite parameter to true in the file writer configuration.
Finalized files information
Creates PUB socket to inform about finalized data files. For each closed file, the socket will send a JSON message, with the following structure:
{
"filename": <string>: HDF5 data file name (relative to writer root directory),
"nimages": <int> number of images in the file (counting from 1!),
"file_number": <int> number of file within the acquisition,
"sample_name": <string> name of sample,
"run_name": <string> name of run,
"run_number": <int> number of run,
"experiment_group": <string> number of p-group / proposal (optional),
"user_data": <any json> user_data,
"beam_x_pxl": <float> beam center (X) in pixels,
"beam_y_pxl": <float> beam center (Y) in pixels,
"detector_distance_m": <float> detector distance (X) in m,
"detector_height_pxl": <int> detector size (X) in pixels,
"detector_width_pxl": <int> detector size (Y) in pixels,
"incident_energy_eV": <float> photon energy of the X-ray beam,
"pixel_size_m": <float> pixel size in meter (assuming pixel X == Y),
"saturation": <int> this count and higher mean saturation,
"space_group_number": <int> space group number (optional),
"underload": <int> pixels with this count should be excluded,
"unit_cell": <optinal> unit cell dimensions in Angstrom/degree {
"a": <float>, "b": <float>, "c": <float>,
"alpha": <float>, "beta": <float>, "gamma": <float>
},
}
user_data is defined as header_appendix in the /start operation in the jfjoch_broker.
Other metadata are also carried over from /start operation.
If the header_appendix is a string with valid JSON meaning, it will be embedded as JSON, otherwise it will be escaped as string.
For example header_appendix of {"param1": "test1", "param2": ["test1", "test2"]}, than example message will look as follows:
{
"filename": "dataset_name_data_000001.h5",
"nimages": 1000,
"file_number": 0,
"sample_name": "lysozyme",
"run_name": "lyso_cryo",
"run_number": 25,
"experiment_group": "p00001",
"beam_x_pxl": 1200,
"beam_y_pxl": 1500,
"detector_distance_m": 0.155,
"detector_height_pxl": 2164,
"detector_width_pxl": 2068,
"image_time_s": 0.001,
"nimages": 2,
"incident_energy_eV": 12400.0,
"pixel_size_m": 7.5e-05,
"saturation": 32766,
"space_group_number": 96,
"underload": -32768,
"unit_cell": {
"a": 78.0,
"alpha": 90.0,
"b": 78.0,
"beta": 90.0,
"c": 39.0,
"gamma": 90.0
},
"user_data": {
"param1": "test1",
"param2": ["test1", "test2"]
}
}
Notifications for finalized files are optional, if notification port number is omitted this functionality is not enabled.
HDF5 file structure
Jungfraujoch aims to generate files compliant with NXmx format.
Master file
There are custom extension to NXmx format. These will be documented in the future.
Specifically, if data collection was configured with header_appendix having key equal to hdf5 and value as JSON
object with number and string values. These will be added to /entry/user.
There are two versions of master file possible.
By default, legacy version is used. This version is compatible with DECTRIS file writer version 1.0 format.
This ensures the file compatibility of Neggia and Durin XDS plugins, as well as DECTRIS Albula viewer version 4.0.
Distinct feature is that if images are split into data files, there will be multiple links in /entry/data,
each corresponding to a data file.
Yet, certain new HDF5 features, like virtual datasets, are not possible in this format since it has to be compatible with HDF5 1.8 features.
Therefore, we have enabled format VDS version. This will link to all data files via a single virtual dataset /entry/data/data.
The same way spot finding, azimuthal integration and others, will be linked between master and data files.
This format allows to display processing results in currently developed Jungfraujoch Viewer.
For the time being it only works with Durin XDS plugin, and require DECTRIS Albula viewer version 4.1+.
Data file
Data file has the following structure:
| Location | Description | Optional | Linked in master file v. 2 |
|---|---|---|---|
| /entry/data/data | Images | X | |
| /entry/detector/timestamp | Timestamp of the image | ||
| /entry/detector/exptime | Exposure time of the image | ||
| /entry/detector/number | Image number; if image rejection was used this will be the original image number | ||
| /entry/detector/det_info | Debug field of the JF detector | X | |
| /entry/detector/storage_cell_image | Storage cell number | X | X * |
| /entry/detector/rcv_delay | Receiver delay for the image (Jungfraujoch debugging) | X | |
| /entry/detector/rcv_free_send_buffers | Receiver number of free send buffers at the time of sending the image (Jungfraujoch debugging) | X | |
| /entry/detector/data_collection_efficiency_image | Ratio of received and expected UDP packets | X | X * |
| /entry/detector/packets_expected | Number of UDP packets expected for the image | X | |
| /entry/detector/packets_received | Number of UDP packets received for the image | X | |
| /entry/image/max_value | Max viable value of the image (excl. overloads, etc.) | X | |
| /entry/azint/bin_to_q | Azimuthal integration - bin-to-Q mapping | X | |
| /entry/azint/image | Azimuthal integration - per image | X | X |
| /entry/MX/peakXPosRaw | Peak position X (see CXI format) | X | X |
| /entry/MX/peakYPosRaw | Peak position Y (see CXI format) | X | X |
| /entry/MX/peakTotalIntensity | Peak total intensity (see CXI format) | X | X |
| /entry/MX/nPeaks | Number of peaks per image (see CXI format) | X | X |
| /entry/MX/strongPixels | Number of strong pixel per image | X | X |
| /entry/MX/nPeaksRingFiltered | Number of peaks not belonging to rings | X | X |
| /entry/MX/imageIndexed | Image is successfully indexed | X | X |
| /entry/MX/latticeIndexed | Crystal lattice for the image, assuming it is indexed | X | X |
| /entry/MX/bkgEstimate | Mean value of pixels in the radius of 3-5 A | X | X |
| /entry/MX/resolutionEstimate | Resolution estimate based on ML model from SSRL | X | X |
| /entry/roi/{roi_name}/max | Max pixel value for roi named {roi_name} | X | X |
| /entry/roi/{roi_name}/sum | Sum pixel value for roi named {roi_name} | X | X |
| /entry/roi/{roi_name}/sum_sq | Sum pixel values squared for roi named {roi_name} | X | X |
| /entry/roi/{roi_name}/npixel | Number of valid pixel for roi named {roi_name} | X | X |
| /entry/roi/{roi_name}/x | Weighted X-coordinate for roi named {roi_name} | X | X |
| /entry/roi/{roi_name}/y | Weighted Y-coordinate for roi named {roi_name} | X | X |
| /entry/xfel/pulseID | Pulse ID (for XFEL only) | X | X |
| /entry/xfel/eventCode | Event code (for XFEL only) | X | X |
* - Datasets from /entry/detector in data file are mapped to /entry/instrument/detector/detectorSpecific in master file.
If spot finding is enabled, spots are written in the CXI format and are recognized by CrystFEL. The following has to be added to the CrystFEL geometry file:
peak_list = /opt/MX
peak_list_type = cxi
Other formats (CBF and TIFF)
In addition to HDF5 format, Jungfraujoch allows to save images in the Crystallographic Binary File (CBF) format. CBF files are written according to miniCBF format, with only basic header, and always with 32-bit signed integer format. Dynamic range is reduced to max 2^24, negative numbers are zeroed, and masked, and/or bad pixels are set to -1.
Also writing to TIFF files is possible, though no metadata are saved in this case.