6b95600260
In VDS mode the per-image ROI results (max/sum/sum_sq/npixel/x/y) are written into the data files but were not exposed in the master, so a VDS master surfaced no ROI statistics. Add virtual datasets under /entry/roi/<name> in LinkToData_VDS, one group per ROI, mirroring how the spot-finding and azimuthal-integration arrays are linked. Integrated and legacy formats are unaffected (the results are already reachable there). Extended the reader round-trip test to write real ROI results and check they read back from the master for both VDS and integrated formats. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
466 lines
25 KiB
Markdown
466 lines
25 KiB
Markdown
# HDF5 / NeXus data format
|
||
|
||
Jungfraujoch stores images and on-the-fly analysis results in HDF5 files that aim to be
|
||
[NXmx](https://manual.nexusformat.org/classes/applications/NXmx.html)-compliant. On top of the
|
||
NXmx application definition, Jungfraujoch records a substantial amount of *derived* metadata
|
||
(spot finding, indexing, integration, azimuthal integration, per-image statistics, timing). These
|
||
extra entries do not exist in NXmx and are documented here so that the layout is unambiguous and
|
||
reusable.
|
||
|
||
This page documents the **file layout and the data fields**. The operational behaviour of the
|
||
writer (running, republishing, file finalisation, CBF/TIFF output) is described in
|
||
[jfjoch_writer](JFJOCH_WRITER.md). The wire format that feeds the writer is described in
|
||
[CBOR messages](CBOR.md); fields below frequently correspond one-to-one to CBOR message fields, and
|
||
that document is a useful companion for their meaning.
|
||
|
||
## 1. Motivation: derived metadata and FAIR data
|
||
|
||
The goal of Jungfraujoch is not only to store high-throughput datasets efficiently, but to keep
|
||
them findable, accessible, interoperable and reusable (FAIR). Jungfraujoch is used for both
|
||
**rotation** macromolecular crystallography (single- and multi-crystal, including fine-sliced and
|
||
helical scans) and **serial** crystallography (stills, grid scans); the same concerns apply to both:
|
||
|
||
* **Findability.** Raw diffraction images carry almost no descriptive metadata about *content*.
|
||
Quantities such as background level, number of diffraction spots, or indexing outcome let a user
|
||
judge the quality and relevance of a dataset *before* inspecting the raw images.
|
||
* **Accessibility at scale.** A single experiment can span tens to hundreds of terabytes. Standard
|
||
retrieval (e.g. HTTP) makes a dataset *available* but not *inspectable* — users would otherwise
|
||
have to download a large fraction of the data just to decide whether it is useful. Compact
|
||
derived representations make discovery, assessment and reuse feasible.
|
||
|
||
Because Jungfraujoch couples acquisition with real-time analysis used to *steer* experiments,
|
||
transparency and reproducibility of that analysis matter. As a minimum the writer therefore
|
||
preserves spot-finding and indexing results together with the filters that were applied, and it can
|
||
retain an unbiased, down-sampled reference set of unfiltered images for validation and reuse.
|
||
|
||
### Two complementary layouts: per-image spots vs. a reflection table
|
||
|
||
Jungfraujoch stores analysis products in two shapes, matching how each is accessed.
|
||
|
||
**Per-image spot finding / indexing.** Spot finding and indexing are inherently *image-centric* —
|
||
the natural query is "give me the spots for image *n*" — and this holds for serial stills and for
|
||
rotation frames alike. For these products Jungfraujoch adopts a layout similar to the
|
||
[Coherent X-ray Imaging (CXI) data bank](https://www.cxidb.org) (Maia, 2012) and the convention
|
||
understood by [CrystFEL](https://www.desy.de/~twhite/crystfel/): spot properties (position,
|
||
intensity, Miller index, …) are stored in fixed-size two-dimensional arrays indexed by image number,
|
||
with each image allocated room for up to a predefined maximum number of spots. These dense arrays
|
||
are addressed with ordinary HDF5 hyperslab reads, so the spots of a single image are retrieved
|
||
without traversing variable-length structures. The cost is some storage overhead for unused slots
|
||
(padded with sentinels), which is acceptable for the access pattern.
|
||
|
||
**Integrated reflections.** Integrated intensities are naturally a *dataset-wide* table, which is
|
||
exactly the model of the NeXus
|
||
[NXreflections](https://manual.nexusformat.org/classes/base_classes/NXreflections.html) base class.
|
||
This fits rotation crystallography well, and Jungfraujoch uses NXreflections for its integration
|
||
results (see §4.2 below). We deliberately do *not* force spot finding/indexing into a single
|
||
experiment-wide table: across the hundreds of thousands of patterns typical of serial — or
|
||
fine-sliced rotation — experiments, that would require aggregating the whole experiment before the
|
||
spots of one image can be read. We encourage the community to develop standardised NeXus application
|
||
definitions for image-centric crystallography products that combine NeXus interoperability with the
|
||
access patterns and scale of modern high-throughput experiments.
|
||
|
||
## 2. File layout
|
||
|
||
A run is written as one **master file** plus, depending on the format, one or more **data files**:
|
||
|
||
```
|
||
<prefix>_master.h5 # NXmx master file (metadata + links / virtual datasets)
|
||
<prefix>_data_000001.h5 # data file: images + per-image analysis
|
||
<prefix>_data_000002.h5
|
||
...
|
||
```
|
||
|
||
The master file is produced by `writer/HDF5NXmx.cpp`; data files by `writer/HDF5DataFile.cpp` and
|
||
its plugins (`writer/HDF5DataFilePlugin*.cpp`). Files are written to a temporary `*.<random>.tmp`
|
||
name and renamed on successful close.
|
||
|
||
Three master-file variants exist (set via `file_format`):
|
||
|
||
| Format | Value | Master ↔ data linking |
|
||
|--------|:-----:|------------------------|
|
||
| **NXmxLegacy** (default) | 1 | One external link in `/entry/data` per data file (`data_000001`, …). HDF5 1.8 compatible — works with Neggia/Durin XDS plugins and Albula 4.0. |
|
||
| **NXmxVDS** | 2 | A single virtual dataset `/entry/data/data` spans all data files; spot finding, azimuthal integration and reflections are linked the same way. Requires HDF5 1.10 / Albula 4.1+. |
|
||
| **NXmxIntegrated** | 3 | No separate data files — images and all metadata live in one file. Equivalent in content to the VDS format. |
|
||
|
||
In legacy/VDS mode, image-indexed analysis arrays live in the **data files** and are exposed in the
|
||
master file through external links or virtual datasets; in integrated mode they are written
|
||
directly into the single file. Throughout this document a "✓ in master" column marks entries that
|
||
are visible (directly or via link/VDS) from the master file.
|
||
|
||
Images are stored chunked (one image per chunk) and compressed with bitshuffle + LZ4 or
|
||
bitshuffle + Zstd; signed integer image datasets use `INTx_MIN` as the HDF5 fill value (the
|
||
"masked / no-data" sentinel), unsigned use `UINTx_MAX`.
|
||
|
||
### Reprocessing output: `<prefix>_process.h5`
|
||
|
||
The offline reprocessing tool [`jfjoch_process`](TOOLS.md) (`tools/jfjoch_process.cpp`) re-runs the
|
||
full analysis pipeline (spot finding, indexing, refinement, integration, scaling) on an existing
|
||
dataset and writes its results to a master file named **`<prefix>_process.h5`**. This file uses the
|
||
**integrated** format, but instead of copying the images its `/entry/data/data` is a *virtual
|
||
dataset that links back to the original image files* (`hdf5_source_data` →
|
||
`NXmx::LinkToData_ProcessingVDS`). The result is a compact, self-describing companion file that
|
||
holds *all* the derived analysis (everything in §4) plus a virtual view
|
||
of the raw images — without duplicating terabytes of data.
|
||
|
||
This is a particularly FAIR-friendly artefact: it can be shared or archived alongside (or instead
|
||
of) the raw data to convey what is in a dataset and how it processed, while the `/entry/data/data`
|
||
VDS still resolves to the original images when they are available. `jfjoch_process` can also process
|
||
an equally-spaced *subset* of images (start/end/stride), producing a down-sampled reference set.
|
||
|
||
## 3. NXmx-standard content
|
||
|
||
The entries below are part of, or valid base classes for, the
|
||
[NXmx](https://manual.nexusformat.org/classes/applications/NXmx.html) application definition.
|
||
"NXmx" = listed in the application definition; "base" = a valid field of the relevant NeXus base
|
||
class (`NXdetector`, `NXsample`, `NXsource`) but not in the NXmx required/recommended subset.
|
||
|
||
### `/entry` (NXentry)
|
||
|
||
| Field | Std | Notes |
|
||
|-------|:---:|-------|
|
||
| `definition` | NXmx | value `"NXmx"` |
|
||
| `start_time` | NXmx | arming time |
|
||
| `end_time`, `end_time_estimated` | NXmx | approximate end time |
|
||
|
||
File-level HDF5 attributes `file_name`, `file_time`, `HDF5_Version` are also set.
|
||
|
||
### `/entry/source` (NXsource), `/entry/instrument` (NXinstrument)
|
||
|
||
| Field | Std | Units |
|
||
|-------|:---:|-------|
|
||
| `source/name`, `source/type` | NXmx / base | |
|
||
| `source/current` | base | A |
|
||
| `instrument/name` | NXmx | |
|
||
|
||
### `/entry/instrument/beam` (NXbeam)
|
||
|
||
| Field | Std | Units |
|
||
|-------|:---:|-------|
|
||
| `incident_wavelength` | NXmx | angstrom |
|
||
| `incident_wavelength_spread` | NXmx | angstrom (only if polychromatic) |
|
||
| `total_flux` | NXmx | Hz |
|
||
|
||
### `/entry/instrument/attenuator` (NXattenuator)
|
||
|
||
| Field | Std |
|
||
|-------|:---:|
|
||
| `attenuator_transmission` | NXmx |
|
||
|
||
### `/entry/instrument/detector` (NXdetector)
|
||
|
||
| Field | Std | Units |
|
||
|-------|:---:|-------|
|
||
| `depends_on` | NXmx | → `transformations/rot3` |
|
||
| `beam_center_x`, `beam_center_y` | NXmx | pixel |
|
||
| `distance` | NXmx | m |
|
||
| `count_time`, `frame_time` | NXmx | s |
|
||
| `sensor_thickness` | NXmx | m |
|
||
| `sensor_material` | NXmx | |
|
||
| `description` | NXmx | |
|
||
| `threshold_energy` | NXmx | eV (EIGER; written only for a single channel) |
|
||
| `x_pixel_size`, `y_pixel_size` | base | m |
|
||
| `serial_number` | base | |
|
||
| `bit_depth_readout` | NXmx | |
|
||
| `saturation_value` | NXmx | |
|
||
| `flatfield_applied` | NXmx | |
|
||
| `pixel_mask`, `pixel_mask_applied` | NXmx | `pixel_mask` is `[y, x]`, hard-linked from `detectorSpecific/pixel_mask` |
|
||
| `countrate_correction_applied` | NXmx | |
|
||
| `number_of_cycles` | base | frame-summation factor |
|
||
|
||
### `/entry/instrument/detector/transformations` (NXtransformations)
|
||
|
||
The NXtransformations *mechanism* (the `depends_on` chain, `transformation_type`, `vector`,
|
||
`offset` attributes) is standard. The axis **names** follow the PyFAI PONI convention chosen by
|
||
Jungfraujoch (see [DETECTOR_GEOMETRY](DETECTOR_GEOMETRY.md)):
|
||
|
||
| Axis | Type | Units | Depends on |
|
||
|------|------|-------|-----------|
|
||
| `translation` | translation | m | `.` |
|
||
| `rot1` | rotation | rad | `translation` |
|
||
| `rot2` | rotation | rad | `rot1` |
|
||
| `rot3` | rotation | rad | `rot2` |
|
||
|
||
### `/entry/instrument/detector/module` (NXdetector_module)
|
||
|
||
`data_origin`, `data_size`, `fast_pixel_direction`, `slow_pixel_direction`, `module_offset` — all
|
||
NXmx (`fast/slow_pixel_direction` and `module_offset` carry transformation attributes).
|
||
|
||
### `/entry/sample` (NXsample)
|
||
|
||
| Field | Std | Units / notes |
|
||
|-------|:---:|-------|
|
||
| `name` | NXmx | |
|
||
| `depends_on` | NXmx | points at the last goniometer / grid-scan axis, or `.` for stills |
|
||
| `temperature` | NXmx | K |
|
||
| `transformations/` (NXtransformations) | NXmx | rotation axis (e.g. `omega`) or grid-scan translation; hard-linked as `/entry/sample/goniometer` |
|
||
| `unit_cell` | base | `[a, b, c, α, β, γ]` |
|
||
| `ub_matrix` | base | `[1, 3, 3]`, Angstrom⁻¹ |
|
||
|
||
For a rotation scan the goniometer axis is written as a per-image angle array `<axis>` plus
|
||
`<axis>_end`, scalar `<axis>_range_average`, `<axis>_range_total`, and for helical scans
|
||
`<axis>_helical_x/_y/_z`. These extra goniometer datasets beyond the bare axis array are Jungfraujoch
|
||
conveniences.
|
||
|
||
### `/entry/data` (NXdata)
|
||
|
||
`data` (3-D image stack, `[n_images, y, x]`) with `image_nr_low` / `image_nr_high` attributes.
|
||
In legacy mode this group instead contains one external link `data_000001`, … per data file.
|
||
|
||
## 4. Extensions beyond NXmx
|
||
|
||
Everything in this section is **outside the NXmx standard**. Each group is declared with
|
||
`NX_class = NXcollection` (the NeXus-sanctioned container for non-standardised content) unless noted.
|
||
The per-image arrays are indexed by image number, padded to the run length and filled with a
|
||
sentinel (`NaN` for floats, `-1`/`0` for integer indices) where a quantity is absent.
|
||
|
||
### 4.1 `/entry/MX` — spot finding and indexing (CXI-style)
|
||
|
||
The flagship extension. Spot ("peak") properties are stored as fixed-size `[n_images, max_spots]`
|
||
arrays (CXI layout, recognised by CrystFEL); scalar-per-image quantities as `[n_images]` vectors.
|
||
In legacy/VDS mode these live in the data files and are linked/virtual-stacked into the master.
|
||
|
||
**Per-spot arrays `[n_images, max_spots]`:**
|
||
|
||
| Dataset | Units | Meaning | Indexing only |
|
||
|---------|-------|---------|:---:|
|
||
| `peakXPosRaw`, `peakYPosRaw` | pixel | spot position (raw detector frame) | |
|
||
| `peakTotalIntensity` | photons | spot intensity | |
|
||
| `peakIceRingRes` | | spot lies in an ice-ring resolution band | |
|
||
| `peakH`, `peakK`, `peakL` | | Miller indices of the (indexed) spot | ✓ |
|
||
| `peakDistEwaldSphere` | Å⁻¹ | distance of the spot from the Ewald sphere | ✓ |
|
||
| `peakIndexed` | | spot fits the indexing solution | ✓ |
|
||
| `peakLattice` | | lattice the spot belongs to (`-1` = unindexed) | ✓ |
|
||
|
||
**Per-image vectors `[n_images]`:**
|
||
|
||
| Dataset | Units | Meaning |
|
||
|---------|-------|---------|
|
||
| `nPeaks` | | number of spots stored for the image (CXI) |
|
||
| `strongPixels` | | strong-pixel count (first spot-finding stage) |
|
||
| `peakCountUnfiltered` | | spots found before filtering |
|
||
| `peakCountLowRes` | | low-resolution spots |
|
||
| `peakCountIceRingRes` | | spots inside ice-ring bands |
|
||
| `peakCountIndexed` | | spots fitting the indexing solution |
|
||
| `imageIndexed` | | image was indexed (0/1) |
|
||
| `indexingLatticeCount` | | number of lattices found for the image |
|
||
| `niggliClass` | | Niggli class of the indexed Bravais lattice (see *International Tables for Crystallography A* (2016), Vol. A, [Table 3.1.3.1](https://onlinelibrary.wiley.com/iucr/itc/Ac/ch3o1v0001/table3o1o3o1.pdf)) |
|
||
| `bravaisLattice` | | Bravais lattice short code, e.g. `aP`, `mC`, `oF`, `tI`, `hP`, `hR`, `cF` |
|
||
| `profileRadius` | Å⁻¹ | crystal profile radius |
|
||
| `mosaicity` | deg | mosaicity estimate |
|
||
| `bFactor` | Ų | per-image B-factor estimate |
|
||
| `resolutionEstimate` | Å | diffraction resolution estimate |
|
||
| `integratedReflections` | | number of integrated reflections |
|
||
| `bkgEstimate` | photons | mean background in the 3–5 Å resolution band |
|
||
| `beam_corr_x`, `beam_corr_y` | pixel | beam-center correction applied during processing |
|
||
| `imageScaleFactor` | | on-the-fly per-image scale factor *g* |
|
||
| `imageScaleCC` | | on-the-fly scaling correlation coefficient |
|
||
| `imageScaleMosaicity` | deg | scaling-model mosaicity |
|
||
| `imageScaleBFactor` | Ų | scaling-model B-factor |
|
||
|
||
**Per-image lattices:** `latticeIndexed` `[n_images, 9]` (Å) — the real-space lattice (flattened
|
||
3×3); `latticeIndexedExtra` `[n_images, max_extra_lattices, 9]` (Å) — additional orientation
|
||
variants.
|
||
|
||
**Run-level summaries** (written into the master `/entry/MX` at finalisation):
|
||
|
||
| Dataset | Units | Meaning |
|
||
|---------|-------|---------|
|
||
| `indexing_algorithm` | | `FFBIDX` / `FFT (CUDA)` / `FFT (FFTW)` |
|
||
| `geom_refinement_algorithm` | | e.g. `beam_center` |
|
||
| `rotationLatticeIndexed` | Å | whole-run rotation-indexing lattice (`[9]`) |
|
||
| `rotationLatticeIndexedExtra` | Å | additional whole-run lattices (`[m, 9]`) |
|
||
| `rotationLatticeNiggliClass` | | Niggli class of the run lattice |
|
||
| `imageIndexedMean` | | mean indexing rate over the run |
|
||
| `bkgEstimateMean` | photons | mean background over the run |
|
||
| `indexedLatticeCount` | | per-image lattice count summary (master). *Note: data files use `indexingLatticeCount`; readers accept either.* |
|
||
|
||
CrystFEL can read the spots directly with:
|
||
|
||
```
|
||
peak_list = /entry/MX
|
||
peak_list_type = cxi
|
||
```
|
||
|
||
### 4.2 `/entry/reflections` — integrated reflections (NXreflections)
|
||
|
||
Integrated reflections are stored **per image** as
|
||
`/entry/reflections/image_NNNNNN` groups, each declared `NX_class = NXreflections`. The columns map
|
||
mostly onto the standard
|
||
[NXreflections](https://manual.nexusformat.org/classes/base_classes/NXreflections.html) base class:
|
||
|
||
| Dataset | Units | NXreflections | Meaning |
|
||
|---------|-------|:-------------:|---------|
|
||
| `h`, `k`, `l` | | standard | Miller indices |
|
||
| `d` | Å | standard | resolution |
|
||
| `int_sum` | photons | standard | integrated intensity (summation) |
|
||
| `int_err` | photons | non-standard name | σ of the intensity (standard equivalent: `int_sum_errors`) |
|
||
| `background_mean` | photons | standard | mean background under the peak |
|
||
| `predicted_x`, `predicted_y` | pixel | name standard, units differ | predicted position. NXreflections `predicted_x/_y` are *physical* lengths; the pixel datasets are `predicted_px_x/_y` |
|
||
| `observed_x`, `observed_y` | pixel | name standard, units differ | observed centroid (pixels; standard pixel form is `observed_px_x/_y`) |
|
||
| `observed_frame` | | standard | image number of the reflection |
|
||
| `lp` | | standard | Lorentz–polarization factor (stored as `1/rlp`) |
|
||
| `partiality` | | standard | recorded fraction of the reflection |
|
||
| `delta_phi` | deg | **extension** | XDS Δφ: offset from the centre of the current frame |
|
||
| `zeta` | | **extension** | Lorentz ζ factor (reciprocal-space geometry term) |
|
||
| `image_scale_corr` | | **extension** | per-image scale correction; `I_true = image_scale_corr · int_sum` |
|
||
|
||
In the master file these per-image groups are exposed through `/entry/reflections` external links
|
||
(VDS/integrated formats).
|
||
|
||
### 4.3 `/entry/azint` — azimuthal integration
|
||
|
||
| Dataset | Shape | Units | Meaning |
|
||
|---------|-------|-------|---------|
|
||
| `bin_to_q` | `[φ_bins, q_bins]` | Å⁻¹ | q value of each bin |
|
||
| `bin_to_two_theta` | `[φ_bins, q_bins]` | deg | 2θ of each bin |
|
||
| `bin_to_phi` | `[φ_bins, q_bins]` | deg | azimuthal angle of each bin |
|
||
| `image` | `[n_images, φ_bins, q_bins]` | | per-image integrated profile (NaN for empty bins) |
|
||
| `image_std` | `[n_images, φ_bins, q_bins]` | | per-bin standard deviation |
|
||
| `image_count` | `[n_images, φ_bins, q_bins]` | | pixels contributing per bin |
|
||
| `map` | `[y, x]` | | pixel→bin mapping (master file only) |
|
||
|
||
### 4.4 `/entry/roi` — regions of interest (per-image results)
|
||
|
||
`/entry/roi/<roi_name>` has one sub-group per configured ROI, holding the **per-image result
|
||
vectors** `[n_images]`. These are written into the data files; in VDS mode they are exposed from
|
||
the master file through virtual datasets, and in integrated mode they are in the single file.
|
||
(In legacy mode they remain only in the data files.)
|
||
|
||
| Dataset | Meaning |
|
||
|---------|---------|
|
||
| `max` | maximum pixel value in the ROI |
|
||
| `sum` | sum of pixel values |
|
||
| `sum_sq` | sum of squared pixel values |
|
||
| `npixel` | number of valid pixels |
|
||
| `x`, `y` | intensity-weighted centroid |
|
||
|
||
### 4.4.1 `/entry/roi_defs` — ROI definitions (master file)
|
||
|
||
The **dataset-wide ROI definitions** (geometry, fixed for the whole acquisition) live in the
|
||
master file under a *separate* `/entry/roi_defs` group — kept apart from `/entry/roi` above so
|
||
that older readers, which iterate `/entry/roi`, are unaffected by these entries. One sub-group
|
||
`/entry/roi_defs/<roi_name>` per ROI:
|
||
|
||
| Dataset | Meaning |
|
||
|---------|---------|
|
||
| `bit_index` | which bit of `roi_map` (below) marks this ROI |
|
||
| `type` | `box`, `circle` or `azim` |
|
||
| `min_x_pxl`, `max_x_pxl`, `min_y_pxl`, `max_y_pxl` | box bounds (type `box`) |
|
||
| `center_x_pxl`, `center_y_pxl`, `radius_pxl` | circle (type `circle`) |
|
||
| `q_min_recipA`, `q_max_recipA` | Q range (type `azim`) |
|
||
| `phi_min_deg`, `phi_max_deg` | azimuthal-angle sector (type `azim`, omitted for a full ring) |
|
||
|
||
`/entry/roi_defs/roi_map` `[y, x]` is a `uint16` per-pixel bitmask: bit `bit_index` is set for
|
||
every pixel belonging to that ROI, so an ROI's footprint can be recovered exactly.
|
||
|
||
### 4.5 `/entry/image` — per-image pixel statistics
|
||
|
||
`[n_images]` vectors: `max_value`, `min_value` (viable min/max, excluding error/saturated pixels),
|
||
`error_pixels`, `saturated_pixels`, `pixel_sum`. Surfaced in the master file under `/entry/image`.
|
||
|
||
### 4.6 `/entry/profiling` — per-image timing
|
||
|
||
`[n_images]` vectors in seconds: `spotFindingTime`, `indexingTime`, `integrationTime`,
|
||
`refinementTime`, `processingTime`, `braggPredictionTime`, `preprocessingTime`, `compressionTime`,
|
||
`azIntTime`, `indexAnalysisTime`, `imageScaleTime`.
|
||
|
||
### 4.7 `/entry/detector` — acquisition diagnostics (data file)
|
||
|
||
A convenience NXcollection in the data file (note: distinct from the standard
|
||
`/entry/instrument/detector`). In **integrated** format these datasets are written under
|
||
`/entry/instrument/detector/detectorSpecific` instead.
|
||
|
||
| Dataset | Meaning |
|
||
|---------|---------|
|
||
| `timestamp`, `exptime` | per-image timestamp and exposure time |
|
||
| `number` | image number (original number if image rejection was used) |
|
||
| `det_info` | JUNGFRAU debug field |
|
||
| `storage_cell_image` | storage-cell number |
|
||
| `rcv_delay`, `rcv_free_send_buffers` | receiver internal diagnostics |
|
||
| `packets_expected`, `packets_received` | UDP packets per image |
|
||
| `data_collection_efficiency_image` | received / expected packet ratio |
|
||
|
||
### 4.8 `/entry/xfel` — pulsed-source metadata
|
||
|
||
`[n_images]` vectors `pulseID` and `eventCode`, written for pulsed sources (e.g. SwissFEL).
|
||
|
||
### 4.9 Other collections
|
||
|
||
| Path | Class | Content |
|
||
|------|-------|---------|
|
||
| `/entry/instrument/detector/detectorSpecific` | NXcollection | Dectris-style detector metadata + Jungfraujoch fields: `x_pixels_in_detector`, `y_pixels_in_detector`, `nimages`, `ntrigger`, `nimages_collected`, `nimages_written`, `data_collection_efficiency`, `max_receiver_delay`, `storage_cell_number`, `storage_cell_delay` [ns], `software_git_commit`, `software_git_date`, `jfjoch_release`, `jfjoch_writer_release`, `summation_mode`, `detect_ice_rings`, `gain_file_names`, `data_reduction_factor_serialmx`, `adu_histogram/`, `data_collection_efficiency_image` |
|
||
| `/entry/instrument/detector/calibration` | NXcollection | per-channel pedestal / calibration images (bitshuffle-compressed) |
|
||
| `/entry/instrument/fluorescence` | NXcollection | XRF spectrum: `energy` [eV], `data` |
|
||
| `/entry/user` | NXcollection | scalar values supplied under `header_appendix.hdf5` |
|
||
|
||
### 4.10 Non-standard fields inside the NXmx detector group
|
||
|
||
A few extension scalars are written *inside* the otherwise-standard `/entry/instrument/detector`
|
||
group for compatibility with existing tooling:
|
||
|
||
| Field | Units | Meaning |
|
||
|-------|-------|---------|
|
||
| `detector_distance` | m | duplicate of `distance` (Dectris/Neggia compatibility) |
|
||
| `detector_number` | | detector identifier (Dectris convention) |
|
||
| `error_value` | | masked/error pixel sentinel (NXmx standard would be `underload_value`) |
|
||
| `bit_depth_image` | | stored image bit depth (NXmx standard is `bit_depth_readout`) |
|
||
| `acquisition_type` | | always `triggered` (Dectris convention) |
|
||
| `jungfrau_conversion_applied` | | JUNGFRAU photon/keV conversion applied |
|
||
| `jungfrau_conversion_factor` | eV | conversion factor |
|
||
| `geometry_transformation_applied` | | module→full-detector geometry applied |
|
||
|
||
### 4.11 User-supplied metadata: `header_appendix` and `image_appendix`
|
||
|
||
Facilities frequently need to attach metadata that Jungfraujoch does not model explicitly. Two
|
||
free-form JSON fields in the `/start` request (`broker/jfjoch_api.yaml`) provide this without any
|
||
schema change; both accept *any valid JSON*:
|
||
|
||
| Field | Carried in | Persisted to HDF5? |
|
||
|-------|-----------|--------------------|
|
||
| `header_appendix` | the **start** message, under `user_data.user` (see [CBOR](CBOR.md)) | no — except the `hdf5` sub-object (below) |
|
||
| `image_appendix` | **every image** message, as `user_data` | no |
|
||
|
||
Both are forwarded verbatim through the ZeroMQ/CBOR stream to every downstream consumer (writer,
|
||
republished analysis, viewers), so they are the recommended channel for facility- or
|
||
beamline-specific provenance (proposal, operator, optics state, per-image trigger info, …) that has
|
||
no dedicated API field.
|
||
|
||
**Persisting selected values to HDF5.** `header_appendix` is normally *not* written to the master
|
||
file. As an exception, if it contains a key `hdf5` whose value is a JSON object of scalars (strings
|
||
and numbers — no arrays or nested objects), the writer stores each entry under `/entry/user/<key>`.
|
||
|
||
For example, a `/start` request containing:
|
||
|
||
```json
|
||
{
|
||
"header_appendix": {
|
||
"proposal": "p20001",
|
||
"operator": "jdoe",
|
||
"hdf5": { "beamline": "X06SA", "ring_mode": "top-up", "attenuator_foils": 2 }
|
||
},
|
||
"image_appendix": { "trigger_source": "external" }
|
||
}
|
||
```
|
||
|
||
forwards the whole `header_appendix` as `user_data.user` on the start message and
|
||
`{"trigger_source": "external"}` as `user_data` on every image message, and writes three scalars
|
||
into the master file:
|
||
|
||
```
|
||
/entry/user/beamline = "X06SA"
|
||
/entry/user/ring_mode = "top-up"
|
||
/entry/user/attenuator_foils = 2
|
||
```
|
||
|
||
## 5. Notes
|
||
|
||
* **Units** are written as the HDF5 `units` attribute on the dataset (e.g. `m`, `eV`, `deg`,
|
||
`Angstrom`, `Angstrom^-1`, `Angstrom^2`, `pixel`, `s`).
|
||
* **Sentinels.** Missing per-image values are `NaN` (floats) or `-1`/`0` (integer indices); image
|
||
pixels use `INTx_MIN` / `UINTx_MAX`.
|
||
* **Master vs data file.** In legacy/VDS formats the analysis arrays physically live in the data
|
||
files; the master file links to them (external links in legacy, virtual datasets in VDS). In the
|
||
integrated format there are no data files and everything is in one place.
|
||
* **CXI / CrystFEL.** `/entry/MX` follows the CXI peak-list convention; see
|
||
[CXI file format](https://raw.githubusercontent.com/cxidb/CXI/master/cxi_file_format.pdf).
|