32 KiB
CPU-side crystallographic data analysis (Jungfraujoch)
This document describes the crystallographic algorithms implemented in Jungfraujoch for CPU- and GPU-side real‑time and near‑real‑time data analysis.
Scope. The pipeline covered here comprises:
- geometry mapping and corrections,
- azimuthal integration (powder/radial profiles),
- Bragg spot finding (strong pixels → connected components → spot descriptors),
- indexing (still and rotation modes),
- Bravais lattice / centering inference,
- geometry and lattice refinement,
- reflection prediction (still and rotation),
- Bragg integration by either 2D summation or reference-driven profile fitting,
- scaling and merging,
- merge-level error modelling and outlier rejection,
- auxiliary statistics (Wilson plot, ⟨I/σ(I)⟩, CC1/2, CCref).
References
The methods are inspired and reuising solutions implemented in:
- W. Kabsch, “XDS”, Acta Cryst. D66 (2010), 125–132 and related XDS papers (rotation geometry, partiality, scaling concepts).
- W. Kabsch, “Integration, scaling, space-group assignment and post-refinement”, Acta Cryst. D66 (2010), 133–144 (mosaicity/partiality likelihood treatment; notation such as ζ and rotation factors).
- T. A. White et al., CrystFEL method papers (spot finding, three‑ring integration, serial/still diffraction processing concepts).
- J. Kieffer & J. P. Wright, "PyFAI: a Python library for high performance azimuthal integration on GPU", Powder Diffraction 28 (2013), S339-S350 (detector geometry definition, azimuthal integration)
- H. Powell, "The Rossmann Fourier autoindexing algorithm in MOSFLM", Acta Cryst. D55 (1999), 1690-1695 (FFT indexing) (list is not exhaustive)
1. Geometry, reciprocal-space mapping, and basic quantities
1.1 Coordinate conventions
For a pixel coordinate (x,y) (in pixels), Jungfraujoch converts to a laboratory direction vector via:
- shift by direct-beam position
(x_\mathrm{beam}, y_\mathrm{beam}), - scale by pixel size
p(mm), - set detector distance
D(mm), - apply detector orientation rotation
R_\mathrm{det}(PyFAI-like parameterization).
The unnormalized detector coordinate (mm) is: $ \mathbf{r}\mathrm{det}(x,y) = \begin{pmatrix} (x-x\mathrm{beam})p\ (y-y_\mathrm{beam})p\ D \end{pmatrix}. $
The lab-frame vector is: $ \mathbf{r}\mathrm{lab} = R\mathrm{det},\mathbf{r}_\mathrm{det}. $
Let the incident wavevector magnitude be k = 1/\lambda in Å^{-1}, and define:
$
\mathbf{S}_0 = (0,0,k).
$
The reciprocal-space scattering vector associated with pixel (x,y) is:
$
\mathbf{s}(x,y) = k,\frac{\mathbf{r}\mathrm{lab}}{\lVert \mathbf{r}\mathrm{lab}\rVert} - \mathbf{S}_0.
$
This \mathbf{s} is the fundamental quantity used for spot finding (resolution filters), indexing, and refinement.
1.2 Two-theta, azimuth, resolution and q
The scattering angle 2\theta is computed from \mathbf{r}_\mathrm{lab} via:
$
2\theta = \arctan!\left(\frac{\sqrt{x_\mathrm{lab}^2 + y_\mathrm{lab}^2}}{z_\mathrm{lab}}\right).
$
Resolution (Å) at a pixel is: $ d = \frac{\lambda}{2\sin(\theta)} = \frac{\lambda}{2\sin(2\theta/2)}. $
The magnitude q = 2\pi/d is used for radial binning and ice-ring handling.
1.3 Distance from the Ewald sphere
For a reciprocal lattice point \mathbf{p} (Å^{-1}), define:
$
\Delta_\mathrm{Ewald}(\mathbf{p}) = \lVert \mathbf{p} + \mathbf{S}_0\rVert - k.
$
Jungfraujoch uses |\Delta_\mathrm{Ewald}| as an operational proxy for excitation error. This appears in:
- still prediction (accept if
|\Delta_\mathrm{Ewald}|\le \Delta_\mathrm{cut}), - profile radius estimation (see §7.1),
- still partiality option in scaling/merging (§9.3).
2. Azimuthal integration (radial profiles)
Azimuthal integration produces a 1D radial profile I(q) or I(d) by histogramming pixels into radial bins. Pixels are not split across bins; each pixel contributes wholly to a single bin.
2.1 Histogram estimator
Let bin index b(x,y)\in\{0,\dots,B-1\} be precomputed from q(x,y) (or equivalently from d(x,y)). For each bin b:
- accumulate corrected intensity: $ S_b = \sum_{(x,y):,b(x,y)=b} I(x,y),C(x,y), $
- and count: $ N_b = #{(x,y):,b(x,y)=b \text{ and pixel is valid}}. $
A simple mean profile is then \bar{I}_b = S_b / N_b (when N_b>0). Invalid pixels (masked, saturated, detector error codes) are excluded.
2.2 Corrections applied
Two standard corrections are available:
(i) Solid angle / geometric correction. A commonly used approximation for flat detectors gives a \cos^3(2\theta) factor:
$
C_\Omega(2\theta) = \cos^3(2\theta).
$
(ii) Polarization correction. With polarization coefficient P (beamline dependent) and azimuth \phi:
$
C_\mathrm{pol}(2\theta,\phi) =
\frac{1}{2}\left(1+\cos^2(2\theta) - P\cos(2\phi)\left(1-\cos^2(2\theta)\right)\right),
$
applied as a divisor to intensities (i.e. scale by 1/C_\mathrm{pol}) when enabled.
2.3 Background estimate for profiles
A background estimate is derived from the integrated profile using the azimuthal integration settings (details depend on the configured estimator). This background is used for monitoring and diagnostics; it is not the same as local Bragg-spot background used in summation integration (§8).
3. Spot finding (strong pixels → Bragg spots)
Spot finding is a two-stage process:
- Strong-pixel selection using intensity and/or local signal-to-noise criteria.
- Connected-component labeling (CCL) to group strong pixels into candidate spots, followed by spot-level filtering and feature extraction.
3.1 Strong-pixel detection by local statistics
For each pixel i with value v_i, consider a square window (nominally 31\times 31 pixels) around it. Let the window contain n valid pixels (excluding masked/bad/saturated), and define:
$
\Sigma = \sum v,\qquad \Sigma_2 = \sum v^2.
$
To avoid biasing the local statistics by the test pixel itself, Jungfraujoch evaluates the pixel against the window with the pixel removed: $ \Sigma' = \Sigma - v_i,\quad \Sigma_2' = \Sigma_2 - v_i^2,\quad n' = n-1. $
A variance-like quantity proportional to n'^2 is formed:
$
V = n'\Sigma_2' - (\Sigma')^2,
$
and the deviation-from-mean quantity:
$
\Delta = v_i n' - \Sigma'.
$
A pixel is considered strong if:
- it is above a photon/count threshold, and
\Delta>0, and- the squared deviation exceeds a scaled variance:
$
\Delta^2 > V\cdot T^2,
$
where
Tis the configured signal-to-noise threshold.
This is equivalent to a local z-score criterion but implemented in integer arithmetic to be robust and fast.
Special cases:
- saturated pixels can be forced to “strong” (useful for detecting overloaded Bragg spots),
- invalid pixels are never strong.
3.2 Resolution and ice-ring handling
Spot finding can be restricted to a resolution range [d_\mathrm{high}, d_\mathrm{low}] by masking pixels outside the range. Optionally, pixels in identified ice-ring regions can be tagged so that subsequent indexing/refinement may include or exclude them (see §4 and §6).
A further optional safeguard removes isolated high-resolution “spur” spots by detecting large gaps in 1/d (or q) space and discarding spots beyond the gap. This is intended for macromolecular diffraction where edge-of-detector backgrounds can be extremely low.
3.3 Connected-component labeling (CCL)
Strong pixels are grouped into connected components (adjacent strong pixels) using a CCL algorithm. Each component yields a candidate spot with:
- centroid
(x,y)(often intensity-weighted), - pixel count (spot size),
- integrated spot intensity proxy (sum of pixel values),
- resolution
dat the centroid (or mean over pixels), - and quality flags (e.g. ice-ring classification).
Spot-level filters include minimum/maximum pixel count and resolution limits.
4. Indexing overview
Indexing maps observed reciprocal-space vectors \mathbf{s}_i to a lattice such that:
$
\mathbf{s}_i \approx h_i\mathbf{a}^* + k_i\mathbf{b}^* + l_i\mathbf{c}^*,
$
with integer (h_i,k_i,l_i).
Jungfraujoch supports two complementary indexing strategies:
- FFT-based indexing (Rossmann-type): does not require an a priori unit cell; suitable for unknown samples.
- Fast-feedback indexing (TORO-like): requires an approximate unit cell; optimized for speed and feedback.
Both feed into a common robust refinement/selection stage which maximizes the number of inliers under an indexing tolerance.
4.1 Indexed-spot decision (inlier test)
Given a trial lattice with direct basis vectors \mathbf{a},\mathbf{b},\mathbf{c} (used here as reciprocal-space dot-test vectors), fractional indices are estimated by:
$
h_f = \mathbf{s}\cdot\mathbf{a},\quad
k_f = \mathbf{s}\cdot\mathbf{b},\quad
l_f = \mathbf{s}\cdot\mathbf{c}.
$
Let (h,k,l)=(\mathrm{round}(h_f),\mathrm{round}(k_f),\mathrm{round}(l_f)) and define the fractional residual:
$
\delta^2 = (h_f-h)^2 + (k_f-k)^2 + (l_f-l)^2.
$
A spot is indexed if \delta^2 \le \tau^2, where \tau is the configured tolerance.
For indexed spots, the reciprocal lattice point \mathbf{p} = h\mathbf{a}^*+k\mathbf{b}^*+l\mathbf{c}^* is used to compute \Delta_\mathrm{Ewald}(\mathbf{p}) (stored as a diagnostic and later used in profile-radius estimation).
5. FFT indexing (unknown unit cell)
FFT indexing follows a classical approach: detect dominant periodicities by projecting reciprocal-space points onto many directions and Fourier transforming the resulting 1D histograms.
5.1 Directional projections and histograms
Choose a set of unit vectors \{\mathbf{u}_d\} on a half-sphere (a near-uniform distribution generated via a golden-angle construction). For each direction d, form a histogram in the scalar projection:
$
t_{id} = \left|\mathbf{u}_d\cdot \mathbf{s}_i\right|.
$
Bin width is chosen approximately as:
$
\Delta t \approx \frac{1}{2 L_\mathrm{max}},
$
where L_\mathrm{max} is the maximum expected real-space unit-cell edge (Å). The histogram extent is tied to the maximum q used (set by a high-resolution cutoff for indexing).
5.2 FFT peak picking and candidate vectors
For each direction, the FFT magnitude spectrum is computed; peaks correspond to periodicities along \mathbf{u}_d. Each direction yields a candidate real-space length L with maximum spectral magnitude (subject to L\ge L_\mathrm{min}).
Candidate vectors are \mathbf{v}_d = L_d\,\mathbf{u}_d.
A collinearity filter removes nearly parallel vectors (e.g. within 5°) and attempts to resolve harmonic ambiguity: shorter “fundamental” vectors may be preferred over longer harmonics if their peak magnitude is sufficiently strong relative to the dominant peak.
5.3 Lattice reduction and cell candidates
Triples of candidate vectors are combined to form candidate bases (\mathbf{A},\mathbf{B},\mathbf{C}). A simple reduction is applied:
$
\mathbf{B} \leftarrow \mathbf{B} - \mathrm{round}!\left(\frac{\mathbf{B}\cdot\mathbf{A}}{\mathbf{A}\cdot\mathbf{A}}\right)\mathbf{A},
$
$
\mathbf{C} \leftarrow \mathbf{C} - \mathrm{round}!\left(\frac{\mathbf{C}\cdot\mathbf{A}}{\mathbf{A}\cdot\mathbf{A}}\right)\mathbf{A}
- \mathrm{round}!\left(\frac{\mathbf{C}\cdot\mathbf{B}}{\mathbf{B}\cdot\mathbf{B}}\right)\mathbf{B}. $
Candidates are filtered by allowed length and angle ranges.
5.4 Robust refinement and best-cell selection
Candidate bases are refined against observed spots using an iterative inlier‑focused least‑squares procedure (trimmed/contracting threshold). The output cell is chosen to:
- maximize the number of indexed spots under the tolerance
\tau, and - break ties by a refined score (smaller residual threshold/score is preferred).
An optional reference unit cell (if supplied) restricts acceptance to cells within a relative distance tolerance in edge lengths (permutation-invariant).
6. Bravais lattice / centering inference (“lattice search”)
If the space group is supplied by the user, its lattice constraints are assumed for refinement and subsequent processing.
If not, Jungfraujoch attempts to infer the most plausible Bravais lattice type from the metric tensor after Niggli reduction:
- Niggli reduction is performed to obtain a reduced cell in
G^6representation (Gruber vector). - The reduced cell is compared against a list of Niggli classes corresponding to Bravais lattices and centerings.
- The highest-symmetry class that matches within tolerances is selected (relative metric tolerance and angular tolerance).
The output includes:
- a conventional cell,
- crystal system (triclinic, monoclinic, …),
- centering symbol
P, A, B, C, I, F, R.
This stage provides centering information used for systematic absences in prediction (§7.3) and for reporting.
Note. In ambiguous or special cases, forcing space group to P1 (no symmetry assumptions) is recommended.
7. Geometry and lattice refinement
Refinement adjusts experimental geometry and crystal parameters to minimize discrepancies between observed spot reciprocal vectors and those predicted by a lattice model with integer indices.
7.1 Parameterization
The refinement jointly optimizes, depending on mode and constraints:
- beam center
(x_\mathrm{beam}, y_\mathrm{beam}), - detector distance
D, - detector tilt angles (two-angle model; third rotation often held at 0),
- rotation axis direction (for rotation datasets),
- crystal orientation (a global rotation),
- unit-cell parameters, with constraints determined by inferred crystal system.
For higher symmetries, constraints are enforced, e.g.
- cubic:
a=b=c,\ \alpha=\beta=\gamma=90^\circ, - tetragonal:
a=b, - hexagonal:
a=b,\ \gamma=120^\circ, - monoclinic (unique axis
b):\alpha=\gamma=90^\circ,\betarefined.
7.2 Residuals and objective
For each indexed spot assigned integer (h,k,l), compute:
- observed reciprocal vector
\mathbf{s}_\mathrm{obs}from its detector position and current geometry, - predicted reciprocal vector
\mathbf{s}_\mathrm{pred}(h,k,l;\ \text{lattice params}).
Residual is: $ \mathbf{r} = \mathbf{s}\mathrm{obs} - \mathbf{s}\mathrm{pred}. $
A non-linear least squares solver minimizes \sum \|\mathbf{r}\|^2 over all selected inlier spots.
7.3 Rotation datasets: bringing observations to a common reference frame
For oscillation/rotation data, each image corresponds to a rotation angle \phi about an axis \mathbf{m}_2. Observed reciprocal vectors are rotated “back to start” so that all images are refined in a single reference crystal frame:
$
\mathbf{s}\mathrm{obs,ref} = R(\phi),\mathbf{s}\mathrm{obs},
$
with R(\phi) constructed from the axis-angle representation of the goniometer model.
7.4 Multi-stage tightening of inlier tolerance
Refinement is performed in stages with decreasing acceptance tolerance for including reflections (e.g. from coarse to fine), which stabilizes convergence when starting from imperfect indexing and approximate geometry.
9. Bragg integration
Jungfraujoch predicts reflection positions for integration by enumerating Miller indices within a resolution cutoff and accepting those that satisfy a diffraction condition model.
8.1 Enumerating reciprocal lattice points
For a maximum resolution d_\mathrm{min}, accept (h,k,l) such that:
$
\lVert \mathbf{p}(h,k,l)\rVert^2 = \lVert h\mathbf{a}^* + k\mathbf{b}^* + l\mathbf{c}^*\rVert^2 \le \left(\frac{1}{d_\mathrm{min}}\right)^2.
$
8.2 Still prediction (excitation-error cutoff)
For still images, the diffracting condition is approximated by an excitation-error cutoff:
$
\left|\Delta_\mathrm{Ewald}(\mathbf{p})\right| \le \Delta_\mathrm{cut}.
$
Accepted reflections are projected to the detector by intersecting the diffracted direction \mathbf{S}=\mathbf{S}_0+\mathbf{p} with the detector plane, using the current geometry.
8.3 Rotation prediction (Laue equation + partiality model)
For rotation/oscillation datasets, Jungfraujoch solves for rotation angles \phi where the rotated reciprocal lattice point satisfies the Ewald-sphere condition. In an XDS-like notation, define:
- rotation axis unit vector
\mathbf{m}_2, \mathbf{S}_0incident vector,\mathbf{S}(\phi)=\mathbf{S}_0+\mathbf{p}(\phi).
A key quantity is: $ \zeta = \left|\mathbf{m}_2\cdot \mathbf{e}_1\right|,\quad \mathbf{e}_1 = \frac{\mathbf{S}\times \mathbf{S}_0}{\lVert \mathbf{S}\times \mathbf{S}_0\rVert}, $ which also appears in XDS as the Lorentz component linked to the rotation axis.
A Gaussian mosaicity model yields a partiality fraction over an oscillation width \Delta\phi:
P(\phi;\sigma_M,\zeta,\Delta\phi) = \frac{1}{2}\left[\mathrm{erf}\!\left(\frac{\phi+\Delta\phi/2}{\sqrt{2}\,\sigma_M/\zeta}\right) - \mathrm{erf}\!\left(\frac{\phi-\Delta\phi/2}{\sqrt{2}\,\sigma_M/\zeta}\right)\right],
with mosaicity \sigma_M in radians.
Reflections are predicted if they meet minimum \zeta and mosaicity-window criteria, and their predicted detector coordinates fall on the active detector area.
8.4 Systematic absences (centering)
Systematic absences are applied at least at the centering level (prior to full space-group symmetry). For centering symbol C:
I: absent ifh+k+lodd,A: absent ifk+lodd,B: absent ifh+lodd,C: absent ifh+kodd,F: absent if any ofh+k, h+l, k+lis odd,R: absent if(-h+k+l)\bmod 3 \ne 0,P: no centering absences.
9. 2D Bragg integration (profile fitting over a three-ring ROI)
Jungfraujoch integrates each predicted reflection in the detector plane over a CrystFEL-inspired “three-ring” region of interest (§9.1). The default extraction is profile fitting (Kabsch; §9.3), which weights each pixel by a fitted spot profile and so recovers weak reflections far better than plain summation; plain box summation (§9.2) is retained as the seed for the profile and as a fallback. Both methods share the same ROI and background model, and emit the same per-reflection (I,\sigma,\text{partiality},d), so scaling, the rotation combine (§10.6) and merging consume either unchanged.
9.1 Regions of interest
For each predicted reflection at (x_p,y_p), define three radii:
r_1: inner signal radius,r_2: inner background radius,r_3: outer background radius.
Pixels are classified by their squared distance r^2=(x-x_p)^2+(y-y_p)^2:
- signal region:
r^2 < r_1^2, - background annulus:
r_2^2 \le r^2 < r_3^2.
Invalid pixels (masked/bad/saturated) are excluded from both sums.
9.2 Box summation (seed and fallback)
Let:
S = \sum I(x,y)over signal pixels,n_S= number of valid signal pixels,B = \sum I(x,y)over background pixels,n_B= number of valid background pixels.
Background per pixel and integrated intensity:
$
\hat{b} = \frac{B}{n_B},\qquad
\hat{I} = S - n_S \hat{b},
$
with a Poisson-like uncertainty \sigma(\hat{I})\approx\sqrt{S} (floored at 1). A reflection is accepted as “observed” only if all signal pixels were valid and n_B exceeds a minimum. This box sum is the classical estimator; it is used directly with --integrator boxsum, and otherwise seeds the profile fit below.
The background mean is computed with a single high-outlier reject (drop ring pixels above \hat{b}+3\sqrt{\hat{b}}, recompute): a bandwidth-streaked high-resolution spot or a close neighbour can leak into the ring and bias the mean high, over-subtracting and driving weak high-resolution intensities negative. A clean Poisson background is essentially unchanged by the cut.
9.3 Profile-fitted extraction (default)
A fixed signal disk captures a width-dependent fraction of each spot, which puts a multiplicative floor on the per-observation precision of strong reflections and weights weak reflections poorly. Profile fitting removes this by extracting each intensity against a fitted spot shape, without needing reference intensities. Per frame:
- Seed. Box-sum every reflection (§9.2) to get a rough intensity and observed centroid, and select strong spots (significance
\ge 5). - Build the profile. From the strong spots, form a profile per resolution shell: an isotropic Gaussian of the measured second moment (the default), or an empirical averaged grid. The width is shell-dependent because spot size grows with resolution; the intrinsic spot is essentially round in the detector plane (per-detector-region and crystal-anisotropy profiles were evaluated and add nothing — the real crystal anisotropy lives in the discarded rocking direction). When a finite energy bandwidth is set, however, it smears each spot radially by
\sigma_\mathrm{bw}=\text{bandwidth}\cdot R_\mathrm{px}(distance from the beam centre, large at high resolution), turning high-resolution spots into radial streaks. There the profile is elongated only along the radial direction per reflection (\sigma^2_\mathrm{radial}=\sigma^2_\mathrm{intrinsic}+\sigma_\mathrm{bw}^2, tangential unchanged) on a grid grown to hold the streak, capturing it without the tangential background an isotropic widening would add. - Fit (Kabsch). With profile
P, backgroundBand the shell variance model, the intensity and its uncertainty are $ I = \frac{\sum P,(c-B)/v}{\sum P^2/v},\qquad \sigma = \sqrt{\frac{1}{\sum P^2/v}},\qquad v = B + \max(I,0),P, $ wherecis the pixel value and the de-biased variancev(background plus model signal, rather than the down-fluctuating observed count) is iterated. The rotation/excitation partiality is carried exactly as in the box-sum path.
The integrator is selected by --integrator boxsum|gaussian|empirical (default gaussian).
9.4 Lorentz–polarization factor handling
For integrated reflections, polarization correction can be applied as a multiplicative correction to the reflection scale via the geometry-based polarization term (§2.2). A Lorentz-like factor is carried as rlp in predictions, and used during scaling/merging (§10).
10. Scaling and merging
After per-image integration, Jungfraujoch scales observations and merges them into unique reflections. The design is intentionally compatible with XDS/XSCALE concepts, while supporting both still and rotation partiality models.
10.1 Observation model
For an observation j of a unique reflection h on image (or image group) i, the predicted measured intensity is modeled as:
$
I_{ij} \approx G_i , L_{ij}, P_{ij}, I_h,
$
where:
G_iis the image scale factor,L_{ij}is a Lorentz-like / geometry factor (stored asrlpor derived),P_{ij}is a partiality term (model-dependent),I_his the merged (true) intensity parameter for that unique reflection.
A least-squares objective is minimized:
$
\sum_{ij} \left(\frac{I_{ij}^{\mathrm{pred}} - I_{ij}^{\mathrm{obs}}}{\sigma_{ij}}\right)^2
$
with regularization on G_i and optional smoothness constraints (particularly meaningful for rotation series).
10.2 Partiality models available
Jungfraujoch supports several partiality choices:
-
Rotation partiality (XDS-like; see §8.3): $ P_{ij} = \frac{1}{2}\left[ \mathrm{erf}!\left(\frac{\Delta\phi_{ij}+\Delta\phi/2}{\sqrt{2},\sigma_{M,i}/\zeta_{ij}}\right) - \mathrm{erf}!\left(\frac{\Delta\phi_{ij}-\Delta\phi/2}{\sqrt{2},\sigma_{M,i}/\zeta_{ij}}\right) \right]. $ Mosaicity
\sigma_{M,i}can be refined per image group with bounds. -
Still partiality (excitation-error proxy): $ P_{ij} = \exp!\left(-\frac{\Delta_\mathrm{Ewald}^2}{R_i^2}\right), $ where
R_i^2is a refined width parameter (bounded). -
Unity:
P_{ij}=1. -
Fixed: use the per-reflection partiality carried from prediction.
Reflections below a minimum partiality can be rejected from merging to avoid unstable corrections.
10.3 Regularization and smoothness
To stabilize scale determination, a weak prior G_i\approx 1 is used. For rotation datasets, optional smoothness encourages slowly varying scales and mosaicity:
$
\log G_{i-1} - 2\log G_i + \log G_{i+1} \approx 0,
$
(and similarly for mosaicity), reflecting the expectation of gradual changes during a rotation scan.
10.4 Merging estimator
After refinement, corrected observations are formed: $ I^{\mathrm{corr}}{ij} = \frac{I^{\mathrm{obs}}{ij}}{G_i L_{ij} P_{ij}},\qquad \sigma^{\mathrm{corr}}{ij} = \frac{\sigma^{\mathrm{obs}}{ij}}{G_i L_{ij} P_{ij}}. $
Unique intensities are merged by inverse-variance weighted mean: $ I_h = \frac{\sum_j w_j I^{\mathrm{corr}}{ij}}{\sum_j w_j},\qquad w_j = \frac{1}{(\sigma^{\mathrm{corr}}{ij})^2}. $
An internal-consistency term can inflate uncertainties when multiple observations are present, in the spirit of XSCALE.
10.5 Merging statistics
Per-shell and overall merging statistics are computed on corrected intensities, including:
- number of observations,
- number of unique reflections,
- mean
I/\sigma(I), - an R$_\mathrm{meas}$-like quantity derived from within‑HKL deviations (shell-binned).
Completeness requires enumeration of possible reflections given a unit cell and symmetry; where this is not fully available, completeness may be reported as 0 or omitted.
10.6 Rotation datasets: combining partials into fulls (3D integration)
In a rotation scan a reflection is recorded as a series of partials spread across the frames its rocking curve crosses. Merging those partials directly would force the merge error model to absorb the rocking-curve slicing as if it were measurement noise, capping the achievable I/\sigma. For rotation data Jungfraujoch instead combines each reflection's partials into a single full intensity first, then scales and merges the fulls — a 3D integration over the rocking curve.
The combine groups each reflection's partials into rocking events (contiguous runs of frames) and reduces each event to one full:
- De-biased weighted sum. Partials are combined by inverse-variance weighting, where each partial's variance is its background-noise component plus the model signal shared across the event (Kabsch profile-fit form). Using the shared model signal rather than the individual down-fluctuating intensity stops weak partials from being over-weighted, which would otherwise inflate the merged error model. The weights depend on the full, so the estimate is iterated.
- Captured fraction. The partiality summed over the event,
f=\min(1,\sum_j p_j), measures how completely the rocking curve was sampled; it replaces a per-partial minimum-partiality cut, because an event seen over only a few percent of its curve is unreliable however many frames it spans. - Capture-aware uncertainty. A full captured incompletely (
f<1) is extrapolated and biased high. The unobserved fraction is charged as an extra systematic uncertainty,\sigma^2 \leftarrow \sigma^2 + \big(c\,(1-f)\,I\big)^2, so the merge down-weights these extrapolated fulls and the error model treats their scatter as expected. It is enabled by default for the rotation path.
The fulls are then re-scaled in the XDS sense — a per-image scale refit directly on the complete reflections under the unity partiality model — and merged (§10.4). Because every merged observation is now a counting-statistics-limited full rather than a partiality-divided slice, the error model reaches a far higher asymptotic I/\sigma.
11. Mosaicity and “profile radius” monitoring
11.1 Profile radius (intrinsic excitation-error width)
The “profile radius” is the intrinsic angular width of a reflection — crystal mosaicity plus beam divergence — estimated from the spread of \Delta_\mathrm{Ewald} over indexed spots,
$
R \approx \sqrt{\tfrac{1}{N}\sum_i \Delta_{\mathrm{Ewald},i}^2}.
$
When the beam has a finite energy bandwidth, that bandwidth smears each reflection radially by \sigma_\mathrm{bw}\approx \mathrm{bandwidth}\cdot\lambda/2d^2 (largest at high resolution), which also broadens the measured \Delta_\mathrm{Ewald} spread. Since prediction re-applies the bandwidth term per reflection (§8.2), this contribution is deconvolved from the estimate — R^2 = \langle\Delta_\mathrm{Ewald}^2\rangle - \langle\sigma_\mathrm{bw}^2\rangle — so that R is the intrinsic width and bandwidth is not double-counted. Still predictions use an excitation-error cutoff proportional to R.
11.2 Mosaicity from rotation data
For rotation data the mosaicity \sigma_M is estimated by maximum likelihood from the rocking offsets \tau of indexed spots, using the XDS reflection-fraction model R(\tau;\sigma_M/\zeta) (Kabsch 2010): each spot's exact Bragg angle is located near its frame, \zeta (the rotation-axis Lorentz component) is computed, and \sigma_M is chosen to maximize \sum_i \log R(\tau_i;\sigma_M/\zeta_i).
The \phi search window for the Bragg angle is set wider than the oscillation, so that reflections recorded at large rocking offset are included. These tail reflections carry most of the information about the mosaic width; a window limited to the oscillation range would truncate the \tau distribution and bias \sigma_M low.
The estimated mosaicity feeds the rotation prediction (how many frames each reflection spans, §8.3) and the rotation partiality (§10.2). It is held fixed during scaling: in the per-image scale fit the mosaicity is degenerate with the scale G (both rescale the predicted intensity), so refining it there is unstable. A correct mosaicity matters because it controls both how much of each rocking curve is captured and the partiality used to form fulls (§10.6); too small a value truncates the captured curve and over-peaks the partiality, degrading the combined fulls.
12. Wilson statistics and French–Wilson treatment
12.1 Per-shell ⟨I/σ(I)⟩
For monitoring integration quality, Jungfraujoch reports mean \langle I/\sigma(I)\rangle in a fixed number of resolution shells. Shelling is performed in 1/d^2 space (typical of crystallographic practice).
12.2 Wilson plot (B-factor proxy)
A Wilson-type analysis is computed by binning intensities by resolution and fitting:
$
\langle I\rangle \propto \exp!\left(-\frac{B}{2}\frac{1}{d^2}\right),
$
i.e.
$
\log \langle I\rangle = \mathrm{const} - \frac{B}{2}\left(\frac{1}{d^2}\right).
$
A linear regression of \log\langle I\rangle vs 1/d^2 provides an estimate of B, subject to basic quality checks (e.g. R^2 threshold).
12.3 French–Wilson (posterior expectation of I and |F|)
To mitigate negative intensities and obtain physically meaningful amplitudes, Jungfraujoch implements a French–Wilson style Bayesian treatment using per-shell mean intensity as a prior scale.
For each merged observation I_\mathrm{obs} with uncertainty \sigma, the posterior over true intensity I\ge 0 is:
$
p(I\mid I_\mathrm{obs}) \propto p(I),\exp!\left(-\frac{(I_\mathrm{obs}-I)^2}{2\sigma^2}\right),
$
with priors differing between acentric and centric cases (standard Wilson distributions).
Numerical quadrature over a scaled intensity variable is used to compute posterior moments:
\langle I\rangle,\langle |F|\rangle = \langle \sqrt{I}\rangle, and an amplitude uncertainty estimate via: $ \sigma_F \approx \sqrt{\langle I\rangle - \langle |F|\rangle^2}. $
13. Practical notes and limitations
- Bragg integration is profile-fitted by default (per-shell Gaussian profile, Kabsch extraction; §9.3), with plain box summation available as a fallback (
--integrator boxsum). The profiles are built per frame from that frame's strong spots, which suits fast-feedback and serial/streaming use; a profile shared across many frames (as in full offline workflows) is not currently formed. - Space-group symmetry beyond centering absences is not necessarily enforced during prediction/integration unless the space group is supplied and used downstream.
- Resolution masking and ice rings are controllable; including ice-ring spots in indexing can improve robustness for some samples but may bias refinement in others.
- Rotation vs still modes differ substantially in prediction and scaling because partiality is angle-driven in rotation data and excitation-error-driven in still data.