Jungfraujoch/docs/CPU_DATA_ANALYSIS.md

# CPU-side crystallographic data analysis (Jungfraujoch)

This document describes the crystallographic algorithms implemented in Jungfraujoch for **CPU**- and **GPU**-side real‑time and near‑real‑time data analysis.

**Scope.** The pipeline covered here comprises:

1. geometry mapping and corrections,
2. azimuthal integration (powder/radial profiles),
3. Bragg spot finding (strong pixels → connected components → spot descriptors),
4. indexing (still and rotation modes),
5. Bravais lattice / centering inference,
6. geometry and lattice refinement,
7. reflection prediction (still and rotation),
8. 2D summation integration,
9. scaling and merging,
10. auxiliary statistics (Wilson plot, ⟨I/σ(I)⟩, French–Wilson).

## References

The methods are inspired by solutions implemented in:

- W. Kabsch, “XDS”, *Acta Cryst.* **D66** (2010), 125–132 and related XDS papers (rotation geometry, partiality, scaling concepts).
- W. Kabsch, “Integration, scaling, space-group assignment and post-refinement”, *Acta Cryst.* **D66** (2010), 133–144 (mosaicity/partiality likelihood treatment; notation such as ζ and rotation factors).
- T. A. White et al., CrystFEL method papers (spot finding, three‑ring integration, serial/still diffraction processing concepts).
- J. Kieffer & J. P. Wright, "PyFAI: a Python library for high performance azimuthal integration on GPU", *Powder Diffraction* **28** (2013), S339-S350 (detector geometry definition, azimuthal integration)
- H. Powell, "The Rossmann Fourier autoindexing algorithm in MOSFLM", *Acta Cryst.* **D55** (1999), 1690-1695 (FFT indexing)
(list is not exhaustive)

## 1. Geometry, reciprocal-space mapping, and basic quantities

### 1.1 Coordinate conventions

For a pixel coordinate $(x,y)$ (in pixels), Jungfraujoch converts to a laboratory direction vector via:

1. shift by direct-beam position $(x_\mathrm{beam}, y_\mathrm{beam})$,
2. scale by pixel size $p$ (mm),
3. set detector distance $D$ (mm),
4. apply detector orientation rotation $R_\mathrm{det}$ (PyFAI-like parameterization).

The unnormalized detector coordinate (mm) is:
$
\mathbf{r}_\mathrm{det}(x,y) =
\begin{pmatrix}
(x-x_\mathrm{beam})p\\
(y-y_\mathrm{beam})p\\
D
\end{pmatrix}.
$

The lab-frame vector is:
$
\mathbf{r}_\mathrm{lab} = R_\mathrm{det}\,\mathbf{r}_\mathrm{det}.
$

Let the incident wavevector magnitude be $k = 1/\lambda$ in Å$^{-1}$, and define:
$
\mathbf{S}_0 = (0,0,k).
$

The **reciprocal-space scattering vector** associated with pixel $(x,y)$ is:
$
\mathbf{s}(x,y) = k\,\frac{\mathbf{r}_\mathrm{lab}}{\lVert \mathbf{r}_\mathrm{lab}\rVert} - \mathbf{S}_0.
$

This $\mathbf{s}$ is the fundamental quantity used for spot finding (resolution filters), indexing, and refinement.

### 1.2 Two-theta, azimuth, resolution and $q$

The scattering angle $2\theta$ is computed from $\mathbf{r}_\mathrm{lab}$ via:
$
2\theta = \arctan\!\left(\frac{\sqrt{x_\mathrm{lab}^2 + y_\mathrm{lab}^2}}{z_\mathrm{lab}}\right).
$

Resolution (Å) at a pixel is:
$
d = \frac{\lambda}{2\sin(\theta)} = \frac{\lambda}{2\sin(2\theta/2)}.
$

The magnitude $q = 2\pi/d$ is used for radial binning and ice-ring handling.

### 1.3 Distance from the Ewald sphere

For a reciprocal lattice point $\mathbf{p}$ (Å$^{-1}$), define:
$
\Delta_\mathrm{Ewald}(\mathbf{p}) = \lVert \mathbf{p} + \mathbf{S}_0\rVert - k.
$
Jungfraujoch uses $|\Delta_\mathrm{Ewald}|$ as an operational proxy for excitation error. This appears in:
- still prediction (accept if $|\Delta_\mathrm{Ewald}|\le \Delta_\mathrm{cut}$),
- profile radius estimation (see §7.1),
- still partiality option in scaling/merging (§9.3).

---

## 2. Azimuthal integration (radial profiles)

Azimuthal integration produces a 1D radial profile $I(q)$ or $I(d)$ by histogramming pixels into radial bins. Pixels are **not split** across bins; each pixel contributes wholly to a single bin.

### 2.1 Histogram estimator

Let bin index $b(x,y)\in\{0,\dots,B-1\}$ be precomputed from $q(x,y)$ (or equivalently from $d(x,y)$). For each bin $b$:

- accumulate corrected intensity:
  $
  S_b = \sum_{(x,y):\,b(x,y)=b} I(x,y)\,C(x,y),
  $
- and count:
  $
  N_b = \#\{(x,y):\,b(x,y)=b \text{ and pixel is valid}\}.
  $

A simple mean profile is then $ \bar{I}_b = S_b / N_b$ (when $N_b>0$). Invalid pixels (masked, saturated, detector error codes) are excluded.

### 2.2 Corrections applied

Two standard corrections are available:

**(i) Solid angle / geometric correction.** A commonly used approximation for flat detectors gives a $\cos^3(2\theta)$ factor:
$
C_\Omega(2\theta) = \cos^3(2\theta).
$

**(ii) Polarization correction.** With polarization coefficient $P$ (beamline dependent) and azimuth $\phi$:
$
C_\mathrm{pol}(2\theta,\phi) =
\frac{1}{2}\left(1+\cos^2(2\theta) - P\cos(2\phi)\left(1-\cos^2(2\theta)\right)\right),
$
applied as a divisor to intensities (i.e. scale by $1/C_\mathrm{pol}$) when enabled.

### 2.3 Background estimate for profiles

A background estimate is derived from the integrated profile using the azimuthal integration settings (details depend on the configured estimator). This background is used for monitoring and diagnostics; it is **not** the same as local Bragg-spot background used in summation integration (§8).

---

## 3. Spot finding (strong pixels → Bragg spots)

Spot finding is a two-stage process:

1. **Strong-pixel selection** using intensity and/or local signal-to-noise criteria.
2. **Connected-component labeling (CCL)** to group strong pixels into candidate spots, followed by spot-level filtering and feature extraction.

### 3.1 Strong-pixel detection by local statistics

For each pixel $i$ with value $v_i$, consider a square window (nominally $31\times 31$ pixels) around it. Let the window contain $n$ valid pixels (excluding masked/bad/saturated), and define:
$
\Sigma = \sum v,\qquad \Sigma_2 = \sum v^2.
$

To avoid biasing the local statistics by the test pixel itself, Jungfraujoch evaluates the pixel against the window with the pixel removed:
$
\Sigma' = \Sigma - v_i,\quad \Sigma_2' = \Sigma_2 - v_i^2,\quad n' = n-1.
$

A variance-like quantity proportional to $n'^2$ is formed:
$
V = n'\Sigma_2' - (\Sigma')^2,
$
and the deviation-from-mean quantity:
$
\Delta = v_i n' - \Sigma'.
$

A pixel is considered strong if:
- it is above a photon/count threshold, and
- $\Delta>0$, and
- the squared deviation exceeds a scaled variance:
  $
  \Delta^2 > V\cdot T^2,
  $
  where $T$ is the configured signal-to-noise threshold.

This is equivalent to a local z-score criterion but implemented in integer arithmetic to be robust and fast.

Special cases:
- saturated pixels can be forced to “strong” (useful for detecting overloaded Bragg spots),
- invalid pixels are never strong.

### 3.2 Resolution and ice-ring handling

Spot finding can be restricted to a resolution range $[d_\mathrm{high}, d_\mathrm{low}]$ by masking pixels outside the range. Optionally, pixels in identified ice-ring regions can be tagged so that subsequent indexing/refinement may include or exclude them (see §4 and §6).

A further optional safeguard removes isolated high-resolution “spur” spots by detecting large gaps in $1/d$ (or $q$) space and discarding spots beyond the gap. This is intended for macromolecular diffraction where edge-of-detector backgrounds can be extremely low.

### 3.3 Connected-component labeling (CCL)

Strong pixels are grouped into connected components (adjacent strong pixels) using a CCL algorithm. Each component yields a candidate spot with:

- centroid $(x,y)$ (often intensity-weighted),
- pixel count (spot size),
- integrated spot intensity proxy (sum of pixel values),
- resolution $d$ at the centroid (or mean over pixels),
- and quality flags (e.g. ice-ring classification).

Spot-level filters include minimum/maximum pixel count and resolution limits.

---

## 4. Indexing overview

Indexing maps observed reciprocal-space vectors $\mathbf{s}_i$ to a lattice such that:
$
\mathbf{s}_i \approx h_i\mathbf{a}^* + k_i\mathbf{b}^* + l_i\mathbf{c}^*,
$
with integer $(h_i,k_i,l_i)$.

Jungfraujoch supports two complementary indexing strategies:

1. **FFT-based indexing** (Rossmann-type): does not require an a priori unit cell; suitable for unknown samples.
2. **Fast-feedback indexing** (TORO-like): requires an approximate unit cell; optimized for speed and feedback.

Both feed into a common robust refinement/selection stage which maximizes the number of inliers under an indexing tolerance.

### 4.1 Indexed-spot decision (inlier test)

Given a trial lattice with direct basis vectors $\mathbf{a},\mathbf{b},\mathbf{c}$ (used here as reciprocal-space dot-test vectors), fractional indices are estimated by:
$
h_f = \mathbf{s}\cdot\mathbf{a},\quad
k_f = \mathbf{s}\cdot\mathbf{b},\quad
l_f = \mathbf{s}\cdot\mathbf{c}.
$
Let $(h,k,l)=(\mathrm{round}(h_f),\mathrm{round}(k_f),\mathrm{round}(l_f))$ and define the fractional residual:
$
\delta^2 = (h_f-h)^2 + (k_f-k)^2 + (l_f-l)^2.
$
A spot is indexed if $\delta^2 \le \tau^2$, where $\tau$ is the configured tolerance.

For indexed spots, the reciprocal lattice point $\mathbf{p} = h\mathbf{a}^*+k\mathbf{b}^*+l\mathbf{c}^*$ is used to compute $\Delta_\mathrm{Ewald}(\mathbf{p})$ (stored as a diagnostic and later used in profile-radius estimation).

---

## 5. FFT indexing (unknown unit cell)

FFT indexing follows a classical approach: detect dominant periodicities by projecting reciprocal-space points onto many directions and Fourier transforming the resulting 1D histograms.

### 5.1 Directional projections and histograms

Choose a set of unit vectors $\{\mathbf{u}_d\}$ on a half-sphere (a near-uniform distribution generated via a golden-angle construction). For each direction $d$, form a histogram in the scalar projection:
$
t_{id} = \left|\mathbf{u}_d\cdot \mathbf{s}_i\right|.
$

Bin width is chosen approximately as:
$
\Delta t \approx \frac{1}{2 L_\mathrm{max}},
$
where $L_\mathrm{max}$ is the maximum expected real-space unit-cell edge (Å). The histogram extent is tied to the maximum $q$ used (set by a high-resolution cutoff for indexing).

### 5.2 FFT peak picking and candidate vectors

For each direction, the FFT magnitude spectrum is computed; peaks correspond to periodicities along $\mathbf{u}_d$. Each direction yields a candidate real-space length $L$ with maximum spectral magnitude (subject to $L\ge L_\mathrm{min}$).

Candidate vectors are $\mathbf{v}_d = L_d\,\mathbf{u}_d$.

A collinearity filter removes nearly parallel vectors (e.g. within 5°) and attempts to resolve harmonic ambiguity: shorter “fundamental” vectors may be preferred over longer harmonics if their peak magnitude is sufficiently strong relative to the dominant peak.

### 5.3 Lattice reduction and cell candidates

Triples of candidate vectors are combined to form candidate bases $(\mathbf{A},\mathbf{B},\mathbf{C})$. A simple reduction is applied:
$
\mathbf{B} \leftarrow \mathbf{B} - \mathrm{round}\!\left(\frac{\mathbf{B}\cdot\mathbf{A}}{\mathbf{A}\cdot\mathbf{A}}\right)\mathbf{A},
$
$
\mathbf{C} \leftarrow \mathbf{C} - \mathrm{round}\!\left(\frac{\mathbf{C}\cdot\mathbf{A}}{\mathbf{A}\cdot\mathbf{A}}\right)\mathbf{A}
- \mathrm{round}\!\left(\frac{\mathbf{C}\cdot\mathbf{B}}{\mathbf{B}\cdot\mathbf{B}}\right)\mathbf{B}.
$

Candidates are filtered by allowed length and angle ranges.

### 5.4 Robust refinement and best-cell selection

Candidate bases are refined against observed spots using an iterative inlier‑focused least‑squares procedure (trimmed/contracting threshold). The output cell is chosen to:
1. maximize the number of indexed spots under the tolerance $\tau$, and
2. break ties by a refined score (smaller residual threshold/score is preferred).

An optional reference unit cell (if supplied) restricts acceptance to cells within a relative distance tolerance in edge lengths (permutation-invariant).

---

## 6. Bravais lattice / centering inference (“lattice search”)

If the space group is supplied by the user, its lattice constraints are assumed for refinement and subsequent processing.

If not, Jungfraujoch attempts to infer the most plausible Bravais lattice type from the metric tensor after Niggli reduction:

1. **Niggli reduction** is performed to obtain a reduced cell in $G^6$ representation (Gruber vector).
2. The reduced cell is compared against a list of Niggli classes corresponding to Bravais lattices and centerings.
3. The highest-symmetry class that matches within tolerances is selected (relative metric tolerance and angular tolerance).

The output includes:
- a conventional cell,
- crystal system (triclinic, monoclinic, …),
- centering symbol $P, A, B, C, I, F, R$.

This stage provides centering information used for systematic absences in prediction (§7.3) and for reporting.

**Note.** In ambiguous or special cases, forcing space group to $P1$ (no symmetry assumptions) is recommended.

---

## 7. Geometry and lattice refinement

Refinement adjusts experimental geometry and crystal parameters to minimize discrepancies between observed spot reciprocal vectors and those predicted by a lattice model with integer indices.

### 7.1 Parameterization

The refinement jointly optimizes, depending on mode and constraints:

- beam center $(x_\mathrm{beam}, y_\mathrm{beam})$,
- detector distance $D$,
- detector tilt angles (two-angle model; third rotation often held at 0),
- rotation axis direction (for rotation datasets),
- crystal orientation (a global rotation),
- unit-cell parameters, with constraints determined by inferred crystal system.

For higher symmetries, constraints are enforced, e.g.
- cubic: $a=b=c,\ \alpha=\beta=\gamma=90^\circ$,
- tetragonal: $a=b$,
- hexagonal: $a=b,\ \gamma=120^\circ$,
- monoclinic (unique axis $b$): $\alpha=\gamma=90^\circ$, $\beta$ refined.

### 7.2 Residuals and objective

For each indexed spot assigned integer $(h,k,l)$, compute:

- observed reciprocal vector $\mathbf{s}_\mathrm{obs}$ from its detector position and current geometry,
- predicted reciprocal vector $\mathbf{s}_\mathrm{pred}(h,k,l;\ \text{lattice params})$.

Residual is:
$
\mathbf{r} = \mathbf{s}_\mathrm{obs} - \mathbf{s}_\mathrm{pred}.
$

A non-linear least squares solver minimizes $\sum \|\mathbf{r}\|^2$ over all selected inlier spots.

### 7.3 Rotation datasets: bringing observations to a common reference frame

For oscillation/rotation data, each image corresponds to a rotation angle $\phi$ about an axis $\mathbf{m}_2$. Observed reciprocal vectors are rotated “back to start” so that all images are refined in a single reference crystal frame:
$
\mathbf{s}_\mathrm{obs,ref} = R(\phi)\,\mathbf{s}_\mathrm{obs},
$
with $R(\phi)$ constructed from the axis-angle representation of the goniometer model.

### 7.4 Multi-stage tightening of inlier tolerance

Refinement is performed in stages with decreasing acceptance tolerance for including reflections (e.g. from coarse to fine), which stabilizes convergence when starting from imperfect indexing and approximate geometry.

---

## 8. Reflection prediction

Jungfraujoch predicts reflection positions for integration by enumerating Miller indices within a resolution cutoff and accepting those that satisfy a diffraction condition model.

### 8.1 Enumerating reciprocal lattice points

For a maximum resolution $d_\mathrm{min}$, accept $(h,k,l)$ such that:
$
\lVert \mathbf{p}(h,k,l)\rVert^2 = \lVert h\mathbf{a}^* + k\mathbf{b}^* + l\mathbf{c}^*\rVert^2 \le \left(\frac{1}{d_\mathrm{min}}\right)^2.
$

### 8.2 Still prediction (excitation-error cutoff)

For still images, the diffracting condition is approximated by an excitation-error cutoff:
$
\left|\Delta_\mathrm{Ewald}(\mathbf{p})\right| \le \Delta_\mathrm{cut}.
$
Accepted reflections are projected to the detector by intersecting the diffracted direction $\mathbf{S}=\mathbf{S}_0+\mathbf{p}$ with the detector plane, using the current geometry.

### 8.3 Rotation prediction (Laue equation + partiality model)

For rotation/oscillation datasets, Jungfraujoch solves for rotation angles $\phi$ where the rotated reciprocal lattice point satisfies the Ewald-sphere condition. In an XDS-like notation, define:

- rotation axis unit vector $\mathbf{m}_2$,
- $\mathbf{S}_0$ incident vector,
- $\mathbf{S}(\phi)=\mathbf{S}_0+\mathbf{p}(\phi)$.

A key quantity is:
$
\zeta = \left|\mathbf{m}_2\cdot \mathbf{e}_1\right|,\quad
\mathbf{e}_1 = \frac{\mathbf{S}\times \mathbf{S}_0}{\lVert \mathbf{S}\times \mathbf{S}_0\rVert},
$
which also appears in XDS as the Lorentz component linked to the rotation axis.

A Gaussian mosaicity model yields a partiality fraction over an oscillation width $\Delta\phi$:
$
P(\phi;\sigma_M,\zeta,\Delta\phi) = \frac{1}{2}\left[
\mathrm{erf}\!\left(\frac{\phi+\Delta\phi/2}{\sqrt{2}\,\sigma_M/\zeta}\right)
-
\mathrm{erf}\!\left(\frac{\phi-\Delta\phi/2}{\sqrt{2}\,\sigma_M/\zeta}\right)
\right],
$
with mosaicity $\sigma_M$ in radians.

Reflections are predicted if they meet minimum $\zeta$ and mosaicity-window criteria, and their predicted detector coordinates fall on the active detector area.

### 8.4 Systematic absences (centering)

Systematic absences are applied at least at the centering level (prior to full space-group symmetry). For centering symbol $C$:

- $I$: absent if $h+k+l$ odd,
- $A$: absent if $k+l$ odd,
- $B$: absent if $h+l$ odd,
- $C$: absent if $h+k$ odd,
- $F$: absent if any of $h+k, h+l, k+l$ is odd,
- $R$: absent if $(-h+k+l)\bmod 3 \ne 0$,
- $P$: no centering absences.

---

## 9. 2D summation integration (three-ring method)

Jungfraujoch integrates predicted reflections by **summation** (no profile fitting), using a CrystFEL-inspired “three-circle / three-ring” method in the detector plane.

### 9.1 Regions of interest

For each predicted reflection at $(x_p,y_p)$, define three radii:

- $r_1$: inner signal radius,
- $r_2$: inner background radius,
- $r_3$: outer background radius.

Pixels are classified by their squared distance $r^2=(x-x_p)^2+(y-y_p)^2$:

- **signal region:** $r^2 < r_1^2$,
- **background annulus:** $r_2^2 \le r^2 < r_3^2$.

Invalid pixels (masked/bad/saturated) are excluded from both sums.

### 9.2 Background subtraction and intensity estimate

Let:
- $S = \sum I(x,y)$ over signal pixels,
- $n_S$ = number of valid signal pixels,
- $B = \sum I(x,y)$ over background pixels,
- $n_B$ = number of valid background pixels.

Background per pixel:
$
\hat{b} = \frac{B}{n_B},
$
integrated intensity:
$
\hat{I} = S - n_S \hat{b}.
$

A reflection is accepted as “observed” only if all signal pixels were valid and $n_B$ exceeds a minimum (to avoid unstable background estimates).

### 9.3 Uncertainty model

A Poisson-like estimator is used for the raw summed counts:
$
\sigma(\hat{I}) \approx \sqrt{S},
$
with a minimum $\sigma\ge 1$ to avoid singular weights. (This is a pragmatic online estimate; more elaborate models may be applied downstream.)

### 9.4 Lorentz–polarization factor handling

For integrated reflections, polarization correction can be applied as a multiplicative correction to the reflection scale via the geometry-based polarization term (§2.2). A Lorentz-like factor is carried as `rlp` in predictions, and used during scaling/merging (§10).

---

## 10. Scaling and merging

After per-image integration, Jungfraujoch scales observations and merges them into unique reflections. The design is intentionally compatible with XDS/XSCALE concepts, while supporting both still and rotation partiality models.

### 10.1 Observation model

For an observation $j$ of a unique reflection $h$ on image (or image group) $i$, the predicted measured intensity is modeled as:
$
I_{ij} \approx G_i \, L_{ij}\, P_{ij}\, I_h,
$
where:

- $G_i$ is the image scale factor,
- $L_{ij}$ is a Lorentz-like / geometry factor (stored as `rlp` or derived),
- $P_{ij}$ is a partiality term (model-dependent),
- $I_h$ is the merged (true) intensity parameter for that unique reflection.

A least-squares objective is minimized:
$
\sum_{ij} \left(\frac{I_{ij}^{\mathrm{pred}} - I_{ij}^{\mathrm{obs}}}{\sigma_{ij}}\right)^2
$
with regularization on $G_i$ and optional smoothness constraints (particularly meaningful for rotation series).

### 10.2 Partiality models available

Jungfraujoch supports several partiality choices:

1. **Rotation partiality** (XDS-like; see §8.3):
   $
   P_{ij} = \frac{1}{2}\left[
   \mathrm{erf}\!\left(\frac{\Delta\phi_{ij}+\Delta\phi/2}{\sqrt{2}\,\sigma_{M,i}/\zeta_{ij}}\right)
    -
   \mathrm{erf}\!\left(\frac{\Delta\phi_{ij}-\Delta\phi/2}{\sqrt{2}\,\sigma_{M,i}/\zeta_{ij}}\right)
   \right].
   $
   Mosaicity $\sigma_{M,i}$ can be refined per image group with bounds.

2. **Still partiality** (excitation-error proxy):
   $
   P_{ij} = \exp\!\left(-\frac{\Delta_\mathrm{Ewald}^2}{R_i^2}\right),
   $
   where $R_i^2$ is a refined width parameter (bounded).

3. **Unity**: $P_{ij}=1$.

4. **Fixed**: use the per-reflection partiality carried from prediction.

Reflections below a minimum partiality can be rejected from merging to avoid unstable corrections.

### 10.3 Regularization and smoothness

To stabilize scale determination, a weak prior $G_i\approx 1$ is used. For rotation datasets, optional smoothness encourages slowly varying scales and mosaicity:
$
\log G_{i-1} - 2\log G_i + \log G_{i+1} \approx 0,
$
(and similarly for mosaicity), reflecting the expectation of gradual changes during a rotation scan.

### 10.4 Merging estimator

After refinement, corrected observations are formed:
$
I^{\mathrm{corr}}_{ij} = \frac{I^{\mathrm{obs}}_{ij}}{G_i L_{ij} P_{ij}},\qquad
\sigma^{\mathrm{corr}}_{ij} = \frac{\sigma^{\mathrm{obs}}_{ij}}{G_i L_{ij} P_{ij}}.
$

Unique intensities are merged by inverse-variance weighted mean:
$
I_h = \frac{\sum_j w_j I^{\mathrm{corr}}_{ij}}{\sum_j w_j},\qquad
w_j = \frac{1}{(\sigma^{\mathrm{corr}}_{ij})^2}.
$

An internal-consistency term can inflate uncertainties when multiple observations are present, in the spirit of XSCALE.

### 10.5 Merging statistics

Per-shell and overall merging statistics are computed on corrected intensities, including:
- number of observations,
- number of unique reflections,
- mean $I/\sigma(I)$,
- an R$_\mathrm{meas}$-like quantity derived from within‑HKL deviations (shell-binned).

Completeness requires enumeration of possible reflections given a unit cell and symmetry; where this is not fully available, completeness may be reported as 0 or omitted.

---

## 11. Mosaicity and “profile radius” monitoring

### 11.1 Profile radius (still excitation error width)

A simple scalar “profile radius” is estimated from indexed spots using the distribution of $\Delta_\mathrm{Ewald}$. Two estimators are available:

- standard deviation:
  $
  R \approx \sqrt{\frac{1}{N}\sum_i \Delta_{\mathrm{Ewald},i}^2},
  $
- robust MAD-based alternative (median absolute deviation), scaled by 1.4826.

Operationally, predictions for still data may use a cutoff proportional to this width (e.g. $\Delta_\mathrm{cut}\approx 2R$).

### 11.2 Mosaicity from rotation data (maximum likelihood)

For rotation data, Jungfraujoch can estimate mosaicity by maximizing a likelihood based on the XDS reflection fraction $R(\tau;\sigma_M/\zeta)$ as described by Kabsch (2010). In brief:

- compute angular deviations $\tau$ from predicted Bragg positions,
- compute $\zeta$ for each reflection,
- maximize $\sum \log R(\tau)$ over $\sigma_M$.

This yields a physically meaningful mosaicity estimate tied to the rotation partiality model.

---

## 12. Wilson statistics and French–Wilson treatment

### 12.1 Per-shell ⟨I/σ(I)⟩

For monitoring integration quality, Jungfraujoch reports mean $\langle I/\sigma(I)\rangle$ in a fixed number of resolution shells. Shelling is performed in $1/d^2$ space (typical of crystallographic practice).

### 12.2 Wilson plot (B-factor proxy)

A Wilson-type analysis is computed by binning intensities by resolution and fitting:
$
\langle I\rangle \propto \exp\!\left(-\frac{B}{2}\frac{1}{d^2}\right),
$
i.e.
$
\log \langle I\rangle = \mathrm{const} - \frac{B}{2}\left(\frac{1}{d^2}\right).
$
A linear regression of $\log\langle I\rangle$ vs $1/d^2$ provides an estimate of $B$, subject to basic quality checks (e.g. $R^2$ threshold).

### 12.3 French–Wilson (posterior expectation of I and |F|)

To mitigate negative intensities and obtain physically meaningful amplitudes, Jungfraujoch implements a French–Wilson style Bayesian treatment using per-shell mean intensity as a prior scale.

For each merged observation $I_\mathrm{obs}$ with uncertainty $\sigma$, the posterior over true intensity $I\ge 0$ is:
$
p(I\mid I_\mathrm{obs}) \propto p(I)\,\exp\!\left(-\frac{(I_\mathrm{obs}-I)^2}{2\sigma^2}\right),
$
with priors differing between acentric and centric cases (standard Wilson distributions).

Numerical quadrature over a scaled intensity variable is used to compute posterior moments:
- $\langle I\rangle$,
- $\langle |F|\rangle = \langle \sqrt{I}\rangle$,
  and an amplitude uncertainty estimate via:
  $
  \sigma_F \approx \sqrt{\langle I\rangle - \langle |F|\rangle^2}.
  $

---

## 13. Practical notes and limitations

- **No profile fitting** is currently performed for Bragg integration; all integration is summation-based (§9). This is appropriate for fast feedback and many serial/streaming use cases, but differs from full profile fitting workflows.
- **Space-group symmetry** beyond centering absences is not necessarily enforced during prediction/integration unless the space group is supplied and used downstream.
- **Resolution masking and ice rings** are controllable; including ice-ring spots in indexing can improve robustness for some samples but may bias refinement in others.
- **Rotation vs still modes** differ substantially in prediction and scaling because partiality is angle-driven in rotation data and excitation-error-driven in still data.

---