Files
Jungfraujoch/image_analysis/pixel_refinement/METHODS.md
T
leonarski_fandClaude Opus 4.8 100fe7b7e7
Build Packages / build:rpm (ubuntu2404_nocuda) (push) Successful in 25m0s
Build Packages / build:rpm (rocky8_nocuda) (push) Successful in 26m42s
Build Packages / build:rpm (rocky8_sls9) (push) Successful in 27m7s
Build Packages / build:rpm (ubuntu2204_nocuda) (push) Successful in 28m25s
Build Packages / build:rpm (rocky9_nocuda) (push) Successful in 29m44s
Build Packages / build:rpm (rocky9_sls9) (push) Successful in 32m14s
Build Packages / build:rpm (rocky8) (push) Successful in 24m39s
Build Packages / build:rpm (ubuntu2404) (push) Successful in 23m52s
Build Packages / build:rpm (ubuntu2204) (push) Successful in 25m19s
Build Packages / Generate python client (push) Successful in 23s
Build Packages / XDS test (durin plugin) (push) Successful in 20m42s
Build Packages / Create release (push) Skipped
Build Packages / build:rpm (rocky9) (push) Successful in 27m2s
Build Packages / Build documentation (push) Successful in 1m23s
Build Packages / DIALS test (push) Successful in 31m5s
Build Packages / XDS test (JFJoch plugin) (push) Successful in 14m55s
Build Packages / XDS test (neggia plugin) (push) Successful in 13m7s
Build Packages / Unit tests (push) Successful in 2h14m40s
PixelRefine: make factored Terms 1+2 the model, remove old wiring
PixelRefine is now an intensity-only operation: geometry is fixed (refined
upstream by XtalOptimizer) and the only objective is the factored per-reflection
likelihood (FACTORED_MODEL.md Terms 1+2) - measured per-resolution profile width
R1 plus one Fisher-weighted intensity/scaling residual per reflection, fitting
the per-image scale G and B. Validated on crystal 2 (fixed_master.h5 as stills,
1.7 A): CC1/2 84-92%, CCref 77-92%, flat - reproduces the env-flag prototype and
matches the rotation path from the stills path.

Removed:
- the per-pixel ShoeboxResidual loss and PixelResidual cost functor;
- all in-PixelRefine geometry refinement (orientation/cell/beam/distance/R),
  the regularised-orientation LSQ, signal-weighting, and the global sweep;
- Term 3 (per-spot recentring) - a confirmed no-op on both crystals;
- the diagnostic scaffolding (covariance, centroid, adaptive_R1) and the
  PR_* env knobs + stderr dumps in IndexAndRefine;
- the PredictImage/ChiSquaredImage renderers and the entire viewer
  PixelRefine window/table/params + worker bindings + shoebox overlay.

The sweep box-integrator background median became mean (consistency) by virtue
of removing the sweep. METHODS.md rewritten for the current model; findings
recorded in FINDINGS-2026-06.md. Net -2200 lines.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 22:02:18 +02:00

215 lines
9.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# PixelRefine — methods
`PixelRefine` is the still-image integrator. It integrates the Bragg reflections of one
image by **profile fitting against a reference intensity set** $I^\mathrm{ref}$ (e.g.
`F_calc` from a deposited model, or the current merged estimate in an EM-style outer loop)
and returns already-scaled intensities. It is an **intensity-wise** operation: the
detector geometry (orientation, cell, beam, distance) is taken as fixed — it was refined
upstream by `XtalOptimizer` (`IndexAndRefine::RefineGeometryIfNeeded`) — and PixelRefine
only measures the spot shape and fits the per-image scale.
The objective is the factored per-reflection likelihood of `FACTORED_MODEL.md`, **Terms 1
and 2**. This note records the equations and the reasons behind each design choice.
Throughout, a reflection's shoebox is a small box of raw detector pixels $I_p$ with a
local flat background $B$; the area-normalised tangential profile at pixel $p$ is $P_p$,
and $v_p$ is the variance used to weight pixel $p$.
---
## 0. The forward model
The recorded amplitude of a still reflection is the profile-fit amplitude
$$
J \;=\; \frac{\sum_p P_p\,(I_p - B)/v_p}{\sum_p P_p^{2}/v_p},
\qquad
\operatorname{var}(J) = \frac{1}{\sum_p P_p^{2}/v_p}.
$$
The full (rotation-equivalent) intensity is recovered by dividing out the factors a still
does not record,
$$
I \;=\; \frac{J}{p\,B_\mathrm{DW}\,\mathrm{pol}},\qquad
p = \exp\!\left(-\frac{\epsilon_r^{2}}{R_{0,\mathrm{eff}}^{2}}\right),\quad
B_\mathrm{DW}=\exp\!\left(-\frac{B_\mathrm{fac}}{4 d^{2}}\right),
$$
where the **partiality** $p$ is the fraction of the mosaic block crossing the Ewald
sphere, $\epsilon_r$ is the radial excitation error, $R_0$ the radial (rocking) width, and
$\mathrm{pol}$ the polarisation correction. The tangential profile is a separable,
area-normalised Gaussian of width $R_1$:
$$
P_p = \frac{1}{\pi R_1^{2}}\exp\!\left(-\frac{\epsilon_{t,p}^{2}}{R_1^{2}}\right).
$$
A finite **X-ray bandwidth** thickens the Ewald shell radially, adding a fixed,
resolution-dependent term to the radial width, $R_{0,\mathrm{eff}}^2 = R_0^2 +
(b\lambda)^2/(2d^4)$ ($b$ = relative bandwidth; the pink-beam / DMM signature). $b=0$ is a
monochromatic no-op.
---
## 1. De-biased variance (the load-bearing fix)
**Symptom.** Mean intensities went **negative** in the high-resolution shells
($\langle I/\sigma\rangle$ down to $-12$), and the per-image scale $G$ collapsed to $0$ on
most images, dropping ~80 % of observations.
**Cause.** The extraction weighted each pixel by its **observed** count, $v_p = I_p$. A
down-fluctuated background pixel ($I_p < B$) then gets a small $v_p$, hence a large
$w_p=1/v_p$, and its contribution $P_p(I_p-B)/v_p < 0$ is large in magnitude. Summed over
the many empty shoebox pixels this drags $J$ below zero — the *inverse-observed-count*
(Poisson-on-data) bias, worst where the true signal is weakest (high resolution).
**Fix.** For background-limited reflections the correct variance is the local background,
**constant over the shoebox**, $v_p = \max(B,1)$, giving the unbiased uniform-variance
estimator $J = \sum_p P_p (I_p-B)/\sum_p P_p^2$. This turned $\langle I/\sigma\rangle$
positive at all resolutions and stopped the scale collapse.
---
## 2. Prediction band and multiplicity
A reflection is given a shoebox only when it lies within a radial band of the Ewald
sphere, $\bigl|\,|S_{hkl}| - 1/\lambda\,\bigr| \le \delta$. For randomly oriented stills
the number of images on which a given $hkl$ qualifies is $\propto \delta$. The original
$\delta = 5\times10^{-4}\,\text{Å}^{-1}$ was 46× tighter than a box integrator, giving 4×
fewer observations per reflection. Widening to $\delta = 2\times10^{-3}\,\text{Å}^{-1}$
(`ewald_dist_cutoff`) restores the multiplicity; the partiality $p$ downweights the
slightly-off-Ewald tails it admits. (Widening is only safe with the de-biased variance of
§1 and the factored objective of §§34 — with a per-pixel geometry fit it diverged.)
---
## 3. Term 2 — measured per-resolution profile width $R_1$
$R_1$ is **measured, not fitted**. Fitting $R_1$ inside a per-image least squares is
degenerate with the scale $G$ (a narrower profile and a larger scale trade off), and that
degeneracy slides the per-image scale and wrecks the merge. But a **second moment** is
normalised by the total intensity, so it carries shape information *decoupled from scale*:
$$
R_1^2 = 2\,\langle \epsilon_t^2\rangle,
\qquad
\langle \epsilon_t^2\rangle = \frac{\sum_p (I_p-B)\,\epsilon_{t,p}^2}{\sum_p (I_p-B)} .
$$
We bin the strong spots ($\mathrm{signif}\ge 5$) by resolution ($1/d^2$, 6 bins) and take
the **median** $\langle\epsilon_t^2\rangle$ per bin, so each reflection integrates with the
$R_1$ of its resolution shell (low-res spots are tight; high-res anisotropic streaks are
wider). Weak spots fall back to the global $R_1$. Measuring the width rather than fitting
it is what makes profile-width refinement stable — and it is a selling point: the mask
adapts to the data per shell.
---
## 4. Term 1 — the intensity / scaling residual
The per-pixel least squares is replaced by **one residual per reflection**: the profile-fit
amplitude $J$ (using the Term-2 $R_1$) should equal the scaled reference,
$$
r_h = \frac{J_h - G\,B_\mathrm{DW}\,p_h\,\mathrm{pol}_h\,I^\mathrm{ref}_h}{\sigma_{J,h}},
\qquad
L = \sum_h r_h^2 + \text{(scale prior)} .
$$
Only the per-image scale $G$ and DebyeWaller $B$ are optimised; geometry and $R$ are
fixed. Three consequences:
* **Integration and scaling become one objective.** $J$ *is* the integrated intensity and
the residual *is* the scaling residual.
* **The empty-pixel problem disappears by construction.** Empty pixels enter only through
$J$ (with ~zero profile weight); they make no residual of their own and cannot dominate.
* **Fisher weighting puts the reference at maximum leverage.** $\sigma_J$ uses the
*model-expected* variance $v_p = B + \max(J,0)\,P_p$ (background plus expected signal from
$I^\mathrm{ref}$), not the observed counts — so a strong *expected* reflection observed
absent is penalised, and a noise spike with low $I^\mathrm{ref}$ gets no weight.
The scale is regularised towards 1 with a data-scaled weight $w_G=\sqrt{N_\mathrm{refl}/
\sigma_G}$ (mirrors `ScaleOnTheFly`) so weakly-measured images cannot drift and scramble
the merge.
**Geometry is not refined here.** PixelRefine's earlier per-image geometry refinement
(regularised orientation LSQ, signal-weighting, a global orientation/cell sweep) was
removed: on true stills the predictions are already good (radial centroid error ≈ 0,
tangential ≈ a 0.4 px sampling floor that is *not* a recoverable misprediction), and per-image
geometry refinement only overfit the sparse signal. Geometry is the job of `XtalOptimizer`.
---
## 5. Background estimator — mean, not median (both integrators)
**Symptom.** Pushed past the true resolution limit, the no-signal shells reported
$\langle I/\sigma\rangle\approx4\text{}6$ at $\mathrm{CC}_{1/2}\approx0$ — confident "data"
where there is none. Present in *both* PixelRefine and the classical `BraggIntegrate2D`.
**Cause.** Both used the **median** of the background ring. For a right-skewed (Poisson)
background $\operatorname{median}(B) < \mathbb{E}[B]$, so subtraction under-subtracts by a
tiny but *coherent* per-pixel offset that grows over an $n_\mathrm{pix}$ peak and a
multiplicity-$m$ merge into a fake $\langle I/\sigma\rangle\propto\sqrt{m}$, worst where the
real signal is weakest.
**Fix.** Use the **mean** of the ring (spot cores and saturation sentinels already
excluded). $\langle I/\sigma\rangle$ then collapses to ~0 wherever $\mathrm{CC}\approx0$ and
the honest resolution limit becomes visible. This was the single largest contributor to
untrustworthy σ — a one-line change in each integrator.
---
## 6. Error model (global $a,b$; XDS form)
Counting statistics under-estimate the variance of strong reflections, which carry
systematic errors proportional to $I$, not $\sqrt{I}$. The standard correction inflates the
variance with a **global** two-parameter model, applied at the merge level so both
integrators benefit:
$$
\sigma'^{\,2} = a\,\sigma^{2} + (b\,\langle I\rangle)^{2},
\qquad
\mathrm{ISa} = \frac{1}{b} = \lim_{I\to\infty}\frac{I}{\sigma'} .
$$
The $I^2$ term uses the reflection **mean** $\langle I\rangle$ (not the per-observation
$I_i$, which would bias the merged mean and collapse CC); $a,b$ are fit from the spread of
symmetry equivalents with a relative ($1/\mathrm{dev}^4$) weight so the strong bins (which
fix $b$) do not swamp the weak bins (which fix $a$). `jfjoch_process` prints the model and
ISa.
---
## Results (lysozyme rotation crystal `fixed_master.h5`, treated as stills, 1.7 Å)
| Configuration | N_obs | $\langle I/\sigma\rangle$ | CC$_{1/2}$ | CC$_\mathrm{ref}$ |
|---|---:|---:|---:|---:|
| Baseline per-pixel loss | 799 k | 7.2 | erratic, →0 at 1.7 Å | erratic |
| **Factored Terms 1+2 (this model)** | 1.22 M | 10.7 | **8492 % flat** | **7792 % flat** |
The factored objective turns the erratic, high-res-collapsing per-pixel result into flat
~90 % CC$_{1/2}$/CC$_\mathrm{ref}$ to 1.7 Å — matching the proper rotation integration path
from the *stills* path. (See `FINDINGS-2026-06.md`.)
---
## Default recipe
§§14 are PixelRefine-specific; §§56 act at the integration/merge level and apply to the
classical route too.
| Field / behaviour | Default | Section |
|---|---|---|
| fit/extraction variance | local background $B$ | 1 |
| `ewald_dist_cutoff` | $2\times10^{-3}\,\text{Å}^{-1}$ | 2 |
| tangential width $R_1$ | measured per resolution shell | 3 |
| objective | per-reflection intensity residual, Fisher-weighted | 4 |
| refined parameters | per-image $G$ (and $B$); geometry fixed | 4 |
| `scale_reg_sigma` | 2.0 | 4 |
| local background estimator | **mean** of the ring | 5 |
| merge error model | global $a,b$ (ISa printed) | 6 |
`ewald_dist_cutoff` (multiplicity vs. cost) and `bandwidth` (Si vs. DMM) are the two knobs
worth setting per dataset.