Build Packages / build:rpm (ubuntu2404_nocuda) (push) Successful in 25m0s
Build Packages / build:rpm (rocky8_nocuda) (push) Successful in 26m42s
Build Packages / build:rpm (rocky8_sls9) (push) Successful in 27m7s
Build Packages / build:rpm (ubuntu2204_nocuda) (push) Successful in 28m25s
Build Packages / build:rpm (rocky9_nocuda) (push) Successful in 29m44s
Build Packages / build:rpm (rocky9_sls9) (push) Successful in 32m14s
Build Packages / build:rpm (rocky8) (push) Successful in 24m39s
Build Packages / build:rpm (ubuntu2404) (push) Successful in 23m52s
Build Packages / build:rpm (ubuntu2204) (push) Successful in 25m19s
Build Packages / Generate python client (push) Successful in 23s
Build Packages / XDS test (durin plugin) (push) Successful in 20m42s
Build Packages / Create release (push) Skipped
Build Packages / build:rpm (rocky9) (push) Successful in 27m2s
Build Packages / Build documentation (push) Successful in 1m23s
Build Packages / DIALS test (push) Successful in 31m5s
Build Packages / XDS test (JFJoch plugin) (push) Successful in 14m55s
Build Packages / XDS test (neggia plugin) (push) Successful in 13m7s
Build Packages / Unit tests (push) Successful in 2h14m40s
PixelRefine is now an intensity-only operation: geometry is fixed (refined upstream by XtalOptimizer) and the only objective is the factored per-reflection likelihood (FACTORED_MODEL.md Terms 1+2) - measured per-resolution profile width R1 plus one Fisher-weighted intensity/scaling residual per reflection, fitting the per-image scale G and B. Validated on crystal 2 (fixed_master.h5 as stills, 1.7 A): CC1/2 84-92%, CCref 77-92%, flat - reproduces the env-flag prototype and matches the rotation path from the stills path. Removed: - the per-pixel ShoeboxResidual loss and PixelResidual cost functor; - all in-PixelRefine geometry refinement (orientation/cell/beam/distance/R), the regularised-orientation LSQ, signal-weighting, and the global sweep; - Term 3 (per-spot recentring) - a confirmed no-op on both crystals; - the diagnostic scaffolding (covariance, centroid, adaptive_R1) and the PR_* env knobs + stderr dumps in IndexAndRefine; - the PredictImage/ChiSquaredImage renderers and the entire viewer PixelRefine window/table/params + worker bindings + shoebox overlay. The sweep box-integrator background median became mean (consistency) by virtue of removing the sweep. METHODS.md rewritten for the current model; findings recorded in FINDINGS-2026-06.md. Net -2200 lines. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
215 lines
9.9 KiB
Markdown
215 lines
9.9 KiB
Markdown
# PixelRefine — methods
|
||
|
||
`PixelRefine` is the still-image integrator. It integrates the Bragg reflections of one
|
||
image by **profile fitting against a reference intensity set** $I^\mathrm{ref}$ (e.g.
|
||
`F_calc` from a deposited model, or the current merged estimate in an EM-style outer loop)
|
||
and returns already-scaled intensities. It is an **intensity-wise** operation: the
|
||
detector geometry (orientation, cell, beam, distance) is taken as fixed — it was refined
|
||
upstream by `XtalOptimizer` (`IndexAndRefine::RefineGeometryIfNeeded`) — and PixelRefine
|
||
only measures the spot shape and fits the per-image scale.
|
||
|
||
The objective is the factored per-reflection likelihood of `FACTORED_MODEL.md`, **Terms 1
|
||
and 2**. This note records the equations and the reasons behind each design choice.
|
||
|
||
Throughout, a reflection's shoebox is a small box of raw detector pixels $I_p$ with a
|
||
local flat background $B$; the area-normalised tangential profile at pixel $p$ is $P_p$,
|
||
and $v_p$ is the variance used to weight pixel $p$.
|
||
|
||
---
|
||
|
||
## 0. The forward model
|
||
|
||
The recorded amplitude of a still reflection is the profile-fit amplitude
|
||
|
||
$$
|
||
J \;=\; \frac{\sum_p P_p\,(I_p - B)/v_p}{\sum_p P_p^{2}/v_p},
|
||
\qquad
|
||
\operatorname{var}(J) = \frac{1}{\sum_p P_p^{2}/v_p}.
|
||
$$
|
||
|
||
The full (rotation-equivalent) intensity is recovered by dividing out the factors a still
|
||
does not record,
|
||
|
||
$$
|
||
I \;=\; \frac{J}{p\,B_\mathrm{DW}\,\mathrm{pol}},\qquad
|
||
p = \exp\!\left(-\frac{\epsilon_r^{2}}{R_{0,\mathrm{eff}}^{2}}\right),\quad
|
||
B_\mathrm{DW}=\exp\!\left(-\frac{B_\mathrm{fac}}{4 d^{2}}\right),
|
||
$$
|
||
|
||
where the **partiality** $p$ is the fraction of the mosaic block crossing the Ewald
|
||
sphere, $\epsilon_r$ is the radial excitation error, $R_0$ the radial (rocking) width, and
|
||
$\mathrm{pol}$ the polarisation correction. The tangential profile is a separable,
|
||
area-normalised Gaussian of width $R_1$:
|
||
|
||
$$
|
||
P_p = \frac{1}{\pi R_1^{2}}\exp\!\left(-\frac{\epsilon_{t,p}^{2}}{R_1^{2}}\right).
|
||
$$
|
||
|
||
A finite **X-ray bandwidth** thickens the Ewald shell radially, adding a fixed,
|
||
resolution-dependent term to the radial width, $R_{0,\mathrm{eff}}^2 = R_0^2 +
|
||
(b\lambda)^2/(2d^4)$ ($b$ = relative bandwidth; the pink-beam / DMM signature). $b=0$ is a
|
||
monochromatic no-op.
|
||
|
||
---
|
||
|
||
## 1. De-biased variance (the load-bearing fix)
|
||
|
||
**Symptom.** Mean intensities went **negative** in the high-resolution shells
|
||
($\langle I/\sigma\rangle$ down to $-12$), and the per-image scale $G$ collapsed to $0$ on
|
||
most images, dropping ~80 % of observations.
|
||
|
||
**Cause.** The extraction weighted each pixel by its **observed** count, $v_p = I_p$. A
|
||
down-fluctuated background pixel ($I_p < B$) then gets a small $v_p$, hence a large
|
||
$w_p=1/v_p$, and its contribution $P_p(I_p-B)/v_p < 0$ is large in magnitude. Summed over
|
||
the many empty shoebox pixels this drags $J$ below zero — the *inverse-observed-count*
|
||
(Poisson-on-data) bias, worst where the true signal is weakest (high resolution).
|
||
|
||
**Fix.** For background-limited reflections the correct variance is the local background,
|
||
**constant over the shoebox**, $v_p = \max(B,1)$, giving the unbiased uniform-variance
|
||
estimator $J = \sum_p P_p (I_p-B)/\sum_p P_p^2$. This turned $\langle I/\sigma\rangle$
|
||
positive at all resolutions and stopped the scale collapse.
|
||
|
||
---
|
||
|
||
## 2. Prediction band and multiplicity
|
||
|
||
A reflection is given a shoebox only when it lies within a radial band of the Ewald
|
||
sphere, $\bigl|\,|S_{hkl}| - 1/\lambda\,\bigr| \le \delta$. For randomly oriented stills
|
||
the number of images on which a given $hkl$ qualifies is $\propto \delta$. The original
|
||
$\delta = 5\times10^{-4}\,\text{Å}^{-1}$ was 4–6× tighter than a box integrator, giving 4×
|
||
fewer observations per reflection. Widening to $\delta = 2\times10^{-3}\,\text{Å}^{-1}$
|
||
(`ewald_dist_cutoff`) restores the multiplicity; the partiality $p$ downweights the
|
||
slightly-off-Ewald tails it admits. (Widening is only safe with the de-biased variance of
|
||
§1 and the factored objective of §§3–4 — with a per-pixel geometry fit it diverged.)
|
||
|
||
---
|
||
|
||
## 3. Term 2 — measured per-resolution profile width $R_1$
|
||
|
||
$R_1$ is **measured, not fitted**. Fitting $R_1$ inside a per-image least squares is
|
||
degenerate with the scale $G$ (a narrower profile and a larger scale trade off), and that
|
||
degeneracy slides the per-image scale and wrecks the merge. But a **second moment** is
|
||
normalised by the total intensity, so it carries shape information *decoupled from scale*:
|
||
|
||
$$
|
||
R_1^2 = 2\,\langle \epsilon_t^2\rangle,
|
||
\qquad
|
||
\langle \epsilon_t^2\rangle = \frac{\sum_p (I_p-B)\,\epsilon_{t,p}^2}{\sum_p (I_p-B)} .
|
||
$$
|
||
|
||
We bin the strong spots ($\mathrm{signif}\ge 5$) by resolution ($1/d^2$, 6 bins) and take
|
||
the **median** $\langle\epsilon_t^2\rangle$ per bin, so each reflection integrates with the
|
||
$R_1$ of its resolution shell (low-res spots are tight; high-res anisotropic streaks are
|
||
wider). Weak spots fall back to the global $R_1$. Measuring the width rather than fitting
|
||
it is what makes profile-width refinement stable — and it is a selling point: the mask
|
||
adapts to the data per shell.
|
||
|
||
---
|
||
|
||
## 4. Term 1 — the intensity / scaling residual
|
||
|
||
The per-pixel least squares is replaced by **one residual per reflection**: the profile-fit
|
||
amplitude $J$ (using the Term-2 $R_1$) should equal the scaled reference,
|
||
|
||
$$
|
||
r_h = \frac{J_h - G\,B_\mathrm{DW}\,p_h\,\mathrm{pol}_h\,I^\mathrm{ref}_h}{\sigma_{J,h}},
|
||
\qquad
|
||
L = \sum_h r_h^2 + \text{(scale prior)} .
|
||
$$
|
||
|
||
Only the per-image scale $G$ and Debye–Waller $B$ are optimised; geometry and $R$ are
|
||
fixed. Three consequences:
|
||
|
||
* **Integration and scaling become one objective.** $J$ *is* the integrated intensity and
|
||
the residual *is* the scaling residual.
|
||
* **The empty-pixel problem disappears by construction.** Empty pixels enter only through
|
||
$J$ (with ~zero profile weight); they make no residual of their own and cannot dominate.
|
||
* **Fisher weighting puts the reference at maximum leverage.** $\sigma_J$ uses the
|
||
*model-expected* variance $v_p = B + \max(J,0)\,P_p$ (background plus expected signal from
|
||
$I^\mathrm{ref}$), not the observed counts — so a strong *expected* reflection observed
|
||
absent is penalised, and a noise spike with low $I^\mathrm{ref}$ gets no weight.
|
||
|
||
The scale is regularised towards 1 with a data-scaled weight $w_G=\sqrt{N_\mathrm{refl}/
|
||
\sigma_G}$ (mirrors `ScaleOnTheFly`) so weakly-measured images cannot drift and scramble
|
||
the merge.
|
||
|
||
**Geometry is not refined here.** PixelRefine's earlier per-image geometry refinement
|
||
(regularised orientation LSQ, signal-weighting, a global orientation/cell sweep) was
|
||
removed: on true stills the predictions are already good (radial centroid error ≈ 0,
|
||
tangential ≈ a 0.4 px sampling floor that is *not* a recoverable misprediction), and per-image
|
||
geometry refinement only overfit the sparse signal. Geometry is the job of `XtalOptimizer`.
|
||
|
||
---
|
||
|
||
## 5. Background estimator — mean, not median (both integrators)
|
||
|
||
**Symptom.** Pushed past the true resolution limit, the no-signal shells reported
|
||
$\langle I/\sigma\rangle\approx4\text{–}6$ at $\mathrm{CC}_{1/2}\approx0$ — confident "data"
|
||
where there is none. Present in *both* PixelRefine and the classical `BraggIntegrate2D`.
|
||
|
||
**Cause.** Both used the **median** of the background ring. For a right-skewed (Poisson)
|
||
background $\operatorname{median}(B) < \mathbb{E}[B]$, so subtraction under-subtracts by a
|
||
tiny but *coherent* per-pixel offset that grows over an $n_\mathrm{pix}$ peak and a
|
||
multiplicity-$m$ merge into a fake $\langle I/\sigma\rangle\propto\sqrt{m}$, worst where the
|
||
real signal is weakest.
|
||
|
||
**Fix.** Use the **mean** of the ring (spot cores and saturation sentinels already
|
||
excluded). $\langle I/\sigma\rangle$ then collapses to ~0 wherever $\mathrm{CC}\approx0$ and
|
||
the honest resolution limit becomes visible. This was the single largest contributor to
|
||
untrustworthy σ — a one-line change in each integrator.
|
||
|
||
---
|
||
|
||
## 6. Error model (global $a,b$; XDS form)
|
||
|
||
Counting statistics under-estimate the variance of strong reflections, which carry
|
||
systematic errors proportional to $I$, not $\sqrt{I}$. The standard correction inflates the
|
||
variance with a **global** two-parameter model, applied at the merge level so both
|
||
integrators benefit:
|
||
|
||
$$
|
||
\sigma'^{\,2} = a\,\sigma^{2} + (b\,\langle I\rangle)^{2},
|
||
\qquad
|
||
\mathrm{ISa} = \frac{1}{b} = \lim_{I\to\infty}\frac{I}{\sigma'} .
|
||
$$
|
||
|
||
The $I^2$ term uses the reflection **mean** $\langle I\rangle$ (not the per-observation
|
||
$I_i$, which would bias the merged mean and collapse CC); $a,b$ are fit from the spread of
|
||
symmetry equivalents with a relative ($1/\mathrm{dev}^4$) weight so the strong bins (which
|
||
fix $b$) do not swamp the weak bins (which fix $a$). `jfjoch_process` prints the model and
|
||
ISa.
|
||
|
||
---
|
||
|
||
## Results (lysozyme rotation crystal `fixed_master.h5`, treated as stills, 1.7 Å)
|
||
|
||
| Configuration | N_obs | $\langle I/\sigma\rangle$ | CC$_{1/2}$ | CC$_\mathrm{ref}$ |
|
||
|---|---:|---:|---:|---:|
|
||
| Baseline per-pixel loss | 799 k | 7.2 | erratic, →0 at 1.7 Å | erratic |
|
||
| **Factored Terms 1+2 (this model)** | 1.22 M | 10.7 | **84–92 % flat** | **77–92 % flat** |
|
||
|
||
The factored objective turns the erratic, high-res-collapsing per-pixel result into flat
|
||
~90 % CC$_{1/2}$/CC$_\mathrm{ref}$ to 1.7 Å — matching the proper rotation integration path
|
||
from the *stills* path. (See `FINDINGS-2026-06.md`.)
|
||
|
||
---
|
||
|
||
## Default recipe
|
||
|
||
§§1–4 are PixelRefine-specific; §§5–6 act at the integration/merge level and apply to the
|
||
classical route too.
|
||
|
||
| Field / behaviour | Default | Section |
|
||
|---|---|---|
|
||
| fit/extraction variance | local background $B$ | 1 |
|
||
| `ewald_dist_cutoff` | $2\times10^{-3}\,\text{Å}^{-1}$ | 2 |
|
||
| tangential width $R_1$ | measured per resolution shell | 3 |
|
||
| objective | per-reflection intensity residual, Fisher-weighted | 4 |
|
||
| refined parameters | per-image $G$ (and $B$); geometry fixed | 4 |
|
||
| `scale_reg_sigma` | 2.0 | 4 |
|
||
| local background estimator | **mean** of the ring | 5 |
|
||
| merge error model | global $a,b$ (ISa printed) | 6 |
|
||
|
||
`ewald_dist_cutoff` (multiplicity vs. cost) and `bandwidth` (Si vs. DMM) are the two knobs
|
||
worth setting per dataset.
|