PixelRefine is now an intensity-only operation: geometry is fixed (refined upstream by XtalOptimizer) and the only objective is the factored per-reflection likelihood (FACTORED_MODEL.md Terms 1+2) - measured per-resolution profile width R1 plus one Fisher-weighted intensity/scaling residual per reflection, fitting the per-image scale G and B. Validated on crystal 2 (fixed_master.h5 as stills, 1.7 A): CC1/2 84-92%, CCref 77-92%, flat - reproduces the env-flag prototype and matches the rotation path from the stills path. Removed: - the per-pixel ShoeboxResidual loss and PixelResidual cost functor; - all in-PixelRefine geometry refinement (orientation/cell/beam/distance/R), the regularised-orientation LSQ, signal-weighting, and the global sweep; - Term 3 (per-spot recentring) - a confirmed no-op on both crystals; - the diagnostic scaffolding (covariance, centroid, adaptive_R1) and the PR_* env knobs + stderr dumps in IndexAndRefine; - the PredictImage/ChiSquaredImage renderers and the entire viewer PixelRefine window/table/params + worker bindings + shoebox overlay. The sweep box-integrator background median became mean (consistency) by virtue of removing the sweep. METHODS.md rewritten for the current model; findings recorded in FINDINGS-2026-06.md. Net -2200 lines. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
9.9 KiB
PixelRefine — methods
PixelRefine is the still-image integrator. It integrates the Bragg reflections of one
image by profile fitting against a reference intensity set I^\mathrm{ref} (e.g.
F_calc from a deposited model, or the current merged estimate in an EM-style outer loop)
and returns already-scaled intensities. It is an intensity-wise operation: the
detector geometry (orientation, cell, beam, distance) is taken as fixed — it was refined
upstream by XtalOptimizer (IndexAndRefine::RefineGeometryIfNeeded) — and PixelRefine
only measures the spot shape and fits the per-image scale.
The objective is the factored per-reflection likelihood of FACTORED_MODEL.md, Terms 1
and 2. This note records the equations and the reasons behind each design choice.
Throughout, a reflection's shoebox is a small box of raw detector pixels I_p with a
local flat background B; the area-normalised tangential profile at pixel p is P_p,
and v_p is the variance used to weight pixel p.
0. The forward model
The recorded amplitude of a still reflection is the profile-fit amplitude
J \;=\; \frac{\sum_p P_p\,(I_p - B)/v_p}{\sum_p P_p^{2}/v_p},
\qquad
\operatorname{var}(J) = \frac{1}{\sum_p P_p^{2}/v_p}.
The full (rotation-equivalent) intensity is recovered by dividing out the factors a still does not record,
I \;=\; \frac{J}{p\,B_\mathrm{DW}\,\mathrm{pol}},\qquad
p = \exp\!\left(-\frac{\epsilon_r^{2}}{R_{0,\mathrm{eff}}^{2}}\right),\quad
B_\mathrm{DW}=\exp\!\left(-\frac{B_\mathrm{fac}}{4 d^{2}}\right),
where the partiality p is the fraction of the mosaic block crossing the Ewald
sphere, \epsilon_r is the radial excitation error, R_0 the radial (rocking) width, and
\mathrm{pol} the polarisation correction. The tangential profile is a separable,
area-normalised Gaussian of width R_1:
P_p = \frac{1}{\pi R_1^{2}}\exp\!\left(-\frac{\epsilon_{t,p}^{2}}{R_1^{2}}\right).
A finite X-ray bandwidth thickens the Ewald shell radially, adding a fixed,
resolution-dependent term to the radial width, $R_{0,\mathrm{eff}}^2 = R_0^2 +
(b\lambda)^2/(2d^4)$ (b = relative bandwidth; the pink-beam / DMM signature). b=0 is a
monochromatic no-op.
1. De-biased variance (the load-bearing fix)
Symptom. Mean intensities went negative in the high-resolution shells
(\langle I/\sigma\rangle down to -12), and the per-image scale G collapsed to 0 on
most images, dropping ~80 % of observations.
Cause. The extraction weighted each pixel by its observed count, v_p = I_p. A
down-fluctuated background pixel (I_p < B) then gets a small v_p, hence a large
w_p=1/v_p, and its contribution P_p(I_p-B)/v_p < 0 is large in magnitude. Summed over
the many empty shoebox pixels this drags J below zero — the inverse-observed-count
(Poisson-on-data) bias, worst where the true signal is weakest (high resolution).
Fix. For background-limited reflections the correct variance is the local background,
constant over the shoebox, v_p = \max(B,1), giving the unbiased uniform-variance
estimator J = \sum_p P_p (I_p-B)/\sum_p P_p^2. This turned \langle I/\sigma\rangle
positive at all resolutions and stopped the scale collapse.
2. Prediction band and multiplicity
A reflection is given a shoebox only when it lies within a radial band of the Ewald
sphere, \bigl|\,|S_{hkl}| - 1/\lambda\,\bigr| \le \delta. For randomly oriented stills
the number of images on which a given hkl qualifies is \propto \delta. The original
\delta = 5\times10^{-4}\,\text{Å}^{-1} was 4–6× tighter than a box integrator, giving 4×
fewer observations per reflection. Widening to \delta = 2\times10^{-3}\,\text{Å}^{-1}
(ewald_dist_cutoff) restores the multiplicity; the partiality p downweights the
slightly-off-Ewald tails it admits. (Widening is only safe with the de-biased variance of
§1 and the factored objective of §§3–4 — with a per-pixel geometry fit it diverged.)
3. Term 2 — measured per-resolution profile width R_1
R_1 is measured, not fitted. Fitting R_1 inside a per-image least squares is
degenerate with the scale G (a narrower profile and a larger scale trade off), and that
degeneracy slides the per-image scale and wrecks the merge. But a second moment is
normalised by the total intensity, so it carries shape information decoupled from scale:
R_1^2 = 2\,\langle \epsilon_t^2\rangle,
\qquad
\langle \epsilon_t^2\rangle = \frac{\sum_p (I_p-B)\,\epsilon_{t,p}^2}{\sum_p (I_p-B)} .
We bin the strong spots (\mathrm{signif}\ge 5) by resolution (1/d^2, 6 bins) and take
the median \langle\epsilon_t^2\rangle per bin, so each reflection integrates with the
R_1 of its resolution shell (low-res spots are tight; high-res anisotropic streaks are
wider). Weak spots fall back to the global R_1. Measuring the width rather than fitting
it is what makes profile-width refinement stable — and it is a selling point: the mask
adapts to the data per shell.
4. Term 1 — the intensity / scaling residual
The per-pixel least squares is replaced by one residual per reflection: the profile-fit
amplitude J (using the Term-2 R_1) should equal the scaled reference,
r_h = \frac{J_h - G\,B_\mathrm{DW}\,p_h\,\mathrm{pol}_h\,I^\mathrm{ref}_h}{\sigma_{J,h}},
\qquad
L = \sum_h r_h^2 + \text{(scale prior)} .
Only the per-image scale G and Debye–Waller B are optimised; geometry and R are
fixed. Three consequences:
- Integration and scaling become one objective.
Jis the integrated intensity and the residual is the scaling residual. - The empty-pixel problem disappears by construction. Empty pixels enter only through
J(with ~zero profile weight); they make no residual of their own and cannot dominate. - Fisher weighting puts the reference at maximum leverage.
\sigma_Juses the model-expected variancev_p = B + \max(J,0)\,P_p(background plus expected signal fromI^\mathrm{ref}), not the observed counts — so a strong expected reflection observed absent is penalised, and a noise spike with lowI^\mathrm{ref}gets no weight.
The scale is regularised towards 1 with a data-scaled weight $w_G=\sqrt{N_\mathrm{refl}/
\sigma_G}$ (mirrors ScaleOnTheFly) so weakly-measured images cannot drift and scramble
the merge.
Geometry is not refined here. PixelRefine's earlier per-image geometry refinement
(regularised orientation LSQ, signal-weighting, a global orientation/cell sweep) was
removed: on true stills the predictions are already good (radial centroid error ≈ 0,
tangential ≈ a 0.4 px sampling floor that is not a recoverable misprediction), and per-image
geometry refinement only overfit the sparse signal. Geometry is the job of XtalOptimizer.
5. Background estimator — mean, not median (both integrators)
Symptom. Pushed past the true resolution limit, the no-signal shells reported
\langle I/\sigma\rangle\approx4\text{–}6 at \mathrm{CC}_{1/2}\approx0 — confident "data"
where there is none. Present in both PixelRefine and the classical BraggIntegrate2D.
Cause. Both used the median of the background ring. For a right-skewed (Poisson)
background \operatorname{median}(B) < \mathbb{E}[B], so subtraction under-subtracts by a
tiny but coherent per-pixel offset that grows over an n_\mathrm{pix} peak and a
multiplicity-m merge into a fake \langle I/\sigma\rangle\propto\sqrt{m}, worst where the
real signal is weakest.
Fix. Use the mean of the ring (spot cores and saturation sentinels already
excluded). \langle I/\sigma\rangle then collapses to ~0 wherever \mathrm{CC}\approx0 and
the honest resolution limit becomes visible. This was the single largest contributor to
untrustworthy σ — a one-line change in each integrator.
6. Error model (global a,b; XDS form)
Counting statistics under-estimate the variance of strong reflections, which carry
systematic errors proportional to I, not \sqrt{I}. The standard correction inflates the
variance with a global two-parameter model, applied at the merge level so both
integrators benefit:
\sigma'^{\,2} = a\,\sigma^{2} + (b\,\langle I\rangle)^{2},
\qquad
\mathrm{ISa} = \frac{1}{b} = \lim_{I\to\infty}\frac{I}{\sigma'} .
The I^2 term uses the reflection mean \langle I\rangle (not the per-observation
I_i, which would bias the merged mean and collapse CC); a,b are fit from the spread of
symmetry equivalents with a relative (1/\mathrm{dev}^4) weight so the strong bins (which
fix b) do not swamp the weak bins (which fix a). jfjoch_process prints the model and
ISa.
Results (lysozyme rotation crystal fixed_master.h5, treated as stills, 1.7 Å)
| Configuration | N_obs | \langle I/\sigma\rangle |
CC$_{1/2}$ | CC$_\mathrm{ref}$ |
|---|---|---|---|---|
| Baseline per-pixel loss | 799 k | 7.2 | erratic, →0 at 1.7 Å | erratic |
| Factored Terms 1+2 (this model) | 1.22 M | 10.7 | 84–92 % flat | 77–92 % flat |
The factored objective turns the erratic, high-res-collapsing per-pixel result into flat
~90 % CC${1/2}$/CC$\mathrm{ref}$ to 1.7 Å — matching the proper rotation integration path
from the stills path. (See FINDINGS-2026-06.md.)
Default recipe
§§1–4 are PixelRefine-specific; §§5–6 act at the integration/merge level and apply to the classical route too.
| Field / behaviour | Default | Section |
|---|---|---|
| fit/extraction variance | local background B |
1 |
ewald_dist_cutoff |
2\times10^{-3}\,\text{Å}^{-1} |
2 |
tangential width R_1 |
measured per resolution shell | 3 |
| objective | per-reflection intensity residual, Fisher-weighted | 4 |
| refined parameters | per-image G (and B); geometry fixed |
4 |
scale_reg_sigma |
2.0 | 4 |
| local background estimator | mean of the ring | 5 |
| merge error model | global a,b (ISa printed) |
6 |
ewald_dist_cutoff (multiplicity vs. cost) and bandwidth (Si vs. DMM) are the two knobs
worth setting per dataset.