Jungfraujoch/image_analysis/pixel_refinement/FACTORED_MODEL.md

# A factored likelihood for joint integration + scaling + geometry

**Status: Terms 1+2 implemented and shipping as the PixelRefine default** (see
`PixelRefine.cpp`, `METHODS.md`, `FINDINGS-2026-06.md`); Term 3 (geometry) and the
priors/NN extensions of §4 remain future work. Goal: replace the per-pixel least-squares
of PixelRefine with a per-*reflection* likelihood that fuses profile-fit integration,
scaling against the reference, and geometry refinement into one differentiable
objective — the foundation for priors (Bayesian) and learned components (NN), and the
thing that dissolves the empty-pixel and parameter-degeneracy problems by construction
rather than by patching.

## 0. Notation

Per image, parameters `θ`: scale `G`, Debye-Waller `B`, orientation + cell (geometry),
profile width `R1` (tangential, possibly a 2×2 tensor), partiality width `R0`
(radial/mosaicity; `R0_eff² = R0² + R_bw²`, `R_bw² = (bλ)²/2d⁴` *known* from bandwidth).
Per reflection `h`: reference intensity `I_ref` (the hypothesis), resolution `d`,
predicted centre `c_pred`, partiality `p = exp(−ε_r²/R0_eff²)`, polarisation `pol`,
`B_term = exp(−B/4d²)`, shoebox pixels `{I_p}` with mean local background `Bg`, and the
area-normalised tangential profile template `P_p = P_tang(ε_t,p; R1)`.

## 1. The factorisation principle

A reflection's shoebox carries three (to first order) **orthogonal** pieces of
information — the 0th, 1st and 2nd moments of its intensity distribution:

| moment | statistic | constrains |
|---|---|---|
| 0th — total | profile-fit amplitude `J` | scale chain `G, B` (and `p`) |
| 1st — position | centroid `c_obs` | geometry (orientation; radial→distance/cell) |
| 2nd — shape | second moment `M₂` | profile width `R1` (and anisotropy) |

The current per-pixel residual mixes all three into one objective over shared pixels —
*that* is what couples the parameters (measured G–R0 ≈ −0.46, G–R1 ≈ +0.51) and lets the
many empty pixels dominate. Residual-ing each **moment** against its model instead gives
a block-diagonal Jacobian: the couplings vanish because each statistic carries one
parameter block's information.

## 2. The three residual terms

### 2.1 Intensity / scaling residual (one scalar per reflection)

Optimal (Diamond) profile-fit amplitude and its model:
```
J      = Σ_p w_p P_p (I_p − Bg) / Σ_p w_p P_p²          w_p = 1/v_p
J_model = G · B_term · p · pol · I_ref
r¹_h    = (J − J_model) / σ_J
```
`J` is ~invariant to `R1` (a well-sampled spot integrates to the same total whatever
width is assumed) → **R1 leaves this residual**. Empty pixels make no residual; they
enter only through `J` with ~zero profile weight → **the empty-pixel problem is gone by
construction.** This residual *is* the scaling residual — integration and scaling are now
one objective.

### 2.2 Shape residual (constrains R1; decoupled from scale)

```
M₂_obs   = Σ_p (I_p − Bg) ε_t,p² / Σ_p (I_p − Bg)      (intensity-weighted variance, Å⁻²)
M₂_model = R1² / 2                                      (variance of exp(−ε_t²/R1²))
r²_h     = (M₂_obs − M₂_model) / σ_M2
```
A moment is normalised by the total → **scale-invariant → `∂r²/∂G = 0`**. The G↔R1
degeneracy disappears. Anisotropic extension: use the 2×2 moment tensor
`Σ(I−Bg)(ε_t⊗ε_t)/Σ(I−Bg)` vs `diag(R1a²/2, R1b²/2)` → elliptical R1 (the DMM streak).
Weak spots have huge `σ_M2` → contribute ~nothing → R1 is set by strong spots
automatically (and may be made `R1(d)` per resolution).

### 2.3 Position residual (constrains geometry; decoupled from scale and shape)

```
c_obs = Σ_p (I_p − Bg)(x_p, y_p) / Σ_p (I_p − Bg)
r³_h  = (c_obs − c_pred(geometry)) / σ_c              (2-vector; split radial / tangential)
```
Centroid is scale- and width-invariant → `∂r³/∂G = ∂r³/∂R1 ≈ 0`. The **radial** component
constrains distance/cell, the **tangential** constrains orientation — exactly the split
the diagnostic measured (radial≈0 = no distance error; tangential∝radius = orientation).

## 3. Fisher / expected-variance weighting (makes it a likelihood)

Every `σ` uses the **model-expected** variance, never observed counts — this is what
makes strong *expected* reflections carry the information and makes the model "feel pain
when something that should be there is not":
```
v_p   = Bg + J_model · P_p           (background + expected signal from I_ref, not I_obs)
σ_J²  = 1 / Σ_p (P_p² / v_p)
σ_M2  ≈ M₂ · √(2 / N_eff),   σ_c ≈ R1 / √(N_eff),   N_eff = (Σ(I−Bg))² / Σ v_p
```
Fisher information about `G` from term 1 is `∝ (B_term·p·pol·I_ref)² / σ_J²` — driven by
`I_ref`, so a noise spike (high counts, low `I_ref`) gets *no* weight while a strong
expected reflection observed absent (`J≈0`, large residual, moderate `σ_J`) gets a large
penalty. The reference enters at maximum leverage: it sets both the target and the weight.

## 4. Joint objective and priors

```
L(θ) = Σ_h [ (r¹_h)² + (r²_h)² + |r³_h|² ]  +  priors
```
No free λ if the σ's are correct — the relative weighting *is* the Fisher information.
Priors are the Bayesian hooks and the principled degeneracy breaks:
- **R0 (partiality/mosaicity) is GLOBAL + prior.** R0 multiplies `J_model` (`p`), so it is
  still degenerate with the per-image `G` *within term 1* — the one degeneracy the
  factorisation does **not** remove. Resolve it physically, not with a directional G prior
  (which would bias every output intensity): `R0 ~ N(mosaicity, σ)`, `R_bw` fixed from the
  known bandwidth, and `R0` fit **globally** (one per crystal, from many reflections'
  partiality distribution) so per-image G can't trade against it.
- orientation `~ N(spot-centroid, σ)`; `G ~ N(1, σ_G)` or tied to the beam monitor;
  distance `~ N(nominal, σ_L)` (loose, since serial/jet alignment is poorly constrained).
- Optional Bayesian intensities: treat `I_true` as a parameter with the reference as its
  prior → posterior over intensities, not point estimates.

## 5. Why the degeneracies vanish (Jacobian structure)

`Jᵀ W J` is approximately block-diagonal in `(G,B,p | R1 | geometry)`:
```
∂r¹/∂{G,B,p} ≠ 0 ;  ∂r¹/∂R1 ≈ 0 ;  ∂r¹/∂geom ≈ 0
∂r²/∂R1     ≠ 0 ;  ∂r²/∂G  = 0 ;  ∂r²/∂geom ≈ 0
∂r³/∂geom   ≠ 0 ;  ∂r³/∂G  = 0 ;  ∂r³/∂R1  ≈ 0
```
So G↔R1 (+0.51) and all the cross-couplings drop to ~0 by construction. Only **G↔R0**
survives (R0 is a scale-multiplier, not a shape), handled by the global+physical prior of
§4. The degeneracies we measured were artifacts of projecting all information onto a
single per-pixel residual.

## 6. Implementation notes

- Per reflection: 1 (intensity) + 1 (shape) + 2 (position) residuals = 4, vs ~49 per-pixel
  residuals → **cheaper**, and Ceres autodiffs the moment formulas through the pixels.
- The per-pixel forward model still *defines* `P_tang`, `p`, etc.; the **loss** moves to
  the moments.
- Geometry (term 3) can run as the global sweep we have (it already maximises a
  position/CC objective); terms 1–2 are the per-image photometry. Or solve all three
  jointly per image with the global R0/mosaicity shared across images (two-level fit).
- Drop-in path: keep the current extraction, add the three residuals as a new objective
  behind a flag, compare against the per-pixel loss on both test crystals.

## 7. Why this serves the goal

It is one differentiable likelihood, factored along the physics, that (a) maximises use of
the reference (target + Fisher weight), (b) is the substrate for priors / posteriors over
intensities (Bayesian), and (c) lets any term — profile `P`, partiality `p`, corrections —
be replaced by a learned function trained through the same likelihood. That is the
qualitative move XDS-style empirical profile fitting cannot make.