This is an UNSTABLE release. It includes many experimental features, as well as many AI generated fixes. We recommend using rc.152 for production use.
jfjoch_process: Major rotation (rot3d) data processing overhaul - robust profile-fit integration, Cauchy-loss scaling with optional absorption surface, de-novo indexing and space-group/centering determination fixes, and merging statistics + ISa in the mmCIF output.
jfjoch_process: Bragg integration now runs on the GPU in the offline/non-FPGA workflow (one box-sum + profile-fit engine, GPU when available, CPU otherwise); the FPGA workflow integrates on the CPU directly from the assembled image. The previous standalone integrators are removed.
jfjoch_process: Deterministic Bragg prediction - when more reflections are predicted than fit the output, they are ranked by distance to the Ewald sphere before truncation, so repeated runs produce identical reflections.
jfjoch_process: Judge systematic absences by resolution-normalised intensity instead of I/sigma alone, so screw axes are no longer missed when the error model under-estimates sigma on weak axial reflections (e.g. the monoclinic 2_1 screw).
jfjoch_process: GPU-accelerated rotation scaling and merging (RotationScaleMerge), substantially faster than the previous CPU path.
jfjoch_process: Unify still and rotation processing on a single --force-still flag (replaces the -P partiality-model option); rotation is auto-detected from the goniometer and processed as rot3d two-pass by default, the default reflection output is mmCIF, and the experimental --reciprocal-profile option is removed.
jfjoch_process: Add EXPERIMENTAL ice-ring detection (--detect-ice-rings) that excludes ice reflections from scaling.
jfjoch_broker: The Bragg integration model (profile-fit Gaussian, empirical, or box-sum) is now selectable via the REST API (/config/bragg_integration) and the web frontend.
jfjoch_broker: Write smargon chi/phi goniometer positions to NXmx; read sensor thickness/material from HDF5 metadata.
jfjoch_writer: Don't write empty grid-scan position arrays when the dataset has no images.
Compression: Add BSHUF_ZSTD_RLE_HUFF, make compression size-aware (drop frames that don't fit rather than aborting), and add the jfjoch_recompress tool.
jfjoch_viewer: Report "Multiple lattices detected" and grey out "Analyze dataset" on a live connection.
CI: Ship jfjoch_viewer to the release as a Linux-agnostic .tgz.
This is an UNSTABLE release. It includes many experimental features, as well as many AI generated fixes. We recommend using rc.152 for production use.
* jfjoch_process: Major rotation (rot3d) data processing overhaul - robust profile-fit integration, Cauchy-loss scaling with optional absorption surface, de-novo indexing and space-group/centering determination fixes, and merging statistics + ISa in the mmCIF output.
* jfjoch_process: Bragg integration now runs on the GPU in the offline/non-FPGA workflow (one box-sum + profile-fit engine, GPU when available, CPU otherwise); the FPGA workflow integrates on the CPU directly from the assembled image. The previous standalone integrators are removed.
* jfjoch_process: Deterministic Bragg prediction - when more reflections are predicted than fit the output, they are ranked by distance to the Ewald sphere before truncation, so repeated runs produce identical reflections.
* jfjoch_process: Judge systematic absences by resolution-normalised intensity instead of I/sigma alone, so screw axes are no longer missed when the error model under-estimates sigma on weak axial reflections (e.g. the monoclinic 2_1 screw).
* jfjoch_process: GPU-accelerated rotation scaling and merging (RotationScaleMerge), substantially faster than the previous CPU path.
* jfjoch_process: Unify still and rotation processing on a single --force-still flag (replaces the -P partiality-model option); rotation is auto-detected from the goniometer and processed as rot3d two-pass by default, the default reflection output is mmCIF, and the experimental --reciprocal-profile option is removed.
* jfjoch_process: Add EXPERIMENTAL ice-ring detection (--detect-ice-rings) that excludes ice reflections from scaling.
* jfjoch_broker: The Bragg integration model (profile-fit Gaussian, empirical, or box-sum) is now selectable via the REST API (/config/bragg_integration) and the web frontend.
* jfjoch_broker: Write smargon chi/phi goniometer positions to NXmx; read sensor thickness/material from HDF5 metadata.
* jfjoch_writer: Don't write empty grid-scan position arrays when the dataset has no images.
* Compression: Add BSHUF_ZSTD_RLE_HUFF, make compression size-aware (drop frames that don't fit rather than aborting), and add the jfjoch_recompress tool.
* jfjoch_viewer: Report "Multiple lattices detected" and grey out "Analyze dataset" on a live connection.
* jfjoch_viewer: Frontend fixes - detector settings widget, panel/preview overflow, and navigation icons.
* CI: Build Windows (CUDA and non-CUDA) installers.
* CI: Ship jfjoch_viewer to the release as a Linux-agnostic .tgz.
Dataset re-processing reads a stored HDF5 file, so it is unavailable for
the live HTTP stream. Disable the "Analyze dataset" hero button while a
source is connected (with an explanatory tooltip) instead of letting the
user click through to a "open a file first" dialog afterwards.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Space-group search (image_analysis/scale_merge/SearchSpaceGroup):
- Two-stage POINTLESS-style determination. Stage A scores each distinct rotation
operator once (was once per candidate space group, ~34x faster on lysozyme:
~26s -> <1s) and picks the largest point group all of whose operators confirm.
Stage B picks the maximal space group whose predicted absences are confirmed
weak, fixing the prototype's default to the symmorphic group (it returned P422
instead of P4(3)2(1)2). Enantiomorphic / origin-ambiguous pairs (P4(1) vs P4(3),
I222 vs I2(1)2(1)2(1)) are reported as indistinguishable.
- Constrain candidates to subgroups of the lattice (metric) holohedry and weigh
centering only P-vs-metric, fed from rotation indexing's LatticeSearch result.
Integration / pipeline:
- With no user-fixed space group, predict in P (IndexAndRefine) so the
centering-absent reflections are integrated and the search can confirm/deny
centering (catching pseudo-centering / a missed superstructure) instead of
trusting the metric; a user-fixed group still rejects absences in integration.
- JFJochProcess: scale+merge in P1 -> determine the space group -> set it and
re-scale+merge in it (statistics then come out in the right symmetry) -> write
it to /entry/sample/space_group_number (new EndMessage.space_group_number,
preferred by NXmx::Sample). jfjoch_scale no longer searches; it consumes the
file's space group (and no longer clobbers it with an empty -S).
Twinning (new image_analysis/scale_merge/TwinningAnalysis): Padilla-Yeates L-test
(<|L|>, <L^2>; acentric-only, positive intensities so L is bounded) plus a
shell-normalised <I^2>/<I>^2 second moment and a twin-fraction estimate. Reported
after the final merge in jfjoch_process and jfjoch_scale, and surfaced in the
jfjoch_viewer merge-statistics window with a red outline when twinning is suspected.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A serial-crystallography run on a detector with a large converted geometry
(JF17T16, modules tiled vertically + horizontally) aborted with
"Array out of bounds (Not enough memory to save image)". An indexed still on
such a detector predicts/integrates close to the kMaxReflections (10000) cap;
at ~170 B per serialized Reflection that is ~1.7 MB of per-image CBOR metadata,
which overflowed the fixed 1 MB the buffer slot reserved on top of the image.
The serialization guard then threw and cancelled the whole run.
- Raise the per-image metadata headroom from 1 MB to 4 MB
(GetImageBufferLocationSize). The worst case - 10000 reflections + 2000 spots
(API max) + 65534 azimuthal bins - serializes to 2.78 MB, leaving margin while
staying negligible next to the multi-MB image slot.
- When metadata still does not fit, drop just that frame (log metadata/image/slot
sizes + recycle the slot) instead of aborting, in both the FPGA and Lite
receivers.
- Add a regression test asserting the worst-case metadata fits the headroom.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
JFJochBitShuffleCompressor::Compress now takes a dest_size and returns a
negative value when the compressed output would not fit, instead of writing
past the destination buffer. The check is lazy: before each block it verifies
the remaining space still covers that block's worst case (mirrored by the new
MaxCompressedBlockSize helper, consistent with MaxCompressedSize so a dest
sized to MaxCompressedSize never fails). On overflow the dest content is
undefined - no rescue.
The receiver uses this to compress directly into the writer buffer slot and
drop just the oversized frame instead of pre-reserving the full worst-case
image size next to the per-image CBOR metadata.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
On sparse lyso frames the larger block improves compression ratio across all
bshuf algorithms (16-bit data): ZSTD 8.58 -> 9.30, LZ4 7.38 -> 7.58, RLE
6.82 -> 6.90. 16384 captures most of the gain available from even larger
blocks (ZSTD tops out ~9.55 at 65536) while staying close to the cache sweet
spot: the cheap codecs (LZ4, RLE) peak in throughput once a block's working
set fits L1d (~4096 elem here), so very large blocks trade real throughput for
diminishing ratio - and that penalty is worse on the Xeon Gold/Platinum
production hosts (smaller private L2, shared-L3 contention under many parallel
compression threads).
The block size is stored per-dataset in the bitshuffle HDF5 filter params, so
existing readers (XDS/Neggia/Durin/CrystFEL) stay compatible.
Move the per-block bitshuffle scratch off the inline member array onto a
lazily-sized heap vector, like tmp_space, so the block size no longer bloats
every stack-allocated compressor (incl. the transient ones in
CBORStream2Serializer).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the fixed-element DefaultBlockSize with a byte target divided by
elem_size to get the block element count, so the per-block working set (and
thus cache behaviour) stays constant across pixel bit depths instead of halving
from 8- to 16- to 32-bit. The target is per-algorithm, following the measured
sweet spots on sparse data: LZ4 wants a small, cache-resident block for
throughput (16 kB), ZSTD/RLE want a large block for ratio (128 kB). The gap is
widest on extreme-sparsity inputs such as the uint32 pixel_mask, where
large-block ZSTD reaches 100-1800x vs ~160x for LZ4.
The block size is read back per-dataset from the bitshuffle stream header
(block_size = header_bytes / elem_size) and the HDF5 filter params, so the
decompressor and external readers (XDS/Neggia/Durin/CrystFEL) need no change.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Compress() and FrameTransformation::CompressImage() returned int64_t with a
negative value meaning "did not fit". That is a footgun: the negative result
silently converts to a huge size_t if a caller forgets to check it. Return
size_t and instead throw a named CompressionBufferTooSmallException (deriving
from JFJochException, Compression category) when the output would not fit the
destination buffer.
The receiver catches it explicitly and drops just that frame, as before; the
offline/GetCompressedImage path uses a worst-case buffer so it never throws.
Add a test that a too-small destination throws and a worst-case buffer does not.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New CompressionAlgorithm that emits a standard Zstandard frame: zero/0xFF runs
become RLE_Blocks (like BSHUF_ZSTD_RLE) and literal regions become
Compressed_Blocks with per-block adaptive Huffman literals and no sequences
(Number_of_Sequences=0). Short runs are absorbed into the literal stream;
incompressible literals fall back to Raw_Blocks so the worst case stays within
ZSTD_compressBound.
The Huffman tree + bitstream are produced by zstd's own HUF_compress{1,4}X_repeat
(the same calls ZSTD_compressLiterals uses); only the frame/block/literals-section
framing is hand-written, with comments citing zstd_compression_format.md so it can
be checked clause by clause. Output decodes with stock ZSTD_decompress, so no
reader changes are needed (decode routes like BSHUF_ZSTD).
On sparse diffraction this gives ~12% smaller files than bitshuffle/LZ4 at about
the same end-to-end speed, sitting between LZ4 and full ZSTD; for maximum ratio
use BSHUF_ZSTD. Robust on any input: tests round-trip pure zeros, Poisson(10),
Mersenne-Twister noise (checked against the size bound), an extreme-sparsity mask,
and a real lyso image through stock ZSTD_decompress.
API: exposed as "bszstd_rlehuf"; regenerate the Python/TS clients (update_version.sh)
to surface the new value there.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Builds a single Compressed_Block (Huffman-coded Literals_Section, empty
Sequences_Section) and checks: the block type is Compressed, its trailing
Number_of_Sequences byte is 0, and stock ZSTD_decompress reconstructs the
literals exactly. This is the format guarantee from zstd_compression_format.md
("if Number_of_Sequences == 0 ... Block's decompressed content is defined solely
by the Literals Section content"), locked into the test suite.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
update_version.sh regenerated the TypeScript client (types.gen.ts, zod.gen.ts)
and Redoc docs to include the new "bszstd_rlehuf" compression enum value added
to jfjoch_api.yaml; the package-lock version field follows VERSION. The C++
server model treats compression as a plain string, so it needed no change.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New offline tool that re-compresses /entry/data/data of a _data_NNNNNN.h5 file
from bitshuffle/LZ4 to the standard bitshuffle/Zstd HDF5 filter. Every other
object (groups, datasets, attributes, the dataset's own attributes,
dims/dtype/chunking/block size) is reproduced unchanged.
It writes a fresh file - only /entry/data/data is re-encoded, every other object
is H5Ocopy'd verbatim - which then atomically replaces the original via rename().
This needs no h5repack (the new file has no leftover space) and is crash-safe
(the original is opened read-only until the rename). Frames are streamed one at a
time through the registered bitshuffle filter (decompress LZ4, compress Zstd), so
it is dtype-agnostic and never holds the whole dataset in memory.
Output is read by the standard bitshuffle+zstd HDF5 plugin (verified against the
hdf5plugin/DIALS libh5bshuf.so, which links libzstd and supports the zstd mode).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The rot3d combine over-extrapolates fulls reconstructed from only a fraction f
of their rocking curve (min_partiality admits f as low as 0.02). Against XDS on
HEWL these low-capture fulls are systematically biased HIGH (+15% at f=0.8 to
+100% at f=0.3), and the bias - not random scatter - is the strong-reflection
floor that hurts anomalous accuracy.
--capture-uncertainty <coeff> (default 0 = off, baseline bit-identical) adds a
systematic uncertainty ~coeff*(1-f)*I to each full's sigma, so the merge
down-weights the over-extrapolated fulls and the error model treats their
scatter as expected. Unlike outlier rejection (which trades accuracy for CC1/2),
this fixes a real bias, so accuracy improves: at coeff=1.0 the anomalous peak
height vs XDS rises CL_CL +16%, SD_MET/SG_CYS +5-6%, ISa 10.7->11.0. Rotation-
only (no-op for stills, which never combine).
Also expose the per-image scale offline: Combine3D now carries the first-pass
per-image scale metadata (G, B, mosaicity, wedge, CC) forward instead of
dropping it, and jfjoch_process -M writes <prefix>_image.dat from it (the
offline self-scaling result was otherwise unobservable - process.h5's per-image
arrays are only filled on the online path). This enabled the XDS DECAY
comparison (jfjoch G tracks XDS, r=0.93).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The rotation per-image mosaicity was ~3x too small (0.045 vs the true 0.13deg),
which crippled the partiality model and capped per-observation precision: it
predicted reflections on too few frames and over-peaked the rocking partiality,
so the rot3d-combined fulls were ~1.7x noisier than XDS's, the integration
bottleneck behind the jungfraujoch-vs-XDS ISa gap.
Two root causes, both fixed:
- CalcMosaicityXDS (Kabsch-2010 MLE) searched each spot's exact-Bragg phi only
within +-wedge (the 0.2deg oscillation). Reflections recorded at larger rocking
offset - the tail that defines the mosaic width - fell outside and were dropped,
truncating the tau distribution so the MLE underestimated ~2x. Widen the search
window to wedge+0.8deg; the MLE then converges to the true 0.13deg (and is
insensitive to widening further, since it weights by the recorded fraction).
- ScaleOnTheFly then re-refined the mosaicity from the intensity residual, which
is degenerate with the per-image scale G and collapses it toward its floor.
Trust the (now correct) indexing mosaicity and keep it fixed during scaling.
With the correct mosaicity, --capture-uncertainty (which down-weights the
over-extrapolated under-captured fulls) now pays off strongly, so default it ON
(1.0) for the rot3d combine; it stays off for non-rot3d. Together on the HEWL
rotation crystal: ISa 10.7 -> 19.1, and anomalous peak height vs XDS goes from
52% to ~78% (CL_CL 1.92x -> 1.29x). This reaches XDS's own published-correction
ceiling (DECAY+ABSORP+MODPIX ~= 19.6); the remaining gap to its quoted ISa 28 is
the I->inf extrapolation. No effect on the stills path (rotation-only code).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The profile radius (intrinsic excitation-error width = mosaicity + divergence)
was the plain RMS of dist_ewald over indexed spots. With a finite energy
bandwidth that spread is broadened by the bandwidth's radial smear
sigma_bw = bandwidth_sigma*lambda/(2 d^2), which prediction then re-applies per
reflection - so bandwidth was counted twice and the radius was inflated (most at
high resolution, sigma_bw ~ 1/d^2). Subtract the bandwidth variance from the
measured spread so the radius is the intrinsic width. bandwidth = 0
(monochromatic / rotation) is unchanged. Small for narrow bandwidths (~6% of the
variance, ~4% radius on the 1% jet); matters for wide-bandwidth / pink beam.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Document the rot3d path that was missing: a new section on combining a
reflection's per-frame partials into one full (de-biased weighted combine,
captured fraction, capture-aware systematic uncertainty, XDS-order full
re-scaling) so the merge sees counting statistics instead of rocking-curve
slicing. Recast the profile-radius and mosaicity sections as what the system
does - profile radius as the intrinsic (bandwidth-deconvolved) width, mosaicity
by ML with a search window wide enough to capture the rocking tail and held
fixed during scaling - rather than the optimisation narrative.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The default 2D Bragg integrator is ProfileIntegrate2D (Kabsch profile fit with a
per-resolution-shell Gaussian profile and de-biased variance), with box summation
as the seed/fallback (--integrator boxsum|gaussian|empirical). Section 9 and the
section 13 note both still claimed integration was summation-only with no profile
fitting; rewrite them to describe the profile-fit default.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Weak high-resolution still reflections were systematically over-subtracted: a
bandwidth-streaked high-res spot (or a neighbour) leaks into the r2-r3 background
annulus and biases its mean high, so the subtracted background is too large and
the merged high-resolution intensities go negative (seen as reproducibly negative
<I/sig> at 100% completeness and high multiplicity past ~1.9 A).
Add one high-outlier sigma-clip pass to the box-sum background (reject ring pixels
above mean + 3*sqrt(mean), recompute) so the contamination no longer inflates it.
A clean Poisson background is essentially unchanged (~0.1% exceed the cut). On the
HEWL serial-stills jet this de-biases the high-res band - <I/sig> 2.03 A 0.9 -> 1.6,
1.79 A -0.1 -> +0.7 - extends the usable resolution ~2.2 -> ~2.0 A and improves
overall R-meas 130 -> 124%, with CC1/2 and CC-vs-reference neutral. The rotation
crystal is unchanged (ISa 19.1), its clean backgrounds being barely clipped.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
With a finite energy bandwidth each reflection is smeared RADIALLY by
sigma_bw = bandwidth_sigma * R_px (R_px = distance from the beam centre, so large
at high resolution): high-resolution spots become radial streaks. The isotropic
per-shell Gaussian both mis-weights them and clips the streak tail on the fixed
profile grid, losing intensity (biased low, noisy).
When a bandwidth is set, fit each reflection with a per-reflection Gaussian
elongated only along its radial direction - sigma^2_radial = sigma^2_intrinsic +
sigma_bw^2, sigma^2_tangential = sigma^2_intrinsic - on a grid grown to hold the
streak. Unlike an isotropic widening this adds no tangential background. It only
engages where the smear exceeds the intrinsic spot (high resolution); low/mid
resolution and monochromatic data (bandwidth 0, e.g. rotation) are untouched.
On the HEWL serial-stills jet (with the background sigma-clip) this lifts the
overall CC-vs-reference 52 -> 55% and the high-resolution I/sig (1.7 A 0.5 -> 1.4),
recovering the 2.0-2.5 A band, with CC1/2 preserved (the per-shot noise the wider
region adds is averaged out by the high serial multiplicity). Rotation ISa 19.1
unchanged.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
RefineErrorModel fits the variance model a*sigma^2 + (b*<I>)^2 to the binned
*median* of the squared symmetry-mate deviations, chosen for robustness. But a
single deviation squared over its variance is chi-square(1)-distributed, whose
median is only 0.4549 of its mean, so the fit was calibrating the variances to
0.4549x their true value: the merged sigmas came out ~1.48x too small and the
achieved reduced chi^2 was 1/0.4549 = 2.2, not 1. The error model was internally
well-behaved (flat chi^2 across resolution) but globally over-confident, which
inflated ISa (=1/b) by ~1.48x and made the exported sigmas too optimistic for
downstream weighting / French-Wilson.
Divide the binned median by the chi-square(1) median (0.4549) to recover an
unbiased estimate of the mean E[dev^2]=sigma^2, keeping the robustness of the
median while targeting reduced chi^2 = 1. Also compute the achieved median
reduced chi^2 (same normalization) and report it on the "Error model" line so
mis-calibration can no longer drift silently.
Verified: HEWL rotation a 0.588->1.292, b 0.052->0.077, ISa 19.1->12.9, chi^2
2.17->1.06; serial Jet8 ISa 1.0->0.7, chi^2 0.92. Relative ISa comparisons and
all CC1/2/CCref/anomalous metrics are unchanged (sigma-independent or a common
constant); only the absolute sigma calibration is corrected.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fixes wrong space-group assignment on cubic insulin (true I2_13, a=77.6).
SearchSpaceGroup: drop the lattice-centering gate (and the now-unused
lattice_centering option). The indexer often returns the conventional cell,
whose geometry hides I/F/C centering - that information lives only in the
systematic absences. Stage B now tests every centering of the point group and
confirms it from the data, so an indexer-reported 'P' no longer excludes
I2_13/I23. (I23 and I2_13 remain indistinguishable by absences and are reported
as such.)
jfjoch_process: discard any unit cell stored in the input HDF5 by default so the
cell is re-determined from scratch. A stale/wrong stored cell otherwise resolves
the indexing algorithm to FFBIDX, which trusts that cell and locks onto the wrong
lattice (Ins_I_3 went from 2.7% -> 76% indexing). A user-supplied -C cell still
applies.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ScaleOnTheFly fits each frame's scale G independently with no neighbour
coupling, so the few partials of one rocking event are weight-summed in
combine3D on inconsistent scales - jitter that never enters the full's
counting sigma and instead surfaces as scatter between symmetry mates,
inflating the error-model b (low ISa). A centered moving average of
log(G) over a small frame window (default 9, on for rot3d) removes it,
mirroring XDS's smooth scaling. Complementary to --scale-fulls (which
rescales between fulls, after the combine): smoothing fixes within-event
scale, scale-fulls fixes between-full.
On the rotation lysozyme set (1.4A, merged, with --scale-fulls): ISa
11.7 -> 15.0, R_meas 10.0% -> 8.3%, CCref stable, chi2 ~0.97 (honest).
Anomalous (full-res): ANODE S-peak 0.61x -> 0.80x of XDS.
--smooth-g[=window] tunes/disables it (=0 off); --mosaicity <deg> is a
diagnostic that fixes the scaling mosaicity for sweeps.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The 76e88b0f sigma-clip (reject ring pixels above mean+3*sqrt(mean))
de-biases bandwidth-streaked high-resolution stills, but it ran on all
data. On rotation (no streaks) it clips legitimate high background pixels
and biases the mean low, slightly inflating weak intensities and hurting
the anomalous signal. Gate it to bandwidth>0 (stills), matching how the
814dff34 radial-profile change is already gated.
Rotation lysozyme (self-scaled, smooth-g, -A): anomalous S-peak 0.84x ->
0.86x of XDS (SD_MET 11.71 -> 11.94, CL_CL 1.28x -> 1.25x), ISa unchanged.
Stills (bandwidth set) are byte-identical (clip still applies).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ProfileIntegrate2D::BoxSum now excludes every predicted reflection's r2 disk
from the r2..r3 background annulus (mirroring BraggIntegrate2D), so a neighbour
Bragg peak can no longer bias a reflection's background high and over-subtract.
With neighbours excluded the annulus can safely widen, so the default r_3 goes
8 -> 10 (more background pixels, lower-variance estimate).
Measured (rotation lyso @1.0 A, external CCref/CCxds vs XDS): 1.05 A CCref +1.3
/ CCxds +1.3, R-meas -5 pts, low-res R-meas unchanged. Serial (Jet8 @0.0002
bandwidth): 1.68 A CC1/2 +3.9 / CCref +1.1, 1.58 A +4.0 / +1.6.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Each partial subtracted its own independently-estimated per-frame background,
so a weak full assembled from N frames accumulated one background-estimation
variance per frame. The true background is flat over the few frames of one
rocking event, so replace each partial's background by the event mean and
correct its intensity by n_bkg*(bkg - pooled), where n_bkg = (sigma^2 - I)/bkg
is the effective background-pixel count -- correct for weak AND strong
reflections (sigma^2/bkg would over-count strong ones and over-correct them).
Single-frame events are a no-op.
Measured (rotation lyso @1.0 A): 1.05 A CC1/2 81.3 -> 83.0, R-meas 81.4 ->
77.2, CCref +0.2, CCxds +0.3; overall R-meas 10.0 -> 9.4%; ISa preserved (13.5).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
S.S0 = |S||S0| cos(2theta), so dividing the rocking rate |m2 . (S x S0)| by it
added a spurious 1/cos(2theta) factor (1.0 at low res, 1.8x at 1 A) to the
reciprocal Lorentz, hence to the absolute/Wilson scale. Divide by |S||S0| to get the correct zeta*sin(2theta) (times a constant 1/lambda^2 the overall scale
absorbs). The spurious factor depends on 2theta only, so it cancels within a
resolution shell and between symmetry mates -- CC1/2, CCref and R-meas are
metric-neutral -- but it corrupts the absolute scale and destabilises the
per-image B-factor and cross-resolution error model. CPU and GPU kernels.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
HDF5MetadataSource ignored /entry/instrument/detector/sensor_thickness and
sensor_material and left the DetectorSetup default (320 um, Si). Read them when
present (NXmx stores thickness in metres), so reprocessing honours the recorded
sensor -- which the parallax/absorption model needs. (The acquisition path still
records the 320 um default because DetDECTRIS sets no per-model thickness; that
is a separate, acquisition-side fix.)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
JFJochBrokerParser set SensorThickness_um / SensorMaterial unconditionally from
the request's Detector model, but that model defaults them to 320 um / Si with
IsSet=false. So any start request that didn't explicitly carry the sensor
overwrote the detector-reported value (DECTRIS SIMPLON read) or the
detector-specific default with 320 um -- the "PILATUS4 ends up 320 um" symptom.
Guard both with the IsSet flag, mirroring highVoltageV just above. The
receiver -> FillMessage -> CBOR -> writer chain was already correct; the value
was simply wrong at the source.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The profile-fit width came from the full 13x13 second moment, which runs 3-8x
wider than the true spot: neighbour reflections leak into the (unmasked) learning
grid -- catastrophic at low res where spots crowd the beam -- and the far corners
(lever arm dx^2+dy^2 up to 72) add rectified background noise. Splitting the spot
moment into radial vs tangential shows the tangential width is isotropic
(mosaicity/divergence) while the radial excess is pure sensor parallax ~tan^2(2th).
- Measure the width over the signal disk r1 on the monochromatic path (inherently
excludes neighbours, caps the radial tail); keep the generous full-grid width on
the broadband/stills path (sparse spots, the centroid-undersampling floor).
- Extend the (was bandwidth-only) radial ellipse with an analytic, material-aware
parallax term c_par*tan^2(2theta), c_par = Var(z)/pixel^2 from sensor thickness +
material + energy (parallax_var_px2; Si and CdTe), plus a fixed weak-spot capture
term on the monochromatic path only.
HEWL rotation @1.0A: ISa 13.5->15.7, CC1/2 1.12A 91.3->95.9 / 1.05A 83.0->85.2,
external CCref band 88.1->89.9, CCxds 93.4->94.8, R-meas 9.4->8.7; low/mid flat.
Sharp serial stills gain slightly from parallax; broadband stills neutral.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The profile was learned and applied on the integer pixel round(predicted), so a
shared profile sits up to 0.5 px off the true spot (and stacking spots with random
sub-pixel offsets broadens the learned profile). Build the Gaussian per reflection
instead, centred on the predicted sub-pixel offset -- noise-free geometry, unlike the
observed centroid, which hurt -- and elongated radially as before.
HEWL rotation @1.0A: ISa 15.7->16.2, CCref band 89.9->90.0, CCxds 94.8->95.0 (high-res
1.00A CCref 66.0->66.9); sharp serial stills 1.68A CC1/2 61.6->62.5; anomalous S peak
0.92x XDS (no accuracy traded). De-broadening the learned width by the 1/12 px^2/axis
integer-binning floor was tested and rejected (it over-narrows).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The per-shell profile width is learned in pixels, so it varies ~4x with resolution
(mostly the geometric projection of a near-constant reciprocal-space relrod) and must
be binned per shell -> it starves at high resolution / on sparse data. The new
--reciprocal-profile flag instead learns ONE global width in reciprocal space,
sigma2_q,tan = A + B|q| + C|q|^2: the Jacobian g_tan=cos(2theta) removes the geometric
projection, and C|q|^2 is the crystal mosaicity relrod (variance ~(eta|q|)^2). Applied
per reflection as sigma2_tan,px = (A + B|q| + C|q|^2)/g_tan^2 (B,C clamped >=0;
quadratic->linear->constant fallback).
Off by default. On the sharp HEWL test crystal (mosaicity 0.091deg, so C fits to ~0 and
it reduces to the validated linear form) it is metric-neutral: ISa 16.2->16.3, anomalous
0.92x unchanged, CCref band 90.0->89.9, CC1/2 a touch lower (per-shell isn't starved at
23k spots/shell, and a global fit is less flexible). So: simpler + more transferable at a
small CC1/2 cost, ISa/anomalous held. Its payoff is on MOSAIC crystals (large C|q|^2),
where per-shell starves on the wide weak high-res spots and 6 shells are too coarse; both
lyso test crystals are sharp, so it ships as a dial to try on mosaic data elsewhere.
A separate radial relrod fit was tried and dropped (no gain). See NEXTGEN_INTEGRATOR.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
When a space group is supplied without a reference cell, de-novo two-pass
rotation indexing fed the FFT's Niggli-reduced primitive cell straight into
XtalOptimizer as if it were the conventional cell. For non-primitive lattices
(centered I/F/R/C, or hexagonal where the primitive pair sits at gamma=60) the
conventional-system model then refined to a wrong minimum and indexed 0% of
frames: cytC (P3121) gave 103.9/103.9/78 instead of 83.7/83.7/88.6, insulin
(I213) 66.7 instead of 77.65, insulin-R3 51/51/36 instead of 81.4/81.4/33.3.
Run LatticeSearch on the FFT primitive cell (it already yields the correct
conventional cell + reindex for I/F/R/C). For the one remaining gap - a
metrically hexagonal lattice that the geometry-keyed search lands on the
ortho-hexagonal C setting - re-express the reduced primitive cell in
conventional hexagonal axes (b -> b - a opens gamma 60 -> 120).
De-novo "-S" now indexes cytC/insu/Ins_H/lyso/MyoB/EP/lyso_ref at 100% with the
correct cell; the "-C -S" path is unchanged. The helper stays in this .cpp
(g++) rather than CrystalLattice.h to avoid recompiling CUDA units, which is
broken under the box's CUDA-13 nvcc; promote it to a method once that is fixed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two offline-processing ergonomics changes.
scale-fulls is now ON by default for -P rot3d (it refits the per-frame scale on
the combined fulls and lifts ISa substantially, e.g. HEWL rot3d 7.0 -> 16.4).
--scale-fulls stays as the explicit opt-in for non-rot3d order; new
--no-scale-fulls disables it for rot3d. (scale_fulls is now an optional<bool>
defaulting to combine_3d.) Note: on low-completeness data the Unity-reference
refit can cost a little CC1/2 (endothiapepsin ~70% complete: -5% in a mid shell);
pair with --reject-outliers for the full low-symmetry benefit.
When merging (-M), the merged reflections (.mtz/.cif) are the wanted output, so
the large per-image _process.h5 is no longer written by default - it routinely
ran to hundreds of MB. Pass --write-process-h5 to also emit it. Without merging
the _process.h5 is the only output and is always written. Implemented with a
ProcessConfig.write_process_h5 flag gating the FileWriter; reflection and
image.dat writing are unaffected.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds an opt-in smooth absorption correction for rotation scaling. After the
rot3d fulls are scaled, --absorption[=num] fits a multiplicative surface
A(s1_crystal) - a degree<=4 monomial basis (real spherical harmonics up to l=4,
as XDS/DIALS) of the diffracted-beam direction in the crystal/goniometer frame,
by ridge-regularized log-linear least-squares of I_scaled/I_ref weighted by
(I/sigma)^2, over num iterations (default 3); the surface divides image_scale_corr
and the fulls are re-merged.
Off by default and a no-op without rot3d. On the test panel (~13 keV, thin
crystals) it is metric-neutral - fitted rms(log A) ~3-4%, ISa/CC1/2 unchanged -
because absorption is negligible there and the per-frame scale G(phi) already
absorbs the angular part. It is kept as a lever for low-energy data (e.g. 6 keV)
where absorption becomes significant. Stored as ScalingSettings::absorption_iter.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The per-image G/B/mosaicity fit used a plain L2 loss, so a few outlier
reflections occasionally dragged a frame into a bad optimum - a stochastic
(~15% of runs) per-frame mis-scaling that elevated R-meas and collapsed CC1/2
at low symmetry (the image-level CC1/2 half-split makes the damage look patchy
across shells, while the data is genuinely noisier). A Cauchy loss (3 sigma)
soft-downweights those outliers without a hard cut: MyoB 0/10 catastrophic
(was ~2/10), R-meas stable, and ISa improves on every test crystal
(EP0210 9.2->12.4, MyoB 12.5->14.6, lysoC 10.4->11.2, cytC 11.5->12.6),
most on low symmetry.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
--smooth-g now takes a rotation range in degrees (default 5) instead of a frame
count, converted to an odd frame window from the oscillation step inside
JFJochProcess. This makes the per-frame scale-G smoothing physical and
independent of the frame slicing. (Note: G-smoothing is not the cure for the
low-symmetry stochastic collapse - the per-image scale Cauchy loss is - but a
degree-based window is the correct parameterization regardless.)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fit the reciprocal tangential-width model y(q)=a0+a1*t+a2*t^2 in a centered,
standardized variable t=(q-qbar)/qscale instead of the raw {1,q,q^2} monomials:
the raw normal matrix went near-singular when the strong spots span a narrow
q-range (small cell / sparse still), letting tiny per-frame jitter swing the
curvature into a wild over-wide profile. Adds IRLS (Huber) robustness, a ridge
on the curvature (sharp-crystal prior), and clamps the applied width to the
fitted q-range (no extrapolation). Stays strictly per-frame (no dataset pooling),
so it works online and for stills. Neutral on rotation data (cytC high-res CC1/2
win preserved 66.8 vs 65.6%).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per-operator CC alone cannot separate a real weak symmetry operator from a false
moderate one: a noisy crystal's genuine cubic 2-folds (InsI2 I23, CC ~0.51) score
BELOW a pseudo-symmetric crystal's false 2-fold (InsH3, CC 0.64), so no CC threshold
works. Add a self-consistency test: a candidate point group is accepted only if
merging the intensities under it does not inflate R-meas past max_rmeas_ratio (1.5x)
the most-consistent candidate. A false operator forces non-equivalent reflections
together and blows R-meas up; a real one leaves it flat.
Fixes InsH3: was over-called R32 (ISa collapsed to 2.2 from the forced merge), now
correctly R3/H3 (ISa 10.2, matching XDS). InsI2 stays I23, and lysoC/InsI3/MyoB/
EP0210/cytC/lyso_ref are unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jfjoch_process gains --detect-ice-rings, which (a) activates the existing spot-finder
ice flagging (ice spots de-prioritised in indexing) and (b) drops reflections sitting
on a hexagonal-ice powder ring from scaling/combine/merge/stats, via a new shared
IsOnIceRing() helper over ICE_RING_RES_A using the spot-finder's q half-width. Their
integrated intensity is contaminated by the strong, variable ice background, so leaving
them in mis-scales the whole frame and inflates the error model.
On EP0117 (ice-ring crystal): de-novo space-group determination recovers from P1 to P2
and CC1/2 improves (31->37%). Off by default; a no-op without the flag. This is the
first, non-controversial step - the residual gap needs ice-aware background/integration
(follow-up).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The hexagonal-ice ring FWHM measured on JUNGFRAU data (azimuthal radial background
profile) is ~0.06 q, so the exclusion half-width should be ~0.03; the previous 0.02
under-covered the strong low-res rings. On EP0117 with --detect-ice-rings this lifts
CC1/2 37->50%, and combined with --reject-outliers 3 (which down-weights the
radiation-damaged late frames) reaches ~94% (XDS 98.5%). Only active with
--detect-ice-rings, so default behaviour is unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
--min-image-cc <num> is documented in percent and the per-image CC is a fraction in
[0,1], but the value was passed straight to MinCCForImage (which requires [0,1]), so
the documented usage (e.g. --min-image-cc 30) threw an uncaught JFJochException and
aborted. Divide the percent argument by 100.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Records the validated z-score detector (on the azint radial background profile) and
position/width estimation, for a future --detect-ice-rings=auto. Not yet implemented.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The hexagonal lattice metric (two equal axes at 60/120 deg, both perpendicular to the
third) is also satisfied by its ortho-hexagonal C-centred supercell, so the
geometry-keyed LatticeSearch lands a metrically-hexagonal lattice on the C-orthorhombic
setting. The -S path already re-expresses it (HexagonalConventional) keyed on the
supplied space group; do the same de-novo, keyed on the metric of the reduced PRIMITIVE
cell (rhombohedral lattices have a rhombohedral primitive cell and are unaffected).
cytC (P3121) now indexes de-novo as hexagonal 83.8/83.8/88.6 gamma=120 and merges in
P3121 (CC1/2 99.7%, ISa 13.1), for both test crystals, instead of the orthohexagonal
C2/P1. lysoC/InsI3/InsH3 unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The R-meas-ratio over-symmetry guard (commit 7ade6d9) missed a near-perfect
pseudo-2-fold: InsH2 (XDS R3) has a false 2-fold at operator CC 0.85 whose merged
intensities still correlate well, so R-meas inflated only ~1.3x and R32 was accepted
(ISa collapsing to 3.9). The reduced chi^2 (within-orbit scatter / sigma^2) is the
right signal - false equivalents disagree by many sigma even when they correlate well.
Measured chi^2 ratio (candidate / best subgroup) across the test set: true point groups
<= 1.63 (a cubic merge of a noisy crystal, InsI3, is the worst real case), false >= 2.26
(InsH2) and 6.0 (InsH3); threshold 2.0 separates them. InsH2 now resolves to R3 (ISa 6.1
from the false R32's 3.9); InsH3/InsI2/InsI3/cytC/lysoC/MyoB/EP0210 unchanged and correct.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per-observation merge outlier rejection at 6 sigma is neutral-or-better across every test
crystal (CC1/2 up or flat: lysoC 99.7->99.9, MyoB 98.9->99.6, EP0210 97.6->98.8,
cytC 99.7->100.0, InsI2 99.2->99.7; R-meas slightly lower everywhere; ISa unchanged), and
it is the lever for radiation-damaged / ice data (EP0117 reaches CC1/2 ~95% with it). Make
it the rot3d default, like scale-fulls and smooth-g; --reject-outliers 0 disables it. Off
for the non-rot3d partiality models, which were not benchmarked.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The .cif (--scaling-output cif) now carries the per-shell and overall merging
statistics in the standard _reflns / _reflns_shell categories (resolution, redundancy,
completeness, <I/sigma>, R-rim/R-meas, CC1/2) plus the Diederichs asymptotic I/sigma
(ISa) as a free-text _reflns.pdbx_diffrn_ISa item (no standard CIF tag exists). The
MergeStatistics and the ISa string are threaded through WriteReflections to the mmCIF
writer; jfjoch_process and jfjoch_scale pass them. Values match the text statistics table.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
MergeOnTheFly::AddImage picked each image's CC1/2 half from a shared
mt19937 drawn in call order (and before Mask), so the split depended on
iteration/thread order and on how many images were masked. The class is
mutex-guarded for concurrent "on-the-fly" use, so any parallel merge would
make CC1/2 non-reproducible - a latent race.
Assign the half as a splitmix64 hash of the image's stable index instead,
computed after Mask. The split is now reproducible run-to-run, independent
of AddImage order, parallel-safe, and decoupled from masking. Callers pass
the outcome's vector index as the image id.
Verified: lyso_ref two-pass -M -P rot3d gives identical CC1/2 across runs
(overall 99.6%, P41212); hash split is balanced ~50/50.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add an optional Smargon static positioner (chi/phi angles + rotation axes)
that is reconstructed into the NXmx sample transformation chain. Chi/phi are
appended at the innermost end of the chain (closest to the sample) for both
the goniometer and grid-scan branches, with axes defaulting to chi {0,0,1}
and phi = omega default {1,0,0}.
- SmargonPosition gains chi_axis/phi_axis (common/JFJochMessages.h)
- OpenAPI: optional phi_axis/chi_axis arrays; clients regenerated
- OpenAPIConvert wires Dataset_settings.smargon -> DatasetSettings
- CBOR serializer/deserializer round-trip the axes
- tests: CBORSerialize_Start_Smargon
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When per-image CBOR metadata comes within 32 bytes of the buffer slot
size, slot_size - (metadata_size + 32) wrapped around (size_t), passing a
huge output_size to CompressImage. That defeated the buffer-too-small
guard and let the compressor write the full image past the end of the
slot, corrupting adjacent memory; AppendImage then threw a plain
JFJochException that aborted the whole collection after the fact.
Detect metadata_size + 32 >= slot_size explicitly and throw
CompressionBufferTooSmallException, so the existing catch drops just this
frame gracefully - the case the change was meant to handle.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SmoothImageScaleG rewrites the partials in place (image_scale_corr and
image_scale_g). On the no-reference path that is harmless: each scaling
pass recomputes G from scratch via ScaleAllImages, so smoothing always
runs on freshly-refined G. On the reference path the scaling loop is
skipped, so G is computed once and stays; running scale_and_merge twice
(P1 then the adopted space group) smoothed the already-smoothed G a
second time, compounding into a ~2x wider effective kernel than the
configured --smooth-g and biasing the merged intensities.
Smooth only on the first pass of the reference path (G is unchanged
afterwards, and the smoothed partials persist into the second pass's
combine3D). The no-reference path is unchanged.
Verified on lyso (600 frames, -P rot3d -z ref.mtz -M): the reference run
now logs the smoothing once instead of twice, and the merged MTZ changes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
De-novo two-pass indexing failed on large-cell / superstructure / modulated
crystals (EcwtAL500 0%, EcwtCQ034 0%) and mis-handled a pseudo-symmetric one
(EcwtCQ066 14%). The common cause: the choice of unit cell was made too early,
on the raw pre-refinement spot fraction, which is an unreliable discriminator
(a correct hexagonal cell indexes only ~13% of the un-refined accumulated
spots, while a wrong larger cell can index more).
Move the decision to after geometry refinement, where it is reliable:
- FFTIndexer now OFFERS widened candidate cells instead of deciding. ReduceResults
gains a `widen` flag: the standard path (9 shortest vectors) is unchanged; the
widened path anchors the two short axes and lets the third range over all
filtered vectors (+dedup) to reach the long axis of an elongated cell.
FilterFFTResults takes the peak count as a parameter (30 standard, 60 widened).
RunInternal appends widened candidates only when its standard best indexes
poorly, so compact crystals are untouched.
- RotationIndexer fully refines the top few candidates and keeps the one that
indexes the most spots under its own refined geometry (IndexedFraction). Each
refine is length-bounded (1.2x the found cell) so a free triclinic refine cannot
drift onto a pseudo-translation / modulation supercell (CQ034's satellites). The
earlier (primary) candidate is preferred: a later one is adopted only if it
indexes clearly more and reasonably well in absolute terms, so a twin's noisy
near-tie cannot displace it. Extra/twin lattices are only searched when the
chosen cell is the FFT primary (lattice[0]), since MultiLatticeSearch's
rotations are derived from that primary.
- The pseudo-symmetry guard (de-novo only - a user-fixed space group is always
honored) is a ratio of refined fractions: refine the primitive as triclinic and
drop to P only if the constrained cell indexes less than half of it. A false
promotion indexes badly under its constraint (CQ066 ratio ~0.1) while genuine
higher symmetry, including R-centred, indexes comparably (Ins_H R3 ratio ~0.7)
and is kept.
Validated on the full /data/rotation_test battery: AL500 0->89% (C2), CQ034
0->99%, CQ066 14->93% (ISa 7.2->13.7); the other 15 crystals keep their exact
cell, space group, indexing rate and ISa (no regressions).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a row after the Indexing solution that flags images with more than
one indexed lattice (indexing_lattice_count > 1).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The THIRD_PARTY_NOTICES.md manifest lived only at the repo root and was
referenced from docs/SOFTWARE.md via a ../ link that escapes the Sphinx
source tree, so it never rendered in the published docs and was absent
from the navigation.
Add docs/THIRD_PARTY_NOTICES.md to the General toctree and fix the
SOFTWARE.md link. The docs page is generated from the canonical root file
by update_version.sh (like the python-client docs): licenses/*.txt links
point at the repository, and the project-license links point at the
in-docs LICENSE / FPGA_LICENSE pages.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The two Windows jobs (build-windows / build-windows-nocuda) differed only in
-DJFJOCH_USE_CUDA=ON/OFF, so collapse them into one matrixed job (variant
cuda/nocuda), mirroring the build-rpm matrix.
On a tag, upload the NSIS installer (jfjoch-<version>-win64-{cuda<major>|cpu}.exe,
named in CMakeLists.txt) to the release via gitea_upload_file.py, the same helper
the RPM/DEB nocuda variants use to attach viewer/writer artifacts.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Windows viewer runner has Python but not the 'requests' package, and does
not necessarily have bash. So:
- rewrite gitea_upload_file.py to use only the Python stdlib (urllib), which
works with a bare interpreter on both the Linux package runners and Windows;
also drop the file's unused create_release() (gitea_create_release.py owns that);
- run the Windows 'Upload installer to release' step in PowerShell (always present)
instead of bash, globbing the NSIS .exe with Get-ChildItem.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Default reflection format is now mmCIF (ScalingSettings.h, the single source of
truth). jfjoch_process and jfjoch_scale no longer hard-code a local MTZ default;
they override only when --scaling-output is given. jfjoch_viewer inherits this.
- A dataset with a rotation goniometer axis is processed as rotation data (two-pass
indexing) by default; add --process-as-stills to force per-frame stills. -R still
tunes the first-pass image count.
- rot3d is the default partiality model for rotation processing (fixed for stills)
when no explicit -P is given.
- Update docs/JFJOCH_PROCESS.md: new de-novo rotation/stills examples, corrected
-R/-P/--scaling-output defaults, --process-as-stills, and a real Integration table.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
An 11-crystal mosaicity-stratified re-test (/data/rotation_test, off vs on
vs a de-contaminated variant, plus a per-frame dump of the fitted widths)
showed the dial is net-negative and cannot work in the per-frame paradigm:
- The C|q|^2 mosaicity term - the whole point - is unfittable per-frame: the
fitted curvature a2 comes out ~0 (often negative) on every crystal, with zero
correlation to the XDS mosaicity (0.09..0.42 deg). Strong spots sit at low q
where eta^2 q^2 is invisible; the curvature only appears at high q where there
are ~0 strong spots. The law degenerates to a straight line.
- With a2~0 the high-res width becomes a blind 1/cos^2(2theta) extrapolation,
2-4x wider than per-shell. The per-shell path's high-res "starvation" (flat
narrow fallback) is accidentally correct: weak, crowded high-res spots want a
narrow aperture, not the true wide spot shape.
- The over-wide profile pulls background into weak spots -> R-meas rises, CC1/2
drops in reliable high-multiplicity shells (pding4_001, pding4_003, MyoB,
EcwtCQ066). A cap at the widest well-sampled per-shell width recovers the
regression, confirming over-widening is the harm. No crystal reliably wins;
the apparent overall-CC gains were all in noise shells (mult 2-3, CC<20%).
Delete the CLI flag, the BraggIntegrationSettings::reciprocal_profile setting,
and the per-frame fit block. Default (per-shell) integration is byte-identical.
NEXTGEN_INTEGRATOR.md records the finding as a dead-end for posterity.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The master file's detectorSpecific/detect_ice_rings key is read back by
HDF5MetadataSource onto the experiment, so a dataset collected with ice-ring
detection on was reprocessed with exclusion forced on (dropping ~9% of
reflections, gutting low/mid-res completeness on clean crystals) and
jfjoch_process had no way to turn it off. Make --detect-ice-rings an
optional<bool> taking =on|off (bare = on, back-compatible) and apply it
after the dataset value is copied in, so an explicit CLI choice wins.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
In the default rotation (rot3d) path only G is refined - B is fixed, mosaicity
is pinned and the wedge is not refined - so the predicted intensity G*coeff is
linear in G and the robust (Cauchy) per-image scale is a 1-D M-estimate. Solve
it directly by iteratively reweighted least squares (a few closed-form weighted
ratios) instead of building a Ceres problem per image. Ceres is kept for the
cases that are genuinely nonlinear: refining the B-factor or the rotation wedge.
Same Cauchy objective as the Ceres path, but ~4x faster at scaling and ~30%
faster overall on the /data/rotation_test battery, with space group, cell, ISa,
completeness and CC1/2 matching across all 18 crystals (the two that look
different, EP_cs_01-17 and EcwtAL500, are run-to-run unstable for both solvers).
lyso_ref scaling 25.2->4.3s, cytC_2 15.2->2.6s, battery total 468->316s.
Also drop the per-image G/B regularizers (gated by GetScalingRegularize, which
nothing enables) and the now-unused RegularizationResidual.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
--detect-ice-rings previously dropped ice-ring reflections outright before
scaling, which also removed them from the merge and gutted low/mid-resolution
completeness on crystals that merge them fine (e.g. lysoC 80% -> the ice bands
show as jagged shell completeness). Instead flag them (new Reflection.on_ice_ring,
set in JFJochProcess, carried through the 3D combine) and exclude the flag only
where a model is fit: the per-image scale (ScaleOnTheFly::Accept + the per-image
CC), and - for the de-novo P1 pass - the space-group search and error model
(MergeOnTheFly::ExcludeIceRings, on for for_search). The final in-symmetry merge
and its statistics keep them, so completeness is preserved.
/data/rotation_test battery vs the previous drop-from-merge behaviour: space
group correct on all 18; completeness recovered broadly with CC1/2 and ISa held
(cytC_2 82->99.7%, cytC_3 73->99.7%, InsI3 76->99.5%, lysoC 80->99.7%, MyoB
80->99.7%, InsH3 78->99%). Excluding ice from the P1 search merge is what keeps
the space group correct: without it InsI3 flipped I23->P1 and EP_cs_01-17 P2->P1.
Known limitation: on heavy-ice crystals (EP_cs_01-17) the strong ice is garbage
and keeping it in the final merge collapses CC1/2 in the ice shells (91.7->6.9%).
Distinguishing strong vs weak/absent rings per crystal needs data-driven,
per-ring ice detection (azimuthal radial profile) - the planned next step.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
BraggPredictionRot halves settings.wedge_deg to the +/- half-wedge of the
partiality erf pair, but IndexAndRefine already passed GetWedge_deg()/2, so the
two /2 compounded to a half-wedge of increment/4 - half the correct Kabsch value
(increment/2, which ScaleOnTheFly's RotationPartiality already used). Pass the
full increment so prediction's partiality matches scaling.
With prediction correct, ScaleOnTheFly now uses the stored r.partiality directly
(the value the reflection was integrated with) rather than recomputing the erf
pair per reflection - recomputing only when scaling overrides the geometry
(-w wedge refinement, --mosaicity, or a scaling wedge override). Output-neutral
on the /data/rotation_test battery (SG/cell/completeness identical, ISa/CC1/2
within run noise on the stable crystals).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
An ice-contaminated ring decorrelates its reflections, so its merged half-set
CC1/2 falls well below the Bragg trend of its resolution shoulders (e.g.
EP_cs_01-17 2.25A ring CC1/2 0.12 vs shoulders 0.89). After the final merge,
compute per-hexagonal-ice-ring CC1/2 in the ring band (+/- ice width in q=2pi/d)
against the shoulders either side, mask any ring more than 0.05 below its
shoulders (reliable shoulder CC1/2 > 0.5, >=20 reflections each) and re-merge
without them. Weak/absent rings track their neighbours and stay, so clean
crystals are untouched.
Data-driven and robust on /data/rotation_test: 16/18 crystals mask nothing and
are unchanged (every clean crystal, plus icy-but-fine EP_cs_02-10); EP_cs_01-17
masks 6 rings, CC1/2 6.9% -> 28%; CQ066/pding4_003 mask one marginal ring each.
A well-scaled CC1/2 test, replacing the fragile per-image azimuthal cutoffs.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A single per-image ice_ring_score - the strongest hexagonal-ice ring band/shoulder
intensity ratio from the azimuthal profile (1 = no ice) - computed in the CPU and
FPGA analysis paths and the offline azint worker, then plumbed through every layer:
DataMessage/EndMessage, CBOR (frame_serialize), HDF5 (/entry/MX/iceRingScore),
ScanResult, receiver plots (PlotType::IceRingScore), the OpenAPI spec (plot_type +
scan_result schema, with regenerated broker/gen and frontend client) and
OpenAPIConvert, the reader + Qt viewer, and the React frontend plot. Documented in
docs/CBOR.md, docs/HDF5.md and docs/CPU_DATA_ANALYSIS.md, with the general
"add a per-image quantity" recipe added to CLAUDE.md.
Verified in HDF5: lysoC (weak ice) mean 1.23, EP_cs_01-17 (heavy ice) mean 1.67 /
max 2.23. This is a monitoring quantity - it does not gate scaling (which already
excludes all ice rings) or merging (handled by the CC1/2 ring mask).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The per-image ice_ring_score was written to the data files (HDF5DataFilePluginMX)
but the master-file write and the stored-file read path were missing, so opening a
processed HDF5 (e.g. in jfjoch_viewer) would not surface it. Mirror bkg_estimate:
write /entry/MX/iceRingScore in the NXmx master (HDF5NXmx, from EndMessage), and
read it back in HDF5MetadataSource into the dataset (master + data-file reads) and
per image into the message. Verified the write; the read is a byte-for-byte mirror
of the working bkgEstimate round-trip.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The score's baseline was two adjacent shoulder bins with a bin-overlap bug - the
ring's edge bins were counted in both the ring and the shoulder, since
GetMeanValueOfBins is inclusive. At the typical (coarse) azint binning (dq ~ 0.05
in q, wider than the 0.03 ring half-width) a shoulder is only ~1 bin, so the ratio
was noisy and poorly separated. Replace it with the ring intensity over a smooth
whole-profile background: a running median of the non-ice bins, interpolated under
each ring.
Clean crystals now sit at ~1.0 and ice separates far more cleanly on
/data/rotation_test: cytC 1.06->1.03, lysoC 1.23->2.77, EP_cs_01-17 1.67->4.51
(max 11.4). A z-score / abnormality probability was tried but is uninformative
here - with many photons any real ice ring is highly significant, so the useful
discriminator is the ice magnitude (this ratio), noted in CPU_DATA_ANALYSIS.md.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Add -q/--azim-q-spacing, --azim-min-q, --azim-max-q, --azim-phi-bins (mirroring
jfjoch_azint) so offline processing can set the radial binning, applied before
the azint mapping is built. Set the AzimuthalIntegrationSettings default spacing
to 0.01 1/A (was 0.05): the coarse default barely resolved the narrow ice rings,
diluting the ice-ring score. Finer binning sharpens it a lot with no effect on
processing - EP_cs_01-17 ice score 4.6->7.3 (max 11->23), clean cytC stays ~1.0,
and space group / cell / ISa / completeness are unchanged (cytC, InsI3, MyoB,
pding4_001 verified full-image). Documented in JFJOCH_PROCESS.md.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Reimplement BraggIntegrate2D (box sum) and ProfileIntegrate2D (Kabsch
profile fit) under one roof as a base + CPU + GPU engine, mirroring the
AzIntEngine / ROIIntegration pattern. Reads the preprocessed int32
ImagePreprocessorBuffer (masked=INT32_MIN, saturated=INT32_MAX), the same
buffer AzIntEngineGPU/ROIIntegrationGPU consume.
The CUDA engine runs one block per reflection with shared-memory
reductions across six kernels (reset, mask, box-sum, profile learning,
profile build, Kabsch fit); the resolution shell is computed inline. The
learning/fit hot path is single precision (FP64 is throttled on consumer
GPUs; reproduces the double CPU path to ~1e-4). Collapsing the per-frame
CUDA API calls into one reset kernel keeps launch-latency overhead low.
Standalone for now: NOT wired into IndexAndRefine. See
BRAGG_INTEGRATION_ENGINE.md for the design and the binding steps.
BraggIntegrationEngineGPUTest checks GPU == CPU across all three modes
(box/gaussian/empirical) within numeric tolerance, plus a [bragg_bench]
perf sweep.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two per-object scan_result API tweaks (C++ internal names unchanged):
- Rename the per-image scan_result field ice_ring_score -> ice (it is
serialized once per image, so keep it short). Only the OpenAPI property and
its regenerated C++/TS clients change; the DataMessage/EndMessage/CBOR/HDF5
field stays ice_ring_score, and the ice_ring_score plot_type enum is untouched.
- Add a global rotation_bravais string (crystal-system letter + centering, e.g.
"tP", "cF", "hR") to scan_result, alongside rotation_unit_cell /
rotation_crystal_lattice. It comes from the RotationIndexer result
(LatticeSearchResult system+centering) via the receiver, formatted by a new
BravaisSymbol() helper in ScanResult.h.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The rot3d post-pass (scale -> smooth-G -> 3D combine -> scale-fulls -> merge ->
error model -> stats) dominated offline wall clock because ScaleOnTheFly + MergeAll
+ CombineRotationObservations + MergeOnTheFly rebuild a std::map keyed by hkl on every
scaling iteration and every merge (7-14 map rebuilds per space-group pass), each re-walking
and re-keying millions of per-frame partials.
RotationScaleMerge ingests the per-frame partials ONCE into flat vectors and reuses them
across both space-group passes: the raw-hkl ordering is sorted a single time at ingest, so
the per-pass 3D combine only re-splits events (no sort) and the ASU grouping is one gemmi
reduction per distinct raw hkl (~10x fewer) rather than per observation, reusing that order.
Every hot step is a flat loop (segmented reduction + per-frame robust IRLS + parallel
per-run combine) that also maps directly onto CUDA kernels. CC1/2 and the per-image CC are
computed once at the end, not every iteration.
It is a distinct path from ScaleOnTheFly, used only for the self-scaling rotation case
(Rotation partiality + combine3d, per-image G, no B refinement, no external reference, no
absorption surface, no wedge/mosaicity override). Stills, B-factor refinement, reference
scaling and the absorption surface stay on the classic path.
Numerically equivalent to the classic path (same robust per-frame G, same 3D combine, same
XDS-order scale-fulls, same global error model, same merge statistics), validated on the
18-crystal /data/rotation_test battery: 16/18 bit-identical in space group / ISa / CC1/2;
the 2 differing crystals are ice-heavy / marginal ones on which the classic path is equally
non-deterministic run-to-run (a pre-existing upstream integration race). Scale/merge wall
time drops ~3.4x (median 9.8 -> 4.1 s), making higher --scaling-iterations cheap.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The per-observation corr update (7.6M items) ran through a work-stealing ParallelFor that
does one atomic fetch_add PER item - pure contention for trivial work (measured: update 0.60s
vs reduce 0.15s / fit 0.13s in the scale-partials loop). Add ParallelChunks (one contiguous
range per worker, no per-item sync) and use it for UpdateCorr, and parallelise the ASU keying
(gemmi reduction per distinct raw hkl - HKLKeyGenerator is const, safe to read concurrently)
and the group-stamping over disjoint raw-hkl runs.
scale-partials 0.90 -> 0.28s, group-hkl 0.20 -> 0.09s, per-pass warm 0.83s, whole scale/merge
phase ~3.3 -> ~2.0s. Bit-identical output (same space group, ISa, CC1/2). ParallelChunks is
the CPU stand-in for a flat CUDA grid-stride kernel; ParallelFor stays for the heavy, uneven
per-frame fits where the atomic amortises and work-stealing balances the load.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Prediction applied a mosaicity/profile-radius moving average (RotationParameters) over the
last N *processed* frames. Under the parallel per-image loop that window is thread-arrival
order, so the smoothed value - and hence which reflections are predicted/integrated - was
non-deterministic run-to-run, swinging CC1/2 (and even the space group) on marginal crystals.
`-N 1` was deterministic; `-N 32` was not.
Fix (as designed with FL): prediction now uses each frame's OWN mosaicity/profile-radius
(image-local, deterministic membership - a reflection on the cutoff contributes ~nothing).
The smoothing that actually matters is moved into RotationScaleMerge and done in FRAME order
(deterministic): per-frame mosaicity is smoothed with the same window as smooth-G, then every
partial's partiality is recomputed from it BEFORE the 3D combine. This is the mosaicity analogue
of smooth-G: combining a reflection's per-frame partials only tiles the rocking curve correctly
(captured fractions summing toward 1) if neighbouring frames share a consistent mosaicity.
Battery (18 crystals, /data/rotation_test, 2 runs each): 15/18 now bit-identical run-to-run
(the good crystals unchanged - lyso P41212 ISa 7.8 CC1/2 99.7%). The 3 residual crystals
(EcwtAL500, EcwtCQ066S, pding4_003 - all large/triclinic cells) still jitter ~0.002%, traced
to a SEPARATE, benign cause: the GPU prediction buffer overflow (BraggPredictionRotGPU
max_reflections=10000 with a racy atomicAdd/atomicSub) on dense frames - cell/space group stay
stable; to be addressed in the GPU prediction/integration rework (naively raising the cap also
changes prediction quality, so it is not a one-line bump). Minor label refinements from the
recomputed partiality: cytC_2 P321 -> P3121 (now consistent with cytC_3), Ins_I_2/3 report the
honest I23/I213 screw-axis ambiguity.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
First stage of moving the rotation scale/merge onto the GPU. The per-frame partial-scaling loop
(inverse-variance group-mean reduction -> robust per-frame IRLS G -> corr update, x scaling_iter)
now runs in RotationScaleMergeGPU (.cu) when a GPU is present; the CPU loops remain the fallback.
The host keeps the one-time raw-hkl sort and the per-space-group gemmi ASU keying, and hands the
GPU a group-ordered permutation + CSR so the per-group reduction is a DETERMINISTIC segmented
reduction (one thread per group, fixed order, no atomics) - preserving the run-to-run determinism
just won on the CPU path (a float atomicAdd reduction would have re-introduced jitter). Reduction is
one-thread-per-group (groups average tens of obs, so a block-per-group wastes threads); the IRLS is
one block per frame with a deterministic shared-memory reduction.
Validated: bit-identical to the CPU path and deterministic run-to-run on lyso/cytC/Ins_H/pding
(P41212 ISa 7.8 CC1/2 99.7%, etc.). The scaling kernels are ~7x faster than the CPU compute
(~36 ms for 3 iters vs ~0.28 s); end-to-end scale/merge ~2.0 -> ~1.5 s. The remaining gap to the
<1 s target is the per-pass host round-trip (corr down/upload for the CPU combine + per-SG group-CSR
rebuild); phase 2 keeps the data resident by moving the 3D combine and the merge/error-model onto
the GPU too, so nothing round-trips.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Port Combine() (partials->fulls) to CUDA, mirroring process_rawrun bit-for-bit:
one thread per raw-hkl run splits its usable partials into rocking events (frame
gap <= 2), pools background, seeds F, runs 3 de-biased Poisson reweights and the
capture-uncertainty term. Emission is deterministic - a count pass, a host
exclusive prefix sum for per-run offsets, then an emit pass at those offsets - so
fulls come out in raw-run-major/event order, identical to the CPU path; both pass
instantiations share the same arithmetic so count == emit exactly. Dmax/Dmin/Fmax
reproduce std::max/min NaN semantics (not fmax) for parity.
Validated across the 18-crystal rotation battery: all 15 deterministic crystals
(P1/P2/C2/H3/I23/P41212/P222/P422) match the CPU combine exactly on SG/ISa/CC1.2/
completeness and run-to-run (fulls count bit-identical); the 3 upstream-nondet
crystals vary from GPU-prediction overflow, not the combine.
Gated opt-in behind JFJOCH_RSM_GPU_COMBINE (default = CPU combine): combine alone
is timing-neutral because the shared 1.2M SortFullsByFrame std::sort dominates and
the fulls round-trip adds a copy - it only pays off once the fulls stay resident
for scale-fulls + merge. Also add JFJOCH_RSM_NO_GPU master switch to force the CPU
fallback (incl. phase-1 scaling) from one binary for A/B parity. SortFullsByFrame
extracted from the Combine tail and shared by both paths.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Scale the combined fulls (Unity model) on the device so they no longer round-trip
between the combine and the merge: after the GPU combine, build the fulls' per-frame
and per-ASU-group CSRs on the host from just the small key arrays (f_frame/f_group)
with a deterministic counting sort - no GPU stable-sort - then scale in place and
download once.
The four scaling kernels are reused unchanged except FitPerFrameGKernel, which gains
an optional `perm` argument (null for the partials, whose arrays are already
frame-contiguous; a frame-grouping permutation for the emit-ordered fulls) so the
fulls are scaled without a physical reorder. The Unity model falls out of giving the
fulls all-ones partiality/rlp/zeta (coeff = mean), so no other kernel changes and the
committed phase-1 partial-scaling path is bit-identical (perm == null -> idx == i).
Validated across the rotation battery (JFJOCH_RSM_GPU_COMBINE=1): all 15 deterministic
crystals stay run-to-run deterministic and their merged output is bit-identical to the
CPU path (SG/ISa/CC1.2/completeness). The lone exception is EP_cs_01-24 (CC1/2 2%,
R_meas 379% - unindexable noise): merged intensities/CC/completeness match exactly, but
the ill-conditioned 16-bin error-model b fit amplifies the ~1e-7 scale-fulls rounding
to ISa 10.6 vs 10.8 - benign, same class as the accepted phase-1 GPU rounding. The 3
upstream-nondeterministic crystals vary as before (GPU-prediction overflow, not this).
Scale-fulls drops from ~0.09s to ~0 across the two passes; combine+scale-fulls region
~0.32s GPU vs ~0.46s CPU on lyso. Still opt-in (fulls are downloaded for the host merge;
the win grows once the merge/error-model also stay resident).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Steps 1-2 (GPU 3D-combine + resident scale-fulls) are validated bit-parity and
run-to-run deterministic against the CPU path across the rotation battery, and cut
the combine+scale-fulls region from ~0.46s to ~0.32s on lyso, so make them the
default when a GPU is present (consistent with phase-1 partial scaling already being
default-on). JFJOCH_RSM_CPU_COMBINE forces the CPU combine/scale-fulls for A/B or
debugging; JFJOCH_RSM_NO_GPU still disables the whole GPU path.
The only battery crystal whose reported metrics move is EP_cs_01-24 (CC1/2 2%,
unindexable noise) whose upstream integration is itself nondeterministic; its merged
intensities/CC/completeness are unchanged, only the ill-conditioned error-model b.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The single-threaded ReduceGroupMeans over the 6.3M partials (~0.07s/2-pass) and the
per-frame diagnostic CC now run on the resident partials on the GPU: after SmoothG,
the smoothed corr is uploaded once (and left resident for the combine, dropping the
combine's redundant re-upload), then the post-smooth group means (reusing the scaling
reduce) and the per-frame Pearson CC (a new one-block-per-frame kernel) run there and
only the tiny per-frame cc/cc_n come back. FinalizePerFrameScale is split into
ComputePerFrameCC (host reference) + the writeback; the GPU path uses ComputePartialCC.
The per-frame CC is diagnostic only (the per-image scaling table), so the tree
reduction's ~ulp difference from the CPU is immaterial and it does not touch merged
intensities. smooth+CC region ~0.10s GPU vs ~0.15s CPU on lyso. Validated across the
battery: 15/15 deterministic crystals run-to-run deterministic and merged output
bit-identical to the CPU path (only EP_cs_01-24, unindexable noise, keeps its benign
error-model-b wobble). CPU fallbacks (JFJOCH_RSM_CPU_COMBINE / _NO_GPU) unchanged.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Profiling showed the per-space-group "group hkl" step (~0.30s/2-pass on lyso) is not
gemmi-bound (the ASU keying is ~6ms) but memory-bandwidth-bound: stamping the group id
onto, and reading it back from, the `group` field scattered across the 56-byte Obs
struct touches the whole ~350MB partials array twice per pass.
Precompute the per-obs AcceptReflection finiteness once (immutable) into a flat 1-byte
array, then stamp the ASU-group id from rawrun_group + that flat array into a flat
group_ids vector for the GPU, and build the group CSR (a stable counting sort, now
parallel) from group_ids - all sequential/flat reads. The Obs.group field is written
only when a CPU stage will read it (no GPU: scaling/CC/combine otherwise use
group_ids / rawrun_group, never partials.group), so the default path skips the strided
Obs pass entirely. group hkl ~0.31 -> ~0.20 s/2-pass on lyso.
Output is bit-identical (group_ids values and the obs-index-ordered gperm are unchanged),
so the merged results are unchanged; validated across the battery (15/15 deterministic
crystals bit-identical to the CPU path, only EP_cs_01-24 noise keeps its benign wobble).
Non-CUDA build unaffected (need_obs_group is always true there).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Apply smooth-G's corr adjustment on the device (a small kernel: corr[i] *=
ratio[frame[i]] for flagged frames, double-then-float, matching host SmoothG) so the
per-image corr never leaves the GPU: it now stays resident through scaling ->
smooth-G -> per-frame CC -> combine, and across the two space-group passes exactly as
the old host round-trip did. The host only builds the tiny per-frame ratio (g/g_smooth
via the extracted ComputeSmoothGWindow) and refreshes host partials[].corr solely for
the CPU-combine path (JFJOCH_RSM_CPU_COMBINE or the diagnostic dump).
This drops the post-scale GetCorr and the two SetCorr re-uploads (~3x25MB/pass) plus the
6.3M host corr-adjust loop: scale-partials ~0.21->~0.10s and the smooth+combine region
shrinks, taking RSM on lyso to ~0.91s (was ~1.47s with phase-1-only, ~1.71s full-CPU) -
under the 1s target for this crystal; merge+stats (~0.49s) is now the dominant chunk.
Bit-identical (GPU smooth-G == host SmoothG on the resident corr); validated across the
battery (15/15 deterministic crystals bit-identical to CPU across default / CPU-combine /
NO_GPU, only EP_cs_01-24 noise wobbles). Non-CUDA build unaffected.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Port the four fulls-walking reductions of MergeAndStats to the GPU, over the fulls
group-CSR already resident from scale-fulls: the per-group inv-var mean + leverage-
corrected error-model samples, the merge accumulate (inv-var sums + deterministic
half-sets, error-model-corrected sigma, with outlier rejection), and R_meas + the
per-shell usable count. The host keeps the parts that don't parallelise cleanly or are
tiny: the I2-sort + 16-bin (a,b) median fit, the per-group reject median (a per-group
median is awkward on the GPU - cheap on the host from the GPU cnt), the merged export,
the shells and the gemmi completeness. Only per-group arrays (~55k) + the samples
(~n_fulls, for the fit) come back - the fulls are not re-walked on the host.
Device HalfForImage (splitmix64) + IceRingIndex mirror the host; the corrected-sigma
uses (b*I_for_b)^2 (not b^2*I^2) to match the host rounding; the R_meas usable count
requires finite d (the host counts only fulls with a valid shell, and a group's fulls
share d, so the shell is assigned per group). Gated on fulls_resident (GPU
combine+scale-fulls active); reject is fully supported so it runs for the default
rot3d command.
merge+stats ~0.49 -> ~0.37s, taking RSM on lyso to ~0.78s (was ~0.91). Validated across
the battery: 15/15 deterministic crystals bit-identical to the CPU path (SG / ISa /
CC1.2 / completeness / total-obs, and the exact outlier-reject count), only EP_cs_01-24
noise wobbles. The em-sort + a,b fit are the remaining host floor. Non-CUDA build
unaffected (use_gpu_merge is always false there).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Remove the [rsm] per-stage lap timing and the JFJOCH_RSM_NO_GPU / JFJOCH_RSM_CPU_COMBINE
env gates now that the GPU-resident path is the validated default (it runs whenever a GPU
is present, with the CPU loops as the bit-parity fallback; the diagnostic-dump path still
uses the CPU combine).
Honour a fixed (forced) mosaicity: SmoothMosaicityAndPartiality now overrides every frame
with GetForcedMosaicity() when set, instead of always reading the per-frame integration
value - so the caller can route the --mosaicity case through RotationScaleMerge (its
partiality recompute makes it a natural fit) rather than a separate path.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Consolidate the offline scaling/merging on two engines and delete the old classic
rotation chain:
- ScaleOnTheFly now implements ONLY the fixed-partiality model: it uses each reflection's
stored partiality as-is (1 for stills, the zeta/erf rocking-curve value already set at
prediction for rotation). Dropped the Rotation partiality recompute, the unity override,
the ceres rotation residual, and all wedge/mosaicity refinement.
- Remove Combine3D (CombineRotationObservations): its 3D-combine lives in RotationScaleMerge.
- JFJochProcess / jfjoch_scale: rotation (-P rot3d) goes through RotationScaleMerge for the
whole self-scale -> 3D combine -> scale-fulls -> merge; stills self-scale with ScaleOnTheFly
+ MergeOnTheFly. RotationScaleMerge does not support external-reference scaling, B-factor
refinement, an absorption surface or wedge refinement, so those combinations now throw
instead of silently falling back. Deleted the classic ScaleFulls / AbsorptionSurface /
SmoothImageScaleG helpers.
- CLI cleanup: drop -w/--wedge (wedge refinement) and --absorption (both tools), and the now
dead -P values rot and unity (keep fixed | rot3d). Fix the -P round-trip in the reproduced
command line.
Behaviour on the rotation path is unchanged: the full 18-crystal battery matches the
pre-cleanup metrics (SG/ISa/CC1.2/completeness) exactly on all 15 deterministic crystals.
Non-CUDA build unaffected.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
With ScaleOnTheFly now fixed-only and rotation routed through RotationScaleMerge, the
PartialityModel enum carried no information (it was always Rotation for rotation processing,
Fixed otherwise, and mirrored combine_3d). Drop it:
- ScalingSettings: `partiality_mode` (PartialityModel) -> `force_still_processing` (bool);
SetPartialityModel/GetPartialityModel -> ForceStillProcessing/GetForceStillProcessing.
Remove the enum.
- DiffractionExperiment: drop GetPartialityModel(); the rotation-vs-stills decision is now
just GetCombine3D() (set by the tool = rotation && !force_still). The wedge getters no longer
key off the model (dead since wedge refinement was removed).
- jfjoch_process: `-P/--partiality fixed|rot3d` -> `--force-still-processing` (a rotation
dataset scaled as independent stills). Auto-detect sets combine_3d for rotation data unless
the flag is given.
- jfjoch_scale: `-P fixed|rot3d` now toggles combine_3d directly (no PartialityModel).
- jfjoch_viewer: the "process as stills" toggle sets ForceStillProcessing(!rotation_mode) -
UI unchanged, just wired to the new field.
PartialityModel was never in the OpenAPI, so no generated clients change. Rotation path
behaviour is unchanged (lyso 16.4/99.6%/87.3%); --force-still-processing correctly routes to
ScaleOnTheFly. CUDA + non-CUDA + viewer all build.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fold the two overlapping knobs into one. Rotation-vs-stills is now a single decision
(IsRotationIndexing() = goniometer && rotation-indexing) that drives BOTH indexing and
scaling/merging - there is no independent scaling-model switch:
- ScalingSettings: drop combine_3d and force_still_processing (and Combine3D/GetCombine3D,
ForceStillProcessing/GetForceStillProcessing). The scaling stage reads
experiment.IsRotationIndexing() instead - rotation indexing implies RotationScaleMerge,
no rotation indexing implies per-image ScaleOnTheFly.
- jfjoch_process: merge --process-as-stills and --force-still-processing into one
--force-still (turns rotation processing off entirely: still indexing + still scaling).
The rotation-specific scaling defaults (scale-fulls, smooth-G, capture uncertainty,
outlier rejection) now key on rotation_indexing.
- jfjoch_scale: replace -P fixed|rot3d with --force-still; rotation is auto-detected from the
goniometer (RotationIndexing set accordingly) so it matches jfjoch_process.
- jfjoch_viewer: the "process as stills" toggle already sets RotationIndexing; drop the now
redundant Combine3D/ForceStillProcessing calls.
Rotation path unchanged (lyso 16.4/99.6%/87.3%); --force-still routes to ScaleOnTheFly; the
-R + --force-still conflict still errors. CUDA + non-CUDA + viewer all build.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a build-viewer-tgz job on the oldest runner (RHEL 8) that builds
JFJOCH_VIEWER_ONLY with cuda12.9 and nocuda and uploads a self-contained
jfjoch_viewer-<version>-linux-{cuda<major>|cpu}.tgz to the release, mirroring
the Windows installer naming. CMake gains a Linux viewer-only CPack path that
emits a single TGZ instead of an RPM/DEB. Drop the per-distro viewer RPM/DEB
from the release uploads (writer + XDS plugin still go there); the viewer
packages continue to publish to the package repositories unchanged.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Dashboard: the Measurement statistics and Spot finding panels pinned
their Paper to an exact height matching the neighbouring plots, but
their content is taller, so rows spilled past the border. Use minHeight
so the box still aligns as a floor but grows to contain its content.
Live preview: the image Paper had minWidth 1200 (wider than its column
inside the 1300px main area), so it overflowed and overlapped the
settings panel; its fixed height 1250 also left a large empty area
below the image. Fill the column instead and centre the image.
Detector settings: the Count time row sat in an xs=8 column inside a
left-packed ListItem, shifting the field off-centre relative to the
other rows. Match the xs=1/10/1 layout and centre the switch + field.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Replace the free functions BraggIntegrate2D/ProfileIntegrate2D with the
BraggIntegrationEngine (CPU/GPU) as the live integrator.
- IndexAndRefine no longer holds the integrator: ProcessImage takes a
per-worker BraggIntegrateFn callback (ProcessImage is called concurrently by
the shared IndexAndRefine, so the stateful engine must not be a member).
- WithoutFPGA/jfjoch_process: owns a GPU engine when a GPU is present, else CPU,
and passes the GPU-resident preprocessed buffer so integration runs on-device.
- AfterFPGA: forces CPU and integrates straight off the assembled CompressedImage
via a templated per-pixel sampler - only the reflection-disk pixels are read,
no whole-image copy (the FPGA host runs up to 36 GB/s). Sampler maps type
min/max to INT32_MIN/INT32_MAX on read; special/saturation only, no +/-1 band.
- Remove BraggIntegrate2D/ProfileIntegrate2D and their test; keep IntegratorMode.
Prediction: buffer up to 20000 candidates but return the 10000 closest to the
Ewald sphere (deterministic partial_sort on |dist_ewald|, hkl tiebreak) instead
of the GPU atomic-fill order. Serialized output stays <=10000, so the frame
transport headroom and its CBOR guard are unchanged.
integration_model exposed via OpenAPI (bragg_integration_settings schema,
/config/bragg_integration PUT/GET, added to jfjoch_settings and jfjoch_statistics)
and the frontend (BraggIntegrationSettings dropdown). Regenerated C++/TS clients
and redoc.
Validated old-vs-new on all 18 /data/rotation_test crystals: indexing rate and
space group bit-identical; ISa/CC identical on 16/18 (one improved, EcwtAL500
ISa 0.0->6.7); new CompressedImage-vs-buffer and GPU-vs-CPU parity tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The grid-scan branch of NXmx::Sample wrote grid_scan_x/grid_scan_y from
GetXContainer_m/GetYContainer_m, which return max_image_number entries.
When a scan stops at the first image (max_image_number == 0) these are
empty and SaveVector throws "Cannot write empty vector", surfacing as
"Writer 0: HDF5 error". Guard the two per-image writes on
max_image_number > 0, mirroring the goniometer branch; the grid-scan
scalar metadata and smargon transformation are still written.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The monoclinic 2_1 screw was missed on EP_cs_02-10 and EP_cs_01-17 (adopted P2
instead of P2_1). The 0k0-odd reflections are correctly measured as weak (~0.1-1%
of 0k0-even, matching XDS), but their merged sigmas are ~2x too small, so their
I/sigma clears the present_i_over_sigma cut and they count as screw-axis
violations - the search then falls back to the symmorphic group.
Add a resolution-normalised intensity E^2 = I / <I>(shell), computed from
equal-count resolution shells, and require a reflection to reach present_e_squared
(0.3) as well as present_i_over_sigma before it counts as violating a predicted
absence. This only tightens "present", so it cannot manufacture a screw whose
predicted-absent class carries real intensity (a symmorphic crystal's axial
reflections sit at E^2 ~ 1 and still register as violations).
On the 18-crystal rotation battery this recovers the screw on EP_cs_02-10 and
EP_cs_01-17 (-> P2_1) and, as a side effect, on MyoB (-> P2_1), pding4_001
(-> P4_122/P4_322) and pding4_003 (-> P2_2_2_1) - all confirmed by genuine absences
in the reference intensities (absent class at 0.02-0.76% of the allowed class),
which the old sigma-only test also missed. The other 13 crystals, indexing rate
and ISa are unchanged; the screw-free control (Ins_I -> I23/I2_13) is unaffected.
Add a regression test that reproduces the under-estimated-sigma screw and checks
the gate recovers it (and that disabling the gate reproduces the miss).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add the user-facing changes committed since the rc.156 entry was written: GPU
Bragg integration in the offline workflow (old integrators removed), deterministic
Ewald-ranked prediction, intensity-based systematic-absence test (recovers missed
screw axes), GPU-accelerated RotationScaleMerge, the --force-still / auto-rotation
/ mmCIF-default processing changes, the integration-model REST/frontend setting,
and the writer, viewer and CI fixes.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The error model splits a reflection's variance into a statistical part (a*sigma^2,
which averages down with multiplicity) and a systematic part ((b*I)^2 - absorption,
beam flicker, partiality, detector non-uniformity - correlated across a reflection's
repeats). Inverse-variance merging (sigma = 1/sqrt(sum_w)) divided BOTH by the
multiplicity, so high-multiplicity reflections got an unphysically small merged sigma:
on lysoC the merged I/sigma reached 300 with a third of reflections above the ISa of
11.1, whereas XDS's tops out exactly at its ISa (23.2 vs 23.3). The too-small sigmas
also faked screw-axis violations (the missed monoclinic 2_1) and mis-weight refinement.
Floor the merged sigma at b*|I| (= I/sigma capped at ISa = 1/b), applied in the shared
export loops of both merge paths: Merge.cpp (stills) and RotationScaleMerge.cpp (the
rotation path, whose export loop is shared by the CPU and GPU merges, so parity is
preserved). Intensities, weights, CC1/2 and R_meas are unchanged - only sigma changes.
Validated on the 18-crystal rotation battery: merged I/sigma now caps at each crystal's
ISa (was up to 300), overall <I/sigma> becomes honest (lysoC 11.6 -> 5.4), and no space
group or indexing rate regresses. The one crystal still uncapped (EP_cs_01-24, ISa 0.0)
is the known ice-ring failure where the error model does not fit, so there is no b*I
floor - a separate issue. This is the root cause the SearchSpaceGroup E^2 gate patched
downstream; the two now reinforce each other. It does not close the per-observation ISa
gap vs XDS (integration quality), only makes the merged sigmas honest.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
TCPImageCommTest_DisconnectMidWrite_NoHang launched the sender before the
puller's connection was accepted by the pusher's background accept thread.
When StartDataCollection ran first it threw "No writers connected", which
escaped the sender lambda (re-thrown by sender.get()) and starved the
receiver thread's PollImage, whose REQUIRE failure aborted with SIGABRT.
Wait for GetConnectedWriters()==1 before starting, matching every other
test in the file.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The build-viewer-tgz job failed at the CPack step: the archive (TGZ)
generator defaults to a monolithic install, so it walked every install()
rule in the tree -- the fetched HDF5 command-line tools, abseil's
EXCLUDE_FROM_ALL sub-libraries, Findlibaec.cmake and the non-viewer
components -- none of which a viewer-only build produces, so cpack aborted
with "cannot find <artifact>".
Scope the archive to CPACK_COMPONENTS_ALL (viewer) via
CPACK_ARCHIVE_COMPONENT_INSTALL, mirroring the DEB/RPM branches;
ALL_COMPONENTS_IN_ONE keeps the output a single tarball named
CPACK_PACKAGE_FILE_NAME so the CI upload glob still matches.
The viewer component also contains the portable CLI tools
(jfjoch_process/scale/azint/recompress/extract_hkl), so build the whole
viewer-only tree (ninja -j16) instead of just jfjoch_viewer, otherwise
those binaries are missing when cpack installs the component.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
This is an UNSTABLE release. It includes many experimental features, as well as many AI generated fixes. We recommend using rc.152 for production use.
New CompressionAlgorithm that emits a standard Zstandard frame: zero/0xFF runs become RLE_Blocks (like BSHUF_ZSTD_RLE) and literal regions become Compressed_Blocks with per-block adaptive Huffman literals and no sequences (Number_of_Sequences=0). Short runs are absorbed into the literal stream; incompressible literals fall back to Raw_Blocks so the worst case stays within ZSTD_compressBound. The Huffman tree + bitstream are produced by zstd's own HUF_compress{1,4}X_repeat (the same calls ZSTD_compressLiterals uses); only the frame/block/literals-section framing is hand-written, with comments citing zstd_compression_format.md so it can be checked clause by clause. Output decodes with stock ZSTD_decompress, so no reader changes are needed (decode routes like BSHUF_ZSTD). On sparse diffraction this gives ~12% smaller files than bitshuffle/LZ4 at about the same end-to-end speed, sitting between LZ4 and full ZSTD; for maximum ratio use BSHUF_ZSTD. Robust on any input: tests round-trip pure zeros, Poisson(10), Mersenne-Twister noise (checked against the size bound), an extreme-sparsity mask, and a real lyso image through stock ZSTD_decompress. API: exposed as "bszstd_rlehuf"; regenerate the Python/TS clients (update_version.sh) to surface the new value there. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>Builds a single Compressed_Block (Huffman-coded Literals_Section, empty Sequences_Section) and checks: the block type is Compressed, its trailing Number_of_Sequences byte is 0, and stock ZSTD_decompress reconstructs the literals exactly. This is the format guarantee from zstd_compression_format.md ("if Number_of_Sequences == 0 ... Block's decompressed content is defined solely by the Literals Section content"), locked into the test suite. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>Fit the reciprocal tangential-width model y(q)=a0+a1*t+a2*t^2 in a centered, standardized variable t=(q-qbar)/qscale instead of the raw {1,q,q^2} monomials: the raw normal matrix went near-singular when the strong spots span a narrow q-range (small cell / sparse still), letting tiny per-frame jitter swing the curvature into a wild over-wide profile. Adds IRLS (Huber) robustness, a ridge on the curvature (sharp-crystal prior), and clamps the applied width to the fitted q-range (no extrapolation). Stays strictly per-frame (no dataset pooling), so it works online and for stills. Neutral on rotation data (cytC high-res CC1/2 win preserved 66.8 vs 65.6%). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>Add an optional Smargon static positioner (chi/phi angles + rotation axes) that is reconstructed into the NXmx sample transformation chain. Chi/phi are appended at the innermost end of the chain (closest to the sample) for both the goniometer and grid-scan branches, with axes defaulting to chi {0,0,1} and phi = omega default {1,0,0}. - SmargonPosition gains chi_axis/phi_axis (common/JFJochMessages.h) - OpenAPI: optional phi_axis/chi_axis arrays; clients regenerated - OpenAPIConvert wires Dataset_settings.smargon -> DatasetSettings - CBOR serializer/deserializer round-trip the axes - tests: CBORSerialize_Start_Smargon Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>The two Windows jobs (build-windows / build-windows-nocuda) differed only in -DJFJOCH_USE_CUDA=ON/OFF, so collapse them into one matrixed job (variant cuda/nocuda), mirroring the build-rpm matrix. On a tag, upload the NSIS installer (jfjoch-<version>-win64-{cuda<major>|cpu}.exe, named in CMakeLists.txt) to the release via gitea_upload_file.py, the same helper the RPM/DEB nocuda variants use to attach viewer/writer artifacts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>Add a build-viewer-tgz job on the oldest runner (RHEL 8) that builds JFJOCH_VIEWER_ONLY with cuda12.9 and nocuda and uploads a self-contained jfjoch_viewer-<version>-linux-{cuda<major>|cpu}.tgz to the release, mirroring the Windows installer naming. CMake gains a Linux viewer-only CPack path that emits a single TGZ instead of an RPM/DEB. Drop the per-distro viewer RPM/DEB from the release uploads (writer + XDS plugin still go there); the viewer packages continue to publish to the package repositories unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>