Files
Jungfraujoch/image_analysis/bragg_integration/BraggIntegrationEngine.h
T
leonarski_fandClaude Opus 4.8 ddddfb6ffc
Build Packages / build:windows:nocuda (pull_request) Successful in 14m41s
Build Packages / build:windows:cuda (pull_request) Successful in 16m48s
Build Packages / build:rpm (ubuntu2404_nocuda) (pull_request) Successful in 11m15s
Build Packages / build:rpm (rocky8_nocuda) (pull_request) Successful in 12m46s
Build Packages / build:rpm (ubuntu2204_nocuda) (pull_request) Successful in 12m38s
Build Packages / build:rpm (rocky9_nocuda) (pull_request) Successful in 13m11s
Build Packages / build:rpm (rocky8_sls9) (pull_request) Successful in 12m20s
Build Packages / build:rpm (rocky9_sls9) (pull_request) Successful in 12m22s
Build Packages / build:rpm (ubuntu2404) (pull_request) Successful in 11m7s
Build Packages / build:rpm (ubuntu2204) (pull_request) Successful in 11m55s
Build Packages / build:rpm (rocky8) (pull_request) Successful in 12m56s
Build Packages / Generate python client (pull_request) Successful in 14s
Build Packages / build:rpm (rocky9) (pull_request) Successful in 13m15s
Build Packages / Create release (pull_request) Skipped
Build Packages / Build documentation (pull_request) Successful in 41s
Build Packages / XDS test (durin plugin) (pull_request) Successful in 10m3s
Build Packages / DIALS test (pull_request) Successful in 13m6s
Build Packages / XDS test (neggia plugin) (pull_request) Successful in 6m58s
Build Packages / XDS test (JFJoch plugin) (pull_request) Successful in 7m30s
Build Packages / Unit tests (pull_request) Successful in 58m5s
Build Packages / Unit tests (push) Successful in 1h12m36s
Build Packages / build:rpm (rocky8_nocuda) (push) Successful in 14m52s
Build Packages / build:rpm (rocky9_nocuda) (push) Successful in 15m35s
Build Packages / build:rpm (ubuntu2204_nocuda) (push) Successful in 15m29s
Build Packages / build:rpm (ubuntu2404_nocuda) (push) Successful in 13m35s
Build Packages / build:rpm (rocky8_sls9) (push) Successful in 15m25s
Build Packages / build:rpm (rocky9_sls9) (push) Successful in 16m5s
Build Packages / build:rpm (rocky8) (push) Successful in 15m11s
Build Packages / build:rpm (rocky9) (push) Successful in 13m35s
Build Packages / build:rpm (ubuntu2204) (push) Successful in 11m59s
Build Packages / build:rpm (ubuntu2404) (push) Successful in 12m14s
Build Packages / DIALS test (push) Successful in 14m29s
Build Packages / XDS test (durin plugin) (push) Successful in 9m56s
Build Packages / XDS test (JFJoch plugin) (push) Successful in 10m23s
Build Packages / XDS test (neggia plugin) (push) Successful in 9m3s
Build Packages / Generate python client (push) Successful in 20s
Build Packages / Build documentation (push) Successful in 1m10s
Build Packages / Create release (push) Skipped
Build Packages / build:windows:nocuda (push) Successful in 16m39s
Build Packages / build:windows:cuda (push) Successful in 18m40s
bragg_integration: GPU box + profile-fit integrator (standalone engine)
Reimplement BraggIntegrate2D (box sum) and ProfileIntegrate2D (Kabsch
profile fit) under one roof as a base + CPU + GPU engine, mirroring the
AzIntEngine / ROIIntegration pattern. Reads the preprocessed int32
ImagePreprocessorBuffer (masked=INT32_MIN, saturated=INT32_MAX), the same
buffer AzIntEngineGPU/ROIIntegrationGPU consume.

The CUDA engine runs one block per reflection with shared-memory
reductions across six kernels (reset, mask, box-sum, profile learning,
profile build, Kabsch fit); the resolution shell is computed inline. The
learning/fit hot path is single precision (FP64 is throttled on consumer
GPUs; reproduces the double CPU path to ~1e-4). Collapsing the per-frame
CUDA API calls into one reset kernel keeps launch-latency overhead low.

Standalone for now: NOT wired into IndexAndRefine. See
BRAGG_INTEGRATION_ENGINE.md for the design and the binding steps.
BraggIntegrationEngineGPUTest checks GPU == CPU across all three modes
(box/gaussian/empirical) within numeric tolerance, plus a [bragg_bench]
perf sweep.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-02 20:59:45 +02:00

106 lines
5.3 KiB
C++

// SPDX-FileCopyrightText: 2026 Filip Leonarski, Paul Scherrer Institute <filip.leonarski@psi.ch>
// SPDX-License-Identifier: GPL-3.0-only
#pragma once
// =============================================================================
// BraggIntegrationEngine — box-sum + profile-fitting 2D integrator, GPU-ready
// =============================================================================
//
// A reimplementation of BraggIntegrate2D (box sum) and ProfileIntegrate2D (Kabsch profile
// fit) under one roof, following the AzIntEngine / ROIIntegration pattern: a base class that
// extracts the fixed per-experiment configuration, a plain-C++ CPU engine (the fallback and the
// numeric oracle), and a CUDA engine (BraggIntegrationEngineGPU) that reaches the same result up
// to floating-point precision.
//
// Unlike BraggIntegrate2D/ProfileIntegrate2D, which read the raw CompressedImage per pixel type
// and reject the special/saturation +/-1 band, this engine reads the already-preprocessed int32
// image held in an ImagePreprocessorBuffer (the same buffer AzIntEngineGPU/ROIIntegrationGPU
// consume): masked/bad pixels are INT32_MIN and saturated pixels INT32_MAX, so bad-pixel identity
// is owned by the preprocessor and a pixel is valid iff v != INT32_MIN && v != INT32_MAX.
//
// The integrator is selected by BraggIntegrationSettings::Integrator:
// BoxSum -> BraggIntegrate2D equivalent (rough disk sum minus ring-mean background)
// ProfileGaussian -> per-reflection measured-width Gaussian profile fit (the default)
// ProfileEmpirical-> per-shell learned empirical profile fit
// The box sum is also the seed pass (Pass A) of the two profile modes, so it always runs.
//
// This class is intentionally standalone: it is NOT yet wired into IndexAndRefine. It takes a
// preprocessed image + the predicted reflections and returns the same vector<Reflection> shape
// (I, sigma, bkg, partiality, ...) that the downstream scaling/merge consumes unchanged.
// =============================================================================
#include <cmath>
#include <cstddef>
#include <cstdint>
#include <optional>
#include <vector>
#include "../../common/BraggIntegrationSettings.h"
#include "../../common/DiffractionExperiment.h"
#include "../../common/DiffractionGeometry.h"
#include "../../common/Reflection.h"
#include "../image_preprocessing/ImagePreprocessorBuffer.h"
namespace bragg_engine {
// Shared with both engines so the CPU and GPU paths stay numerically aligned.
constexpr int N_SHELL = 6; // resolution shells for per-shell profile learning
constexpr double STRONG_I_OVER_SIGMA = 5.0; // strong-spot threshold that seeds the profile
constexpr int MIN_STRONG_PER_SHELL = 30; // below this a shell falls back to the global profile
constexpr double C_CAPTURE = 2.5; // weak-spot radial capture term (monochromatic only)
} // namespace bragg_engine
// One reflection's extracted intensity, produced by the derived engine and turned into a
// Reflection by Finalize() (which owns the polarization correction and scale bookkeeping).
struct BraggFitResult {
float I = 0.0f;
float sigma = NAN;
float bkg = 0.0f;
float observed_x = 0.0f; // intensity-weighted centroid (BoxSum mode only)
float observed_y = 0.0f;
bool ok = false;
bool has_observed = false;
};
class BraggIntegrationEngine {
protected:
// --- fixed configuration extracted from the experiment (see ProfileIntegrate2D) ---
IntegratorMode mode;
bool empirical; // ProfileEmpirical (vs ProfileGaussian)
size_t xpixel, ypixel, npixel;
float r1_sq;
float r2, r2_sq;
float r3, r3_sq;
float min_sigma_ratio;
int R, G, GG; // profile-grid half-size, edge (2R+1) and area (G*G)
bool broadband; // a set bandwidth (stills) vs monochromatic (rotation)
double bw_sigma; // bandwidth sigma [dimensionless, * Rpx -> px]
bool apply_bkg_clip; // stills-only high-outlier background sigma-clip
bool use_ellipse; // radially elongate the per-reflection Gaussian
double c_radial; // radial variance coefficient of tan^2(2theta): parallax + capture
double F_px; // detector distance expressed in pixels
float beam_x, beam_y;
DiffractionGeometry geom; // kept for the per-reflection polarization correction
std::optional<float> polarization;
// Assemble output reflections from the per-reflection fit results (polarization + scale corr).
std::vector<Reflection> Finalize(const std::vector<Reflection> &predicted, size_t npredicted,
const std::vector<BraggFitResult> &results,
int64_t image_number) const;
public:
explicit BraggIntegrationEngine(const DiffractionExperiment &experiment);
virtual ~BraggIntegrationEngine() = default;
// predicted[0..npredicted) are the reflections to extract; image is the preprocessed int32
// frame (image.size() == npixel). Returns only the observed reflections.
virtual std::vector<Reflection> Run(const ImagePreprocessorBuffer &image,
const std::vector<Reflection> &predicted, size_t npredicted,
int64_t image_number) = 0;
};