Runs ROIIntegrationGPU and ROIIntegrationCPU on identical input and asserts
every per-ROI field (sum, sum_square, max, pixels, weighted centre, masked
count) matches bit-for-bit. Uses overlapping ROI boxes (multi-bit masks),
negative pixel values (signed weighted-sum path), and an injected saturated
and masked pixel per ROI to cover the "max only" and "fully excluded" branches.
Guarded by JFJOCH_USE_CUDA and skips with a warning when no CUDA GPU is present,
mirroring ImageSpotFinderGPUTest.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>