Commit Graph

2 Commits

Author SHA1 Message Date
kferjaoui 3ed773e520 Add multi-stream ClusterFinderCUDA with batched processing
- Wrap per-stream CUDA resources (device buffers, stream handle)
  in StreamContext struct; ClusterFinderCUDA owns a vector of
  n_streams contexts with independent pedestal arrays
- Split ClusterFinderCUDA.cuh into clusterfinder_kernel.cuh
  (device kernel) and ClusterFinderCUDA.hpp (host RAII wrapper)
- Add find_clusters_batched(): processes N frames round-robin
  across streams, returns per-frame cluster vectors.
- Update ClusterFinderCUDA.test.cu
- Update Makefile for new file layout.
2026-04-23 11:26:29 +02:00
kferjaoui 69151de3c7 Add in-kernel pedestal update, disable quadrant test
Build on RHEL8 / build (push) Successful in 2m48s
Build on RHEL9 / build (push) Successful in 3m4s
Run tests using data on local RHEL8 / build (push) Successful in 3m35s
- Non-photon pixels now update pedestal (push_fast equivalent)
  directly in the kernel, no atomics needed
- Commented out quadrant significance test (c2): absent from
  sequential CPU code, was producing GPU-only clusters.
- Added d_pd_sum to device allocations and host upload

Build (sm_89): 46 registers, 0 spills, 100% occupancy.

Verified on 256x256 Jungfrau data, 5000 frames, nSigma=5.0:
  CPU 8428 vs GPU 8471 clusters, 99.8% match
  0.63 ms/frame CPU vs 0.04 ms/frame GPU (~16x)
2026-04-13 11:28:03 +02:00