detectors/aare - aare - PSI GIT Service

442 Commits 35 Branches 11 Tags

2 Commits

Include renames

Author	SHA1	Message	Date
kferjaoui	3ed773e520	Add multi-stream ClusterFinderCUDA with batched processing - Wrap per-stream CUDA resources (device buffers, stream handle) in StreamContext struct; ClusterFinderCUDA owns a vector of n_streams contexts with independent pedestal arrays - Split ClusterFinderCUDA.cuh into clusterfinder_kernel.cuh (device kernel) and ClusterFinderCUDA.hpp (host RAII wrapper) - Add find_clusters_batched(): processes N frames round-robin across streams, returns per-frame cluster vectors. - Update ClusterFinderCUDA.test.cu - Update Makefile for new file layout.	2026-04-23 11:26:29 +02:00
kferjaoui	69151de3c7	Add in-kernel pedestal update, disable quadrant test Build on RHEL8 / build (push) Successful in 2m48s Details Build on RHEL9 / build (push) Successful in 3m4s Details Run tests using data on local RHEL8 / build (push) Successful in 3m35s Details - Non-photon pixels now update pedestal (push_fast equivalent) directly in the kernel, no atomics needed - Commented out quadrant significance test (c2): absent from sequential CPU code, was producing GPU-only clusters. - Added d_pd_sum to device allocations and host upload Build (sm_89): 46 registers, 0 spills, 100% occupancy. Verified on 256x256 Jungfrau data, 5000 frames, nSigma=5.0: CPU 8428 vs GPU 8471 clusters, 99.8% match 0.63 ms/frame CPU vs 0.04 ms/frame GPU (~16x)	2026-04-13 11:28:03 +02:00

Author

SHA1

Message

Date

kferjaoui

3ed773e520

Add multi-stream ClusterFinderCUDA with batched processing

- Wrap per-stream CUDA resources (device buffers, stream handle)
  in StreamContext struct; ClusterFinderCUDA owns a vector of
  n_streams contexts with independent pedestal arrays
- Split ClusterFinderCUDA.cuh into clusterfinder_kernel.cuh
  (device kernel) and ClusterFinderCUDA.hpp (host RAII wrapper)
- Add find_clusters_batched(): processes N frames round-robin
  across streams, returns per-frame cluster vectors.
- Update ClusterFinderCUDA.test.cu
- Update Makefile for new file layout.

2026-04-23 11:26:29 +02:00

kferjaoui

69151de3c7

Add in-kernel pedestal update, disable quadrant test

Build on RHEL8 / build (push) Successful in 2m48s

Details

Build on RHEL9 / build (push) Successful in 3m4s

Details

Run tests using data on local RHEL8 / build (push) Successful in 3m35s

Details

- Non-photon pixels now update pedestal (push_fast equivalent)
  directly in the kernel, no atomics needed
- Commented out quadrant significance test (c2): absent from
  sequential CPU code, was producing GPU-only clusters.
- Added d_pd_sum to device allocations and host upload

Build (sm_89): 46 registers, 0 spills, 100% occupancy.

Verified on 256x256 Jungfrau data, 5000 frames, nSigma=5.0:
  CPU 8428 vs GPU 8471 clusters, 99.8% match
  0.63 ms/frame CPU vs 0.04 ms/frame GPU (~16x)

2026-04-13 11:28:03 +02:00