aare/python at feature/cuda_clusterfinder - aare

detectors/aare

Fork 0

mirror of https://github.com/slsdetectorgroup/aare.git synced 2026-07-30 09:53:39 +02:00

Files

T

History

kferjaoui 5922c73c07

Build on RHEL8 / build (push) Successful in 3m16s

Details

Build on RHEL9 / build (push) Successful in 3m26s

Details

Run tests using data on local RHEL8 / build (push) Successful in 9m42s

Details

feat(ClusterFinderCUDA): async submit_batch/collect API

- Eliminate the ~200–300 µs inter-batch idle gap by allowing two batches
to be in-flight simultaneously:
  - submit_batch() enqueues H2D+kernel+D2H without blocking
  - collect() syncs via cudaEventSynchronize (not
  cudaStreamSynchronize) so a queued second batch runs uninterrupted.

- Two ping-pong output slots (NUM_SLOTS=2) with per-slot pinned buffers
and cudaEventDisableTiming sync events.
- find_clusters_batched() keeps its direct implementation.

* Measured: 0.026 -> 0.022 ms/frame (~18%).

2026-05-28 16:23:37 +02:00

aare

Refactor ClusterFinderCUDA

2026-05-18 16:30:13 +02:00

examples

Feature/minuit2 wrapper (#279 )

2026-03-30 09:12:23 +02:00

src

feat(ClusterFinderCUDA): async submit_batch/collect API

2026-05-28 16:23:37 +02:00

tests

feat(ClusterFinderCUDA): async submit_batch/collect API

2026-05-28 16:23:37 +02:00

CMakeLists.txt

Apply cmake-format to build files

2026-04-27 11:53:56 +02:00