Optimize CUDA cluster finder transfers and kernel hot path · 88e0e8d678 - aare

mirror of https://github.com/slsdetectorgroup/aare.git synced 2026-07-28 21:12:52 +02:00

Optimize CUDA cluster finder transfers and kernel hot path

Build on RHEL8 / build (push) Successful in 2m51s

Details

Build on RHEL9 / build (push) Successful in 3m15s

Details

Run tests using data on local RHEL8 / build (push) Successful in 3m47s

Details

- Use per-stream pinned host staging buffers for truly async CUDA transfers.
- Avoid reserving full device capacity per result frame.
- Reduce kernel work by delaying cluster payload construction.
- Use squared comparisons and removing per-pixel sqrtf() ops.

This commit is contained in:

kferjaoui

2026-04-30 18:23:31 +02:00

parent 34e69a8065

commit 88e0e8d678

3 changed files with 241 additions and 91 deletions

python/tests/ClusterFinderCUDA.ipynb

+105 -36

View File

File diff suppressed because one or more lines are too long

Optimize CUDA cluster finder transfers and kernel hot path Build on RHEL8 / build (push) Successful in 2m51s Details Build on RHEL9 / build (push) Successful in 3m15s Details Run tests using data on local RHEL8 / build (push) Successful in 3m47s Details

Optimize CUDA cluster finder transfers and kernel hot path

Build on RHEL8 / build (push) Successful in 2m51s

Details

Build on RHEL9 / build (push) Successful in 3m15s

Details

Run tests using data on local RHEL8 / build (push) Successful in 3m47s

Details