mirror of
https://github.com/slsdetectorgroup/aare.git
synced 2026-06-09 08:28:40 +02:00
perf(ClusterFinderCUDA): FP32 device pedestal and bulk memcpy drain
- Device pedestal arrays (mean/sum/sum2) are now float instead of double: halves global-memory bandwidth for pedestal reads/writes and eliminates FP64 arithmetic in the kernel (3.3x kernel speedup, 15µs -> 4.6µs). - Replace the per-cluster push_back loop in the D2H drain with a single resize()+memcpy().
This commit is contained in:
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user