mirror of
https://github.com/slsdetectorgroup/aare.git
synced 2026-06-05 23:08:40 +02:00
Optimize CUDA cluster finder transfers and kernel hot path
- Use per-stream pinned host staging buffers for truly async CUDA transfers. - Avoid reserving full device capacity per result frame. - Reduce kernel work by delaying cluster payload construction. - Use squared comparisons and removing per-pixel sqrtf() ops.
This commit is contained in:
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user