mirror of
https://github.com/slsdetectorgroup/aare.git
synced 2026-06-05 19:48:40 +02:00
ac96d1f688
- Stencil arithmetic and shared memory use float (COMPUTE_TYPE alias). - Pedestal accumulation stays double to preserve variance accuracy. Notes: - On RTX 4090, FP32 throughput is ~64× higher than FP64, so moving stencil math to float improves performance. - Using float also avoids shared memory bank conflicts: stride-18 maps to distinct banks for 32-bit values, but caused conflicts with 64-bit.