mirror of
https://github.com/slsdetectorgroup/aare.git
synced 2026-06-05 23:38:43 +02:00
Implement mixed precision: f32 stencil, f64 pedestal
- Stencil arithmetic and shared memory use float (COMPUTE_TYPE alias). - Pedestal accumulation stays double to preserve variance accuracy. Notes: - On RTX 4090, FP32 throughput is ~64× higher than FP64, so moving stencil math to float improves performance. - Using float also avoids shared memory bank conflicts: stride-18 maps to distinct banks for 32-bit values, but caused conflicts with 64-bit.
This commit is contained in:
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user