Implement mixed precision: f32 stencil, f64 pedestal
Build on RHEL8 / build (push) Successful in 2m53s
Build on RHEL9 / build (push) Successful in 3m15s
Run tests using data on local RHEL8 / build (push) Successful in 3m47s

- Stencil arithmetic and shared memory use float (COMPUTE_TYPE alias).
- Pedestal accumulation stays double to preserve variance accuracy.

Notes:
- On RTX 4090, FP32 throughput is ~64× higher than FP64, so moving
  stencil math to float improves performance.
- Using float also avoids shared memory bank conflicts: stride-18 maps
  to distinct banks for 32-bit values, but caused conflicts with 64-bit.
This commit is contained in:
kferjaoui
2026-04-27 14:56:40 +02:00
parent 40f08fad92
commit ac96d1f688
3 changed files with 112 additions and 103 deletions
File diff suppressed because one or more lines are too long