Files
aare/python/tests
kferjaoui ac96d1f688
Build on RHEL8 / build (push) Successful in 2m53s
Build on RHEL9 / build (push) Successful in 3m15s
Run tests using data on local RHEL8 / build (push) Successful in 3m47s
Implement mixed precision: f32 stencil, f64 pedestal
- Stencil arithmetic and shared memory use float (COMPUTE_TYPE alias).
- Pedestal accumulation stays double to preserve variance accuracy.

Notes:
- On RTX 4090, FP32 throughput is ~64× higher than FP64, so moving
  stencil math to float improves performance.
- Using float also avoids shared memory bank conflicts: stride-18 maps
  to distinct banks for 32-bit values, but caused conflicts with 64-bit.
2026-04-27 14:56:40 +02:00
..
2025-11-20 09:01:28 +01:00
2025-11-20 09:01:28 +01:00
2025-11-20 09:01:28 +01:00
2025-11-20 09:01:28 +01:00
2026-02-26 14:23:45 +01:00
2025-11-20 09:01:28 +01:00