NUMA CPU/memory pinning is no longer worthwhile: the FPGA DMA buffers are
placed device-local by the kernel (dma_alloc_coherent), the big RAM ring
buffer is random-access (first-touch handles placement), and GPU work is
already spread across all visible devices. So drop the pinning entirely
and with it libnuma.
- Delete NUMAHWPolicy; the only concern worth keeping - GPU selection -
is done directly via pin_gpu() (round-robin over visible GPUs) in the
indexer pool and the Lite analysis threads. CPU-only threads
(FPGA acquire/pedestal/summation/frame-transform) no longer bind
anything.
- Drop get_gpu_numa_node() (sysfs lookup) - only SelectGPUAndItsNUMA
used it.
- numa_policy broker setting is deprecated and ignored (kept in the API
for backward compatibility; warns once on startup).
- Remove NUMA_LIBRARY / numa.h / numaif.h detection from CMake.
- Docs: drop the NUMA dependency, remove the numa_policy config example,
and document running multiple brokers on disjoint GPUs via
CUDA_VISIBLE_DEVICES.
- Remove NUMA_GPU_REVIEW.md (the planning note; this work is now done).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The offline worker threads built MXAnalysisWithoutFPGA without selecting a CUDA
device, so all per-image preprocessing/spot-finding/azimuthal integration ran on
GPU 0 (only the indexer pool was distributed). Add pin_gpu() to CUDAWrapper - a
process-wide round-robin counter (counter++ % get_gpu_count(), no thread id, no-op
without a GPU, honours CUDA_VISIBLE_DEVICES) - and call it once per worker before
building the analysis resources so their CUDA streams/engines land on distinct
devices.
Also add NUMA_GPU_REVIEW.md: a working note mapping ImageBuffer/NUMAHWPolicy/GPU
dispatch with goals and a staged plan (multi-broker GPU isolation via
CUDA_VISIBLE_DEVICES, dropping libnuma, reassessing NUMA pinning for the FPGA path).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>