The offline worker threads built MXAnalysisWithoutFPGA without selecting a CUDA device, so all per-image preprocessing/spot-finding/azimuthal integration ran on GPU 0 (only the indexer pool was distributed). Add pin_gpu() to CUDAWrapper - a process-wide round-robin counter (counter++ % get_gpu_count(), no thread id, no-op without a GPU, honours CUDA_VISIBLE_DEVICES) - and call it once per worker before building the analysis resources so their CUDA streams/engines land on distinct devices. Also add NUMA_GPU_REVIEW.md: a working note mapping ImageBuffer/NUMAHWPolicy/GPU dispatch with goals and a staged plan (multi-broker GPU isolation via CUDA_VISIBLE_DEVICES, dropping libnuma, reassessing NUMA pinning for the FPGA path). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
16 lines
534 B
C++
16 lines
534 B
C++
// SPDX-FileCopyrightText: 2024 Filip Leonarski, Paul Scherrer Institute <filip.leonarski@psi.ch>
|
|
// SPDX-License-Identifier: GPL-3.0-only
|
|
|
|
#pragma once
|
|
|
|
#include <cstdint>
|
|
|
|
int32_t get_gpu_count();
|
|
void set_gpu(int32_t dev_id);
|
|
int get_gpu_numa_node(int dev_id);
|
|
|
|
// Pin the calling thread to the next GPU in round-robin order, using a process-wide counter
|
|
// (counter++ % get_gpu_count()). Call once per thread; no thread id needed. No-op when no GPU
|
|
// is visible. Honours CUDA_VISIBLE_DEVICES via get_gpu_count().
|
|
void pin_gpu();
|