Files
Jungfraujoch/reader/HDF5ImageLocator.h
T
leonarski_f 23d27f30c4 reader: split raw-image reading into HDF5ImageLocator + HDF5ImageSource
Decouple the raw-pixel side of JFJochHDF5Reader from the rest as the first
step toward swappable per-dataset metadata snapshots.

- HDF5ImageLocator: single owner of the legacy/VDS/contiguous layout resolution
  plus a persistent open-file cache, replacing the four duplicated resolvers
  (GetImageLocation, ReadSpots, ReadReflections) and their per-call file caches.
  Also hosts the source-mapping logic (former GetHDF5DataSource body).
- HDF5ImageSource: raw-pixel reading (locator + LoadImageDataset); the part whose
  links to files stay fixed while the metadata master may change.
- JFJochHDF5Reader keeps a thin GetImageLocation/GetRawImage/GetHDF5DataSource that
  delegate to image_source_; the six layout members are gone, parsed into a local
  Layout handed to the source at the end of ReadFile. Cache cleared on Close().

Verified: tests/jfjoch_test [HDF5] (79 cases / 1775 assertions), and
jfjoch_process/azint/extract_hkl/scale relink unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 10:15:09 +02:00

61 lines
2.8 KiB
C++

// SPDX-FileCopyrightText: 2026 Filip Leonarski, Paul Scherrer Institute <filip.leonarski@psi.ch>
// SPDX-License-Identifier: GPL-3.0-only
#pragma once
#include <map>
#include <memory>
#include <optional>
#include <string>
#include <vector>
#include "../writer/HDF5Objects.h" // HDF5ReadOnlyFile, HDF5VirtualDatasetMapping, HDF5DataSetLayout
#include "../common/JFJochMessages.h" // FileWriterFormat, HDF5DataSourceMessage
// Turns a global image number into the HDF5 file + local index that physically holds its pixels,
// for all three on-disk layouts (legacy linked data files, VDS, contiguous/integrated). This is
// the part of the reader whose "links to files stay" constant: it knows where the raw images
// live, independent of which master file the per-image metadata is read from.
//
// Open data-file handles are cached, so scanning many images (e.g. reprocessing) does not reopen
// the same file on every read. HDF5 is not thread-safe, so every call must be made with the
// global hdf5_mutex held by the caller; the locator does no locking of its own.
class HDF5ImageLocator {
public:
struct Location {
std::shared_ptr<HDF5ReadOnlyFile> file;
uint32_t local_index = 0;
};
// Layout description, filled by the reader once the master file has been parsed. All paths
// are absolute: legacy data files and VDS mapping filenames are resolved relative to the
// master before being handed over, so the locator never deals with relative paths.
struct Layout {
FileWriterFormat format = FileWriterFormat::NoFile;
HDF5DataSetLayout data_layout = HDF5DataSetLayout::CONTIGUOUS;
std::shared_ptr<HDF5ReadOnlyFile> master_file;
std::string master_filename;
std::vector<std::string> legacy_files;
size_t images_per_file = 1;
std::vector<HDF5VirtualDatasetMapping> vds_mappings;
};
void Configure(Layout layout);
void Clear();
// Resolve a global image number to {file, local index}. Throws if the image is not covered
// by the layout. Does not bounds-check against the total image count - the caller does that.
Location Resolve(int64_t global_image) const;
// Source mapping for re-writing a derived file (e.g. _process.h5) so it links back to the
// original pixel sources rather than to a master. total_images is supplied by the caller.
std::vector<HDF5DataSourceMessage> GetSourceMapping(uint64_t first_image,
std::optional<uint64_t> image_count,
uint64_t total_images) const;
private:
Layout layout_;
mutable std::map<std::string, std::shared_ptr<HDF5ReadOnlyFile> > file_cache_;
std::shared_ptr<HDF5ReadOnlyFile> OpenCached(const std::string &path) const;
};