Pipelines and workflows
- pipelines.data_integration.copy_subtree_and_create_hdf5(src, dst, select_dir_keywords, select_file_keywords, allowed_file_extensions, root_metadata_dict)[source]
Helper function to copy directory with constraints and create HDF5.
- pipelines.data_integration.load_config_and_setup_logging(yaml_config_file_path, log_dir)[source]
Load YAML configuration file, set up logging, and validate required keys and datetime_steps.
- pipelines.data_integration.run_pipeline(path_to_config_yamlFile, log_dir='logs/')[source]
Integrates data sources specified by the input configuration file into HDF5 files.
- Parameters:
yaml_config_file_path (str): Path to the YAML configuration file. log_dir (str): Directory to save the log file.
- Returns:
list: List of Paths to the created HDF5 file(s).
- pipelines.metadata_revision.update_hdf5_file_with_review(input_hdf5_file, review_yaml_file)[source]
Updates, appends, or deletes metadata attributes in an HDF5 file based on a provided YAML dictionary.
Parameters:
- input_hdf5_filestr
Path to the HDF5 file.
- yaml_dictdict
Dictionary specifying objects and their attributes with operations. Example format: {
- “object_name”: { “attributes”“attr_name”: { “value”: attr_value,
“delete”: true | false
}
}
}