Pipelines and workflows

pipelines.data_integration.copy_subtree_and_create_hdf5(src, dst, select_dir_keywords, select_file_keywords, allowed_file_extensions, root_metadata_dict)[source]

Helper function to copy directory with constraints and create HDF5.

pipelines.data_integration.load_config_and_setup_logging(yaml_config_file_path, log_dir)[source]

Load YAML configuration file, set up logging, and validate required keys and datetime_steps.

pipelines.data_integration.run_pipeline(path_to_config_yamlFile, log_dir='logs/')[source]

Integrates data sources specified by the input configuration file into HDF5 files.

Parameters:

yaml_config_file_path (str): Path to the YAML configuration file. log_dir (str): Directory to save the log file.

Returns:

list: List of Paths to the created HDF5 file(s).

pipelines.metadata_revision.count(hdf5_obj, yml_dict)[source]
pipelines.metadata_revision.load_yaml(review_yaml_file)[source]
pipelines.metadata_revision.update_hdf5_file_with_review(input_hdf5_file, review_yaml_file)[source]

Updates, appends, or deletes metadata attributes in an HDF5 file based on a provided YAML dictionary.

Parameters:

input_hdf5_filestr

Path to the HDF5 file.

yaml_dictdict

Dictionary specifying objects and their attributes with operations. Example format: {

“object_name”: { “attributes”“attr_name”: { “value”: attr_value,

“delete”: true | false

}

}

}

pipelines.metadata_revision.validate_yaml_dict(input_hdf5_file, yaml_dict)[source]