Files
dima/notebooks/example_workflow_metadata_annotation.ipynb
T
2025-02-03 10:31:48 +01:00

4.8 KiB

Metadata Annotation Process

In this notebook, we will go through a simple metadata annotation process. This involves the following steps:

  1. Define an HDF5 file.
  2. Create a YAML representation of the HDF5 file.
  3. Edit and augment the YAML with metadata.
  4. Update the original file based on the edited YAML.

Import libraries and modules

  • Excecute (or Run) the Cell below
In [ ]:
import os
from nbutils import add_project_path_to_sys_path


# Add project root to sys.path
add_project_path_to_sys_path()

try:
    import src.hdf5_ops as hdf5_ops
    import pipelines.metadata_revision as metadata_revision
    print("Imports successful!")
except ImportError as e:
    print(f"Import error: {e}")
Imports successful!

Step 1: Define an HDF5 file

  • Set up the string variable hdf5_file_path with the path to the HDF5 file of interest.
  • Excecute Cell.
In [ ]:
hdf5_file_path = "../output_files/collection_kinetic_flowtube_study_LuciaI_2022-01-31_2023-06-29/kinetic_flowtube_study_LuciaI_2023-06-29.h5"

Step 2: Create a YAML Representation of the File

We now convert HDF5 file structure and existing metadata into a YAML format. This will be used to add and edit metadata attributes.

  • Excecute Cell.
In [4]:
yaml_file_path = hdf5_ops.serialize_metadata(hdf5_file_path,output_format='json')

if os.path.exists(yaml_file_path):
    print(f'The YAML file representation {yaml_file_path} of the HDF5 file {hdf5_file_path} was created successfully.')
The YAML file representation output_files/collection_kinetic_flowtube_study_LuciaI_2022-01-31_2023-06-29/kinetic_flowtube_study_LuciaI_2023-06-29.json of the HDF5 file output_files/collection_kinetic_flowtube_study_LuciaI_2022-01-31_2023-06-29/kinetic_flowtube_study_LuciaI_2023-06-29.h5 was created successfully.

Step 3: Edit and Augment YAML with Metadata

We can now manually edit the YAML file to add metadata.

  • (Optional) automate your metadata annotation process by creating a program that takes the YAMl file and returns the modified version of it.
  • Excecute Cell.
In [ ]:
def metadata_annotation_process(yaml_file_path):

    # Include metadata annotation logic, e.g., load yaml file and modify its content accordingly

    print(f'Ensure your edits to {yaml_file_path} have been properly incorporated and saved.')

    return yaml_file_path

yaml_file_path = metadata_annotation_process(yaml_file_path)

Step 4: Update the Original File Based on the Edited YAML

Lastly, we will update the original file with the metadata from the YAML file.

  • Excecute Cell.
In [ ]:
metadata_revision.update_hdf5_file_with_review(hdf5_file_path,yaml_file_path)