Files
dima/notebooks/example_workflow_metadata_annotation.ipynb

173 lines
4.8 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Metadata Annotation Process\n",
"\n",
"In this notebook, we will go through a simple metadata annotation process. This involves the following steps:\n",
"\n",
"1. Define an HDF5 file.\n",
"2. Create a YAML representation of the HDF5 file.\n",
"3. Edit and augment the YAML with metadata.\n",
"4. Update the original file based on the edited YAML.\n",
"\n",
"\n",
"## Import libraries and modules\n",
"\n",
"* Excecute (or Run) the Cell below"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Imports successful!\n"
]
}
],
"source": [
"import os\n",
"from nbutils import add_project_path_to_sys_path\n",
"\n",
"\n",
"# Add project root to sys.path\n",
"add_project_path_to_sys_path()\n",
"\n",
"try:\n",
" import src.hdf5_ops as hdf5_ops\n",
" import pipelines.metadata_revision as metadata_revision\n",
" print(\"Imports successful!\")\n",
"except ImportError as e:\n",
" print(f\"Import error: {e}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Define an HDF5 file\n",
"\n",
"* Set up the string variable `hdf5_file_path` with the path to the HDF5 file of interest.\n",
"* Excecute Cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hdf5_file_path = \"../output_files/collection_kinetic_flowtube_study_LuciaI_2022-01-31_2023-06-29/kinetic_flowtube_study_LuciaI_2023-06-29.h5\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Create a YAML Representation of the File\n",
"\n",
"We now convert HDF5 file structure and existing metadata into a YAML format. This will be used to add and edit metadata attributes.\n",
"\n",
"* Excecute Cell."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The YAML file representation output_files/collection_kinetic_flowtube_study_LuciaI_2022-01-31_2023-06-29/kinetic_flowtube_study_LuciaI_2023-06-29.json of the HDF5 file output_files/collection_kinetic_flowtube_study_LuciaI_2022-01-31_2023-06-29/kinetic_flowtube_study_LuciaI_2023-06-29.h5 was created successfully.\n"
]
}
],
"source": [
"yaml_file_path = hdf5_ops.serialize_metadata(hdf5_file_path,output_format='json')\n",
"\n",
"if os.path.exists(yaml_file_path):\n",
" print(f'The YAML file representation {yaml_file_path} of the HDF5 file {hdf5_file_path} was created successfully.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Edit and Augment YAML with Metadata\n",
"\n",
"We can now manually edit the YAML file to add metadata.\n",
"* (Optional) automate your metadata annotation process by creating a program that takes the YAMl file and returns the modified version of it.\n",
"* Excecute Cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def metadata_annotation_process(yaml_file_path):\n",
"\n",
" # Include metadata annotation logic, e.g., load yaml file and modify its content accordingly\n",
"\n",
" print(f'Ensure your edits to {yaml_file_path} have been properly incorporated and saved.')\n",
"\n",
" return yaml_file_path\n",
"\n",
"yaml_file_path = metadata_annotation_process(yaml_file_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4: Update the Original File Based on the Edited YAML\n",
"\n",
"Lastly, we will update the original file with the metadata from the YAML file.\n",
"\n",
"* Excecute Cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"metadata_revision.update_hdf5_file_with_review(hdf5_file_path,yaml_file_path)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "multiphase_chemistry_env",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}