173 lines
4.8 KiB
Plaintext
173 lines
4.8 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Metadata Annotation Process\n",
|
|
"\n",
|
|
"In this notebook, we will go through a simple metadata annotation process. This involves the following steps:\n",
|
|
"\n",
|
|
"1. Define an HDF5 file.\n",
|
|
"2. Create a YAML representation of the HDF5 file.\n",
|
|
"3. Edit and augment the YAML with metadata.\n",
|
|
"4. Update the original file based on the edited YAML.\n",
|
|
"\n",
|
|
"\n",
|
|
"## Import libraries and modules\n",
|
|
"\n",
|
|
"* Excecute (or Run) the Cell below"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Imports successful!\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"import os\n",
|
|
"from nbutils import add_project_path_to_sys_path\n",
|
|
"\n",
|
|
"\n",
|
|
"# Add project root to sys.path\n",
|
|
"add_project_path_to_sys_path()\n",
|
|
"\n",
|
|
"try:\n",
|
|
" import src.hdf5_ops as hdf5_ops\n",
|
|
" import pipelines.metadata_revision as metadata_revision\n",
|
|
" print(\"Imports successful!\")\n",
|
|
"except ImportError as e:\n",
|
|
" print(f\"Import error: {e}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 1: Define an HDF5 file\n",
|
|
"\n",
|
|
"* Set up the string variable `hdf5_file_path` with the path to the HDF5 file of interest.\n",
|
|
"* Excecute Cell."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"hdf5_file_path = \"../output_files/collection_kinetic_flowtube_study_LuciaI_2022-01-31_2023-06-29/kinetic_flowtube_study_LuciaI_2023-06-29.h5\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 2: Create a YAML Representation of the File\n",
|
|
"\n",
|
|
"We now convert HDF5 file structure and existing metadata into a YAML format. This will be used to add and edit metadata attributes.\n",
|
|
"\n",
|
|
"* Excecute Cell."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"The YAML file representation output_files/collection_kinetic_flowtube_study_LuciaI_2022-01-31_2023-06-29/kinetic_flowtube_study_LuciaI_2023-06-29.json of the HDF5 file output_files/collection_kinetic_flowtube_study_LuciaI_2022-01-31_2023-06-29/kinetic_flowtube_study_LuciaI_2023-06-29.h5 was created successfully.\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"yaml_file_path = hdf5_ops.serialize_metadata(hdf5_file_path,output_format='json')\n",
|
|
"\n",
|
|
"if os.path.exists(yaml_file_path):\n",
|
|
" print(f'The YAML file representation {yaml_file_path} of the HDF5 file {hdf5_file_path} was created successfully.')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 3: Edit and Augment YAML with Metadata\n",
|
|
"\n",
|
|
"We can now manually edit the YAML file to add metadata.\n",
|
|
"* (Optional) automate your metadata annotation process by creating a program that takes the YAMl file and returns the modified version of it.\n",
|
|
"* Excecute Cell."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def metadata_annotation_process(yaml_file_path):\n",
|
|
"\n",
|
|
" # Include metadata annotation logic, e.g., load yaml file and modify its content accordingly\n",
|
|
"\n",
|
|
" print(f'Ensure your edits to {yaml_file_path} have been properly incorporated and saved.')\n",
|
|
"\n",
|
|
" return yaml_file_path\n",
|
|
"\n",
|
|
"yaml_file_path = metadata_annotation_process(yaml_file_path)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 4: Update the Original File Based on the Edited YAML\n",
|
|
"\n",
|
|
"Lastly, we will update the original file with the metadata from the YAML file.\n",
|
|
"\n",
|
|
"* Excecute Cell."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\n",
|
|
"metadata_revision.update_hdf5_file_with_review(hdf5_file_path,yaml_file_path)"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "multiphase_chemistry_env",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.11.9"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|