Refactor step 1 in notebook to facilitate usage of campaign descriptors

This commit is contained in:
2025-06-20 10:37:01 +02:00
parent b610b4e337
commit 177bcee400

View File

@ -1,182 +1,181 @@
{ {
"cells": [ "cells": [
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Data integration workflow of experimental campaign\n", "# Data integration workflow of experimental campaign\n",
"\n", "\n",
"In this notebook, we will go through a our data integration workflow. This involves the following steps:\n", "In this notebook, we will go through a our data integration workflow. This involves the following steps:\n",
"\n", "\n",
"1. Specify data integration file through YAML configuration file.\n", "1. Specify data integration file through YAML configuration file.\n",
"2. Create an integrated HDF5 file of experimental campaign from configuration file.\n", "2. Create an integrated HDF5 file of experimental campaign from configuration file.\n",
"3. Display the created HDF5 file using a treemap\n", "3. Display the created HDF5 file using a treemap\n",
"\n", "\n",
"## Import libraries and modules\n", "## Import libraries and modules\n",
"\n", "\n",
"* Excecute (or Run) the Cell below" "* Excecute (or Run) the Cell below"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from nbutils import add_project_path_to_sys_path\n", "from nbutils import add_project_path_to_sys_path\n",
"\n", "\n",
"# Add project root to sys.path\n", "# Add project root to sys.path\n",
"add_project_path_to_sys_path()\n", "add_project_path_to_sys_path()\n",
"\n", "\n",
"try:\n", "try:\n",
" import visualization.hdf5_vis as hdf5_vis\n", " import visualization.hdf5_vis as hdf5_vis\n",
" import pipelines.data_integration as data_integration\n", " import pipelines.data_integration as data_integration\n",
" print(\"Imports successful!\")\n", " print(\"Imports successful!\")\n",
"except ImportError as e:\n", "except ImportError as e:\n",
" print(f\"Import error: {e}\")" " print(f\"Import error: {e}\")"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Step 1: Specify data integration task through YAML configuration file\n", "## Step 1: Specify data integration task through YAML configuration file\n",
"\n", "\n",
"* Create your configuration file (i.e., *.yaml file) adhering to the example yaml file in the input folder.\n", "* Create your configuration file (i.e., *.yaml file) adhering to the example yaml file in the input folder.\n",
"* Set up input directory and output directory paths and Excecute Cell.\n", "* Set up input directory and output directory paths and Excecute Cell.\n",
"\n" "\n"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 2, "execution_count": null,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"#output_filename_path = 'output_files/unified_file_smog_chamber_2024-04-07_UTC-OFST_+0200_NG.h5'\n", "number, initials = 1, 'LI' # Set as either 2, 'TBR' or 3, 'NG'\n",
"yaml_config_file_path = '../input_files/data_integr_config_file_TBR.yaml'\n", "campaign_descriptor_path = f'../input_files/campaignDescriptor{number}_{initials}.yaml'\n",
"\n", "\n",
"#path_to_input_directory = 'output_files/kinetic_flowtube_study_2022-01-31_LuciaI'\n", "print(campaign_descriptor_path)\n"
"#path_to_hdf5_file = hdf5_lib.create_hdf5_file_from_filesystem_path(path_to_input_directory)\n" ]
] },
}, {
{ "cell_type": "markdown",
"cell_type": "markdown", "metadata": {},
"metadata": {}, "source": [
"source": [ "## Step 2: Create an integrated HDF5 file of experimental campaign.\n",
"## Step 2: Create an integrated HDF5 file of experimental campaign.\n", "\n",
"\n", "* Excecute Cell. Here we run the function `integrate_data_sources` with input argument as the previously specified YAML config file."
"* Excecute Cell. Here we run the function `integrate_data_sources` with input argument as the previously specified YAML config file." ]
] },
}, {
{ "cell_type": "code",
"cell_type": "code", "execution_count": null,
"execution_count": null, "metadata": {},
"metadata": {}, "outputs": [],
"outputs": [], "source": [
"source": [ "\n",
"\n", "hdf5_file_path = data_integration.run_pipeline(campaign_descriptor_path)"
"hdf5_file_path = data_integration.run_pipeline(yaml_config_file_path)" ]
] },
}, {
{ "cell_type": "code",
"cell_type": "code", "execution_count": null,
"execution_count": null, "metadata": {},
"metadata": {}, "outputs": [],
"outputs": [], "source": [
"source": [ "hdf5_file_path "
"hdf5_file_path " ]
] },
}, {
{ "cell_type": "markdown",
"cell_type": "markdown", "metadata": {},
"metadata": {}, "source": [
"source": [ "## Display integrated HDF5 file using a treemap\n",
"## Display integrated HDF5 file using a treemap\n", "\n",
"\n", "* Excecute Cell. A visual representation in html format of the integrated file should be displayed and stored in the output directory folder"
"* Excecute Cell. A visual representation in html format of the integrated file should be displayed and stored in the output directory folder" ]
] },
}, {
{ "cell_type": "code",
"cell_type": "code", "execution_count": null,
"execution_count": null, "metadata": {},
"metadata": {}, "outputs": [],
"outputs": [], "source": [
"source": [ "\n",
"\n", "if isinstance(hdf5_file_path ,list):\n",
"if isinstance(hdf5_file_path ,list):\n", " for path_item in hdf5_file_path :\n",
" for path_item in hdf5_file_path :\n", " hdf5_vis.display_group_hierarchy_on_a_treemap(path_item)\n",
" hdf5_vis.display_group_hierarchy_on_a_treemap(path_item)\n", "else:\n",
"else:\n", " hdf5_vis.display_group_hierarchy_on_a_treemap(hdf5_file_path)"
" hdf5_vis.display_group_hierarchy_on_a_treemap(hdf5_file_path)" ]
] },
}, {
{ "cell_type": "code",
"cell_type": "code", "execution_count": null,
"execution_count": null, "metadata": {},
"metadata": {}, "outputs": [],
"outputs": [], "source": [
"source": [ "import src.hdf5_ops as h5de \n",
"import src.hdf5_ops as h5de \n", "h5de.serialize_metadata(hdf5_file_path[0],folder_depth=3,output_format='yaml')"
"h5de.serialize_metadata(hdf5_file_path[0],folder_depth=3,output_format='yaml')" ]
] },
}, {
{ "cell_type": "code",
"cell_type": "code", "execution_count": null,
"execution_count": null, "metadata": {},
"metadata": {}, "outputs": [],
"outputs": [], "source": [
"source": [ "import src.hdf5_ops as h5de \n",
"import src.hdf5_ops as h5de \n", "print(hdf5_file_path)\n",
"print(hdf5_file_path)\n", "DataOpsAPI = h5de.HDF5DataOpsManager(hdf5_file_path[0])\n",
"DataOpsAPI = h5de.HDF5DataOpsManager(hdf5_file_path[0])\n", "\n",
"\n", "DataOpsAPI.load_file_obj()\n",
"DataOpsAPI.load_file_obj()\n", "\n",
"\n", "#DataOpsAPI.reformat_datetime_column('ICAD/HONO/2022_11_22_Channel1_Data.dat/data_table',\n",
"#DataOpsAPI.reformat_datetime_column('ICAD/HONO/2022_11_22_Channel1_Data.dat/data_table',\n", "# 'Start Date/Time (UTC)',\n",
"# 'Start Date/Time (UTC)',\n", "# '%Y-%m-%d %H:%M:%S.%f', '%Y-%m-%d %H:%M:%S')\n",
"# '%Y-%m-%d %H:%M:%S.%f', '%Y-%m-%d %H:%M:%S')\n", "DataOpsAPI.extract_and_load_dataset_metadata()\n",
"DataOpsAPI.extract_and_load_dataset_metadata()\n", "df = DataOpsAPI.dataset_metadata_df\n",
"df = DataOpsAPI.dataset_metadata_df\n", "print(df.head())\n",
"print(df.head())\n", "\n",
"\n", "DataOpsAPI.unload_file_obj()\n",
"DataOpsAPI.unload_file_obj()\n", "\n"
"\n" ]
] },
}, {
{ "cell_type": "code",
"cell_type": "code", "execution_count": null,
"execution_count": 5, "metadata": {},
"metadata": {}, "outputs": [],
"outputs": [], "source": [
"source": [ "DataOpsAPI.load_file_obj()\n",
"DataOpsAPI.load_file_obj()\n", "\n",
"\n", "DataOpsAPI.append_metadata('/',{'test_attr':'this is a test value'})\n",
"DataOpsAPI.append_metadata('/',{'test_attr':'this is a test value'})\n", "\n",
"\n", "DataOpsAPI.unload_file_obj()"
"DataOpsAPI.unload_file_obj()" ]
] }
} ],
], "metadata": {
"metadata": { "kernelspec": {
"kernelspec": { "display_name": "Python 3",
"display_name": "multiphase_chemistry_env", "language": "python",
"language": "python", "name": "python3"
"name": "python3" },
}, "language_info": {
"language_info": { "codemirror_mode": {
"codemirror_mode": { "name": "ipython",
"name": "ipython", "version": 3
"version": 3 },
}, "file_extension": ".py",
"file_extension": ".py", "mimetype": "text/x-python",
"mimetype": "text/x-python", "name": "python",
"name": "python", "nbconvert_exporter": "python",
"nbconvert_exporter": "python", "pygments_lexer": "ipython3",
"pygments_lexer": "ipython3", "version": "3.11.10"
"version": "3.11.9" }
} },
}, "nbformat": 4,
"nbformat": 4, "nbformat_minor": 4
"nbformat_minor": 4 }
}