{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data integration workflow of experimental campaign\n", "\n", "In this notebook, we will go through a our data integration workflow. This involves the following steps:\n", "\n", "1. Specify data integration file through YAML configuration file.\n", "2. Create an integrated HDF5 file of experimental campaign from configuration file.\n", "3. Display the created HDF5 file using a treemap\n", "\n", "## Import libraries and modules\n", "\n", "* Excecute (or Run) the Cell below" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "c:\\Users\\florez_j\\.conda\\envs\\dash_multi_chem_env\\python311.zip\n", "c:\\Users\\florez_j\\.conda\\envs\\dash_multi_chem_env\\DLLs\n", "c:\\Users\\florez_j\\.conda\\envs\\dash_multi_chem_env\\Lib\n", "c:\\Users\\florez_j\\.conda\\envs\\dash_multi_chem_env\n", "\n", "c:\\Users\\florez_j\\.conda\\envs\\dash_multi_chem_env\\Lib\\site-packages\n", "c:\\Users\\florez_j\\.conda\\envs\\dash_multi_chem_env\\Lib\\site-packages\\win32\n", "c:\\Users\\florez_j\\.conda\\envs\\dash_multi_chem_env\\Lib\\site-packages\\win32\\lib\n", "c:\\Users\\florez_j\\.conda\\envs\\dash_multi_chem_env\\Lib\\site-packages\\Pythonwin\n", "c:\\Users\\florez_j\\Documents\\GitLab\\acsmnode\n", "c:\\Users\\florez_j\\Documents\\GitLab\\acsmnode\\dima\n" ] } ], "source": [ "import sys\n", "import os\n", "# Set up project root directory\n", "\n", "\n", "notebook_dir = os.getcwd() # Current working directory (assumes running from notebooks/)\n", "project_path = os.path.normpath(os.path.join(notebook_dir, \"..\")) # Move up to project root\n", "dima_path = os.path.normpath(os.path.join(project_path, \"dima\")) # Move up to project root\n", "\n", "for item in sys.path:\n", " print(item)\n", "\n", "\n", "if project_path not in sys.path: # Avoid duplicate entries\n", " sys.path.append(project_path)\n", " print(project_path)\n", "if dima_path not in sys.path:\n", " sys.path.insert(0,dima_path)\n", " print(dima_path)\n", "#sys.path.append(os.path.join(root_dir,'dima','instruments'))\n", "#sys.path.append(os.path.join(root_dir,'dima','src'))\n", "#sys.path.append(os.path.join(root_dir,'dima','utils'))\n", "\n", "import dima.visualization.hdf5_vis as hdf5_vis\n", "import dima.pipelines.data_integration as data_integration\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1: Specify data integration task through YAML configuration file\n", "\n", "* Create your configuration file (i.e., *.yaml file) adhering to the example yaml file in the input folder.\n", "* Set up input directory and output directory paths and Excecute Cell.\n", "\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "#yaml_config_file_path = 'dima/input_files/data_integr_config_file_TBR.yaml' \n", "yaml_config_file_path ='../campaignDescriptor.yaml'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2: Create an integrated HDF5 file of experimental campaign.\n", "\n", "* Excecute Cell. Here we run the function `integrate_data_sources` with input argument as the previously specified YAML config file." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "[Start] Data integration :\n", "Source: ..\\data\\collection_JFJ_2024_2025-03-14_2025-03-14\n", "Destination: ..\\data\\collection_JFJ_2024_2025-03-14_2025-03-14.h5\n", "\n", "Starting data transfer from instFolder: /ACSM_TOFWARE/2024\n", "Completed transfer for //ACSM_TOFWARE/2024/ACSM_JFJ_2024_meta.txt\n", "Completed transfer for //ACSM_TOFWARE/2024/ACSM_JFJ_2024_timeseries.txt\n", "Completed transfer for //ACSM_TOFWARE/2024/Org_data_valid.csv\n", "Completed transfer for //ACSM_TOFWARE/2024/Org_err_valid.csv\n", "Completed transfer for //ACSM_TOFWARE/2024/Org_mz_valid.csv\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "c:\\Users\\florez_j\\Documents\\GitLab\\acsmnode\\dima\\instruments\\readers\\acsm_tofware_reader.py:112: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.\n", " df = pd.read_csv(tmp_filename,\n", "c:\\Users\\florez_j\\Documents\\GitLab\\acsmnode\\dima\\instruments\\readers\\acsm_tofware_reader.py:112: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.\n", " df = pd.read_csv(tmp_filename,\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Completed transfer for //ACSM_TOFWARE/2024/Org_time_valid.csv\n", "[====================================================================================================] 100.0% ...\n", "Completed data transfer for instFolder: /ACSM_TOFWARE/2024\n", "[End] Data integration\n" ] } ], "source": [ "\n", "hdf5_file_path = data_integration.run_pipeline(yaml_config_file_path)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['..\\\\data\\\\collection_JFJ_2024_LeilaS_2025-02-22_2025-02-22.h5']" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hdf5_file_path" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Display integrated HDF5 file using a treemap\n", "\n", "* Excecute Cell. A visual representation in html format of the integrated file should be displayed and stored in the output directory folder" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/ACSM_TOFWARE\n", "/ACSM_TOFWARE/2024\n", "/ACSM_TOFWARE/2024/ACSM_JFJ_2024_meta.txt\n", "/ACSM_TOFWARE/2024/ACSM_JFJ_2024_timeseries.txt\n", "/ACSM_TOFWARE/2024/Org_data_valid.csv\n", "/ACSM_TOFWARE/2024/Org_err_valid.csv\n", "/ACSM_TOFWARE/2024/Org_mz_valid.csv\n", "/ACSM_TOFWARE/2024/Org_time_valid.csv\n" ] }, { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "branchvalues": "remainder", "customdata": [ "

project: Building FAIR data chains for atmospheric observations in the ACTRIS Switzerland Network
experiment: acsm_campaign
contact: LeilaS
level: 1", "/ACSM_TOFWARE", "/ACSM_TOFWARE/2024", "/ACSM_TOFWARE/2024/ACSM_JFJ_2024_meta.txt", "/ACSM_TOFWARE/2024/ACSM_JFJ_2024_timeseries.txt", "/ACSM_TOFWARE/2024/Org_data_valid.csv", "/ACSM_TOFWARE/2024/Org_err_valid.csv", "/ACSM_TOFWARE/2024/Org_mz_valid.csv", "/ACSM_TOFWARE/2024/Org_time_valid.csv" ], "hovertemplate": "%{label}
Count: %{value}
Path: %{customdata}", "labels": [ "/", "/ACSM_TOFWARE", "/ACSM_TOFWARE/2024", "/ACSM_TOFWARE/2024/ACSM_JFJ_2024_meta.txt", "/ACSM_TOFWARE/2024/ACSM_JFJ_2024_timeseries.txt", "/ACSM_TOFWARE/2024/Org_data_valid.csv", "/ACSM_TOFWARE/2024/Org_err_valid.csv", "/ACSM_TOFWARE/2024/Org_mz_valid.csv", "/ACSM_TOFWARE/2024/Org_time_valid.csv" ], "name": "", "parents": [ "", "/", "/ACSM_TOFWARE", "/ACSM_TOFWARE/2024", "/ACSM_TOFWARE/2024", "/ACSM_TOFWARE/2024", "/ACSM_TOFWARE/2024", "/ACSM_TOFWARE/2024", "/ACSM_TOFWARE/2024" ], "root": { "color": "lightgrey" }, "type": "treemap", "values": [ 1, 1, 6, 1, 1, 1, 1, 1, 1 ] } ], "layout": { "height": 600, "margin": { "b": 25, "l": 25, "r": 25, "t": 50 }, "template": { "data": { "bar": [ { "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "bar" } ], "barpolar": [ { "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "barpolar" } ], "carpet": [ { "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "type": "carpet" } ], "choropleth": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "choropleth" } ], "contour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "contour" } ], "contourcarpet": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "contourcarpet" } ], "heatmap": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmap" } ], "heatmapgl": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmapgl" } ], "histogram": [ { "marker": { "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "histogram" } ], "histogram2d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2d" } ], "histogram2dcontour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2dcontour" } ], "mesh3d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "mesh3d" } ], "parcoords": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "parcoords" } ], "pie": [ { "automargin": true, "type": "pie" } ], "scatter": [ { "fillpattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 }, "type": "scatter" } ], "scatter3d": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter3d" } ], "scattercarpet": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattercarpet" } ], "scattergeo": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergeo" } ], "scattergl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergl" } ], "scattermapbox": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermapbox" } ], "scatterpolar": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolar" } ], "scatterpolargl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolargl" } ], "scatterternary": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterternary" } ], "surface": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "surface" } ], "table": [ { "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" } ] }, "layout": { "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "autotypenumbers": "strict", "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "diverging": [ [ 0, "#8e0152" ], [ 0.1, "#c51b7d" ], [ 0.2, "#de77ae" ], [ 0.3, "#f1b6da" ], [ 0.4, "#fde0ef" ], [ 0.5, "#f7f7f7" ], [ 0.6, "#e6f5d0" ], [ 0.7, "#b8e186" ], [ 0.8, "#7fbc41" ], [ 0.9, "#4d9221" ], [ 1, "#276419" ] ], "sequential": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "sequentialminus": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ] }, "colorway": [ "#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52" ], "font": { "color": "#2a3f5f" }, "geo": { "bgcolor": "white", "lakecolor": "white", "landcolor": "#E5ECF6", "showlakes": true, "showland": true, "subunitcolor": "white" }, "hoverlabel": { "align": "left" }, "hovermode": "closest", "mapbox": { "style": "light" }, "paper_bgcolor": "white", "plot_bgcolor": "#E5ECF6", "polar": { "angularaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "radialaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "scene": { "xaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "yaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "zaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "ternary": { "aaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "baxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "caxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "title": { "x": 0.05 }, "xaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 }, "yaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 } } }, "width": 800 } } }, "metadata": {}, "output_type": "display_data" } ], "source": [ "if isinstance(hdf5_file_path ,list):\n", " for path_item in hdf5_file_path :\n", " hdf5_vis.display_group_hierarchy_on_a_treemap(path_item)\n", "else:\n", " hdf5_vis.display_group_hierarchy_on_a_treemap(hdf5_file_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import dima.pipelines.metadata_revision as metadata\n", "\n", "import dima.src.hdf5_ops as h5de\n", "\n", "channels1 = ['Chl_11000','NH4_11000','SO4_11000','NO3_11000','Org_11000']\n", "channels2 = ['FilamentEmission_mA','VaporizerTemp_C','FlowRate_mb','ABsamp']\n", "\n", "target_channels = {'location':'ACSM_TOFWARE/ACSM_JFJ_2024_JantoFeb_timeseries.txt/data_table',\n", " 'names': ','.join(['t_start_Buf','Chl_11000','NH4_11000','SO4_11000','NO3_11000','Org_11000'])\n", " }\n", "diagnostic_channels = {'location':'ACSM_TOFWARE/ACSM_JFJ_2024_JantoFeb_meta.txt/data_table',\n", " 'names': ','.join(['t_base','FilamentEmission_mA','VaporizerTemp_C','FlowRate_mb','ABsamp'])}\n", "\n", "DataOpsAPI = h5de.HDF5DataOpsManager(hdf5_file_path[0])\n", "\n", "DataOpsAPI.load_file_obj()\n", "DataOpsAPI.append_metadata('/ACSM_TOFWARE/',{'target_channels' : target_channels, 'diagnostic_channels' : diagnostic_channels})\n", "\n", "DataOpsAPI.reformat_datetime_column('ACSM_TOFWARE/ACSM_JFJ_2024_JantoFeb_timeseries.txt/data_table','t_start_Buf',src_format='%d.%m.%Y %H:%M:%S.%f')\n", "DataOpsAPI.reformat_datetime_column('ACSM_TOFWARE/ACSM_JFJ_2024_JantoFeb_meta.txt/data_table','t_base',src_format='%d.%m.%Y %H:%M:%S')\n", "\n", "DataOpsAPI.unload_file_obj()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "dash_multi_chem_env", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" } }, "nbformat": 4, "nbformat_minor": 4 }