Files
dima/workflow_data_integration.ipynb
T
2024-06-21 15:42:23 +02:00

3.2 KiB

Data integration workflow of experimental campaign

In this notebook, we will go through a our data integration workflow. This involves the following steps:

  1. Specify data integration file through YAML configuration file.
  2. Create an integrated HDF5 file of experimental campaign from configuration file.
  3. Display the created HDF5 file using a treemap

Import libraries and modules

  • Excecute (or Run) the Cell below
In [ ]:
import sys
import os
# Set up project root directory
root_dir = os.path.abspath(os.curdir)
sys.path.append(root_dir)

import src.hdf5_vis as hdf5_vis
import src.data_integration_lib as dilib

Step 1: Specify data integration task through YAML configuration file

  • Create your configuration file (i.e., *.yaml file) adhering to the example yaml file in the input folder.
  • Set up input directory and output directory paths and Excecute Cell.
In [ ]:
#output_filename_path = 'output_files/unified_file_smog_chamber_2024-04-07_UTC-OFST_+0200_NG.h5'
yaml_config_file_path = 'input_files/data_integr_config_file_LI.yaml'

Step 2: Create an integrated HDF5 file of experimental campaign.

  • Excecute Cell. Here we run the function integrate_data_sources with input argument as the previously specified YAML config file.
In [ ]:
hdf5_file_path = dilib.integrate_data_sources(yaml_config_file_path)

Display integrated HDF5 file using a treemap

  • Excecute Cell. A visual representation in html format of the integrated file should be displayed and stored in the output directory folder
In [ ]:
if isinstance(hdf5_file_path ,list):
    for path_item in hdf5_file_path :
        hdf5_vis.display_group_hierarchy_on_a_treemap(path_item)
else:
    hdf5_vis.display_group_hierarchy_on_a_treemap(hdf5_file_path)