Data integration workflow of experimental campaign¶

In this notebook, we will go through a our data integration workflow. This involves the following steps:

Specify data integration file through YAML configuration file.
Create an integrated HDF5 file of experimental campaign from configuration file.
Display the created HDF5 file using a treemap

Import libraries and modules¶

Excecute (or Run) the Cell below

In [1]:

import sys
import os
# Set up project root directory
root_dir = os.path.abspath(os.curdir)
sys.path.append(root_dir)
sys.path.append(os.path.join(root_dir,'dima'))
#sys.path.append(os.path.join(root_dir,'dima','instruments'))
#sys.path.append(os.path.join(root_dir,'dima','src'))
#sys.path.append(os.path.join(root_dir,'dima','utils'))

import dima.src.hdf5_vis as hdf5_vis
import dima.pipelines.data_integration as dilib

Step 1: Specify data integration task through YAML configuration file¶

Create your configuration file (i.e., *.yaml file) adhering to the example yaml file in the input folder.
Set up input directory and output directory paths and Excecute Cell.

In [2]:

yaml_config_file_path = 'dima_config.yaml'

Step 2: Create an integrated HDF5 file of experimental campaign.¶

Excecute Cell. Here we run the function integrate_data_sources with input argument as the previously specified YAML config file.

In [ ]:

hdf5_file_path = dilib.integrate_data_sources(yaml_config_file_path)

Display integrated HDF5 file using a treemap¶

Excecute Cell. A visual representation in html format of the integrated file should be displayed and stored in the output directory folder

In [ ]:

if isinstance(hdf5_file_path ,list):
    for path_item in hdf5_file_path :
        hdf5_vis.display_group_hierarchy_on_a_treemap(path_item)
else:
    hdf5_vis.display_group_hierarchy_on_a_treemap(hdf5_file_path)

¶

In [8]:

import dima.pipelines.metadata_revision as metadata

import dima.src.hdf5_ops as h5de

channels1 = ['Chl_11000','NH4_11000','SO4_11000','NO3_11000','Org_11000']
channels2 = ['FilamentEmission_mA','VaporizerTemp_C','FlowRate_mb','ABsamp']

target_channels = {'location':'ACSM_TOFWARE/ACSM_JFJ_2024_JantoFeb_timeseries.txt/data_table',
                   'names': ','.join(['t_start_Buf','Chl_11000','NH4_11000','SO4_11000','NO3_11000','Org_11000'])
                   }
diagnostic_channels = {'location':'ACSM_TOFWARE/ACSM_JFJ_2024_JantoFeb_meta.txt/data_table',
                       'names': ','.join(['t_base','FilamentEmission_mA','VaporizerTemp_C','FlowRate_mb','ABsamp'])}

DataOpsAPI = h5de.HDF5DataOpsManager(hdf5_file_path[0])

DataOpsAPI.load_file_obj()
DataOpsAPI.append_metadata('/ACSM_TOFWARE/',{'target_channels' : target_channels, 'diagnostic_channels' : diagnostic_channels})

DataOpsAPI.reformat_datetime_column('ACSM_TOFWARE/ACSM_JFJ_2024_JantoFeb_timeseries.txt/data_table','t_start_Buf',src_format='%d.%m.%Y %H:%M:%S.%f')
DataOpsAPI.reformat_datetime_column('ACSM_TOFWARE/ACSM_JFJ_2024_JantoFeb_meta.txt/data_table','t_base',src_format='%d.%m.%Y %H:%M:%S')

DataOpsAPI.unload_file_obj()

In [ ]:

5.0 KiB Raw Blame History

Data integration workflow of experimental campaign¶

Import libraries and modules¶

Step 1: Specify data integration task through YAML configuration file¶

Step 2: Create an integrated HDF5 file of experimental campaign.¶

Display integrated HDF5 file using a treemap¶

¶

5.0 KiB

Raw Blame History