# ACSM Data Chain Workflow

In this notebook, we will go through our **ACSM Data Chain**. This involves the following steps:

1. Run the data integration pipeline to retrieve ACSM input data and prepare it for processing.  
2. Perform QC/QA analysis.  
3. (Optional) Conduct visual analysis for flag validation.  
4. Prepare input data and QC/QA analysis results for submission to the EBAS database.  

## Import Libraries and Data Chain Steps

* Execute (or Run) the cell below.

In [None]:
import sys
import os
# Set up project root directory


notebook_dir = os.getcwd()  # Current working directory (assumes running from notebooks/)
project_path = os.path.normpath(os.path.join(notebook_dir, ".."))  # Move up to project root
dima_path = os.path.normpath(os.path.join(project_path, "dima"))  # Move up to project root

if project_path not in sys.path:  # Avoid duplicate entries
    sys.path.append(project_path)
if dima_path not in sys.path:
    sys.path.insert(0,dima_path)
#sys.path.append(os.path.join(root_dir,'dima','instruments'))
#sys.path.append(os.path.join(root_dir,'dima','src'))
#sys.path.append(os.path.join(root_dir,'dima','utils'))

#import dima.visualization.hdf5_vis as hdf5_vis
#import dima.pipelines.data_integration as data_integration
import subprocess


for item in sys.path:
    print(item)

from dima.pipelines.data_integration import run_pipeline as get_campaign_data
from pipelines.steps.apply_calibration_factors import main as apply_calibration_factors
from pipelines.steps.generate_flags import main as generate_flags
from pipelines.steps.prepare_ebas_submission import main as prepare_ebas_submission 

## Step 1: Retrieve Input Data from a Network Drive

* Create a configuration file (i.e., a `.yaml` file) following the example provided in the input folder.
* Set up the input and output directory paths.
* Execute the cell.

In [None]:
path_to_config_file = '../campaignDescriptor.yaml'
paths_to_hdf5_files = get_campaign_data(path_to_config_file)

# Select campaign data file and append directory
CAMPAIGN_DATA_FILE = paths_to_hdf5_files[0]
APPEND_DATA_DIR = os.path.splitext(CAMPAIGN_DATA_FILE)[0]

## Step 2: Calibrate Input Campaign Data and Save Data Products

* Set up the input and output directory paths.
* Execute the cell.

In [None]:
path_to_data_file = CAMPAIGN_DATA_FILE
path_to_calibration_file = '../pipelines/params/calibration_factors.yaml'
dataset_name = 'ACSM_TOFWARE/2024/ACSM_JFJ_2024_timeseries.txt/data_table'
#command = ['python', 'pipelines/steps/apply_calibration_factors.py', path_to_data_file, dataset_name, path_to_calibration_file]
#status = subprocess.run(command, capture_output=True, check=True)
#print(status.stdout.decode())

apply_calibration_factors(path_to_data_file,path_to_calibration_file)


## Step 3: Perform QC/QA Analysis

* Generate automated flags based on validity thresholds for diagnostic channels.
* (Optional) Generate manual flags using the **Data Flagging App**, accessible at:  
  [http://localhost:8050/](http://localhost:8050/)
* Execute the cell.

In [None]:
dataset_name = 'ACSM_TOFWARE/2024/ACSM_JFJ_2024_meta.txt/data_table'
path_to_config_file = 'pipelines/params/validity_thresholds.yaml'
#command = ['python', 'pipelines/steps/compute_automated_flags.py', path_to_data_file, dataset_name, path_to_config_file]
#status = subprocess.run(command, capture_output=True, check=True)
#print(status.stdout.decode())
generate_flags(path_to_data_file, 'diagnostics')


## (Optional) Step 3.1: Inspect Previously Generated Flags for Correctness

* Perform flag validation using the Jupyter Notebook workflow available at:  
  [../notebooks/demo_visualize_diagnostic_flags_from_hdf5_file.ipynb](demo_visualize_diagnostic_flags_from_hdf5_file.ipynb)
* Follow the notebook steps to visually inspect previously generated flags.

## Step 4: Apply Diagnostic and Manual Flags to Variables of Interest

* Generate flags for species based on previously collected QC/QA flags.
* Execute the cell.

In [None]:
path_to_data_file = CAMPAIGN_DATA_FILE
dataset_name = 'ACSM_TOFWARE/2024/ACSM_JFJ_2024_meta.txt/data_table'
path_to_config_file = 'pipelines/params/validity_thresholds.yaml'
#command = ['python', 'pipelines/steps/compute_automated_flags.py', path_to_data_file, dataset_name, path_to_config_file]
#status = subprocess.run(command, capture_output=True, check=True)
#print(status.stdout.decode())
generate_flags(path_to_data_file, 'species')

## Step 5: Generate Campaign Data in EBAS Format

* Gather and set paths to the required data products produced in the previous steps.
* Execute the cell.

In [None]:
PATH1="../data/collection_JFJ_2024_2025-03-14_2025-03-14/ACSM_TOFWARE_processed/2024/ACSM_JFJ_2024_timeseries_calibrated.csv"
PATH2="../data/collection_JFJ_2024_2025-03-14_2025-03-14/ACSM_TOFWARE_processed/2024/ACSM_JFJ_2024_timeseries_calibrated_err.csv"
PATH3="../data/collection_JFJ_2024_2025-03-14_2025-03-14/ACSM_TOFWARE_processed/2024/ACSM_JFJ_2024_timeseries_calibration_factors.csv"
PATH4="../data/collection_JFJ_2024_2025-03-14_2025-03-14/ACSM_TOFWARE_flags/2024/ACSM_JFJ_2024_timeseries_flags.csv"
month = 4
prepare_ebas_submission([PATH1,PATH2,PATH3], PATH4, month)

## Step 6: Save Data Products to an HDF5 File

* Gather and set paths to the required data products produced in the previous steps.
* Execute the cell.


In [None]:
import dima.src.hdf5_ops as dataOps 
#print(os.curdir)


dataManager = dataOps.HDF5DataOpsManager(CAMPAIGN_DATA_FILE)
print(dataManager.file_path)
print(APPEND_DATA_DIR)
dataManager.update_file(APPEND_DATA_DIR)


In [None]:
dataManager = dataOps.HDF5DataOpsManager(path_to_data_file)
dataManager.load_file_obj()
dataManager.extract_and_load_dataset_metadata()
df = dataManager.dataset_metadata_df
print(df.head(10))
dataManager.unload_file_obj()