Save changes.

2025-06-24 13:11:08 +02:00 · 2025-02-15 18:20:58 +01:00
parent 25f3ee12a4
commit 0911260f26
5 changed files with 228 additions and 228 deletions
--- a/CITATION.cff
+++ b/CITATION.cff
@ -1,20 +1,20 @@
-cff-version: 1.2.0
-title: >-
-  QC/QA flagging app for acsm experimental campaigns
-message: >-
-  If you use our code and datasets, please cite our
-  repository and related paper. 
-type: software
-authors:
-  - given-names: Juan Felipe
-    family-names: Florez-Ospina
-    email: juanflo16@gmail.com
-    orcid: 'https://orcid.org/0000-0001-5971-9042'
-  - given-names: Robbin Lewis
-    family-names: Modini
-    email: robin.modini@psi.ch
-    orcid: https://orcid.org/0000-0002-2982-1369
-date-released: 26-11-2024
-url: "https://gitlab.psi.ch/apog/acsmnode.git"
-doi:
+cff-version: 1.2.0
+title: >-
+  QC/QA flagging app for acsm experimental campaigns
+message: >-
+  If you use our code and datasets, please cite our
+  repository and related paper. 
+type: software
+authors:
+  - given-names: Juan Felipe
+    family-names: Florez-Ospina
+    email: juanflo16@gmail.com
+    orcid: 'https://orcid.org/0000-0001-5971-9042'
+  - given-names: Robbin Lewis
+    family-names: Modini
+    email: robin.modini@psi.ch
+    orcid: https://orcid.org/0000-0002-2982-1369
+date-released: 26-11-2024
+url: "https://gitlab.psi.ch/apog/acsmnode.git"
+doi:
 license:
--- a/README.md
+++ b/README.md
@ -1,56 +1,56 @@
-# QC/QA Data Flagging Application
-
-This repository hosts a Dash Plotly data flagging app for ACSM data structured in HDF5 format using the DIMA submodule. The provided Jupyter notebooks walk you through the steps to append metadata about diagnostic and target channels, which are necessary for the app to run properly.
-
-## Getting Started
-
-### Requirements
-
-For Windows users, the following are required:
-
-1. **Git Bash**: Git Bash will be used to run shell scripts (`.sh` files).
-
-2. **Conda**: You must have [Anaconda](https://www.anaconda.com/products/individual) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html) installed on your system. Git Bash needs access to Conda to set up the environment properly. Ensure that Conda is added to your system’s PATH during installation.
-
-3. **PSI Network Access (for data retrieval)**: Real data retrieval can only be performed when connected to the PSI network and with the appropriate access rights to the source network drive.
-
-## Clone the Repository
-
-Open a **Git Bash** terminal.
-
-1. Navigate to your GitLab folder, clone the repository, and navigate to the `acsmnode` folder:
-
-   ```bash
-   cd GitLab
-   git clone --recurse-submodules https://gitlab.psi.ch/apog/acsmnode.git
-   cd acsmnode
-   ```
-
-### Set Up the Python Environment
-
-Skip this step if the **Git Bash** terminal already has access to a suitable Python interpreter.
-
-Otherwise, set up an appropriate Python interpreter by running the following command:
-
-   ```bash
-   bash env_setup.sh
-   ```
-
-
-## Run the Dashboard App
-Run the following command to start the dashboard app:
-
-   ```bash
-   python data_flagging_app.py
-   ```
-
-This command will launch the data flagging app.
-
-## Stop the Dashboard App
-
-Run the following command to stop the dashboard app:
-
-   ```bash
-   CTRL + C
-   ```
+# QC/QA Data Flagging Application
+
+This repository hosts a Dash Plotly data flagging app for ACSM data structured in HDF5 format using the DIMA submodule. The provided Jupyter notebooks walk you through the steps to append metadata about diagnostic and target channels, which are necessary for the app to run properly.
+
+## Getting Started
+
+### Requirements
+
+For Windows users, the following are required:
+
+1. **Git Bash**: Git Bash will be used to run shell scripts (`.sh` files).
+
+2. **Conda**: You must have [Anaconda](https://www.anaconda.com/products/individual) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html) installed on your system. Git Bash needs access to Conda to set up the environment properly. Ensure that Conda is added to your system’s PATH during installation.
+
+3. **PSI Network Access (for data retrieval)**: Real data retrieval can only be performed when connected to the PSI network and with the appropriate access rights to the source network drive.
+
+## Clone the Repository
+
+Open a **Git Bash** terminal.
+
+1. Navigate to your GitLab folder, clone the repository, and navigate to the `acsmnode` folder:
+
+   ```bash
+   cd GitLab
+   git clone --recurse-submodules https://gitlab.psi.ch/apog/acsmnode.git
+   cd acsmnode
+   ```
+
+### Set Up the Python Environment
+
+Skip this step if the **Git Bash** terminal already has access to a suitable Python interpreter.
+
+Otherwise, set up an appropriate Python interpreter by running the following command:
+
+   ```bash
+   bash env_setup.sh
+   ```
+
+
+## Run the Dashboard App
+Run the following command to start the dashboard app:
+
+   ```bash
+   python data_flagging_app.py
+   ```
+
+This command will launch the data flagging app.
+
+## Stop the Dashboard App
+
+Run the following command to stop the dashboard app:
+
+   ```bash
+   CTRL + C
+   ```
 This command will terminate the server process running the app.
--- a/TODO.md
+++ b/TODO.md
@ -1,28 +1,28 @@
-# TODO
-* Implement flagging-app specific data operations such as:
-    1. [New item] When verify flags from checklist is active, enable delete-flag button to delete flag associated with active cell on table. 
-    2. [New item] When verify and ready to trasnfer items on checklist are active, enable record-flags button to record verified flags into the HDF5 file.
-    3. [New item] When all checklist items active, enable apply button to apply flags to the time series data and save it to the HDF5 file.
-    1. ~~Define data manager obj with apply flags behavior.~~
-    2. Define metadata answering who did the flagging and quality assurance tests?
-    3. Update intruments/dictionaries/ACSM_TOFWARE_flags.yaml and instruments/readers/flag_reader.py to describe metadata elements based on dictionary.  
-    4. ~~Update DIMA data integration pipeline to allowed user-defined file naming template~~
-    5. ~~Design and implement flag visualization feature: click flag on table and display on figure shaded region when feature is enabled~~
-    6. Implement schema validator on yaml/json representation of hdf5 metadata
-    7. Implement updates to 'actris level' and 'processing_script' after operation applied to data/file? 
-
-* ~~When `Create Flag` is clicked, modify the title to indicate that we are in flagging mode and ROIs can be drawn by dragging.~~
-
-* ~~Update `Commit Flag` logic:~~
-  ~~3. Update recorded flags directory, and add provenance information to each flag (which instrument and channel belongs to).~~
-
-* Record collected flag information initially in a YAML or JSON file. Is this faster than writing directly to the HDF5 file?
-
-* Should we actively transfer collected flags by clicking a button? after commit button is pressed, each flag is now stored in an independent json file
-
-* Enable some form of chunk storage and visualization from the HDF5 file. Iterate over chunks for faster display versus access time.
-    1. Do I need to modify DIMA?
-    2. What is a good chunk size?
-    3. What Dash component can we use to iterate over the chunks?
-
-![Screenshot](figures/flagging_app_screenshot.JPG)
+# TODO
+* Implement flagging-app specific data operations such as:
+    1. [New item] When verify flags from checklist is active, enable delete-flag button to delete flag associated with active cell on table. 
+    2. [New item] When verify and ready to trasnfer items on checklist are active, enable record-flags button to record verified flags into the HDF5 file.
+    3. [New item] When all checklist items active, enable apply button to apply flags to the time series data and save it to the HDF5 file.
+    1. ~~Define data manager obj with apply flags behavior.~~
+    2. Define metadata answering who did the flagging and quality assurance tests?
+    3. Update intruments/dictionaries/ACSM_TOFWARE_flags.yaml and instruments/readers/flag_reader.py to describe metadata elements based on dictionary.  
+    4. ~~Update DIMA data integration pipeline to allowed user-defined file naming template~~
+    5. ~~Design and implement flag visualization feature: click flag on table and display on figure shaded region when feature is enabled~~
+    6. Implement schema validator on yaml/json representation of hdf5 metadata
+    7. Implement updates to 'actris level' and 'processing_script' after operation applied to data/file? 
+
+* ~~When `Create Flag` is clicked, modify the title to indicate that we are in flagging mode and ROIs can be drawn by dragging.~~
+
+* ~~Update `Commit Flag` logic:~~
+  ~~3. Update recorded flags directory, and add provenance information to each flag (which instrument and channel belongs to).~~
+
+* Record collected flag information initially in a YAML or JSON file. Is this faster than writing directly to the HDF5 file?
+
+* Should we actively transfer collected flags by clicking a button? after commit button is pressed, each flag is now stored in an independent json file
+
+* Enable some form of chunk storage and visualization from the HDF5 file. Iterate over chunks for faster display versus access time.
+    1. Do I need to modify DIMA?
+    2. What is a good chunk size?
+    3. What Dash component can we use to iterate over the chunks?
+
+![Screenshot](figures/flagging_app_screenshot.JPG)
--- a/env_setup.sh
+++ b/env_setup.sh
@ -1,48 +1,48 @@
-#!/bin/bash
-
-# Define the name of the environment
-ENV_NAME="multiphase_chemistry_env"
-
-# Check if mamba is available and use it instead of conda for faster installation
-if command -v mamba &> /dev/null; then
-    CONDA_COMMAND="mamba"
-else
-    CONDA_COMMAND="conda"
-fi
-
-# Create the conda environment with all dependencies, resolving from conda-forge and defaults
-$CONDA_COMMAND create -y -n "$ENV_NAME" -c conda-forge -c defaults python=3.11 \
-    jupyter numpy h5py pandas matplotlib plotly=5.24 scipy pip
-
-# Check if the environment was successfully created
-if [ $? -ne 0 ]; then
-    echo "Failed to create the environment '$ENV_NAME'. Please check the logs above for details."
-    exit 1
-fi
-
-# Activate the new environment
-if source activate "$ENV_NAME" 2>/dev/null || conda activate "$ENV_NAME" 2>/dev/null; then
-    echo "Environment '$ENV_NAME' activated successfully."
-else
-    echo "Failed to activate the environment '$ENV_NAME'. Please check your conda setup."
-    exit 1
-fi
-
-# Install additional pip packages only if the environment is activated
-echo "Installing additional pip packages..."
-pip install pybis==1.35 igor2 ipykernel sphinx
-pip install dash dash-bootstrap-components
-
-# Check if pip installations were successful
-if [ $? -ne 0 ]; then
-    echo "Failed to install pip packages. Please check the logs above for details."
-    exit 1
-fi
-
-# Optional: Export the environment to a YAML file (commented out)
-# $CONDA_COMMAND env export -n "$ENV_NAME" > "$ENV_NAME-environment.yaml"
-
-# Print success message
-echo "Environment '$ENV_NAME' created and configured successfully."
-# echo "Environment configuration saved to '$ENV_NAME-environment.yaml'."
-
+#!/bin/bash
+
+# Define the name of the environment
+ENV_NAME="multiphase_chemistry_env"
+
+# Check if mamba is available and use it instead of conda for faster installation
+if command -v mamba &> /dev/null; then
+    CONDA_COMMAND="mamba"
+else
+    CONDA_COMMAND="conda"
+fi
+
+# Create the conda environment with all dependencies, resolving from conda-forge and defaults
+$CONDA_COMMAND create -y -n "$ENV_NAME" -c conda-forge -c defaults python=3.11 \
+    jupyter numpy h5py pandas matplotlib plotly=5.24 scipy pip
+
+# Check if the environment was successfully created
+if [ $? -ne 0 ]; then
+    echo "Failed to create the environment '$ENV_NAME'. Please check the logs above for details."
+    exit 1
+fi
+
+# Activate the new environment
+if source activate "$ENV_NAME" 2>/dev/null || conda activate "$ENV_NAME" 2>/dev/null; then
+    echo "Environment '$ENV_NAME' activated successfully."
+else
+    echo "Failed to activate the environment '$ENV_NAME'. Please check your conda setup."
+    exit 1
+fi
+
+# Install additional pip packages only if the environment is activated
+echo "Installing additional pip packages..."
+pip install pybis==1.35 igor2 ipykernel sphinx
+pip install dash dash-bootstrap-components
+
+# Check if pip installations were successful
+if [ $? -ne 0 ]; then
+    echo "Failed to install pip packages. Please check the logs above for details."
+    exit 1
+fi
+
+# Optional: Export the environment to a YAML file (commented out)
+# $CONDA_COMMAND env export -n "$ENV_NAME" > "$ENV_NAME-environment.yaml"
+
+# Print success message
+echo "Environment '$ENV_NAME' created and configured successfully."
+# echo "Environment configuration saved to '$ENV_NAME-environment.yaml'."
+
--- a/pipelines/steps/visualize_datatable_vars.py
+++ b/pipelines/steps/visualize_datatable_vars.py
@ -1,78 +1,78 @@
-
-import dima.src.hdf5_ops as dataOps
-import os
-import pandas as pd
-import numpy as np
-import matplotlib.pyplot as plt
-
-def visualize_table_variables(data_file_path, dataset_name, flags_dataset_name, x_var, y_vars):
-
-
-
-    if not os.path.exists(data_file_path): 
-        raise ValueError(f"Path to input file {data_file_path} does not exists. The parameter 'data_file_path' must be a valid path to a suitable HDF5 file. ")
-    
-    # Create data manager object
-    dataManager = dataOps.HDF5DataOpsManager(data_file_path)
-
-    dataManager.load_file_obj()
-
-    # Specify diagnostic variables and the associated flags 
-    #dataset_name = 'ACSM_TOFWARE/2024/ACSM_JFJ_2024_meta.txt/data_table'
-    #flags_dataset_name = 'ACSM_TOFWARE_flags/2024/ACSM_JFJ_2024_meta_flags.csv/data_table'
-    dataset_df = dataManager.extract_dataset_as_dataframe(dataset_name)
-    flags_df = dataManager.extract_dataset_as_dataframe(flags_dataset_name)
-
-    if x_var not in dataset_df.columns and x_var not in flags_df.columns:
-        raise ValueError(f'Invalid x_var : {x_var}. x_var must refer to a time variable name that is both in {dataset_name} and {flags_dataset_name}')
-
-    flags_df[x_var] = pd.to_datetime(flags_df[x_var].apply(lambda x : x.decode(encoding="utf-8")))
-
-    dataManager.unload_file_obj()
-
-    if not all(var in dataset_df.columns for var in y_vars):
-        raise ValueError(f'Invalid y_vars : {y_vars}. y_vars must be a subset of {dataset_df.columns}.')
-
-    #fig, ax = plt.subplots(len(y_vars), 1, figsize=(12, 5))
-
-
-    for var_idx, var in enumerate(y_vars):
-        #y = dataset_df[var].to_numpy()
-        
-        # Plot Flow Rate
-        fig = plt.figure(var_idx,figsize=(12, 2.5))
-        ax = plt.gca()
-        #ax = fig.get_axes()
-        ax.plot(dataset_df[x_var], dataset_df[var], label=var, alpha=0.8, color='tab:blue')
-
-        # Specify flag name associated with var name in y_vars. By construction, it is assumed the name satisfy the following sufix convention.
-        var_flag_name = f"flag_{var}"        
-        if var_flag_name in flags_df.columns:   
-            # Identify valid and invalid indices         
-            ind_valid = flags_df[var_flag_name].to_numpy()
-            ind_invalid = np.logical_not(ind_valid)
-            # Detect start and end indices of invalid regions
-            # Find transition points in invalid regions
-            invalid_starts = np.diff(np.concatenate(([False], ind_invalid, [False]))).nonzero()[0][::2]
-            invalid_ends = np.diff(np.concatenate(([False], ind_invalid, [False]))).nonzero()[0][1::2]
-
-            # Fill invalid regions                
-            t_base = dataset_df[x_var].to_numpy()
-            for start, end in zip(invalid_starts, invalid_ends):
-                ax.fill_betweenx([dataset_df[var].min(), dataset_df[var].max()], t_base[start], t_base[end], 
-                                    color='red', alpha=0.3, label="Invalid Data" if start == invalid_starts[0] else "")
-
-        # Labels and Legends
-        ax.set_xlabel(x_var)
-        ax.set_ylabel(var)
-        ax.legend()
-        ax.grid(True)
-
-                #plt.tight_layout()
-                #plt.show()
-
-    return fig, ax
-    
-
-
-
+
+import dima.src.hdf5_ops as dataOps
+import os
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt
+
+def visualize_table_variables(data_file_path, dataset_name, flags_dataset_name, x_var, y_vars):
+
+
+
+    if not os.path.exists(data_file_path): 
+        raise ValueError(f"Path to input file {data_file_path} does not exists. The parameter 'data_file_path' must be a valid path to a suitable HDF5 file. ")
+    
+    # Create data manager object
+    dataManager = dataOps.HDF5DataOpsManager(data_file_path)
+
+    dataManager.load_file_obj()
+
+    # Specify diagnostic variables and the associated flags 
+    #dataset_name = 'ACSM_TOFWARE/2024/ACSM_JFJ_2024_meta.txt/data_table'
+    #flags_dataset_name = 'ACSM_TOFWARE_flags/2024/ACSM_JFJ_2024_meta_flags.csv/data_table'
+    dataset_df = dataManager.extract_dataset_as_dataframe(dataset_name)
+    flags_df = dataManager.extract_dataset_as_dataframe(flags_dataset_name)
+
+    if x_var not in dataset_df.columns and x_var not in flags_df.columns:
+        raise ValueError(f'Invalid x_var : {x_var}. x_var must refer to a time variable name that is both in {dataset_name} and {flags_dataset_name}')
+
+    flags_df[x_var] = pd.to_datetime(flags_df[x_var].apply(lambda x : x.decode(encoding="utf-8")))
+
+    dataManager.unload_file_obj()
+
+    if not all(var in dataset_df.columns for var in y_vars):
+        raise ValueError(f'Invalid y_vars : {y_vars}. y_vars must be a subset of {dataset_df.columns}.')
+
+    #fig, ax = plt.subplots(len(y_vars), 1, figsize=(12, 5))
+
+
+    for var_idx, var in enumerate(y_vars):
+        #y = dataset_df[var].to_numpy()
+        
+        # Plot Flow Rate
+        fig = plt.figure(var_idx,figsize=(12, 2.5))
+        ax = plt.gca()
+        #ax = fig.get_axes()
+        ax.plot(dataset_df[x_var], dataset_df[var], label=var, alpha=0.8, color='tab:blue')
+
+        # Specify flag name associated with var name in y_vars. By construction, it is assumed the name satisfy the following sufix convention.
+        var_flag_name = f"flag_{var}"        
+        if var_flag_name in flags_df.columns:   
+            # Identify valid and invalid indices         
+            ind_valid = flags_df[var_flag_name].to_numpy()
+            ind_invalid = np.logical_not(ind_valid)
+            # Detect start and end indices of invalid regions
+            # Find transition points in invalid regions
+            invalid_starts = np.diff(np.concatenate(([False], ind_invalid, [False]))).nonzero()[0][::2]
+            invalid_ends = np.diff(np.concatenate(([False], ind_invalid, [False]))).nonzero()[0][1::2]
+
+            # Fill invalid regions                
+            t_base = dataset_df[x_var].to_numpy()
+            for start, end in zip(invalid_starts, invalid_ends):
+                ax.fill_betweenx([dataset_df[var].min(), dataset_df[var].max()], t_base[start], t_base[end], 
+                                    color='red', alpha=0.3, label="Invalid Data" if start == invalid_starts[0] else "")
+
+        # Labels and Legends
+        ax.set_xlabel(x_var)
+        ax.set_ylabel(var)
+        ax.legend()
+        ax.grid(True)
+
+                #plt.tight_layout()
+                #plt.show()
+
+    return fig, ax
+    
+
+
+