Save changes.

This commit is contained in:
2025-02-15 18:20:58 +01:00
parent 25f3ee12a4
commit 0911260f26
5 changed files with 228 additions and 228 deletions

View File

@ -1,20 +1,20 @@
cff-version: 1.2.0 cff-version: 1.2.0
title: >- title: >-
QC/QA flagging app for acsm experimental campaigns QC/QA flagging app for acsm experimental campaigns
message: >- message: >-
If you use our code and datasets, please cite our If you use our code and datasets, please cite our
repository and related paper. repository and related paper.
type: software type: software
authors: authors:
- given-names: Juan Felipe - given-names: Juan Felipe
family-names: Florez-Ospina family-names: Florez-Ospina
email: juanflo16@gmail.com email: juanflo16@gmail.com
orcid: 'https://orcid.org/0000-0001-5971-9042' orcid: 'https://orcid.org/0000-0001-5971-9042'
- given-names: Robbin Lewis - given-names: Robbin Lewis
family-names: Modini family-names: Modini
email: robin.modini@psi.ch email: robin.modini@psi.ch
orcid: https://orcid.org/0000-0002-2982-1369 orcid: https://orcid.org/0000-0002-2982-1369
date-released: 26-11-2024 date-released: 26-11-2024
url: "https://gitlab.psi.ch/apog/acsmnode.git" url: "https://gitlab.psi.ch/apog/acsmnode.git"
doi: doi:
license: license:

110
README.md
View File

@ -1,56 +1,56 @@
# QC/QA Data Flagging Application # QC/QA Data Flagging Application
This repository hosts a Dash Plotly data flagging app for ACSM data structured in HDF5 format using the DIMA submodule. The provided Jupyter notebooks walk you through the steps to append metadata about diagnostic and target channels, which are necessary for the app to run properly. This repository hosts a Dash Plotly data flagging app for ACSM data structured in HDF5 format using the DIMA submodule. The provided Jupyter notebooks walk you through the steps to append metadata about diagnostic and target channels, which are necessary for the app to run properly.
## Getting Started ## Getting Started
### Requirements ### Requirements
For Windows users, the following are required: For Windows users, the following are required:
1. **Git Bash**: Git Bash will be used to run shell scripts (`.sh` files). 1. **Git Bash**: Git Bash will be used to run shell scripts (`.sh` files).
2. **Conda**: You must have [Anaconda](https://www.anaconda.com/products/individual) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html) installed on your system. Git Bash needs access to Conda to set up the environment properly. Ensure that Conda is added to your systems PATH during installation. 2. **Conda**: You must have [Anaconda](https://www.anaconda.com/products/individual) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html) installed on your system. Git Bash needs access to Conda to set up the environment properly. Ensure that Conda is added to your systems PATH during installation.
3. **PSI Network Access (for data retrieval)**: Real data retrieval can only be performed when connected to the PSI network and with the appropriate access rights to the source network drive. 3. **PSI Network Access (for data retrieval)**: Real data retrieval can only be performed when connected to the PSI network and with the appropriate access rights to the source network drive.
## Clone the Repository ## Clone the Repository
Open a **Git Bash** terminal. Open a **Git Bash** terminal.
1. Navigate to your GitLab folder, clone the repository, and navigate to the `acsmnode` folder: 1. Navigate to your GitLab folder, clone the repository, and navigate to the `acsmnode` folder:
```bash ```bash
cd GitLab cd GitLab
git clone --recurse-submodules https://gitlab.psi.ch/apog/acsmnode.git git clone --recurse-submodules https://gitlab.psi.ch/apog/acsmnode.git
cd acsmnode cd acsmnode
``` ```
### Set Up the Python Environment ### Set Up the Python Environment
Skip this step if the **Git Bash** terminal already has access to a suitable Python interpreter. Skip this step if the **Git Bash** terminal already has access to a suitable Python interpreter.
Otherwise, set up an appropriate Python interpreter by running the following command: Otherwise, set up an appropriate Python interpreter by running the following command:
```bash ```bash
bash env_setup.sh bash env_setup.sh
``` ```
## Run the Dashboard App ## Run the Dashboard App
Run the following command to start the dashboard app: Run the following command to start the dashboard app:
```bash ```bash
python data_flagging_app.py python data_flagging_app.py
``` ```
This command will launch the data flagging app. This command will launch the data flagging app.
## Stop the Dashboard App ## Stop the Dashboard App
Run the following command to stop the dashboard app: Run the following command to stop the dashboard app:
```bash ```bash
CTRL + C CTRL + C
``` ```
This command will terminate the server process running the app. This command will terminate the server process running the app.

56
TODO.md
View File

@ -1,28 +1,28 @@
# TODO # TODO
* Implement flagging-app specific data operations such as: * Implement flagging-app specific data operations such as:
1. [New item] When verify flags from checklist is active, enable delete-flag button to delete flag associated with active cell on table. 1. [New item] When verify flags from checklist is active, enable delete-flag button to delete flag associated with active cell on table.
2. [New item] When verify and ready to trasnfer items on checklist are active, enable record-flags button to record verified flags into the HDF5 file. 2. [New item] When verify and ready to trasnfer items on checklist are active, enable record-flags button to record verified flags into the HDF5 file.
3. [New item] When all checklist items active, enable apply button to apply flags to the time series data and save it to the HDF5 file. 3. [New item] When all checklist items active, enable apply button to apply flags to the time series data and save it to the HDF5 file.
1. ~~Define data manager obj with apply flags behavior.~~ 1. ~~Define data manager obj with apply flags behavior.~~
2. Define metadata answering who did the flagging and quality assurance tests? 2. Define metadata answering who did the flagging and quality assurance tests?
3. Update intruments/dictionaries/ACSM_TOFWARE_flags.yaml and instruments/readers/flag_reader.py to describe metadata elements based on dictionary. 3. Update intruments/dictionaries/ACSM_TOFWARE_flags.yaml and instruments/readers/flag_reader.py to describe metadata elements based on dictionary.
4. ~~Update DIMA data integration pipeline to allowed user-defined file naming template~~ 4. ~~Update DIMA data integration pipeline to allowed user-defined file naming template~~
5. ~~Design and implement flag visualization feature: click flag on table and display on figure shaded region when feature is enabled~~ 5. ~~Design and implement flag visualization feature: click flag on table and display on figure shaded region when feature is enabled~~
6. Implement schema validator on yaml/json representation of hdf5 metadata 6. Implement schema validator on yaml/json representation of hdf5 metadata
7. Implement updates to 'actris level' and 'processing_script' after operation applied to data/file? 7. Implement updates to 'actris level' and 'processing_script' after operation applied to data/file?
* ~~When `Create Flag` is clicked, modify the title to indicate that we are in flagging mode and ROIs can be drawn by dragging.~~ * ~~When `Create Flag` is clicked, modify the title to indicate that we are in flagging mode and ROIs can be drawn by dragging.~~
* ~~Update `Commit Flag` logic:~~ * ~~Update `Commit Flag` logic:~~
~~3. Update recorded flags directory, and add provenance information to each flag (which instrument and channel belongs to).~~ ~~3. Update recorded flags directory, and add provenance information to each flag (which instrument and channel belongs to).~~
* Record collected flag information initially in a YAML or JSON file. Is this faster than writing directly to the HDF5 file? * Record collected flag information initially in a YAML or JSON file. Is this faster than writing directly to the HDF5 file?
* Should we actively transfer collected flags by clicking a button? after commit button is pressed, each flag is now stored in an independent json file * Should we actively transfer collected flags by clicking a button? after commit button is pressed, each flag is now stored in an independent json file
* Enable some form of chunk storage and visualization from the HDF5 file. Iterate over chunks for faster display versus access time. * Enable some form of chunk storage and visualization from the HDF5 file. Iterate over chunks for faster display versus access time.
1. Do I need to modify DIMA? 1. Do I need to modify DIMA?
2. What is a good chunk size? 2. What is a good chunk size?
3. What Dash component can we use to iterate over the chunks? 3. What Dash component can we use to iterate over the chunks?
![Screenshot](figures/flagging_app_screenshot.JPG) ![Screenshot](figures/flagging_app_screenshot.JPG)

View File

@ -1,48 +1,48 @@
#!/bin/bash #!/bin/bash
# Define the name of the environment # Define the name of the environment
ENV_NAME="multiphase_chemistry_env" ENV_NAME="multiphase_chemistry_env"
# Check if mamba is available and use it instead of conda for faster installation # Check if mamba is available and use it instead of conda for faster installation
if command -v mamba &> /dev/null; then if command -v mamba &> /dev/null; then
CONDA_COMMAND="mamba" CONDA_COMMAND="mamba"
else else
CONDA_COMMAND="conda" CONDA_COMMAND="conda"
fi fi
# Create the conda environment with all dependencies, resolving from conda-forge and defaults # Create the conda environment with all dependencies, resolving from conda-forge and defaults
$CONDA_COMMAND create -y -n "$ENV_NAME" -c conda-forge -c defaults python=3.11 \ $CONDA_COMMAND create -y -n "$ENV_NAME" -c conda-forge -c defaults python=3.11 \
jupyter numpy h5py pandas matplotlib plotly=5.24 scipy pip jupyter numpy h5py pandas matplotlib plotly=5.24 scipy pip
# Check if the environment was successfully created # Check if the environment was successfully created
if [ $? -ne 0 ]; then if [ $? -ne 0 ]; then
echo "Failed to create the environment '$ENV_NAME'. Please check the logs above for details." echo "Failed to create the environment '$ENV_NAME'. Please check the logs above for details."
exit 1 exit 1
fi fi
# Activate the new environment # Activate the new environment
if source activate "$ENV_NAME" 2>/dev/null || conda activate "$ENV_NAME" 2>/dev/null; then if source activate "$ENV_NAME" 2>/dev/null || conda activate "$ENV_NAME" 2>/dev/null; then
echo "Environment '$ENV_NAME' activated successfully." echo "Environment '$ENV_NAME' activated successfully."
else else
echo "Failed to activate the environment '$ENV_NAME'. Please check your conda setup." echo "Failed to activate the environment '$ENV_NAME'. Please check your conda setup."
exit 1 exit 1
fi fi
# Install additional pip packages only if the environment is activated # Install additional pip packages only if the environment is activated
echo "Installing additional pip packages..." echo "Installing additional pip packages..."
pip install pybis==1.35 igor2 ipykernel sphinx pip install pybis==1.35 igor2 ipykernel sphinx
pip install dash dash-bootstrap-components pip install dash dash-bootstrap-components
# Check if pip installations were successful # Check if pip installations were successful
if [ $? -ne 0 ]; then if [ $? -ne 0 ]; then
echo "Failed to install pip packages. Please check the logs above for details." echo "Failed to install pip packages. Please check the logs above for details."
exit 1 exit 1
fi fi
# Optional: Export the environment to a YAML file (commented out) # Optional: Export the environment to a YAML file (commented out)
# $CONDA_COMMAND env export -n "$ENV_NAME" > "$ENV_NAME-environment.yaml" # $CONDA_COMMAND env export -n "$ENV_NAME" > "$ENV_NAME-environment.yaml"
# Print success message # Print success message
echo "Environment '$ENV_NAME' created and configured successfully." echo "Environment '$ENV_NAME' created and configured successfully."
# echo "Environment configuration saved to '$ENV_NAME-environment.yaml'." # echo "Environment configuration saved to '$ENV_NAME-environment.yaml'."

View File

@ -1,78 +1,78 @@
import dima.src.hdf5_ops as dataOps import dima.src.hdf5_ops as dataOps
import os import os
import pandas as pd import pandas as pd
import numpy as np import numpy as np
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
def visualize_table_variables(data_file_path, dataset_name, flags_dataset_name, x_var, y_vars): def visualize_table_variables(data_file_path, dataset_name, flags_dataset_name, x_var, y_vars):
if not os.path.exists(data_file_path): if not os.path.exists(data_file_path):
raise ValueError(f"Path to input file {data_file_path} does not exists. The parameter 'data_file_path' must be a valid path to a suitable HDF5 file. ") raise ValueError(f"Path to input file {data_file_path} does not exists. The parameter 'data_file_path' must be a valid path to a suitable HDF5 file. ")
# Create data manager object # Create data manager object
dataManager = dataOps.HDF5DataOpsManager(data_file_path) dataManager = dataOps.HDF5DataOpsManager(data_file_path)
dataManager.load_file_obj() dataManager.load_file_obj()
# Specify diagnostic variables and the associated flags # Specify diagnostic variables and the associated flags
#dataset_name = 'ACSM_TOFWARE/2024/ACSM_JFJ_2024_meta.txt/data_table' #dataset_name = 'ACSM_TOFWARE/2024/ACSM_JFJ_2024_meta.txt/data_table'
#flags_dataset_name = 'ACSM_TOFWARE_flags/2024/ACSM_JFJ_2024_meta_flags.csv/data_table' #flags_dataset_name = 'ACSM_TOFWARE_flags/2024/ACSM_JFJ_2024_meta_flags.csv/data_table'
dataset_df = dataManager.extract_dataset_as_dataframe(dataset_name) dataset_df = dataManager.extract_dataset_as_dataframe(dataset_name)
flags_df = dataManager.extract_dataset_as_dataframe(flags_dataset_name) flags_df = dataManager.extract_dataset_as_dataframe(flags_dataset_name)
if x_var not in dataset_df.columns and x_var not in flags_df.columns: if x_var not in dataset_df.columns and x_var not in flags_df.columns:
raise ValueError(f'Invalid x_var : {x_var}. x_var must refer to a time variable name that is both in {dataset_name} and {flags_dataset_name}') raise ValueError(f'Invalid x_var : {x_var}. x_var must refer to a time variable name that is both in {dataset_name} and {flags_dataset_name}')
flags_df[x_var] = pd.to_datetime(flags_df[x_var].apply(lambda x : x.decode(encoding="utf-8"))) flags_df[x_var] = pd.to_datetime(flags_df[x_var].apply(lambda x : x.decode(encoding="utf-8")))
dataManager.unload_file_obj() dataManager.unload_file_obj()
if not all(var in dataset_df.columns for var in y_vars): if not all(var in dataset_df.columns for var in y_vars):
raise ValueError(f'Invalid y_vars : {y_vars}. y_vars must be a subset of {dataset_df.columns}.') raise ValueError(f'Invalid y_vars : {y_vars}. y_vars must be a subset of {dataset_df.columns}.')
#fig, ax = plt.subplots(len(y_vars), 1, figsize=(12, 5)) #fig, ax = plt.subplots(len(y_vars), 1, figsize=(12, 5))
for var_idx, var in enumerate(y_vars): for var_idx, var in enumerate(y_vars):
#y = dataset_df[var].to_numpy() #y = dataset_df[var].to_numpy()
# Plot Flow Rate # Plot Flow Rate
fig = plt.figure(var_idx,figsize=(12, 2.5)) fig = plt.figure(var_idx,figsize=(12, 2.5))
ax = plt.gca() ax = plt.gca()
#ax = fig.get_axes() #ax = fig.get_axes()
ax.plot(dataset_df[x_var], dataset_df[var], label=var, alpha=0.8, color='tab:blue') ax.plot(dataset_df[x_var], dataset_df[var], label=var, alpha=0.8, color='tab:blue')
# Specify flag name associated with var name in y_vars. By construction, it is assumed the name satisfy the following sufix convention. # Specify flag name associated with var name in y_vars. By construction, it is assumed the name satisfy the following sufix convention.
var_flag_name = f"flag_{var}" var_flag_name = f"flag_{var}"
if var_flag_name in flags_df.columns: if var_flag_name in flags_df.columns:
# Identify valid and invalid indices # Identify valid and invalid indices
ind_valid = flags_df[var_flag_name].to_numpy() ind_valid = flags_df[var_flag_name].to_numpy()
ind_invalid = np.logical_not(ind_valid) ind_invalid = np.logical_not(ind_valid)
# Detect start and end indices of invalid regions # Detect start and end indices of invalid regions
# Find transition points in invalid regions # Find transition points in invalid regions
invalid_starts = np.diff(np.concatenate(([False], ind_invalid, [False]))).nonzero()[0][::2] invalid_starts = np.diff(np.concatenate(([False], ind_invalid, [False]))).nonzero()[0][::2]
invalid_ends = np.diff(np.concatenate(([False], ind_invalid, [False]))).nonzero()[0][1::2] invalid_ends = np.diff(np.concatenate(([False], ind_invalid, [False]))).nonzero()[0][1::2]
# Fill invalid regions # Fill invalid regions
t_base = dataset_df[x_var].to_numpy() t_base = dataset_df[x_var].to_numpy()
for start, end in zip(invalid_starts, invalid_ends): for start, end in zip(invalid_starts, invalid_ends):
ax.fill_betweenx([dataset_df[var].min(), dataset_df[var].max()], t_base[start], t_base[end], ax.fill_betweenx([dataset_df[var].min(), dataset_df[var].max()], t_base[start], t_base[end],
color='red', alpha=0.3, label="Invalid Data" if start == invalid_starts[0] else "") color='red', alpha=0.3, label="Invalid Data" if start == invalid_starts[0] else "")
# Labels and Legends # Labels and Legends
ax.set_xlabel(x_var) ax.set_xlabel(x_var)
ax.set_ylabel(var) ax.set_ylabel(var)
ax.legend() ax.legend()
ax.grid(True) ax.grid(True)
#plt.tight_layout() #plt.tight_layout()
#plt.show() #plt.show()
return fig, ax return fig, ax