mirror of
https://gitea.psi.ch/APOG/acsmnode.git
synced 2025-06-24 13:11:08 +02:00
Update to README.md, includes main features authors and funding
This commit is contained in:
139
README.md
139
README.md
@ -1,6 +1,73 @@
|
||||
# QC/QA Data Flagging Application
|
||||
# ACSM FAIRifier
|
||||
|
||||
This repository hosts a Dash Plotly data flagging app for ACSM data structured in HDF5 format using the DIMA submodule. The provided Jupyter notebooks walk you through the steps to append metadata about diagnostic and target channels, which are necessary for the app to run properly.
|
||||
**ACSM FAIRifier** is a containerized JupyterLab-based toolkit for preparing Aerosol Chemical Speciation Monitor (ACSM) datasets for EBAS submission and domain-agnostic reuse. It enables users to transform raw or processed ACSM data into:
|
||||
|
||||
- **EBAS-compliant outputs**, with appropriate metadata and file structure
|
||||
- **Self-describing HDF5 files**, containing final and intermediate data products for transparent, reusable, and reproducible science
|
||||
|
||||
---
|
||||
|
||||
### Key Features
|
||||
|
||||
- Notebook-driven pipelines with automatic **provenance tracking**
|
||||
- Notebook-driven visualizations of data products
|
||||
- **Dash Plotly app** for interactive data annotation for quality control
|
||||
- Direct integration with an HDF5-based data structure
|
||||
- HDF5 output includes **intermediate data products** in addition to final outputs
|
||||
|
||||
---
|
||||
|
||||
### Output Formats
|
||||
|
||||
- **NAS EBAS-compliant files**, structured and metadata-rich for archive submission
|
||||
- **Self-describing HDF5 files**, including:
|
||||
- Project-level, contextual, and data lineage metadata
|
||||
- Intermediate and final processed datasets
|
||||
- **YAML workflow file**, automatically generated in [Renku format](https://renku.readthedocs.io/en/latest/user/reference/yaml.html),
|
||||
recording the **prospective provenance** of the data processing chain (i.e., planned steps, parameters, and dependencies)
|
||||
|
||||
---
|
||||
|
||||
### Extensibility
|
||||
|
||||
While designed for ACSM datasets, the FAIRifier framework is modular and adaptable to new instruments and processing pipelines. Email the authors for details.
|
||||
|
||||
---
|
||||
|
||||
### Visual Overview of Domain-Agnostic Data Products
|
||||
|
||||
<p align="center">
|
||||
<img src="docs/poster/figures/hdf5_before_after.svg" alt="HDF5 structure before and after">
|
||||
</p>
|
||||
|
||||
<p align="center">
|
||||
<img src="docs/poster/figures/workflow_acsm_data_JFJ_2024.svg" alt="Workflow visualization">
|
||||
</p>
|
||||
|
||||
---
|
||||
|
||||
|
||||
|
||||
## Repository Structure
|
||||
<details>
|
||||
<summary> <b> Click here to see the structure </b> </summary>
|
||||
|
||||
- `app/` — Dash Plotly app for interactive data flagging
|
||||
- `data/` — Contains ACSM datasets in HDF5 format (input/output)
|
||||
- `dima/` — Submodule supporting HDF5 metadata structure
|
||||
- `notebooks/` — Jupyter notebooks for stepwise FAIRification and submission preparation
|
||||
- `pipelines/` — Data chain scripts powering the transformation workflow
|
||||
- `docs/` — Additional documentation resources
|
||||
- `figures/` — Generated plots and visualizations
|
||||
- `third_party/` — External code dependencies
|
||||
- `workflows/` — Workflow automation (e.g., CI/CD pipelines)
|
||||
- Configuration files:
|
||||
- `Dockerfile.acsmchain` for container builds
|
||||
- `docker-compose.yaml` for orchestrating multi-container setups
|
||||
- `env_setup.sh` to bootstrap local environment
|
||||
- Project metadata files: `README.md`, `LICENSE`, `CITATION.cff`, `TODO.md`, and `campaigndescriptor.yaml`.
|
||||
|
||||
</details>
|
||||
|
||||
## Getting Started
|
||||
|
||||
@ -8,11 +75,13 @@ This repository hosts a Dash Plotly data flagging app for ACSM data structured i
|
||||
|
||||
For Windows users, the following are required:
|
||||
|
||||
1. **Git Bash**: Git Bash will be used to run shell scripts (`.sh` files).
|
||||
1. **Docker Desktop**: Required to run the toolkit using containers. [Download and install Docker Desktop](https://www.docker.com/products/docker-desktop/).
|
||||
|
||||
2. **Conda**: You must have [Anaconda](https://www.anaconda.com/products/individual) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html) installed on your system. Git Bash needs access to Conda to set up the environment properly. Ensure that Conda is added to your system’s PATH during installation.
|
||||
2. **Git Bash**: Used to run shell scripts (`.sh` files).
|
||||
|
||||
3. **PSI Network Access (for data retrieval)**: Real data retrieval can only be performed when connected to the PSI network and with the appropriate access rights to the source network drive.
|
||||
3. **Conda (Optional)**: Required only if you plan to run the toolkit **outside of Docker**. You can install [Anaconda](https://www.anaconda.com/products/distribution) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html).
|
||||
|
||||
4. **PSI Network Access** *(for data retrieval)*: Needed only if accessing live data from a PSI network-mounted drive.
|
||||
|
||||
## Clone the Repository
|
||||
|
||||
@ -24,7 +93,9 @@ git clone --recurse-submodules https://gitea.psi.ch/apog/acsmnode.git
|
||||
cd acsmnode
|
||||
```
|
||||
|
||||
## Run the Data Chain App
|
||||
## Run the ACSM FAIRifier Toolkit
|
||||
|
||||
This toolkit includes a containerized JupyterLab environment for executing the data processing pipeline, plus an optional dashboard for manual flagging.
|
||||
|
||||
1. Open **PowerShell as Administrator** and navigate to the `acsmnode` repository.
|
||||
2. Create a `.env` file in the root of `acsmnode/`.
|
||||
@ -40,9 +111,18 @@ cd acsmnode
|
||||
```bash
|
||||
docker build -f Dockerfile.acsmchain -t datachain_processor .
|
||||
```
|
||||
5. Run the app:
|
||||
```bash
|
||||
docker compose --file docker-compose.yaml up datachain_processor
|
||||
5. Start the toolkit:
|
||||
|
||||
- **Locally without network drive mount:**
|
||||
|
||||
```bash
|
||||
docker compose up datachain_processor
|
||||
|
||||
- **With network drive mount:**
|
||||
|
||||
```bash
|
||||
docker compose up datachain_processor_networked
|
||||
|
||||
6. Access:
|
||||
- **Jupyter Lab**: [http://localhost:8889/lab/tree/notebooks/](http://localhost:8889/lab/tree/notebooks/)
|
||||
|
||||
@ -56,7 +136,10 @@ cd acsmnode
|
||||
docker rm $(docker ps -aq --filter ancestor=datachain_processor)
|
||||
```
|
||||
|
||||
## Set Up the Python Environment
|
||||
|
||||
## (Optional) Set Up the Python Environment
|
||||
|
||||
> Required only if you plan to run the toolkit outside of Docker
|
||||
|
||||
If **Git Bash** lacks a suitable Python interpreter, run:
|
||||
|
||||
@ -65,7 +148,7 @@ If **Git Bash** lacks a suitable Python interpreter, run:
|
||||
```
|
||||
|
||||
|
||||
## Run the Dashboard App
|
||||
## (Optional) Run the Dashboard App
|
||||
Run the following command to start the dashboard app:
|
||||
|
||||
```bash
|
||||
@ -74,11 +157,41 @@ Run the following command to start the dashboard app:
|
||||
|
||||
This command will launch the data flagging app.
|
||||
|
||||
## Stop the Dashboard App
|
||||
## (Optional) Stop the Dashboard App
|
||||
|
||||
Run the following command to stop the dashboard app:
|
||||
|
||||
```bash
|
||||
CTRL + C
|
||||
```
|
||||
This command will terminate the server process running the app.
|
||||
This command will terminate the server process running the app.
|
||||
|
||||
## Authors
|
||||
|
||||
This toolkit was developed by:
|
||||
|
||||
- Juan F. Flórez-Ospina
|
||||
- Leïla H. Simon
|
||||
- Nora K. Nowak
|
||||
- Benjamin T. Brem
|
||||
- Martin Gysel-Beer
|
||||
- Robin L. Modini
|
||||
|
||||
All authors are affiliated with the **PSI Center for Energy and Environmental Sciences**, 5232 Villigen PSI, Switzerland.
|
||||
|
||||
- For general correspondence: [robin.modini@psi.ch](mailto:robin.modini@psi.ch)
|
||||
- For implementation-specific questions: [juan.florez-ospina@psi.ch](mailto:juan.florez-ospina@psi.ch), [juanflo16@gmail.com](mailto:juanflo16@gmail.com)
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Funding
|
||||
|
||||
This work was funded by the **ETH-Domain Open Research Data (ORD) Program – Measure 1**.
|
||||
|
||||
It is part of the project
|
||||
**“Building FAIR Data Chains for Atmospheric Observations in the ACTRIS Switzerland Network”**,
|
||||
which is described in more detail at the [ORD Program project portal](https://open-research-data-portal.ch/projects/building-fairdata-chains-for-atmospheric-observations-in-the-actris-switzerland-network/).
|
||||
|
||||
|
||||
---
|
||||
|
Reference in New Issue
Block a user