ACSM FAIRifier is a containerized JupyterLab-based toolkit for preparing Aerosol Chemical Speciation Monitor (ACSM) datasets for EBAS submission and domain-agnostic reuse. It enables users to transform raw or processed ACSM data into:

EBAS-compliant outputs, with appropriate metadata and file structure
Self-describing HDF5 files, containing final and intermediate data products for transparent, reusable, and reproducible science

Key Features

Notebook-driven pipelines with automatic provenance tracking
Notebook-driven visualizations of data products
Dash Plotly app for interactive data annotation for quality control
Direct integration with an HDF5-based data structure
HDF5 output includes intermediate data products in addition to final outputs

Output Formats

NAS EBAS-compliant files, structured and metadata-rich for archive submission
Self-describing HDF5 files, including:
- Project-level, contextual, and data lineage metadata
- Intermediate and final processed datasets
YAML workflow file, automatically generated in Renku format,
recording the prospective provenance of the data processing chain (i.e., planned steps, parameters, and dependencies)

Extensibility

While designed for ACSM datasets, the FAIRifier framework is modular and adaptable to new instruments and processing pipelines. Email the authors for details.

Visual Overview of Domain-Agnostic Data Products

Repository Structure

Click here to see the structure

app/ — Dash Plotly app for interactive data flagging
data/ — Contains ACSM datasets in HDF5 format (input/output)
dima/ — Submodule supporting HDF5 metadata structure
notebooks/ — Jupyter notebooks for stepwise FAIRification and submission preparation
pipelines/ — Data chain scripts powering the transformation workflow
docs/ — Additional documentation resources
figures/ — Generated plots and visualizations
third_party/ — External code dependencies
workflows/ — Workflow automation (e.g., CI/CD pipelines)
Configuration files:
Dockerfile.acsmchain for container builds
docker-compose.yaml for orchestrating multi-container setups
env_setup.sh to bootstrap local environment
Project metadata files: README.md, LICENSE, CITATION.cff, TODO.md, and campaigndescriptor.yaml.

Getting Started

Requirements

For Windows users, the following are required:

Docker Desktop: Required to run the toolkit using containers. Download and install Docker Desktop.
Git Bash: Used to run shell scripts (.sh files).
Conda (Optional): Required only if you plan to run the toolkit outside of Docker. You can install Anaconda or Miniconda.
PSI Network Access (for data retrieval): Needed only if accessing live data from a PSI network-mounted drive.

Clone the Repository

Open Git Bash and run:

cd Gitea
git clone --recurse-submodules https://gitea.psi.ch/apog/acsmnode.git
cd acsmnode

Run the ACSM FAIRifier Toolkit

This toolkit includes a containerized JupyterLab environment for executing the data processing pipeline, plus an optional dashboard for manual flagging.

Open PowerShell as Administrator and navigate to the acsmnode repository.
Create a .env file in the root of acsmnode/.
Securely store your network drive access credentials in the .env file by adding the following lines:
```
CIFS_USER=<your-username>
CIFS_PASS=<your-password>   
NETWORK_MOUNT=//your-server/your-share
```
To protect your credentials:
- Do not share the .env file with others.
- Ensure the file is excluded from version control by adding .env to your .gitignore and .dockerignore files.

Open Docker Desktop, then build the container image:

docker build -f Dockerfile.acsmchain -t datachain_processor .

Start the toolkit:

Locally without network drive mount:
```
docker compose up datachain_processor
```

With network drive mount:

docker compose up datachain_processor_networked

Access:
- Jupyter Lab: http://localhost:8889/lab/
Stop the app: In the previously open PowerShell terminal, enter:
```
Ctrl + C
```
After the container is properly Stopped, remove the container process as:
```
docker rm $(docker ps -aq --filter ancestor=datachain_processor)
```

(Optional) Set Up the Python Environment

Required only if you plan to run the toolkit outside of Docker

If Git Bash lacks a suitable Python interpreter, run:

   bash env_setup.sh

(Optional) Run the Dashboard App

Run the following command to start the dashboard app:

python data_flagging_app.py

This command will launch the data flagging app.

(Optional) Stop the Dashboard App

Run the following command to stop the dashboard app:

CTRL + C

This command will terminate the server process running the app.

Authors

This toolkit was developed by:

Juan F. Flórez-Ospina
Leïla H. Simon
Nora K. Nowak
Benjamin T. Brem
Martin Gysel-Beer
Robin L. Modini

All authors are affiliated with the PSI Center for Energy and Environmental Sciences, 5232 Villigen PSI, Switzerland.

For general correspondence: robin.modini@psi.ch
For implementation-specific questions: juan.florez-ospina@psi.ch, juanflo16@gmail.com

Funding

This work was funded by the ETH-Domain Open Research Data (ORD) Program – Measure 1.

It is part of the project
“Building FAIR Data Chains for Atmospheric Observations in the ACTRIS Switzerland Network”,
which is described in more detail at the ORD Program project portal.

README.md Unescape Escape

ACSM FAIRifier