This is a containerized, JupyterLab-based data toolkit developed as part of the IDEAR project. It supports efficient, reproducible, and metadata-enriched data processing workflows for instrument-generated datasets.

Key Features

Modular pipeline with reusable notebook workflows
Metadata-driven HDF5 outputs for long-term data reuse
Optional network-mounted input for seamless integration with shared drives

Output Format

Self-describing HDF5 files, including:
- Project-level, contextual, and data lineage metadata

Extensibility

New instruments can be supported by extending the file parsing capabilities in the dima/ module.

Repository Structure

Click to expand

data/ — Input and output datasets (mounted volume)
figures/ — Output visualizations (mounted volume)
notebooks/ — Jupyter notebooks for processing and metadata integration
scripts/ — Supplementary processing logic
dima/ — Metadata and HDF5 schema utilities (persisted module)
Dockerfile — Container image definition
docker-compose.yaml — Local and networked deployment options
env_setup.sh — Optional local environment bootstrap
CITATION.cff, LICENCE, README.md, .gitignore, .dockerignore — Project metadata and config
campaignDescriptor.yaml — Campaign-specific metadata

Getting Started

Requirements

For Docker-based usage:

Docker Desktop
Git Bash (for running shell scripts on Windows)

Optional for local (non-Docker) usage:

Conda (miniconda or anaconda)

If accessing network drives (e.g., PSI):

PSI credentials with access to mounted network shares

Clone the Repository

git clone --recurse-submodules <your-repo-url>
cd <your-repo-name>

Run with Docker

This toolkit includes a containerized JupyterLab environment for executing the data processing pipeline, plus an optional dashboard for manual flagging.

Open PowerShell as Administrator and navigate to the your-repo-name repository.
Create a .env file in the root of your-repo-name/.
Securely store your network drive access credentials in the .env file by adding the following lines:
```
CIFS_USER=<your-username>
CIFS_PASS=<your-password>
JUPYTER_TOKEN=my-token
NETWORK_MOUNT=//your-server/your-share
```
To protect your credentials:
- Do not share the .env file with others.
- Ensure the file is excluded from version control by adding .env to your .gitignore and .dockerignore files.
Open Docker Desktop, then build the container image:
```
docker build -f Dockerfile -t idear_processor .
```
Start the environment:

Locally without network drive mount: Regardless of value in .env, NETWORK_MOUNT defaults to <your-repo-name>/data/.
```
docker compose up idear_processor
```

With network drive mount:

docker compose up idear_processor_networked

Access:
- Jupyter Lab: http://localhost:8889/lab/
Stop the app: In the previously open PowerShell terminal, enter:
```
Ctrl + C
```
After the container is properly Stopped, remove the container process as:
```
docker rm $(docker ps -aq --filter ancestor=idear_processor)
```

(Optional) Set Up the Python Environment

Required only if you plan to run the toolkit outside of Docker

If Git Bash lacks a suitable Python interpreter, run:

   bash env_setup.sh

README.md

IDEAR Project Name