3.7 KiB
IDEAR Project Name
This is a containerized, JupyterLab-based data toolkit developed as part of the IDEAR project. It supports efficient, reproducible, and metadata-enriched data processing workflows for instrument-generated datasets.
Key Features
- Modular pipeline with reusable notebook workflows
- Metadata-driven HDF5 outputs for long-term data reuse
- Optional network-mounted input for seamless integration with shared drives
Output Format
- Self-describing HDF5 files, including:
- Project-level, contextual, and data lineage metadata
Extensibility
New instruments can be supported by extending the file parsing capabilities in the dima/
module.
Repository Structure
Click to expand
data/
— Input and output datasets (mounted volume)figures/
— Output visualizations (mounted volume)notebooks/
— Jupyter notebooks for processing and metadata integrationscripts/
— Supplementary processing logicdima/
— Metadata and HDF5 schema utilities (persisted module)Dockerfile
— Container image definitiondocker-compose.yaml
— Local and networked deployment optionsenv_setup.sh
— Optional local environment bootstrapCITATION.cff
,LICENCE
,README.md
,.gitignore
,.dockerignore
— Project metadata and configcampaignDescriptor.yaml
— Campaign-specific metadata
Getting Started
Requirements
For Docker-based usage:
- Docker Desktop
- Git Bash (for running shell scripts on Windows)
Optional for local (non-Docker) usage:
- Conda (
miniconda
oranaconda
)
If accessing network drives (e.g., PSI):
- PSI credentials with access to mounted network shares
Clone the Repository
git clone --recurse-submodules <your-repo-url>
cd <your-repo-name>
Run with Docker
This toolkit includes a containerized JupyterLab environment for executing the data processing pipeline, plus an optional dashboard for manual flagging.
- Open PowerShell as Administrator and navigate to the
your-repo-name
repository. - Create a
.env
file in the root ofyour-repo-name/
. - Securely store your network drive access credentials in the
.env
file by adding the following lines:To protect your credentials:CIFS_USER=<your-username> CIFS_PASS=<your-password> JUPYTER_TOKEN=my-token NETWORK_MOUNT=//your-server/your-share
- Do not share the .env file with others.
- Ensure the file is excluded from version control by adding .env to your .gitignore and .dockerignore files.
- Open Docker Desktop, then build the container image:
docker build -f Dockerfile -t idear_processor .
- Start the environment:
-
Locally without network drive mount: Regardless of value in .env,
NETWORK_MOUNT
defaults to<your-repo-name>/data/
.docker compose up idear_processor
-
With network drive mount:
docker compose up idear_processor_networked
-
Access:
- Jupyter Lab: http://localhost:8889/lab/
-
Stop the app: In the previously open PowerShell terminal, enter:
Ctrl + C
After the container is properly Stopped, remove the container process as:
docker rm $(docker ps -aq --filter ancestor=idear_processor)
(Optional) Set Up the Python Environment
Required only if you plan to run the toolkit outside of Docker
If Git Bash lacks a suitable Python interpreter, run:
bash env_setup.sh