# IDEAR Project Name This is a **containerized, JupyterLab-based data toolkit** developed as part of the IDEAR project. It supports efficient, reproducible, and metadata-enriched data processing workflows for instrument-generated datasets. --- ### Key Features - Modular pipeline with reusable notebook workflows - Metadata-driven HDF5 outputs for long-term data reuse - Optional network-mounted input for seamless integration with shared drives --- ### Output Format - **Self-describing HDF5 files**, including: - Project-level, contextual, and data lineage metadata --- ### Extensibility New instruments can be supported by extending the file parsing capabilities in the `dima/` module. ## Repository Structure
Click to expand - `data/` — Input and output datasets (mounted volume) - `figures/` — Output visualizations (mounted volume) - `notebooks/` — Jupyter notebooks for processing and metadata integration - `scripts/` — Supplementary processing logic - `dima/` — Metadata and HDF5 schema utilities (persisted module) - `Dockerfile` — Container image definition - `docker-compose.yaml` — Local and networked deployment options - `env_setup.sh` — Optional local environment bootstrap - `CITATION.cff`, `LICENCE`, `README.md`, `.gitignore`, `.dockerignore` — Project metadata and config - `campaignDescriptor.yaml` — Campaign-specific metadata
--- ## Getting Started ### Requirements #### For Docker-based usage: - **Docker Desktop** - **Git Bash** (for running shell scripts on Windows) #### Optional for local (non-Docker) usage: - **Conda** (`miniconda` or `anaconda`) #### If accessing network drives (e.g., PSI): - PSI credentials with access to mounted network shares --- ## Clone the Repository ```bash git clone --recurse-submodules cd ``` ## Run with Docker This toolkit includes a containerized JupyterLab environment for executing the data processing pipeline, plus an optional dashboard for manual flagging. 1. Open **PowerShell as Administrator** and navigate to the `your-repo-name` repository. 2. Create a `.env` file in the root of `your-repo-name/`. 3. **Securely store your network drive access credentials** in the `.env` file by adding the following lines: ```plaintext CIFS_USER= CIFS_PASS= JUPYTER_TOKEN=my-token NETWORK_MOUNT=//your-server/your-share ``` **To protect your credentials:** - Do not share the .env file with others. - Ensure the file is excluded from version control by adding .env to your .gitignore and .dockerignore files. 4. Open **Docker Desktop**, then build the container image: ```bash docker build -f Dockerfile -t idear_processor . ``` 5. Start the environment: - **Locally without network drive mount:** Regardless of value in .env, `NETWORK_MOUNT` defaults to `/data/`. ```bash docker compose up idear_processor - **With network drive mount:** ```bash docker compose up idear_processor_networked 6. Access: - **Jupyter Lab**: [http://localhost:8889/lab/](http://localhost:8889/lab/) 7. Stop the app: In the previously open PowerShell terminal, enter: ```bash Ctrl + C ``` After the container is properly Stopped, remove the container process as: ```bash docker rm $(docker ps -aq --filter ancestor=idear_processor) ``` ## (Optional) Set Up the Python Environment > Required only if you plan to run the toolkit outside of Docker If **Git Bash** lacks a suitable Python interpreter, run: ```bash bash env_setup.sh ``` ## Citation ## License