IDEAR Project Name

This is a containerized, JupyterLab-based data toolkit developed as part of the IDEAR project. It supports efficient, reproducible, and metadata-enriched data processing workflows for instrument-generated datasets.


Key Features

  • Modular pipeline with reusable notebook workflows
  • Metadata-driven HDF5 outputs for long-term data reuse
  • Optional network-mounted input for seamless integration with shared drives

Output Format

  • Self-describing HDF5 files, including:
    • Project-level, contextual, and data lineage metadata

Extensibility

New instruments can be supported by extending the file parsing capabilities in the dima/ module.

Repository Structure

Click to expand
  • data/ — Input and output datasets (mounted volume)
  • figures/ — Output visualizations (mounted volume)
  • notebooks/ — Jupyter notebooks for processing and metadata integration
  • scripts/ — Supplementary processing logic
  • dima/ — Metadata and HDF5 schema utilities (persisted module)
  • Dockerfile — Container image definition
  • docker-compose.yaml — Local and networked deployment options
  • env_setup.sh — Optional local environment bootstrap
  • CITATION.cff, LICENCE, README.md, .gitignore, .dockerignore — Project metadata and config
  • campaignDescriptor.yaml — Campaign-specific metadata

Getting Started

Requirements

For Docker-based usage:

  • Docker Desktop
  • Git Bash (for running shell scripts on Windows)

Optional for local (non-Docker) usage:

  • Conda (miniconda or anaconda)

If accessing network drives (e.g., PSI):

  • PSI credentials with access to mounted network shares

Clone the Repository

git clone --recurse-submodules <your-repo-url>
cd <your-repo-name>

Run with Docker

This toolkit includes a containerized JupyterLab environment for executing the data processing pipeline, plus an optional dashboard for manual flagging.

  1. Open PowerShell as Administrator and navigate to the your-repo-name repository.
  2. Create a .env file in the root of your-repo-name/.
  3. Securely store your network drive access credentials in the .env file by adding the following lines:
    CIFS_USER=<your-username>
    CIFS_PASS=<your-password>
    JUPYTER_TOKEN=my-token
    NETWORK_MOUNT=//your-server/your-share
    
    To protect your credentials:
    • Do not share the .env file with others.
    • Ensure the file is excluded from version control by adding .env to your .gitignore and .dockerignore files.
  4. Open Docker Desktop, then build the container image:
    docker build -f Dockerfile -t idear_processor .
    
  5. Start the environment:
  • Locally without network drive mount: Regardless of value in .env, NETWORK_MOUNT defaults to <your-repo-name>/data/.

    docker compose up idear_processor
    
    
  • With network drive mount:

    docker compose up idear_processor_networked
    
    
  1. Access:

  2. Stop the app: In the previously open PowerShell terminal, enter:

    Ctrl + C
    

    After the container is properly Stopped, remove the container process as:

    docker rm $(docker ps -aq --filter ancestor=idear_processor)
    

(Optional) Set Up the Python Environment

Required only if you plan to run the toolkit outside of Docker

If Git Bash lacks a suitable Python interpreter, run:

   bash env_setup.sh

Citation

License

Description
IDEAR (Improve Data Exchange And Reuse) is a project template and lightweight framework based on DIMA and Docker, designed to support structured, reproducible workflows for scientific data analysis and curation.
Readme 46 KiB
Languages
Jupyter Notebook 73.2%
Dockerfile 26.8%