133 lines
3.7 KiB
Markdown
133 lines
3.7 KiB
Markdown
# IDEAR Project Name
|
|
|
|
This is a **containerized, JupyterLab-based data toolkit** developed as part of the IDEAR project. It supports efficient, reproducible, and metadata-enriched data processing workflows for instrument-generated datasets.
|
|
|
|
---
|
|
|
|
### Key Features
|
|
|
|
- Modular pipeline with reusable notebook workflows
|
|
- Metadata-driven HDF5 outputs for long-term data reuse
|
|
- Optional network-mounted input for seamless integration with shared drives
|
|
|
|
---
|
|
|
|
### Output Format
|
|
|
|
- **Self-describing HDF5 files**, including:
|
|
- Project-level, contextual, and data lineage metadata
|
|
|
|
---
|
|
|
|
### Extensibility
|
|
|
|
New instruments can be supported by extending the file parsing capabilities in the `dima/` module.
|
|
|
|
|
|
|
|
## Repository Structure
|
|
|
|
<details>
|
|
<summary><b>Click to expand</b></summary>
|
|
|
|
- `data/` — Input and output datasets (mounted volume)
|
|
- `figures/` — Output visualizations (mounted volume)
|
|
- `notebooks/` — Jupyter notebooks for processing and metadata integration
|
|
- `scripts/` — Supplementary processing logic
|
|
- `dima/` — Metadata and HDF5 schema utilities (persisted module)
|
|
- `Dockerfile` — Container image definition
|
|
- `docker-compose.yaml` — Local and networked deployment options
|
|
- `env_setup.sh` — Optional local environment bootstrap
|
|
- `CITATION.cff`, `LICENCE`, `README.md`, `.gitignore`, `.dockerignore` — Project metadata and config
|
|
- `campaignDescriptor.yaml` — Campaign-specific metadata
|
|
|
|
</details>
|
|
|
|
---
|
|
|
|
## Getting Started
|
|
|
|
### Requirements
|
|
|
|
#### For Docker-based usage:
|
|
|
|
- **Docker Desktop**
|
|
- **Git Bash** (for running shell scripts on Windows)
|
|
|
|
#### Optional for local (non-Docker) usage:
|
|
|
|
- **Conda** (`miniconda` or `anaconda`)
|
|
|
|
#### If accessing network drives (e.g., PSI):
|
|
|
|
- PSI credentials with access to mounted network shares
|
|
|
|
---
|
|
|
|
## Clone the Repository
|
|
|
|
```bash
|
|
git clone --recurse-submodules <your-repo-url>
|
|
cd <your-repo-name>
|
|
```
|
|
|
|
## Run with Docker
|
|
|
|
This toolkit includes a containerized JupyterLab environment for executing the data processing pipeline, plus an optional dashboard for manual flagging.
|
|
|
|
1. Open **PowerShell as Administrator** and navigate to the `your-repo-name` repository.
|
|
2. Create a `.env` file in the root of `your-repo-name/`.
|
|
3. **Securely store your network drive access credentials** in the `.env` file by adding the following lines:
|
|
```plaintext
|
|
CIFS_USER=<your-username>
|
|
CIFS_PASS=<your-password>
|
|
JUPYTER_TOKEN=my-token
|
|
NETWORK_MOUNT=//your-server/your-share
|
|
```
|
|
**To protect your credentials:**
|
|
- Do not share the .env file with others.
|
|
- Ensure the file is excluded from version control by adding .env to your .gitignore and .dockerignore files.
|
|
4. Open **Docker Desktop**, then build the container image:
|
|
```bash
|
|
docker build -f Dockerfile -t idear_processor .
|
|
```
|
|
5. Start the environment:
|
|
|
|
- **Locally without network drive mount:**
|
|
Regardless of value in .env, `NETWORK_MOUNT` defaults to `<your-repo-name>/data/`.
|
|
```bash
|
|
docker compose up idear_processor
|
|
|
|
- **With network drive mount:**
|
|
|
|
```bash
|
|
docker compose up idear_processor_networked
|
|
|
|
6. Access:
|
|
- **Jupyter Lab**: [http://localhost:8889/lab/](http://localhost:8889/lab/)
|
|
|
|
7. Stop the app:
|
|
In the previously open PowerShell terminal, enter:
|
|
```bash
|
|
Ctrl + C
|
|
```
|
|
After the container is properly Stopped, remove the container process as:
|
|
```bash
|
|
docker rm $(docker ps -aq --filter ancestor=idear_processor)
|
|
```
|
|
|
|
|
|
## (Optional) Set Up the Python Environment
|
|
|
|
> Required only if you plan to run the toolkit outside of Docker
|
|
|
|
If **Git Bash** lacks a suitable Python interpreter, run:
|
|
|
|
```bash
|
|
bash env_setup.sh
|
|
```
|
|
|
|
## Citation
|
|
|
|
## License
|