Update readme with getting started section

This commit is contained in:
2024-12-04 16:24:14 +01:00
parent 3e37854445
commit 49dff5b87b

View File

@ -1,11 +1,6 @@
# DIMA: Data Integration and Metadata Annotation # DIMA: Data Integration and Metadata Annotation
DIMA (Data Integration and Metadata Annotation) is a Python package designed for the **Laboratory of Atmospheric Chemistry** to support the integration of multi-instrument data in HDF5 format. It is tailored for data from diverse experimental campaigns, including: DIMA (Data Integration and Metadata Annotation) is a Python package designed for the **Laboratory of Atmospheric Chemistry** to support the integration of multi-instrument data in HDF5 format. It is tailored for data from diverse experimental campaigns, including **beamtimes**, **kinetic flowtube studies**, **smog chamber experiments**, and **field campaigns**.
- **beamtimes**,
- **kinetic flowtube studies**,
- **smog chamber experiments**, and
- **field campaigns**.
## Key Features ## Key Features
@ -21,44 +16,63 @@ DIMA provides reusable operations for data integration, manipulation, and extrac
4. **Jupyter notebooks** 4. **Jupyter notebooks**
Demonstrates DIMAs core functionalities, such as data integration, HDF5 file creation, visualization, and metadata annotation. Key notebooks include examples for data sharing, OpenBis ETL, and workflow demos. Demonstrates DIMAs core functionalities, such as data integration, HDF5 file creation, visualization, and metadata annotation. Key notebooks include examples for data sharing, OpenBis ETL, and workflow demos.
## Adaptability to Experimental Campaign Needs
The `instruments/` module is designed to be highly adaptable, accommodating new instrument types or file reading capabilities with minimal code refactoring. The module is complemented by instrument-specific dictionaries of terms in YAML format, which facilitate automated annotation of observed variables with:
- `standard_name`
- `units`
- `description`
as suggested by [CF metadata conventions](http://cfconventions.org/).
### Versioning and Community Collaboration
The instrument-specific dictionaries in YAML format provide a human readable interface for community-based development of instrument vocabularies. These descriptions can potentially be enhanced with semantic annotations for interoperability across research domains.
## Repository Structure ## Repository Structure
## Requirements
For **Windows** users, the following are required:
1. **Git Bash**: Install [Git Bash](https://git-scm.com/downloads) to run shell scripts (`.sh` files).
2. **Conda**: Install [Anaconda](https://www.anaconda.com/products/individual) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html).
3. **PSI Network Access**: Ensure access to PSIs network and access rights to source drives for retrieving campaign data from YAML files in the `input_files/` folder.
:bulb: **Tip**: Editing your systems PATH variable ensures both Conda and Git are available in the terminal environment used by Git Bash.
## Getting Started
### Download DIMA
Open a **Git Bash** terminal.
Navigate to your `GitLab` folder, clone the repository, and navigate to the `dima` folder as follows:
```bash
cd path/to/GitLab
git clone --recurse-submodules https://gitlab.psi.ch/5505/dima.git
cd dima
```
### Install Python Interpreter
Open **Git Bash** terminal.
**Option 1**: Install a suitable conda environment `pyenv5505` inside the repository `dima` as follows:
```bash
cd path/to/GitLab/dima
Bash setup_env.sh
```
Open **Anaconda Prompt** or a terminal with access to conda.
**Option 2**: Install conda enviroment from YAML file as follows:
```bash
cd path/to/GitLab/dima
conda env create --file environment.yml
```
## Software arquitecture ## Software arquitecture
<p align="center"> <p align="center">
<img src="docs/software_arquitecture_diagram.svg" alt="Alt Text"> <img src="docs/software_arquitecture_diagram.svg" alt="Alt Text">
</p> </p>
## Installation ## Working with Jupyter Notebook on the `multiphase_chemistry_env`
Follow these steps to install and set up the project:
1. Download our GitLab repository in your GitLab folder, or alternatively open a Git Bash terminal and run the following commands:
```
cd Path/to/GitLab
git clone https://gitlab.psi.ch/5505/data-integration-and-metadata-annotation.git
```
2. Open an Anaconda Prompt (Anaconda3) as administrator, and set the current directory to the path of the project's folder.
3. Create the project's environment `multiphase_chemistry_env` by running the following command:
```
conda env create -f environment.yml
```
### Working with Jupyter Notebook on the `multiphase_chemistry_env`
1. Open an Anaconda Prompt as a regular user, ensure that `multiphase_chemistry_env` is in the list of available enviroments and activate it by running the following commands: 1. Open an Anaconda Prompt as a regular user, ensure that `multiphase_chemistry_env` is in the list of available enviroments and activate it by running the following commands:
``` ```
@ -99,7 +113,16 @@ and select the `multiphase_chemistry_env` environment from the kernel options.
| processing_filename | - | Denotes the name of the file used to process an initial version (e.g, original version) of the dataset into a processed dataset. | | processing_filename | - | Denotes the name of the file used to process an initial version (e.g, original version) of the dataset into a processed dataset. |
| processing_date | - | The date when the data processing was completed. | | | processing_date | - | The date when the data processing was completed. | |
## Adaptability to Experimental Campaign Needs
The `instruments/` module is designed to be highly adaptable, accommodating new instrument types or file reading capabilities with minimal code refactoring. The module is complemented by instrument-specific dictionaries of terms in YAML format, which facilitate automated annotation of observed variables with:
- `standard_name`
- `units`
- `description`
as suggested by [CF metadata conventions](http://cfconventions.org/).
### Versioning and Community Collaboration
The instrument-specific dictionaries in YAML format provide a human readable interface for community-based development of instrument vocabularies. These descriptions can potentially be enhanced with semantic annotations for interoperability across research domains.
### Specifying a compound attribute in yaml language. ### Specifying a compound attribute in yaml language.
Consider the compound attribute *relative_humidity*, which has subattributes *value*, *units*, *range*, and *definition*. The yaml description of Consider the compound attribute *relative_humidity*, which has subattributes *value*, *units*, *range*, and *definition*. The yaml description of
@ -126,7 +149,7 @@ relative_humidity:
``` ```
# How to Extend DIMAs File Reading Capabilities for New Instruments # Extend DIMAs file reading capabilities for new instruments
We now explain how to extend DIMA's file-reading capabilities by adding support for a new instrument. The process involves adding instrument-specific files and registering the new instrument's file reader. We now explain how to extend DIMA's file-reading capabilities by adding support for a new instrument. The process involves adding instrument-specific files and registering the new instrument's file reader.
@ -158,7 +181,6 @@ file_extensions.append('.json')
file_readers.update({'ACSM_TOFWARE_flags_json' : lambda x: read_jsonflag_as_dict(x)}) file_readers.update({'ACSM_TOFWARE_flags_json' : lambda x: read_jsonflag_as_dict(x)})
``` ```
## -------------------
## Getting started ## Getting started
To make it easy for you to get started with GitLab, here's a list of recommended next steps. To make it easy for you to get started with GitLab, here's a list of recommended next steps.