Update readme with getting started section

This commit is contained in:
2024-12-04 16:24:14 +01:00
parent 3e37854445
commit 49dff5b87b

View File

@ -1,11 +1,6 @@
# DIMA: Data Integration and Metadata Annotation
DIMA (Data Integration and Metadata Annotation) is a Python package designed for the **Laboratory of Atmospheric Chemistry** to support the integration of multi-instrument data in HDF5 format. It is tailored for data from diverse experimental campaigns, including:
- **beamtimes**,
- **kinetic flowtube studies**,
- **smog chamber experiments**, and
- **field campaigns**.
DIMA (Data Integration and Metadata Annotation) is a Python package designed for the **Laboratory of Atmospheric Chemistry** to support the integration of multi-instrument data in HDF5 format. It is tailored for data from diverse experimental campaigns, including **beamtimes**, **kinetic flowtube studies**, **smog chamber experiments**, and **field campaigns**.
## Key Features
@ -21,44 +16,63 @@ DIMA provides reusable operations for data integration, manipulation, and extrac
4. **Jupyter notebooks**
Demonstrates DIMAs core functionalities, such as data integration, HDF5 file creation, visualization, and metadata annotation. Key notebooks include examples for data sharing, OpenBis ETL, and workflow demos.
## Adaptability to Experimental Campaign Needs
The `instruments/` module is designed to be highly adaptable, accommodating new instrument types or file reading capabilities with minimal code refactoring. The module is complemented by instrument-specific dictionaries of terms in YAML format, which facilitate automated annotation of observed variables with:
- `standard_name`
- `units`
- `description`
as suggested by [CF metadata conventions](http://cfconventions.org/).
### Versioning and Community Collaboration
The instrument-specific dictionaries in YAML format provide a human readable interface for community-based development of instrument vocabularies. These descriptions can potentially be enhanced with semantic annotations for interoperability across research domains.
## Repository Structure
## Requirements
For **Windows** users, the following are required:
1. **Git Bash**: Install [Git Bash](https://git-scm.com/downloads) to run shell scripts (`.sh` files).
2. **Conda**: Install [Anaconda](https://www.anaconda.com/products/individual) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html).
3. **PSI Network Access**: Ensure access to PSIs network and access rights to source drives for retrieving campaign data from YAML files in the `input_files/` folder.
:bulb: **Tip**: Editing your systems PATH variable ensures both Conda and Git are available in the terminal environment used by Git Bash.
## Getting Started
### Download DIMA
Open a **Git Bash** terminal.
Navigate to your `GitLab` folder, clone the repository, and navigate to the `dima` folder as follows:
```bash
cd path/to/GitLab
git clone --recurse-submodules https://gitlab.psi.ch/5505/dima.git
cd dima
```
### Install Python Interpreter
Open **Git Bash** terminal.
**Option 1**: Install a suitable conda environment `pyenv5505` inside the repository `dima` as follows:
```bash
cd path/to/GitLab/dima
Bash setup_env.sh
```
Open **Anaconda Prompt** or a terminal with access to conda.
**Option 2**: Install conda enviroment from YAML file as follows:
```bash
cd path/to/GitLab/dima
conda env create --file environment.yml
```
## Software arquitecture
<p align="center">
<img src="docs/software_arquitecture_diagram.svg" alt="Alt Text">
</p>
## Installation
Follow these steps to install and set up the project:
1. Download our GitLab repository in your GitLab folder, or alternatively open a Git Bash terminal and run the following commands:
```
cd Path/to/GitLab
git clone https://gitlab.psi.ch/5505/data-integration-and-metadata-annotation.git
```
2. Open an Anaconda Prompt (Anaconda3) as administrator, and set the current directory to the path of the project's folder.
3. Create the project's environment `multiphase_chemistry_env` by running the following command:
```
conda env create -f environment.yml
```
### Working with Jupyter Notebook on the `multiphase_chemistry_env`
## Working with Jupyter Notebook on the `multiphase_chemistry_env`
1. Open an Anaconda Prompt as a regular user, ensure that `multiphase_chemistry_env` is in the list of available enviroments and activate it by running the following commands:
```
@ -99,7 +113,16 @@ and select the `multiphase_chemistry_env` environment from the kernel options.
| processing_filename | - | Denotes the name of the file used to process an initial version (e.g, original version) of the dataset into a processed dataset. |
| processing_date | - | The date when the data processing was completed. | |
## Adaptability to Experimental Campaign Needs
The `instruments/` module is designed to be highly adaptable, accommodating new instrument types or file reading capabilities with minimal code refactoring. The module is complemented by instrument-specific dictionaries of terms in YAML format, which facilitate automated annotation of observed variables with:
- `standard_name`
- `units`
- `description`
as suggested by [CF metadata conventions](http://cfconventions.org/).
### Versioning and Community Collaboration
The instrument-specific dictionaries in YAML format provide a human readable interface for community-based development of instrument vocabularies. These descriptions can potentially be enhanced with semantic annotations for interoperability across research domains.
### Specifying a compound attribute in yaml language.
Consider the compound attribute *relative_humidity*, which has subattributes *value*, *units*, *range*, and *definition*. The yaml description of
@ -126,7 +149,7 @@ relative_humidity:
```
# How to Extend DIMAs File Reading Capabilities for New Instruments
# Extend DIMAs file reading capabilities for new instruments
We now explain how to extend DIMA's file-reading capabilities by adding support for a new instrument. The process involves adding instrument-specific files and registering the new instrument's file reader.
@ -158,7 +181,6 @@ file_extensions.append('.json')
file_readers.update({'ACSM_TOFWARE_flags_json' : lambda x: read_jsonflag_as_dict(x)})
```
## -------------------
## Getting started
To make it easy for you to get started with GitLab, here's a list of recommended next steps.