Update readme with getting started section
This commit is contained in:
96
README.md
96
README.md
@ -1,11 +1,6 @@
|
||||
# DIMA: Data Integration and Metadata Annotation
|
||||
|
||||
DIMA (Data Integration and Metadata Annotation) is a Python package designed for the **Laboratory of Atmospheric Chemistry** to support the integration of multi-instrument data in HDF5 format. It is tailored for data from diverse experimental campaigns, including:
|
||||
|
||||
- **beamtimes**,
|
||||
- **kinetic flowtube studies**,
|
||||
- **smog chamber experiments**, and
|
||||
- **field campaigns**.
|
||||
DIMA (Data Integration and Metadata Annotation) is a Python package designed for the **Laboratory of Atmospheric Chemistry** to support the integration of multi-instrument data in HDF5 format. It is tailored for data from diverse experimental campaigns, including **beamtimes**, **kinetic flowtube studies**, **smog chamber experiments**, and **field campaigns**.
|
||||
|
||||
## Key Features
|
||||
|
||||
@ -21,44 +16,63 @@ DIMA provides reusable operations for data integration, manipulation, and extrac
|
||||
4. **Jupyter notebooks**
|
||||
Demonstrates DIMA’s core functionalities, such as data integration, HDF5 file creation, visualization, and metadata annotation. Key notebooks include examples for data sharing, OpenBis ETL, and workflow demos.
|
||||
|
||||
## Adaptability to Experimental Campaign Needs
|
||||
|
||||
The `instruments/` module is designed to be highly adaptable, accommodating new instrument types or file reading capabilities with minimal code refactoring. The module is complemented by instrument-specific dictionaries of terms in YAML format, which facilitate automated annotation of observed variables with:
|
||||
- `standard_name`
|
||||
- `units`
|
||||
- `description`
|
||||
|
||||
as suggested by [CF metadata conventions](http://cfconventions.org/).
|
||||
### Versioning and Community Collaboration
|
||||
The instrument-specific dictionaries in YAML format provide a human readable interface for community-based development of instrument vocabularies. These descriptions can potentially be enhanced with semantic annotations for interoperability across research domains.
|
||||
|
||||
## Repository Structure
|
||||
|
||||
|
||||
|
||||
## Requirements
|
||||
|
||||
For **Windows** users, the following are required:
|
||||
|
||||
1. **Git Bash**: Install [Git Bash](https://git-scm.com/downloads) to run shell scripts (`.sh` files).
|
||||
|
||||
2. **Conda**: Install [Anaconda](https://www.anaconda.com/products/individual) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html).
|
||||
|
||||
3. **PSI Network Access**: Ensure access to PSI’s network and access rights to source drives for retrieving campaign data from YAML files in the `input_files/` folder.
|
||||
|
||||
:bulb: **Tip**: Editing your system’s PATH variable ensures both Conda and Git are available in the terminal environment used by Git Bash.
|
||||
|
||||
|
||||
## Getting Started
|
||||
|
||||
### Download DIMA
|
||||
|
||||
Open a **Git Bash** terminal.
|
||||
|
||||
Navigate to your `GitLab` folder, clone the repository, and navigate to the `dima` folder as follows:
|
||||
|
||||
```bash
|
||||
cd path/to/GitLab
|
||||
git clone --recurse-submodules https://gitlab.psi.ch/5505/dima.git
|
||||
cd dima
|
||||
```
|
||||
|
||||
### Install Python Interpreter
|
||||
|
||||
Open **Git Bash** terminal.
|
||||
|
||||
**Option 1**: Install a suitable conda environment `pyenv5505` inside the repository `dima` as follows:
|
||||
|
||||
```bash
|
||||
cd path/to/GitLab/dima
|
||||
Bash setup_env.sh
|
||||
```
|
||||
|
||||
Open **Anaconda Prompt** or a terminal with access to conda.
|
||||
|
||||
**Option 2**: Install conda enviroment from YAML file as follows:
|
||||
```bash
|
||||
cd path/to/GitLab/dima
|
||||
conda env create --file environment.yml
|
||||
```
|
||||
|
||||
## Software arquitecture
|
||||
|
||||
<p align="center">
|
||||
<img src="docs/software_arquitecture_diagram.svg" alt="Alt Text">
|
||||
</p>
|
||||
|
||||
## Installation
|
||||
|
||||
Follow these steps to install and set up the project:
|
||||
|
||||
1. Download our GitLab repository in your GitLab folder, or alternatively open a Git Bash terminal and run the following commands:
|
||||
```
|
||||
cd Path/to/GitLab
|
||||
git clone https://gitlab.psi.ch/5505/data-integration-and-metadata-annotation.git
|
||||
```
|
||||
|
||||
2. Open an Anaconda Prompt (Anaconda3) as administrator, and set the current directory to the path of the project's folder.
|
||||
|
||||
3. Create the project's environment `multiphase_chemistry_env` by running the following command:
|
||||
```
|
||||
conda env create -f environment.yml
|
||||
```
|
||||
|
||||
### Working with Jupyter Notebook on the `multiphase_chemistry_env`
|
||||
## Working with Jupyter Notebook on the `multiphase_chemistry_env`
|
||||
|
||||
1. Open an Anaconda Prompt as a regular user, ensure that `multiphase_chemistry_env` is in the list of available enviroments and activate it by running the following commands:
|
||||
```
|
||||
@ -99,7 +113,16 @@ and select the `multiphase_chemistry_env` environment from the kernel options.
|
||||
| processing_filename | - | Denotes the name of the file used to process an initial version (e.g, original version) of the dataset into a processed dataset. |
|
||||
| processing_date | - | The date when the data processing was completed. | |
|
||||
|
||||
## Adaptability to Experimental Campaign Needs
|
||||
|
||||
The `instruments/` module is designed to be highly adaptable, accommodating new instrument types or file reading capabilities with minimal code refactoring. The module is complemented by instrument-specific dictionaries of terms in YAML format, which facilitate automated annotation of observed variables with:
|
||||
- `standard_name`
|
||||
- `units`
|
||||
- `description`
|
||||
|
||||
as suggested by [CF metadata conventions](http://cfconventions.org/).
|
||||
### Versioning and Community Collaboration
|
||||
The instrument-specific dictionaries in YAML format provide a human readable interface for community-based development of instrument vocabularies. These descriptions can potentially be enhanced with semantic annotations for interoperability across research domains.
|
||||
|
||||
### Specifying a compound attribute in yaml language.
|
||||
Consider the compound attribute *relative_humidity*, which has subattributes *value*, *units*, *range*, and *definition*. The yaml description of
|
||||
@ -126,7 +149,7 @@ relative_humidity:
|
||||
|
||||
```
|
||||
|
||||
# How to Extend DIMA’s File Reading Capabilities for New Instruments
|
||||
# Extend DIMA’s file reading capabilities for new instruments
|
||||
|
||||
We now explain how to extend DIMA's file-reading capabilities by adding support for a new instrument. The process involves adding instrument-specific files and registering the new instrument's file reader.
|
||||
|
||||
@ -158,7 +181,6 @@ file_extensions.append('.json')
|
||||
file_readers.update({'ACSM_TOFWARE_flags_json' : lambda x: read_jsonflag_as_dict(x)})
|
||||
```
|
||||
|
||||
## -------------------
|
||||
## Getting started
|
||||
|
||||
To make it easy for you to get started with GitLab, here's a list of recommended next steps.
|
||||
|
Reference in New Issue
Block a user