Update readme with getting started section
This commit is contained in:
96
README.md
96
README.md
@ -1,11 +1,6 @@
|
|||||||
# DIMA: Data Integration and Metadata Annotation
|
# DIMA: Data Integration and Metadata Annotation
|
||||||
|
|
||||||
DIMA (Data Integration and Metadata Annotation) is a Python package designed for the **Laboratory of Atmospheric Chemistry** to support the integration of multi-instrument data in HDF5 format. It is tailored for data from diverse experimental campaigns, including:
|
DIMA (Data Integration and Metadata Annotation) is a Python package designed for the **Laboratory of Atmospheric Chemistry** to support the integration of multi-instrument data in HDF5 format. It is tailored for data from diverse experimental campaigns, including **beamtimes**, **kinetic flowtube studies**, **smog chamber experiments**, and **field campaigns**.
|
||||||
|
|
||||||
- **beamtimes**,
|
|
||||||
- **kinetic flowtube studies**,
|
|
||||||
- **smog chamber experiments**, and
|
|
||||||
- **field campaigns**.
|
|
||||||
|
|
||||||
## Key Features
|
## Key Features
|
||||||
|
|
||||||
@ -21,44 +16,63 @@ DIMA provides reusable operations for data integration, manipulation, and extrac
|
|||||||
4. **Jupyter notebooks**
|
4. **Jupyter notebooks**
|
||||||
Demonstrates DIMA’s core functionalities, such as data integration, HDF5 file creation, visualization, and metadata annotation. Key notebooks include examples for data sharing, OpenBis ETL, and workflow demos.
|
Demonstrates DIMA’s core functionalities, such as data integration, HDF5 file creation, visualization, and metadata annotation. Key notebooks include examples for data sharing, OpenBis ETL, and workflow demos.
|
||||||
|
|
||||||
## Adaptability to Experimental Campaign Needs
|
|
||||||
|
|
||||||
The `instruments/` module is designed to be highly adaptable, accommodating new instrument types or file reading capabilities with minimal code refactoring. The module is complemented by instrument-specific dictionaries of terms in YAML format, which facilitate automated annotation of observed variables with:
|
|
||||||
- `standard_name`
|
|
||||||
- `units`
|
|
||||||
- `description`
|
|
||||||
|
|
||||||
as suggested by [CF metadata conventions](http://cfconventions.org/).
|
|
||||||
### Versioning and Community Collaboration
|
|
||||||
The instrument-specific dictionaries in YAML format provide a human readable interface for community-based development of instrument vocabularies. These descriptions can potentially be enhanced with semantic annotations for interoperability across research domains.
|
|
||||||
|
|
||||||
## Repository Structure
|
## Repository Structure
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
For **Windows** users, the following are required:
|
||||||
|
|
||||||
|
1. **Git Bash**: Install [Git Bash](https://git-scm.com/downloads) to run shell scripts (`.sh` files).
|
||||||
|
|
||||||
|
2. **Conda**: Install [Anaconda](https://www.anaconda.com/products/individual) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html).
|
||||||
|
|
||||||
|
3. **PSI Network Access**: Ensure access to PSI’s network and access rights to source drives for retrieving campaign data from YAML files in the `input_files/` folder.
|
||||||
|
|
||||||
|
:bulb: **Tip**: Editing your system’s PATH variable ensures both Conda and Git are available in the terminal environment used by Git Bash.
|
||||||
|
|
||||||
|
|
||||||
|
## Getting Started
|
||||||
|
|
||||||
|
### Download DIMA
|
||||||
|
|
||||||
|
Open a **Git Bash** terminal.
|
||||||
|
|
||||||
|
Navigate to your `GitLab` folder, clone the repository, and navigate to the `dima` folder as follows:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd path/to/GitLab
|
||||||
|
git clone --recurse-submodules https://gitlab.psi.ch/5505/dima.git
|
||||||
|
cd dima
|
||||||
|
```
|
||||||
|
|
||||||
|
### Install Python Interpreter
|
||||||
|
|
||||||
|
Open **Git Bash** terminal.
|
||||||
|
|
||||||
|
**Option 1**: Install a suitable conda environment `pyenv5505` inside the repository `dima` as follows:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd path/to/GitLab/dima
|
||||||
|
Bash setup_env.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Open **Anaconda Prompt** or a terminal with access to conda.
|
||||||
|
|
||||||
|
**Option 2**: Install conda enviroment from YAML file as follows:
|
||||||
|
```bash
|
||||||
|
cd path/to/GitLab/dima
|
||||||
|
conda env create --file environment.yml
|
||||||
|
```
|
||||||
|
|
||||||
## Software arquitecture
|
## Software arquitecture
|
||||||
|
|
||||||
<p align="center">
|
<p align="center">
|
||||||
<img src="docs/software_arquitecture_diagram.svg" alt="Alt Text">
|
<img src="docs/software_arquitecture_diagram.svg" alt="Alt Text">
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
## Installation
|
## Working with Jupyter Notebook on the `multiphase_chemistry_env`
|
||||||
|
|
||||||
Follow these steps to install and set up the project:
|
|
||||||
|
|
||||||
1. Download our GitLab repository in your GitLab folder, or alternatively open a Git Bash terminal and run the following commands:
|
|
||||||
```
|
|
||||||
cd Path/to/GitLab
|
|
||||||
git clone https://gitlab.psi.ch/5505/data-integration-and-metadata-annotation.git
|
|
||||||
```
|
|
||||||
|
|
||||||
2. Open an Anaconda Prompt (Anaconda3) as administrator, and set the current directory to the path of the project's folder.
|
|
||||||
|
|
||||||
3. Create the project's environment `multiphase_chemistry_env` by running the following command:
|
|
||||||
```
|
|
||||||
conda env create -f environment.yml
|
|
||||||
```
|
|
||||||
|
|
||||||
### Working with Jupyter Notebook on the `multiphase_chemistry_env`
|
|
||||||
|
|
||||||
1. Open an Anaconda Prompt as a regular user, ensure that `multiphase_chemistry_env` is in the list of available enviroments and activate it by running the following commands:
|
1. Open an Anaconda Prompt as a regular user, ensure that `multiphase_chemistry_env` is in the list of available enviroments and activate it by running the following commands:
|
||||||
```
|
```
|
||||||
@ -99,7 +113,16 @@ and select the `multiphase_chemistry_env` environment from the kernel options.
|
|||||||
| processing_filename | - | Denotes the name of the file used to process an initial version (e.g, original version) of the dataset into a processed dataset. |
|
| processing_filename | - | Denotes the name of the file used to process an initial version (e.g, original version) of the dataset into a processed dataset. |
|
||||||
| processing_date | - | The date when the data processing was completed. | |
|
| processing_date | - | The date when the data processing was completed. | |
|
||||||
|
|
||||||
|
## Adaptability to Experimental Campaign Needs
|
||||||
|
|
||||||
|
The `instruments/` module is designed to be highly adaptable, accommodating new instrument types or file reading capabilities with minimal code refactoring. The module is complemented by instrument-specific dictionaries of terms in YAML format, which facilitate automated annotation of observed variables with:
|
||||||
|
- `standard_name`
|
||||||
|
- `units`
|
||||||
|
- `description`
|
||||||
|
|
||||||
|
as suggested by [CF metadata conventions](http://cfconventions.org/).
|
||||||
|
### Versioning and Community Collaboration
|
||||||
|
The instrument-specific dictionaries in YAML format provide a human readable interface for community-based development of instrument vocabularies. These descriptions can potentially be enhanced with semantic annotations for interoperability across research domains.
|
||||||
|
|
||||||
### Specifying a compound attribute in yaml language.
|
### Specifying a compound attribute in yaml language.
|
||||||
Consider the compound attribute *relative_humidity*, which has subattributes *value*, *units*, *range*, and *definition*. The yaml description of
|
Consider the compound attribute *relative_humidity*, which has subattributes *value*, *units*, *range*, and *definition*. The yaml description of
|
||||||
@ -126,7 +149,7 @@ relative_humidity:
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
# How to Extend DIMA’s File Reading Capabilities for New Instruments
|
# Extend DIMA’s file reading capabilities for new instruments
|
||||||
|
|
||||||
We now explain how to extend DIMA's file-reading capabilities by adding support for a new instrument. The process involves adding instrument-specific files and registering the new instrument's file reader.
|
We now explain how to extend DIMA's file-reading capabilities by adding support for a new instrument. The process involves adding instrument-specific files and registering the new instrument's file reader.
|
||||||
|
|
||||||
@ -158,7 +181,6 @@ file_extensions.append('.json')
|
|||||||
file_readers.update({'ACSM_TOFWARE_flags_json' : lambda x: read_jsonflag_as_dict(x)})
|
file_readers.update({'ACSM_TOFWARE_flags_json' : lambda x: read_jsonflag_as_dict(x)})
|
||||||
```
|
```
|
||||||
|
|
||||||
## -------------------
|
|
||||||
## Getting started
|
## Getting started
|
||||||
|
|
||||||
To make it easy for you to get started with GitLab, here's a list of recommended next steps.
|
To make it easy for you to get started with GitLab, here's a list of recommended next steps.
|
||||||
|
Reference in New Issue
Block a user