diff --git a/README.md b/README.md index 5564f74..e357a35 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,6 @@ # DIMA: Data Integration and Metadata Annotation -DIMA (Data Integration and Metadata Annotation) is a Python package designed for the **Laboratory of Atmospheric Chemistry** to support the integration of multi-instrument data in HDF5 format. It is tailored for data from diverse experimental campaigns, including: - -- **beamtimes**, -- **kinetic flowtube studies**, -- **smog chamber experiments**, and -- **field campaigns**. +DIMA (Data Integration and Metadata Annotation) is a Python package designed for the **Laboratory of Atmospheric Chemistry** to support the integration of multi-instrument data in HDF5 format. It is tailored for data from diverse experimental campaigns, including **beamtimes**, **kinetic flowtube studies**, **smog chamber experiments**, and **field campaigns**. ## Key Features @@ -21,44 +16,63 @@ DIMA provides reusable operations for data integration, manipulation, and extrac 4. **Jupyter notebooks** Demonstrates DIMA’s core functionalities, such as data integration, HDF5 file creation, visualization, and metadata annotation. Key notebooks include examples for data sharing, OpenBis ETL, and workflow demos. -## Adaptability to Experimental Campaign Needs - -The `instruments/` module is designed to be highly adaptable, accommodating new instrument types or file reading capabilities with minimal code refactoring. The module is complemented by instrument-specific dictionaries of terms in YAML format, which facilitate automated annotation of observed variables with: - - `standard_name` - - `units` - - `description` - - as suggested by [CF metadata conventions](http://cfconventions.org/). -### Versioning and Community Collaboration - The instrument-specific dictionaries in YAML format provide a human readable interface for community-based development of instrument vocabularies. These descriptions can potentially be enhanced with semantic annotations for interoperability across research domains. - ## Repository Structure + +## Requirements + +For **Windows** users, the following are required: + +1. **Git Bash**: Install [Git Bash](https://git-scm.com/downloads) to run shell scripts (`.sh` files). + +2. **Conda**: Install [Anaconda](https://www.anaconda.com/products/individual) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html). + +3. **PSI Network Access**: Ensure access to PSI’s network and access rights to source drives for retrieving campaign data from YAML files in the `input_files/` folder. + +:bulb: **Tip**: Editing your system’s PATH variable ensures both Conda and Git are available in the terminal environment used by Git Bash. + + +## Getting Started + +### Download DIMA + +Open a **Git Bash** terminal. + +Navigate to your `GitLab` folder, clone the repository, and navigate to the `dima` folder as follows: + + ```bash + cd path/to/GitLab + git clone --recurse-submodules https://gitlab.psi.ch/5505/dima.git + cd dima + ``` + +### Install Python Interpreter + +Open **Git Bash** terminal. + +**Option 1**: Install a suitable conda environment `pyenv5505` inside the repository `dima` as follows: + + ```bash + cd path/to/GitLab/dima + Bash setup_env.sh + ``` + +Open **Anaconda Prompt** or a terminal with access to conda. + +**Option 2**: Install conda enviroment from YAML file as follows: + ```bash + cd path/to/GitLab/dima + conda env create --file environment.yml + ``` + ## Software arquitecture

Alt Text

-## Installation - -Follow these steps to install and set up the project: - -1. Download our GitLab repository in your GitLab folder, or alternatively open a Git Bash terminal and run the following commands: - ``` - cd Path/to/GitLab - git clone https://gitlab.psi.ch/5505/data-integration-and-metadata-annotation.git - ``` - -2. Open an Anaconda Prompt (Anaconda3) as administrator, and set the current directory to the path of the project's folder. - -3. Create the project's environment `multiphase_chemistry_env` by running the following command: - ``` - conda env create -f environment.yml - ``` - -### Working with Jupyter Notebook on the `multiphase_chemistry_env` +## Working with Jupyter Notebook on the `multiphase_chemistry_env` 1. Open an Anaconda Prompt as a regular user, ensure that `multiphase_chemistry_env` is in the list of available enviroments and activate it by running the following commands: ``` @@ -99,7 +113,16 @@ and select the `multiphase_chemistry_env` environment from the kernel options. | processing_filename | - | Denotes the name of the file used to process an initial version (e.g, original version) of the dataset into a processed dataset. | | processing_date | - | The date when the data processing was completed. | | +## Adaptability to Experimental Campaign Needs +The `instruments/` module is designed to be highly adaptable, accommodating new instrument types or file reading capabilities with minimal code refactoring. The module is complemented by instrument-specific dictionaries of terms in YAML format, which facilitate automated annotation of observed variables with: + - `standard_name` + - `units` + - `description` + + as suggested by [CF metadata conventions](http://cfconventions.org/). +### Versioning and Community Collaboration + The instrument-specific dictionaries in YAML format provide a human readable interface for community-based development of instrument vocabularies. These descriptions can potentially be enhanced with semantic annotations for interoperability across research domains. ### Specifying a compound attribute in yaml language. Consider the compound attribute *relative_humidity*, which has subattributes *value*, *units*, *range*, and *definition*. The yaml description of @@ -126,7 +149,7 @@ relative_humidity: ``` -# How to Extend DIMA’s File Reading Capabilities for New Instruments +# Extend DIMA’s file reading capabilities for new instruments We now explain how to extend DIMA's file-reading capabilities by adding support for a new instrument. The process involves adding instrument-specific files and registering the new instrument's file reader. @@ -158,7 +181,6 @@ file_extensions.append('.json') file_readers.update({'ACSM_TOFWARE_flags_json' : lambda x: read_jsonflag_as_dict(x)}) ``` -## ------------------- ## Getting started To make it easy for you to get started with GitLab, here's a list of recommended next steps.