Update readme with key features of the repo.

2024-12-03 13:50:53 +01:00
parent 31d8af6aef
commit 47c9bd8e3d
1 changed files with 17 additions and 10 deletions
--- a/README.md
+++ b/README.md
@ -1,24 +1,31 @@
 # DIMA: Data Integration and Metadata Annotation

-DIMA (Data Integration and Metadata Annotation) is a Python package, developed for the Lab of Atmospheric Chemistry that supports integration of multi-instrument data in HDF5 format, collected across a wide range of experimental campaigns, from beamtimes and kinetic flowtube studies to smog chamber studies and field campaigns. 
+DIMA (Data Integration and Metadata Annotation) is a Python package designed for **the Laboratory of Atmospheric Chemistry** to support the integration of multi-instrument data in HDF5 format. It is tailored for data from diverse experimental campaigns, including beamtimes, kinetic flowtube studies, smog chamber experiments, and field campaigns.

-creation of semiformat descriptions using
-. This repository is a Python package, consisting of modules data integration, metadata annotation, data manipulation, and visualization of experimental campaign data in HDF5 files.
+## Key Features

+DIMA provides reusable operations for data integration, manipulation, and extraction using HDF5 files. These serve as the foundation for the following higher-level operations:

+1. **Data Integration Pipeline:** Harmonizes and integrates multi-instrument data sources by converting a human-readable campaign descriptor YAML file into a unified HDF5 format.

-provides tools and workflows for efficient data integration and metadata management, particularly for experimental campaign data stored in the HDF5 file format.
+2. **Metadata Revision Workflow:** Updates and refines metadata through a human-in-the-loop process, optimizing HDF5 file metadata serialization in YAML format to align with conventions and develop campaign-centric vocabularies. 

-Repository for integrating data in HDF5 from various sources and managing metadata updates in the integrated files. 
+3. **Visualization pipeline:**
+  Generates a treemap visualization of an HDF5 file, highlighting its structure and key metadata elements.

-Includes tools and workflows for comprehensive data integration and automated metadata review processes.
+4. **Jupyter notebooks**
+  Demonstrates DIMA’s core functionalities, such as data integration, HDF5 file creation, visualization, and metadata annotation. Key notebooks include examples for data sharing, OpenBis ETL, and workflow demos. 

+## Adaptability to Experimental Campaign Needs 

-## Overview
-DIMA is a collection of reusable data operation modules and high-level workflows designed to streamline the following tasks:
+The `instruments/` module is designed to be highly adaptable, accommodating new instrument types or file reading capabilities with minimal code refactoring. The module is complemented by instrument-specific dictionaries of terms in YAML format, which facilitate automated annotation of observed variables with:
+  - `standard_name`
+  - `units`
+  - `description`  

- **Data Integration Pipeline**: Harmonizes and integrates diverse data sources into a unified HDF5 format.
- **Metadata Revision Workflow**: Updates and refines metadata to ensure consistency and accuracy for experimental campaigns.
+  as suggested by [CF metadata conventions](http://cfconventions.org/).
+### Versioning and Community Collaboration
+   The instrument-specific dictionaries in YAML format provide a human readable interface for community-based development of instrument vocabularies. These descriptions can potentially be enhanced with semantic annotations for interoperability across research domains.  

 ## Repository Structure