diff --git a/README.md b/README.md index 6dc5be1..1733a95 100644 --- a/README.md +++ b/README.md @@ -1,24 +1,31 @@ # DIMA: Data Integration and Metadata Annotation -DIMA (Data Integration and Metadata Annotation) is a Python package, developed for the Lab of Atmospheric Chemistry that supports integration of multi-instrument data in HDF5 format, collected across a wide range of experimental campaigns, from beamtimes and kinetic flowtube studies to smog chamber studies and field campaigns. +DIMA (Data Integration and Metadata Annotation) is a Python package designed for **the Laboratory of Atmospheric Chemistry** to support the integration of multi-instrument data in HDF5 format. It is tailored for data from diverse experimental campaigns, including beamtimes, kinetic flowtube studies, smog chamber experiments, and field campaigns. -creation of semiformat descriptions using -. This repository is a Python package, consisting of modules data integration, metadata annotation, data manipulation, and visualization of experimental campaign data in HDF5 files. +## Key Features +DIMA provides reusable operations for data integration, manipulation, and extraction using HDF5 files. These serve as the foundation for the following higher-level operations: +1. **Data Integration Pipeline:** Harmonizes and integrates multi-instrument data sources by converting a human-readable campaign descriptor YAML file into a unified HDF5 format. -provides tools and workflows for efficient data integration and metadata management, particularly for experimental campaign data stored in the HDF5 file format. +2. **Metadata Revision Workflow:** Updates and refines metadata through a human-in-the-loop process, optimizing HDF5 file metadata serialization in YAML format to align with conventions and develop campaign-centric vocabularies. -Repository for integrating data in HDF5 from various sources and managing metadata updates in the integrated files. +3. **Visualization pipeline:** + Generates a treemap visualization of an HDF5 file, highlighting its structure and key metadata elements. -Includes tools and workflows for comprehensive data integration and automated metadata review processes. +4. **Jupyter notebooks** + Demonstrates DIMA’s core functionalities, such as data integration, HDF5 file creation, visualization, and metadata annotation. Key notebooks include examples for data sharing, OpenBis ETL, and workflow demos. +## Adaptability to Experimental Campaign Needs -## Overview -DIMA is a collection of reusable data operation modules and high-level workflows designed to streamline the following tasks: +The `instruments/` module is designed to be highly adaptable, accommodating new instrument types or file reading capabilities with minimal code refactoring. The module is complemented by instrument-specific dictionaries of terms in YAML format, which facilitate automated annotation of observed variables with: + - `standard_name` + - `units` + - `description` -- **Data Integration Pipeline**: Harmonizes and integrates diverse data sources into a unified HDF5 format. -- **Metadata Revision Workflow**: Updates and refines metadata to ensure consistency and accuracy for experimental campaigns. + as suggested by [CF metadata conventions](http://cfconventions.org/). +### Versioning and Community Collaboration + The instrument-specific dictionaries in YAML format provide a human readable interface for community-based development of instrument vocabularies. These descriptions can potentially be enhanced with semantic annotations for interoperability across research domains. ## Repository Structure