Update readme with key features of the repo.

This commit is contained in:
2024-12-03 13:50:53 +01:00
parent 31d8af6aef
commit 47c9bd8e3d

View File

@ -1,24 +1,31 @@
# DIMA: Data Integration and Metadata Annotation
DIMA (Data Integration and Metadata Annotation) is a Python package, developed for the Lab of Atmospheric Chemistry that supports integration of multi-instrument data in HDF5 format, collected across a wide range of experimental campaigns, from beamtimes and kinetic flowtube studies to smog chamber studies and field campaigns.
DIMA (Data Integration and Metadata Annotation) is a Python package designed for **the Laboratory of Atmospheric Chemistry** to support the integration of multi-instrument data in HDF5 format. It is tailored for data from diverse experimental campaigns, including beamtimes, kinetic flowtube studies, smog chamber experiments, and field campaigns.
creation of semiformat descriptions using
. This repository is a Python package, consisting of modules data integration, metadata annotation, data manipulation, and visualization of experimental campaign data in HDF5 files.
## Key Features
DIMA provides reusable operations for data integration, manipulation, and extraction using HDF5 files. These serve as the foundation for the following higher-level operations:
1. **Data Integration Pipeline:** Harmonizes and integrates multi-instrument data sources by converting a human-readable campaign descriptor YAML file into a unified HDF5 format.
provides tools and workflows for efficient data integration and metadata management, particularly for experimental campaign data stored in the HDF5 file format.
2. **Metadata Revision Workflow:** Updates and refines metadata through a human-in-the-loop process, optimizing HDF5 file metadata serialization in YAML format to align with conventions and develop campaign-centric vocabularies.
Repository for integrating data in HDF5 from various sources and managing metadata updates in the integrated files.
3. **Visualization pipeline:**
Generates a treemap visualization of an HDF5 file, highlighting its structure and key metadata elements.
Includes tools and workflows for comprehensive data integration and automated metadata review processes.
4. **Jupyter notebooks**
Demonstrates DIMAs core functionalities, such as data integration, HDF5 file creation, visualization, and metadata annotation. Key notebooks include examples for data sharing, OpenBis ETL, and workflow demos.
## Adaptability to Experimental Campaign Needs
## Overview
DIMA is a collection of reusable data operation modules and high-level workflows designed to streamline the following tasks:
The `instruments/` module is designed to be highly adaptable, accommodating new instrument types or file reading capabilities with minimal code refactoring. The module is complemented by instrument-specific dictionaries of terms in YAML format, which facilitate automated annotation of observed variables with:
- `standard_name`
- `units`
- `description`
- **Data Integration Pipeline**: Harmonizes and integrates diverse data sources into a unified HDF5 format.
- **Metadata Revision Workflow**: Updates and refines metadata to ensure consistency and accuracy for experimental campaigns.
as suggested by [CF metadata conventions](http://cfconventions.org/).
### Versioning and Community Collaboration
The instrument-specific dictionaries in YAML format provide a human readable interface for community-based development of instrument vocabularies. These descriptions can potentially be enhanced with semantic annotations for interoperability across research domains.
## Repository Structure