Update readme with key features of the repo.
This commit is contained in:
27
README.md
27
README.md
@ -1,24 +1,31 @@
|
||||
# DIMA: Data Integration and Metadata Annotation
|
||||
|
||||
DIMA (Data Integration and Metadata Annotation) is a Python package, developed for the Lab of Atmospheric Chemistry that supports integration of multi-instrument data in HDF5 format, collected across a wide range of experimental campaigns, from beamtimes and kinetic flowtube studies to smog chamber studies and field campaigns.
|
||||
DIMA (Data Integration and Metadata Annotation) is a Python package designed for **the Laboratory of Atmospheric Chemistry** to support the integration of multi-instrument data in HDF5 format. It is tailored for data from diverse experimental campaigns, including beamtimes, kinetic flowtube studies, smog chamber experiments, and field campaigns.
|
||||
|
||||
creation of semiformat descriptions using
|
||||
. This repository is a Python package, consisting of modules data integration, metadata annotation, data manipulation, and visualization of experimental campaign data in HDF5 files.
|
||||
## Key Features
|
||||
|
||||
DIMA provides reusable operations for data integration, manipulation, and extraction using HDF5 files. These serve as the foundation for the following higher-level operations:
|
||||
|
||||
1. **Data Integration Pipeline:** Harmonizes and integrates multi-instrument data sources by converting a human-readable campaign descriptor YAML file into a unified HDF5 format.
|
||||
|
||||
provides tools and workflows for efficient data integration and metadata management, particularly for experimental campaign data stored in the HDF5 file format.
|
||||
2. **Metadata Revision Workflow:** Updates and refines metadata through a human-in-the-loop process, optimizing HDF5 file metadata serialization in YAML format to align with conventions and develop campaign-centric vocabularies.
|
||||
|
||||
Repository for integrating data in HDF5 from various sources and managing metadata updates in the integrated files.
|
||||
3. **Visualization pipeline:**
|
||||
Generates a treemap visualization of an HDF5 file, highlighting its structure and key metadata elements.
|
||||
|
||||
Includes tools and workflows for comprehensive data integration and automated metadata review processes.
|
||||
4. **Jupyter notebooks**
|
||||
Demonstrates DIMA’s core functionalities, such as data integration, HDF5 file creation, visualization, and metadata annotation. Key notebooks include examples for data sharing, OpenBis ETL, and workflow demos.
|
||||
|
||||
## Adaptability to Experimental Campaign Needs
|
||||
|
||||
## Overview
|
||||
DIMA is a collection of reusable data operation modules and high-level workflows designed to streamline the following tasks:
|
||||
The `instruments/` module is designed to be highly adaptable, accommodating new instrument types or file reading capabilities with minimal code refactoring. The module is complemented by instrument-specific dictionaries of terms in YAML format, which facilitate automated annotation of observed variables with:
|
||||
- `standard_name`
|
||||
- `units`
|
||||
- `description`
|
||||
|
||||
- **Data Integration Pipeline**: Harmonizes and integrates diverse data sources into a unified HDF5 format.
|
||||
- **Metadata Revision Workflow**: Updates and refines metadata to ensure consistency and accuracy for experimental campaigns.
|
||||
as suggested by [CF metadata conventions](http://cfconventions.org/).
|
||||
### Versioning and Community Collaboration
|
||||
The instrument-specific dictionaries in YAML format provide a human readable interface for community-based development of instrument vocabularies. These descriptions can potentially be enhanced with semantic annotations for interoperability across research domains.
|
||||
|
||||
## Repository Structure
|
||||
|
||||
|
Reference in New Issue
Block a user