489 Commits

Author SHA1 Message Date
5f6d0e4f2b Decorate readers to capture data lineage using record_data_lineage from src.meta_ops 2025-09-20 11:02:09 +02:00
e96ecfa951 Implement record_data_lineage.py to be used as a parameterized decorator. This is to simplify provenance tracking on newly added file readers. 2025-09-19 19:01:08 +02:00
8daa57c396 Merge branch 'main' of https://gitea.psi.ch/5505-public/dima 2025-09-19 18:56:27 +02:00
11b9e35526 Rename variable actris_level --> data_level 2025-09-19 18:55:31 +02:00
2a9e39b9ca update changelog for version v1.2.0 v1.2.0 2025-06-29 08:51:32 +02:00
1d2f311b1f Update README.md with conda forge instructions. 2025-06-29 08:47:29 +02:00
d43ead5f6c Merge branch 'main' of https://gitea.psi.ch/5505-public/dima 2025-06-29 07:57:46 +02:00
978101f9c2 Refactor README to use Miniforge and conda-forge for env setup; remove unreliable shell script instructions 2025-06-29 07:53:20 +02:00
c40d138563 Create new version v1.1.0. Data integration pipeline now does disk space check and skips data transfer if destination files already exist. v1.1.0 2025-06-28 16:21:26 +02:00
Juan Felipe Flórez Ospina
d6bb20ae7d Update pipelines/data_integration.py with new functionality. copy subtree and create hdf5 function now checks whether there is already a copy of the src directory to avoid replacing files and verifies there is enough free storage space to initiate data transfer and subsequent ingestion into hdf5. 2025-06-28 14:45:06 +02:00
0115745433 Add initial changelog for v1.0.0 v1.0.0 2025-06-26 10:10:22 +02:00
eb853ee948 Fix bug instruments/readers/g5505_text_reader.py. The fallback format does not contain desired_format key, leading to a key error 2025-06-25 16:54:03 +02:00
6c908e6686 Update src/hdf5_writer.py to record unflattened path from original folder 2025-06-25 14:11:56 +02:00
dacafb6234 Update src/hdf5_ops.py to allow for replicates after flattening directory structures. 2025-06-25 14:11:02 +02:00
ee8540e04e Fix typos in instruments/readers/g5505_text_reader.py 2025-06-25 14:09:13 +02:00
e6df345578 Modify instruments/readers/g5505_text_reader.py to include new instrument CEDOAS, which produces multi-format files. The updated file dependencies. 2025-06-25 12:00:55 +02:00
334335387e Update to notebooks/demo_data_integration.ipynb. Step description now includes information about set up with network_mount env variable 2025-06-22 12:16:26 +02:00
cbcebd998a Fix typos in README.md and commplemtned some information about network drives. 2025-06-22 12:15:34 +02:00
e851131269 Append new functions to utils/g5505_utils.py. This search for .env file in root directory 2025-06-22 12:13:14 +02:00
f3ff32e049 Update to pipelines/data_integration.py. Added feature to use environment variable MOUNT_DRIVE defined in .env file. 2025-06-22 12:11:48 +02:00
be7cf0ba12 Re-add sanitized config files
- Replaced sensitive server paths with placeholders
- Ensure .env is used to provide values
2025-06-22 10:45:44 +02:00
630189c5d7 Update input_files/campaignDescriptor3_NG.yaml input directory due changes in source directory 2025-06-22 10:42:20 +02:00
e6b8b60258 update gitignore 2025-06-21 20:28:49 +02:00
a70b012da5 Simplify output dir and file naming 2025-06-20 11:47:19 +02:00
cc702ee17d Change output directory to data/ for all descriptors 2025-06-20 11:45:25 +02:00
177bcee400 Refactor step 1 in notebook to facilitate usage of campaign descriptors 2025-06-20 10:37:01 +02:00
b610b4e337 Rename yaml files in input_files/ as campaign descriptors for consistency with idear project. 2025-06-20 10:24:53 +02:00
f4ddd36ef2 Clean import statements 2025-06-19 20:49:36 +02:00
8e6ee49188 Modify utils/g5505_utils.py. Implement handling unicode character errors. 2025-06-19 20:49:14 +02:00
617a923fb6 Add processing_Script and processing_data and actris_level to output metadata 2025-06-19 20:41:17 +02:00
b96c04fc01 Refactor instruments/readers/g5505_text_reader.py, some code abstracted as functions to improve readabilitity. 2025-06-19 20:40:14 +02:00
f555f7f199 Implement skipping in convert_attrdict_to_np_structured_array(attr_value: dict) when dictionary values are not scalar. This ensures compatible values are transfered while the rest simply dicarded. 2025-06-10 16:03:01 +02:00
7d710c1e62 Fix bug while reading yaml file from utils/g5505_utils.py 2025-06-10 14:38:29 +02:00
ab897018d9 Add exclude paths set through yaml file 2025-06-10 11:08:14 +02:00
83cec97e83 Fix bug instruments/readers/structured_file_reader.py. pd.to_dict return a list of dicts so we need to handle each item seprately using a loop. 2025-06-07 19:14:53 +02:00
f640205b12 Add new file reader instruments/readers/structured_file_reader.py, and update registry.py and yaml 2025-06-07 18:15:41 +02:00
e80c19ef61 Update src/hdf5_writer.py to consider data lineage metadata in data ingestion process 2025-06-07 15:31:13 +02:00
a95fc1fc6a Modified output_file_directory attribute in yaml files as ../output_files/ 2025-06-07 15:30:19 +02:00
87462211a9 Update instruments/readers/nasa_ames_reader.py to handle dirty text entries. Dirty entries of time variables that cannot be properly processed are sent to nat 2025-05-27 10:12:20 +02:00
6851f03dbd Restore instruments/readers/nasa_ames_reader.py to previous version. 2025-05-26 19:39:21 +02:00
bd74f8310c Record missing values for each variable according to EBAS value convention 2025-05-21 13:53:18 +02:00
e4b2a4cd5a Split header in three parts and detect variables and variable descriptions added to attribute dictionary 2025-05-21 09:19:16 +02:00
a22532d08d Register new file reader in the reader registry system. 2025-05-14 13:51:28 +02:00
ad4339a76b Added new filereader dictionary pair for nasames files. This is a first version that may change. 2025-05-14 13:50:08 +02:00
773b6a6fbe Fix import statement in pipelines.data_integration.py 2025-03-14 10:12:57 +01:00
9276f060b0 WIP: Update contributing and acknowledgement sections. 2025-03-11 14:03:22 +01:00
32abd4cd56 Implemented hdf5_file_reader.py and updated register.yaml and hdf5_writer.py. This replaces previous function __copy_file_in_group(). 2025-02-25 12:25:15 +01:00
5f9f09d288 Merge branch 'feature/DB_for_FileReader_Repo' into 'main'
Restructuring of file reader system to process multi-instrument data folders.

See merge request 5505-public/dima!3
2025-02-25 10:48:59 +01:00
295b43a89a Merge branch 'main' into 'feature/DB_for_FileReader_Repo'
# Conflicts:
#   instruments/filereader_registry.py
#   pipelines/data_integration.py
#   src/hdf5_writer.py
2025-02-25 10:41:02 +01:00
064b8b3a62 Update import statements in pipelines/data_integration.py. from instruments.readers import ... -> from instruments import ... 2025-02-25 09:21:52 +01:00