Commit Graph

480 Commits

Author SHA1 Message Date
c40d138563 Create new version v1.1.0. Data integration pipeline now does disk space check and skips data transfer if destination files already exist. v1.1.0 2025-06-28 16:21:26 +02:00
0115745433 Add initial changelog for v1.0.0 v1.0.0 2025-06-26 10:10:22 +02:00
eb853ee948 Fix bug instruments/readers/g5505_text_reader.py. The fallback format does not contain desired_format key, leading to a key error 2025-06-25 16:54:03 +02:00
6c908e6686 Update src/hdf5_writer.py to record unflattened path from original folder 2025-06-25 14:11:56 +02:00
dacafb6234 Update src/hdf5_ops.py to allow for replicates after flattening directory structures. 2025-06-25 14:11:02 +02:00
ee8540e04e Fix typos in instruments/readers/g5505_text_reader.py 2025-06-25 14:09:13 +02:00
e6df345578 Modify instruments/readers/g5505_text_reader.py to include new instrument CEDOAS, which produces multi-format files. The updated file dependencies. 2025-06-25 12:00:55 +02:00
334335387e Update to notebooks/demo_data_integration.ipynb. Step description now includes information about set up with network_mount env variable 2025-06-22 12:16:26 +02:00
cbcebd998a Fix typos in README.md and commplemtned some information about network drives. 2025-06-22 12:15:34 +02:00
e851131269 Append new functions to utils/g5505_utils.py. This search for .env file in root directory 2025-06-22 12:13:14 +02:00
f3ff32e049 Update to pipelines/data_integration.py. Added feature to use environment variable MOUNT_DRIVE defined in .env file. 2025-06-22 12:11:48 +02:00
be7cf0ba12 Re-add sanitized config files
- Replaced sensitive server paths with placeholders
- Ensure .env is used to provide values
2025-06-22 10:45:44 +02:00
630189c5d7 Update input_files/campaignDescriptor3_NG.yaml input directory due changes in source directory 2025-06-22 10:42:20 +02:00
e6b8b60258 update gitignore 2025-06-21 20:28:49 +02:00
a70b012da5 Simplify output dir and file naming 2025-06-20 11:47:19 +02:00
cc702ee17d Change output directory to data/ for all descriptors 2025-06-20 11:45:25 +02:00
177bcee400 Refactor step 1 in notebook to facilitate usage of campaign descriptors 2025-06-20 10:37:01 +02:00
b610b4e337 Rename yaml files in input_files/ as campaign descriptors for consistency with idear project. 2025-06-20 10:24:53 +02:00
f4ddd36ef2 Clean import statements 2025-06-19 20:49:36 +02:00
8e6ee49188 Modify utils/g5505_utils.py. Implement handling unicode character errors. 2025-06-19 20:49:14 +02:00
617a923fb6 Add processing_Script and processing_data and actris_level to output metadata 2025-06-19 20:41:17 +02:00
b96c04fc01 Refactor instruments/readers/g5505_text_reader.py, some code abstracted as functions to improve readabilitity. 2025-06-19 20:40:14 +02:00
f555f7f199 Implement skipping in convert_attrdict_to_np_structured_array(attr_value: dict) when dictionary values are not scalar. This ensures compatible values are transfered while the rest simply dicarded. 2025-06-10 16:03:01 +02:00
7d710c1e62 Fix bug while reading yaml file from utils/g5505_utils.py 2025-06-10 14:38:29 +02:00
ab897018d9 Add exclude paths set through yaml file 2025-06-10 11:08:14 +02:00
83cec97e83 Fix bug instruments/readers/structured_file_reader.py. pd.to_dict return a list of dicts so we need to handle each item seprately using a loop. 2025-06-07 19:14:53 +02:00
f640205b12 Add new file reader instruments/readers/structured_file_reader.py, and update registry.py and yaml 2025-06-07 18:15:41 +02:00
e80c19ef61 Update src/hdf5_writer.py to consider data lineage metadata in data ingestion process 2025-06-07 15:31:13 +02:00
a95fc1fc6a Modified output_file_directory attribute in yaml files as ../output_files/ 2025-06-07 15:30:19 +02:00
87462211a9 Update instruments/readers/nasa_ames_reader.py to handle dirty text entries. Dirty entries of time variables that cannot be properly processed are sent to nat 2025-05-27 10:12:20 +02:00
6851f03dbd Restore instruments/readers/nasa_ames_reader.py to previous version. 2025-05-26 19:39:21 +02:00
bd74f8310c Record missing values for each variable according to EBAS value convention 2025-05-21 13:53:18 +02:00
e4b2a4cd5a Split header in three parts and detect variables and variable descriptions added to attribute dictionary 2025-05-21 09:19:16 +02:00
a22532d08d Register new file reader in the reader registry system. 2025-05-14 13:51:28 +02:00
ad4339a76b Added new filereader dictionary pair for nasames files. This is a first version that may change. 2025-05-14 13:50:08 +02:00
773b6a6fbe Fix import statement in pipelines.data_integration.py 2025-03-14 10:12:57 +01:00
9276f060b0 WIP: Update contributing and acknowledgement sections. 2025-03-11 14:03:22 +01:00
32abd4cd56 Implemented hdf5_file_reader.py and updated register.yaml and hdf5_writer.py. This replaces previous function __copy_file_in_group(). 2025-02-25 12:25:15 +01:00
5f9f09d288 Merge branch 'feature/DB_for_FileReader_Repo' into 'main'
Restructuring of file reader system to process multi-instrument data folders.

See merge request 5505-public/dima!3
2025-02-25 10:48:59 +01:00
295b43a89a Merge branch 'main' into 'feature/DB_for_FileReader_Repo'
# Conflicts:
#   instruments/filereader_registry.py
#   pipelines/data_integration.py
#   src/hdf5_writer.py
2025-02-25 10:41:02 +01:00
064b8b3a62 Update import statements in pipelines/data_integration.py. from instruments.readers import ... -> from instruments import ... 2025-02-25 09:21:52 +01:00
db4bb0ef03 Implemented create_hdf5_from_filesystem_new() using new instrument readers cml interface and subprocesses. This facilitates extension of file reading capabilities by collaborators without requiring changes to file_registry.py. Only additions in folders and registry.yaml. 2025-02-24 18:48:03 +01:00
92a2560ed7 Update all file readers with command line interface so we can run them as a subprocess. Added also registry.yaml to decouple code from user-based instrument adaptations or extensions. 2025-02-24 17:27:12 +01:00
9511377883 Merge branch 'main' of https://gitlab.psi.ch/5505-public/dima 2025-02-22 18:02:45 +01:00
1e67745fa4 Fix import for filereader_registry.py after moving it from intruments/readers/ one level above. 2025-02-22 17:59:00 +01:00
6ebc699a43 Moved filereader_registry.py outside readers folder. 2025-02-22 17:53:19 +01:00
bb48cfa0cd Moved filereader_registry.py outside readers folder. 2025-02-22 17:51:56 +01:00
821d314cb6 Change import statements with try except to enable explicit import of submodules from import to avoid conflicts with parent project. 2025-02-22 17:10:53 +01:00
8ce6f588dc Implement data_lineage_metadata.json detection and then use it to annotate associated file. 2025-02-10 15:56:34 +01:00
68a9928c39 Enable boolean type columns from pandas DataFrame to be suitably converted into numpy structured array 2025-02-10 15:52:17 +01:00