Commit Graph

475 Commits

Author SHA1 Message Date
9bb1d4204d Change output directory to data/ for all descriptors 2025-06-20 11:45:25 +02:00
a1b9fc1cc9 Refactor step 1 in notebook to facilitate usage of campaign descriptors 2025-06-20 10:37:01 +02:00
80d95841e1 Rename yaml files in input_files/ as campaign descriptors for consistency with idear project. 2025-06-20 10:24:53 +02:00
9aa7c0ece8 Clean import statements 2025-06-19 20:49:36 +02:00
ce403da6b0 Modify utils/g5505_utils.py. Implement handling unicode character errors. 2025-06-19 20:49:14 +02:00
c02549f013 Add processing_Script and processing_data and actris_level to output metadata 2025-06-19 20:41:17 +02:00
a10cdf2fc5 Refactor instruments/readers/g5505_text_reader.py, some code abstracted as functions to improve readabilitity. 2025-06-19 20:40:14 +02:00
d5fa2b6c71 Implement skipping in convert_attrdict_to_np_structured_array(attr_value: dict) when dictionary values are not scalar. This ensures compatible values are transfered while the rest simply dicarded. 2025-06-10 16:03:01 +02:00
671c5a5c57 Fix bug while reading yaml file from utils/g5505_utils.py 2025-06-10 14:38:29 +02:00
b31fff1da0 Add exclude paths set through yaml file 2025-06-10 11:08:14 +02:00
7b8a266057 Fix bug instruments/readers/structured_file_reader.py. pd.to_dict return a list of dicts so we need to handle each item seprately using a loop. 2025-06-07 19:14:53 +02:00
98ce166e2a Add new file reader instruments/readers/structured_file_reader.py, and update registry.py and yaml 2025-06-07 18:15:41 +02:00
33d1f20d38 Update src/hdf5_writer.py to consider data lineage metadata in data ingestion process 2025-06-07 15:31:13 +02:00
63e32403ba Modified output_file_directory attribute in yaml files as ../output_files/ 2025-06-07 15:30:19 +02:00
edabcb57f8 Update instruments/readers/nasa_ames_reader.py to handle dirty text entries. Dirty entries of time variables that cannot be properly processed are sent to nat 2025-05-27 10:12:20 +02:00
76f8b194c4 Restore instruments/readers/nasa_ames_reader.py to previous version. 2025-05-26 19:39:21 +02:00
d0c27d4414 Record missing values for each variable according to EBAS value convention 2025-05-21 13:53:18 +02:00
8b30fe5815 Split header in three parts and detect variables and variable descriptions added to attribute dictionary 2025-05-21 09:19:16 +02:00
974260f177 Register new file reader in the reader registry system. 2025-05-14 13:51:28 +02:00
ea1011a9ea Added new filereader dictionary pair for nasames files. This is a first version that may change. 2025-05-14 13:50:08 +02:00
d59967fcc4 Fix import statement in pipelines.data_integration.py 2025-03-14 10:12:57 +01:00
40f17818f2 WIP: Update contributing and acknowledgement sections. 2025-03-11 14:03:22 +01:00
6b43c95a8d Implemented hdf5_file_reader.py and updated register.yaml and hdf5_writer.py. This replaces previous function __copy_file_in_group(). 2025-02-25 12:25:15 +01:00
109be49f31 Merge branch 'feature/DB_for_FileReader_Repo' into 'main'
Restructuring of file reader system to process multi-instrument data folders.

See merge request 5505-public/dima!3
2025-02-25 10:48:59 +01:00
14b738818c Merge branch 'main' into 'feature/DB_for_FileReader_Repo'
# Conflicts:
#   instruments/filereader_registry.py
#   pipelines/data_integration.py
#   src/hdf5_writer.py
2025-02-25 10:41:02 +01:00
4f438f86fe Update import statements in pipelines/data_integration.py. from instruments.readers import ... -> from instruments import ... 2025-02-25 09:21:52 +01:00
68344964ac Implemented create_hdf5_from_filesystem_new() using new instrument readers cml interface and subprocesses. This facilitates extension of file reading capabilities by collaborators without requiring changes to file_registry.py. Only additions in folders and registry.yaml. 2025-02-24 18:48:03 +01:00
e5fdc6fa31 Update all file readers with command line interface so we can run them as a subprocess. Added also registry.yaml to decouple code from user-based instrument adaptations or extensions. 2025-02-24 17:27:12 +01:00
2cdd6925af Merge branch 'main' of https://gitlab.psi.ch/5505-public/dima 2025-02-22 18:02:45 +01:00
bc1d65d469 Fix import for filereader_registry.py after moving it from intruments/readers/ one level above. 2025-02-22 17:59:00 +01:00
85d4e39299 Moved filereader_registry.py outside readers folder. 2025-02-22 17:53:19 +01:00
02e926e003 Moved filereader_registry.py outside readers folder. 2025-02-22 17:51:56 +01:00
81be6b54c8 Change import statements with try except to enable explicit import of submodules from import to avoid conflicts with parent project. 2025-02-22 17:10:53 +01:00
df0aca97df Implement data_lineage_metadata.json detection and then use it to annotate associated file. 2025-02-10 15:56:34 +01:00
b8900cab67 Enable boolean type columns from pandas DataFrame to be suitably converted into numpy structured array 2025-02-10 15:52:17 +01:00
7906387271 Make file reader selection case insensitive by using ext.lower() and update config_text_reader.py to point to renamed dictionary. 2025-02-08 19:45:16 +01:00
cbf468f5ac remove instruments/dictionaries/ICAD_NO2.yaml. Its dict terms are now in ICAD.yaml. 2025-02-08 19:23:37 +01:00
131704dcf2 Add dict terms from ICAD_NO2.yaml 2025-02-08 19:22:27 +01:00
33aabf45fa Combine dictionaries of ICAD_HONO.yaml and ICAD_NO2.yaml into ICAD.yaml 2025-02-08 19:21:17 +01:00
3e6f6bc46e Remove skip directory condition when directory keywords are empty. Here, all paths to files should be considered. 2025-02-07 16:37:01 +01:00
1a843ee2c6 Fix reader txt/csv default behavior. 2025-02-07 16:25:45 +01:00
46ca26a983 Enable instrumentFolder of form <instFolder>/<category>/ to be trasfered without flatenning 2025-02-07 16:24:21 +01:00
36780d1a63 Add try except block to trigger errors for invalid group names. 2025-02-06 16:07:45 +01:00
5943c60216 Add constraint to match only path/to/keyword1/keyword2/files containing a composite keyword keyword1/keyword2. 2025-02-06 15:34:38 +01:00
58386ca10b Add property to extracted dataset as dataframe. Now time column is of datetime type to facilitate downstream procesing. 2025-02-04 17:23:32 +01:00
d89aebd861 Implement method in hdf5 manager to infer datetime column in dataset 2025-02-04 17:13:01 +01:00
e358d4ab64 Synch with remote repo 2025-02-03 10:31:48 +01:00
5e3f75d66b Fix typo in html text. 2025-01-27 13:53:59 +01:00
a3a1b8506c Update readme.md and set_up_env.sh 2025-01-27 13:29:29 +01:00
1b2184d8e1 Update unload operation to remove reference and fix logic error to dataset metadata extraction. 2025-01-24 10:28:43 +01:00