Commit Graph

  • 5f6d0e4f2b Decorate readers to capture data lineage using record_data_lineage from src.meta_ops main florez_j 2025-09-20 11:02:09 +02:00
  • e96ecfa951 Implement record_data_lineage.py to be used as a parameterized decorator. This is to simplify provenance tracking on newly added file readers. florez_j 2025-09-19 19:01:08 +02:00
  • 8daa57c396 Merge branch 'main' of https://gitea.psi.ch/5505-public/dima florez_j 2025-09-19 18:56:27 +02:00
  • 11b9e35526 Rename variable actris_level --> data_level florez_j 2025-09-19 18:55:31 +02:00
  • 2a9e39b9ca update changelog for version v1.2.0 v1.2.0 florez_j 2025-06-29 08:51:32 +02:00
  • 1d2f311b1f Update README.md with conda forge instructions. florez_j 2025-06-29 08:47:29 +02:00
  • d43ead5f6c Merge branch 'main' of https://gitea.psi.ch/5505-public/dima florez_j 2025-06-29 07:57:46 +02:00
  • 978101f9c2 Refactor README to use Miniforge and conda-forge for env setup; remove unreliable shell script instructions florez_j 2025-06-29 07:53:20 +02:00
  • c40d138563 Create new version v1.1.0. Data integration pipeline now does disk space check and skips data transfer if destination files already exist. v1.1.0 florez_j 2025-06-28 16:21:26 +02:00
  • d6bb20ae7d Update pipelines/data_integration.py with new functionality. copy subtree and create hdf5 function now checks whether there is already a copy of the src directory to avoid replacing files and verifies there is enough free storage space to initiate data transfer and subsequent ingestion into hdf5. Juan Felipe Flórez Ospina 2025-06-28 14:45:06 +02:00
  • 0115745433 Add initial changelog for v1.0.0 v1.0.0 florez_j 2025-06-26 10:10:22 +02:00
  • eb853ee948 Fix bug instruments/readers/g5505_text_reader.py. The fallback format does not contain desired_format key, leading to a key error florez_j 2025-06-25 16:54:03 +02:00
  • 6c908e6686 Update src/hdf5_writer.py to record unflattened path from original folder florez_j 2025-06-25 14:11:56 +02:00
  • dacafb6234 Update src/hdf5_ops.py to allow for replicates after flattening directory structures. florez_j 2025-06-25 14:11:02 +02:00
  • ee8540e04e Fix typos in instruments/readers/g5505_text_reader.py florez_j 2025-06-25 14:09:13 +02:00
  • e6df345578 Modify instruments/readers/g5505_text_reader.py to include new instrument CEDOAS, which produces multi-format files. The updated file dependencies. florez_j 2025-06-25 12:00:55 +02:00
  • 334335387e Update to notebooks/demo_data_integration.ipynb. Step description now includes information about set up with network_mount env variable florez_j 2025-06-22 12:16:26 +02:00
  • cbcebd998a Fix typos in README.md and commplemtned some information about network drives. florez_j 2025-06-22 12:15:34 +02:00
  • e851131269 Append new functions to utils/g5505_utils.py. This search for .env file in root directory florez_j 2025-06-22 12:13:14 +02:00
  • f3ff32e049 Update to pipelines/data_integration.py. Added feature to use environment variable MOUNT_DRIVE defined in .env file. florez_j 2025-06-22 12:11:48 +02:00
  • be7cf0ba12 Re-add sanitized config files florez_j 2025-06-22 10:45:44 +02:00
  • 630189c5d7 Update input_files/campaignDescriptor3_NG.yaml input directory due changes in source directory florez_j 2025-06-22 10:42:20 +02:00
  • e6b8b60258 update gitignore florez_j 2025-06-21 20:28:49 +02:00
  • a70b012da5 Simplify output dir and file naming florez_j 2025-06-20 11:47:19 +02:00
  • cc702ee17d Change output directory to data/ for all descriptors florez_j 2025-06-20 11:45:25 +02:00
  • 177bcee400 Refactor step 1 in notebook to facilitate usage of campaign descriptors florez_j 2025-06-20 10:37:01 +02:00
  • b610b4e337 Rename yaml files in input_files/ as campaign descriptors for consistency with idear project. florez_j 2025-06-20 10:24:53 +02:00
  • f4ddd36ef2 Clean import statements florez_j 2025-06-19 20:49:36 +02:00
  • 8e6ee49188 Modify utils/g5505_utils.py. Implement handling unicode character errors. florez_j 2025-06-19 20:49:14 +02:00
  • 617a923fb6 Add processing_Script and processing_data and actris_level to output metadata florez_j 2025-06-19 20:41:17 +02:00
  • b96c04fc01 Refactor instruments/readers/g5505_text_reader.py, some code abstracted as functions to improve readabilitity. florez_j 2025-06-19 20:40:14 +02:00
  • f555f7f199 Implement skipping in convert_attrdict_to_np_structured_array(attr_value: dict) when dictionary values are not scalar. This ensures compatible values are transfered while the rest simply dicarded. florez_j 2025-06-10 16:03:01 +02:00
  • 7d710c1e62 Fix bug while reading yaml file from utils/g5505_utils.py florez_j 2025-06-10 14:38:29 +02:00
  • ab897018d9 Add exclude paths set through yaml file florez_j 2025-06-10 11:08:14 +02:00
  • 83cec97e83 Fix bug instruments/readers/structured_file_reader.py. pd.to_dict return a list of dicts so we need to handle each item seprately using a loop. florez_j 2025-06-07 19:14:53 +02:00
  • f640205b12 Add new file reader instruments/readers/structured_file_reader.py, and update registry.py and yaml florez_j 2025-06-07 18:15:41 +02:00
  • e80c19ef61 Update src/hdf5_writer.py to consider data lineage metadata in data ingestion process florez_j 2025-06-07 15:31:13 +02:00
  • a95fc1fc6a Modified output_file_directory attribute in yaml files as ../output_files/ florez_j 2025-06-07 15:30:19 +02:00
  • 87462211a9 Update instruments/readers/nasa_ames_reader.py to handle dirty text entries. Dirty entries of time variables that cannot be properly processed are sent to nat florez_j 2025-05-27 10:12:20 +02:00
  • 6851f03dbd Restore instruments/readers/nasa_ames_reader.py to previous version. florez_j 2025-05-26 19:39:21 +02:00
  • bd74f8310c Record missing values for each variable according to EBAS value convention florez_j 2025-05-21 13:53:18 +02:00
  • e4b2a4cd5a Split header in three parts and detect variables and variable descriptions added to attribute dictionary florez_j 2025-05-21 09:19:16 +02:00
  • a22532d08d Register new file reader in the reader registry system. florez_j 2025-05-14 13:51:28 +02:00
  • ad4339a76b Added new filereader dictionary pair for nasames files. This is a first version that may change. florez_j 2025-05-14 13:50:08 +02:00
  • 773b6a6fbe Fix import statement in pipelines.data_integration.py florez_j 2025-03-14 10:12:57 +01:00
  • 9276f060b0 WIP: Update contributing and acknowledgement sections. florez_j 2025-03-11 14:03:22 +01:00
  • 32abd4cd56 Implemented hdf5_file_reader.py and updated register.yaml and hdf5_writer.py. This replaces previous function __copy_file_in_group(). florez_j 2025-02-25 12:25:15 +01:00
  • 5f9f09d288 Merge branch 'feature/DB_for_FileReader_Repo' into 'main' florez_j 2025-02-25 10:48:59 +01:00
  • 295b43a89a Merge branch 'main' into 'feature/DB_for_FileReader_Repo' florez_j 2025-02-25 10:41:02 +01:00
  • 064b8b3a62 Update import statements in pipelines/data_integration.py. from instruments.readers import ... -> from instruments import ... florez_j 2025-02-25 09:21:52 +01:00
  • db4bb0ef03 Implemented create_hdf5_from_filesystem_new() using new instrument readers cml interface and subprocesses. This facilitates extension of file reading capabilities by collaborators without requiring changes to file_registry.py. Only additions in folders and registry.yaml. florez_j 2025-02-24 18:48:03 +01:00
  • 92a2560ed7 Update all file readers with command line interface so we can run them as a subprocess. Added also registry.yaml to decouple code from user-based instrument adaptations or extensions. florez_j 2025-02-24 17:27:12 +01:00
  • 9511377883 Merge branch 'main' of https://gitlab.psi.ch/5505-public/dima florez_j 2025-02-22 18:02:45 +01:00
  • 1e67745fa4 Fix import for filereader_registry.py after moving it from intruments/readers/ one level above. florez_j 2025-02-22 17:59:00 +01:00
  • 6ebc699a43 Moved filereader_registry.py outside readers folder. florez_j 2025-02-22 17:53:19 +01:00
  • bb48cfa0cd Moved filereader_registry.py outside readers folder. florez_j 2025-02-22 17:51:56 +01:00
  • 821d314cb6 Change import statements with try except to enable explicit import of submodules from import to avoid conflicts with parent project. florez_j 2025-02-22 17:10:53 +01:00
  • 8ce6f588dc Implement data_lineage_metadata.json detection and then use it to annotate associated file. florez_j 2025-02-10 15:56:34 +01:00
  • 68a9928c39 Enable boolean type columns from pandas DataFrame to be suitably converted into numpy structured array florez_j 2025-02-10 15:52:17 +01:00
  • c28286a626 Make file reader selection case insensitive by using ext.lower() and update config_text_reader.py to point to renamed dictionary. florez_j 2025-02-08 19:45:16 +01:00
  • 0b29e2ec68 remove instruments/dictionaries/ICAD_NO2.yaml. Its dict terms are now in ICAD.yaml. florez_j 2025-02-08 19:23:37 +01:00
  • 609bb0b859 Add dict terms from ICAD_NO2.yaml florez_j 2025-02-08 19:22:27 +01:00
  • ef7fe70bf0 Combine dictionaries of ICAD_HONO.yaml and ICAD_NO2.yaml into ICAD.yaml florez_j 2025-02-08 19:21:17 +01:00
  • b58e205f9f Remove skip directory condition when directory keywords are empty. Here, all paths to files should be considered. florez_j 2025-02-07 16:37:01 +01:00
  • f986edd4a5 Fix reader txt/csv default behavior. florez_j 2025-02-07 16:25:45 +01:00
  • 0d26777732 Enable instrumentFolder of form <instFolder>/<category>/ to be trasfered without flatenning florez_j 2025-02-07 16:24:21 +01:00
  • b374de60f3 Add try except block to trigger errors for invalid group names. florez_j 2025-02-06 16:07:45 +01:00
  • 2f72177410 Add constraint to match only path/to/keyword1/keyword2/files containing a composite keyword keyword1/keyword2. florez_j 2025-02-06 15:34:38 +01:00
  • 5d0ab4603f Add property to extracted dataset as dataframe. Now time column is of datetime type to facilitate downstream procesing. florez_j 2025-02-04 17:23:32 +01:00
  • 6fae139360 Implement method in hdf5 manager to infer datetime column in dataset florez_j 2025-02-04 17:13:01 +01:00
  • 32bba4239a Synch with remote repo florez_j 2025-02-03 10:31:48 +01:00
  • a3ccff4079 Fix typo in html text. florez_j 2025-01-27 13:53:59 +01:00
  • 6653add80c Update readme.md and set_up_env.sh florez_j 2025-01-27 13:29:29 +01:00
  • ef66d8f1c2 Update unload operation to remove reference and fix logic error to dataset metadata extraction. florez_j 2025-01-24 10:28:43 +01:00
  • 1ae607f73b Add validation step to yaml file validation to ensure list type and a minimun length for the 'instrument_datafolder' keyword. florez_j 2025-01-22 15:55:21 +01:00
  • 5b06548d88 Fix typo on extension items, extensions need to include a dot .json and .yaml. florez_j 2025-01-21 09:30:49 +01:00
  • 45132b42ce Add json and yaml extensions to admissible file extension lists. florez_j 2025-01-21 08:57:38 +01:00
  • dd3ebcfe6d Updated to cleared jupyter notebooks florez_j 2025-01-14 14:46:43 +01:00
  • 4f85ca1ad6 Added comments to explain configuration parameters/or variables. florez_j 2025-01-14 14:25:53 +01:00
  • 2bd2e89134 Add directory tree structure description. florez_j 2024-12-04 17:20:35 +01:00
  • 600899dca2 Update .gitignore with output_files/ florez_j 2024-12-04 16:53:57 +01:00
  • 4a91785efb Add .gitkeep and keep this folder empty. it is only to be used for local processing florez_j 2024-12-04 16:52:50 +01:00
  • 49dff5b87b Update readme with getting started section florez_j 2024-12-04 16:24:14 +01:00
  • 3e37854445 Solved binary incompatibility issue of generated environment by conda installing h5py and numpy from conda-forge or default channels. florez_j 2024-12-04 16:15:42 +01:00
  • 112b88e31f Updated bash script and yml env file to set up python interpreter. florez_j 2024-12-04 13:52:35 +01:00
  • d422067223 Update to readme.md florez_j 2024-12-03 13:55:45 +01:00
  • 47c9bd8e3d Update readme with key features of the repo. florez_j 2024-12-03 13:50:53 +01:00
  • 31d8af6aef Updated README.md with software arquitecture figure florez_j 2024-12-02 17:28:22 +01:00
  • b2455b2456 Updated README.md with software arquitecture figure florez_j 2024-12-02 17:24:48 +01:00
  • 6bc89ebff0 Updated README florez_j 2024-12-02 17:22:52 +01:00
  • 1a941bbb6e Updated figure name. florez_j 2024-12-02 17:08:36 +01:00
  • c57bb62ebd Updated ci runner pipeline fot gitlab page florez_j 2024-12-02 16:31:49 +01:00
  • f3bb82a937 Updated documentation and built doc website florez_j 2024-12-02 16:31:03 +01:00
  • 23e0ced190 Relocated to visualization module florez_j 2024-12-02 15:39:41 +01:00
  • fc561a6068 Add __init__.py florez_j 2024-12-02 15:36:03 +01:00
  • c373c18062 Moved hdf5_lib.py to visualization folder florez_j 2024-12-02 15:34:44 +01:00
  • f2df2ced66 Removed no longer useful notebook florez_j 2024-12-02 15:32:57 +01:00
  • 4797c8e894 Added env file specification and bash script for env setup florez_j 2024-12-02 15:10:21 +01:00
  • 2e52109bee removed review folder. This is now supposed to be create for review of experimental campaign data objects metadata. florez_j 2024-12-02 14:32:34 +01:00
  • dccf64ef30 Configure GitLab Pages florez_j 2024-11-26 13:43:09 +01:00