Commit Graph

238 Commits

Author SHA1 Message Date
florez_j e7ed6145f0 Implemented a data extraction module to access data from an hdf5 file in the form of dataframes. 2024-06-11 10:38:04 +02:00
florez_j a410bde23e Removed data table split into categorical and numerical variables and numering is only introduce to disambiguate repeated columns. 2024-06-10 16:18:51 +02:00
florez_j 1ec7ad76ff Removed additional numbering from some intrument specifications. These are now only added if the column names are ambigous. 2024-06-10 16:14:13 +02:00
florez_j 726e9b3503 Fixed bug in the case where data_integration_mode = 'collection'. 2024-06-07 16:45:00 +02:00
florez_j dba5bc9ea7 Updated instrument names from ICAD/HONO and ICAD/NO2 to HONO and NO2. 2024-06-07 16:41:41 +02:00
florez_j 197ad0288a Updated file reader and data integration with datastart and dataend properties. 2024-06-04 13:37:20 +02:00
florez_j 9dcc757acc renamed folder src/instrument_descriptions/ to src/intruments/ and moved text_data_sources.yaml in there. 2024-06-04 10:54:09 +02:00
florez_j a6ddb24eeb Added .strip to column names to remove unwanted characters (\r|\t|\n) and included units description to timestamps. 2024-06-04 09:57:37 +02:00
florez_j fa2990527e Simplified and documented parse_attribute function. 2024-06-04 09:51:12 +02:00
florez_j 014bd14fcd Modified temperature units from °C to Celcius for simpler string encoding. It seems ascii codec cannot encode such a character 2024-06-04 09:44:09 +02:00
florez_j 385267a98f Updated treemap visualization to select only root metadata, which is of string type. 2024-06-03 14:17:42 +02:00
florez_j 560481610c Updated root metadata display in treemaps 2024-06-02 16:43:54 +02:00
florez_j c74b6c1a91 Updated instrument attributes with datetime_format and desired_format. 2024-06-02 16:14:30 +02:00
florez_j 1054367f12 Modified annotate_root_dir function. 2024-06-02 16:02:48 +02:00
florez_j d335836a7d Updated reader to standardize timestamps to a desired format when possible. The desired format is set in text_data_sources.yaml. 2024-06-02 15:59:01 +02:00
florez_j 69f3857936 Implemented functions for data extraction from hdf5 files. 2024-05-31 12:39:10 +02:00
florez_j e6de1ff55d Incorporated jupyter notebook of simple example metadata annotation workflow. 2024-05-30 12:24:12 +02:00
florez_j 4de7834a91 Updated readme file 2024-05-30 12:21:17 +02:00
florez_j 76bffc6afe Updated notebook documentation and included an example metadata annotation notebook. 2024-05-30 12:20:34 +02:00
florez_j a0318681be Removed html file no longer useful. 2024-05-30 12:18:28 +02:00
florez_j 922bb3ca64 Updated YAML config file parsing logic to account for changes in config file description. 2024-05-30 12:16:54 +02:00
florez_j 7f423ccc6f Decomposed experiment_data into experiment_startdate and experiment_enddate. 2024-05-30 12:15:49 +02:00
florez_j 3a9aede909 Made def third_update_hdf5_file_with_review more modular by separating data update and git operations, resulting new functions that can be reused in less restrictive matadata annotation contexts. 2024-05-29 15:26:48 +02:00
florez_j ef7c6c9efb Implemented a git operations module for automated git ops, based on subprocess. 2024-05-29 15:17:09 +02:00
florez_j 146981379f Updated readme file. 2024-05-29 11:24:46 +02:00
florez_j 71f284f709 Updated readme file 2024-05-29 11:23:33 +02:00
florez_j 4ffd790059 Updated project name in configuration file 2024-05-28 15:06:25 +02:00
florez_j dad5e082f1 Changed ordering of data integration config files so that they align with our experimental campaign hierarchy. 2024-05-28 14:43:32 +02:00
florez_j a86fc97605 Refactored due to updates in the file reader function. 2024-05-28 14:41:34 +02:00
florez_j 3de6abce50 added the feature to activate or deactivate data copying before reading the input file. This is to avoid redundant copying when we are already working on file copies. 2024-05-28 14:40:14 +02:00
florez_j fd1c6461bb Updated some of the raname_as metadata for all instruments so that it is much machine readable and perhpas be used as an alternative to the original name in future releases. 2024-05-28 14:37:43 +02:00
florez_j 804ea52583 Modified function to return list of paths when config_file.yaml integration mode = experimental step. 2024-05-28 11:29:32 +02:00
florez_j f6a46168ec Improved parsing from HDF5 attr dict to yaml compatible dict. Now we can parse HDF5 compound attributes (structured np arrays). 2024-05-28 11:27:44 +02:00
florez_j 41c7660be3 Enhanced data transfer progress visualization and logging 2024-05-28 08:59:29 +02:00
florez_j 08d58557df Fixed bug that didnot allowed analythical_methods composite keywords (e.g., ICAD/HONO) to be matched in intrument configurations. 2024-05-28 08:57:57 +02:00
florez_j 3270ce5ed7 Implemented reader file compatibility check. 2024-05-27 18:22:16 +02:00
florez_j 2911416431 Improved modularity of hdf5_file creation by creating a function that copies the intput directory file and applies directory, files, and extensions constraints before regular directory to hdf5 transfer. See [200~def copy_directory_with_contraints(input_dir_path, output_dir_path, select_dir_keywords, select_file_keywords, allowed_file_extensions): 2024-05-27 18:15:08 +02:00
florez_j 24a2d5d37e Refactored list to array conversion using metadata_rewiew_lib 2024-05-26 15:04:07 +02:00
florez_j 77afbbbf8f Added function to convert list of strings into a np.array of bytes. This is useful to create list-valued attributes in HDF5. 2024-05-26 14:56:36 +02:00
florez_j 88572b44b1 Fixed buggy statement. import datetime ... followed by datetime.now() was fixed as datetime.datetime.now(). 2024-05-26 12:26:54 +02:00
florez_j 37071945f5 Removed hdf5 file creation redundancy by creating a helper function create_HDF5_file(date_str,select_file_keywords), which handles variations in date_str and keywords. 2024-05-26 12:24:15 +02:00
florez_j 4dc09339b5 Replaced lambda function with regular function and fstring for better readability and debugging 2024-05-26 11:39:40 +02:00
florez_j b7f9bfe149 Replaced print statement with logging and raise exception for better error handling and managment 2024-05-26 11:34:20 +02:00
florez_j ac37235072 Added function setup_logging to configure logger to record logs in specified output directory. 2024-05-26 11:19:54 +02:00
florez_j 8f1a82c00d updated env file 2024-05-24 15:55:49 +02:00
florez_j c7051bfe69 updated readme and reader to handle ignore ascii character errors 2024-05-24 15:55:15 +02:00
florez_j 9329f39deb Deleted output no longer returned in data integration pipeline 2024-05-24 14:55:08 +02:00
florez_j b5ed1cb826 Updated readme file 2024-05-24 11:56:30 +02:00
florez_j 005e855e48 Updated configuration file organization and workflow description. 2024-05-24 11:15:05 +02:00
florez_j 784cb1eb62 Commented out openia python module. 2024-05-24 10:54:15 +02:00