Commit Graph

100 Commits

Author SHA1 Message Date
florez_j 922bb3ca64 Updated YAML config file parsing logic to account for changes in config file description. 2024-05-30 12:16:54 +02:00
florez_j 3a9aede909 Made def third_update_hdf5_file_with_review more modular by separating data update and git operations, resulting new functions that can be reused in less restrictive matadata annotation contexts. 2024-05-29 15:26:48 +02:00
florez_j ef7c6c9efb Implemented a git operations module for automated git ops, based on subprocess. 2024-05-29 15:17:09 +02:00
florez_j a86fc97605 Refactored due to updates in the file reader function. 2024-05-28 14:41:34 +02:00
florez_j 3de6abce50 added the feature to activate or deactivate data copying before reading the input file. This is to avoid redundant copying when we are already working on file copies. 2024-05-28 14:40:14 +02:00
florez_j fd1c6461bb Updated some of the raname_as metadata for all instruments so that it is much machine readable and perhpas be used as an alternative to the original name in future releases. 2024-05-28 14:37:43 +02:00
florez_j 804ea52583 Modified function to return list of paths when config_file.yaml integration mode = experimental step. 2024-05-28 11:29:32 +02:00
florez_j f6a46168ec Improved parsing from HDF5 attr dict to yaml compatible dict. Now we can parse HDF5 compound attributes (structured np arrays). 2024-05-28 11:27:44 +02:00
florez_j 41c7660be3 Enhanced data transfer progress visualization and logging 2024-05-28 08:59:29 +02:00
florez_j 08d58557df Fixed bug that didnot allowed analythical_methods composite keywords (e.g., ICAD/HONO) to be matched in intrument configurations. 2024-05-28 08:57:57 +02:00
florez_j 3270ce5ed7 Implemented reader file compatibility check. 2024-05-27 18:22:16 +02:00
florez_j 2911416431 Improved modularity of hdf5_file creation by creating a function that copies the intput directory file and applies directory, files, and extensions constraints before regular directory to hdf5 transfer. See [200~def copy_directory_with_contraints(input_dir_path, output_dir_path, select_dir_keywords, select_file_keywords, allowed_file_extensions): 2024-05-27 18:15:08 +02:00
florez_j 24a2d5d37e Refactored list to array conversion using metadata_rewiew_lib 2024-05-26 15:04:07 +02:00
florez_j 77afbbbf8f Added function to convert list of strings into a np.array of bytes. This is useful to create list-valued attributes in HDF5. 2024-05-26 14:56:36 +02:00
florez_j 88572b44b1 Fixed buggy statement. import datetime ... followed by datetime.now() was fixed as datetime.datetime.now(). 2024-05-26 12:26:54 +02:00
florez_j 37071945f5 Removed hdf5 file creation redundancy by creating a helper function create_HDF5_file(date_str,select_file_keywords), which handles variations in date_str and keywords. 2024-05-26 12:24:15 +02:00
florez_j 4dc09339b5 Replaced lambda function with regular function and fstring for better readability and debugging 2024-05-26 11:39:40 +02:00
florez_j b7f9bfe149 Replaced print statement with logging and raise exception for better error handling and managment 2024-05-26 11:34:20 +02:00
florez_j ac37235072 Added function setup_logging to configure logger to record logs in specified output directory. 2024-05-26 11:19:54 +02:00
florez_j c7051bfe69 updated readme and reader to handle ignore ascii character errors 2024-05-24 15:55:15 +02:00
florez_j 9329f39deb Deleted output no longer returned in data integration pipeline 2024-05-24 14:55:08 +02:00
florez_j d000a8348f Added bottom level instrument metadata descriptions such as units and description. 2024-05-24 09:50:25 +02:00
florez_j 8d4f4e68c7 Removed yaml file output from data integration file. The creation of this file is being outsource to data store repo 2024-05-24 09:32:30 +02:00
florez_j 88de88c316 Removed creation of yaml file subsequent to data integration. This can cause misalignment with data store. I think the yaml snapshot of a hdf5 file should therefore be outsourced there. 2024-05-24 09:30:24 +02:00
florez_j 1537633b1a Made a few optimizations to code and documentation. Expressions relying on list comprehensions were simplified with generator expressions. ex,: any([keyword in filename for keyword in select_file_keywords]) was simplified to any(keyword in filename for keyword in select_file_keywords). 2024-05-24 09:06:07 +02:00
florez_j d574ac382d Replaced attribute table_header in Lopap configuration file with a shorter version which is consistent accross more files. Some of the headers might change. 2024-05-24 08:55:36 +02:00
florez_j 63b683e4aa Optimzed and included df to np structured array conversion. \n-Replaced loop plus append with list comprehension. \n-Replaced pd df column concatenation based on row-wise concatenation with df.aggr() method that uses column wise concatenation. 2024-05-23 22:20:19 +02:00
florez_j bd458c6cd0 Optimzed and included df to np structured array conversion. \n-Replaced loop plus append with list comprehension. \n-Replaced pd df column concatenation based on row-wise concatenation with df.aggr() method that uses column wise concatenation. 2024-05-23 22:18:37 +02:00
florez_j a45fb4476b Replaced commented lines by accurate comments 2024-05-22 20:15:17 +02:00
florez_j 7367da84b9 Simplified code by updating HDF5 attributes using .update() dict method (inherited from dict type). 2024-05-22 20:11:54 +02:00
florez_j 1729cd40fa Added feature to interpret links to description in the yaml intrument configuration file and added them at the dataset level as attributes. 2024-05-09 19:17:08 +02:00
florez_j 1429c56916 Added link to descriptions and units of table variables/or columns. These can be used as attributes of datasets from tabular data 2024-05-09 19:15:20 +02:00
florez_j f49120102d Included timestamp specification, which indicates column names in a list that contain datetime information. 2024-04-30 14:51:58 +02:00
florez_j 553c3fe946 Incorparated feature to merge data and time data which may originally be in separate columns in text source files. This is specified in the text source specification yaml file 2024-04-30 14:50:33 +02:00
florez_j 493be88f49 Removed unecessary pygit depenedency and associated function that relied on it. 2024-04-26 13:15:33 +02:00
florez_j 4d91e59279 Included new delete attribute and restart review features. 2024-04-26 13:08:27 +02:00
florez_j 14ae29bf3c Corrected parsing problem from hdf5 to yaml attribute. Single element arrays are now represented as a scalar as opposed to a list with a single element. 2024-04-26 12:54:41 +02:00
florez_j 3c6440977f Implemented delete attribute feature and corrected yaml display of compound attributes 2024-04-25 08:58:29 +02:00
florez_j 4663ed10e2 Fixed branch naming problem 2024-04-24 17:21:44 +02:00
florez_j cf2431d4c3 Included created_at() function 2024-04-24 17:16:59 +02:00
florez_j be02ad01ed Removed problematic lines, which depended on soon to be removed dependency config_file.py 2024-04-24 17:14:13 +02:00
florez_j 6260e7da3c Redefined name of review branch as review:<review's name> 2024-04-24 17:09:38 +02:00
florez_j a9146e5afc Implemented delete attribute feature for review and simplified code. 2024-04-24 17:02:18 +02:00
florez_j ceb8a34ee0 Commented out no needed python import statements 2024-04-23 13:23:13 +02:00
florez_j 8876d5af4f Example data integretion configuration files 2024-04-23 12:03:24 +02:00
florez_j a12cd80355 Implemented function that takes yaml config files specifying data integration output 2024-04-23 11:10:13 +02:00
florez_j b233dc094d yaml intrument configuration file for text data 2024-04-23 11:07:49 +02:00
florez_j d3ec0bd473 Included additional directory path validation based on dir keywords 2024-04-23 11:05:20 +02:00
florez_j 9d9e9dcfe5 Added lines to parse instrument reader properties from yaml file. 2024-04-23 11:02:10 +02:00
florez_j 074d2e3954 Removed config_file output file naming and instead user now inputs desired output filename. Also added input argument to introduce root level metadata. 2024-04-18 19:14:06 +02:00