73beb83278Moved parse_attribute() from ..review_lib.py into ...utils.py and backpropagate (refactored) changes to respective modules.Florez Ospina Juan Felipe2024-07-10 11:32:00 +02:00
2ce925735dModified return datetime output to a format without colons, which could be problematic for filenaming.Florez Ospina Juan Felipe2024-07-10 09:47:56 +02:00
0a0b4ac41dMoved a few functions from ...reader.py and hdf5_lib.py into ..utils.py, and refactored accordingly.Florez Ospina Juan Felipe2024-07-10 09:19:30 +02:00
0c74c52e09Removed smogchamber reader because its funtionality is now integrated into g5505_file_reader.py.Florez Ospina Juan Felipe2024-07-09 16:13:01 +02:00
cb7d914908Cleaned code and modified def create_hdf5_file_from_dataframe to create group hierichy implicitly from path rather than recursively.Florez Ospina Juan Felipe2024-07-08 15:24:48 +02:00
92eca4d79eMoved remaining git operations in metadata_review_lib.py to git_ops.py and refactored accoringlyFlorez Ospina Juan Felipe2024-07-05 15:46:20 +02:00
cedfe614e7Implemented input argument to enable append information to exisintg attributes, which must take the values of either strings or lists.Florez Ospina Juan Felipe2024-06-20 15:32:33 +02:00
106795ae59Added a few lines to detect the existence of the file and change the file mode from 'w' to 'a' based on that information.Florez Ospina Juan Felipe2024-06-20 09:03:47 +02:00
498a51cbc6Updated function to add project level metadata at the root group of the hdf5 file.Florez Ospina Juan Felipe2024-06-19 18:31:11 +02:00
06c5c6d84bIncorporated method to MetadataHarvester class to collect project level metadata.Florez Ospina Juan Felipe2024-06-19 18:30:02 +02:00
a6868d985dFixed bug regarding datetime to str column conversion in dataframe by using .map(srt) (element wise operation) as opposed to .apply(str)Florez Ospina Juan Felipe2024-06-18 09:21:46 +02:00
c68e800967Incorporated dataframe_to_np_structured_array(df: pd.DataFrame) from another module.Florez Ospina Juan Felipe2024-06-16 18:39:30 +02:00
e4de4edf28Incorporated dataframe_to_np_structured_array(df: pd.DataFrame) from another module.Florez Ospina Juan Felipe2024-06-16 18:26:12 +02:00
2d4ecec806Moved dataframe_to_np_structured_array(df: pd.DataFrame) to src/g5505_utils.py. This is a more generic function that can be used more broadly accross modules.Florez Ospina Juan Felipe2024-06-16 18:25:08 +02:00
0fb14b7c6cDeveloped a metadata harvesting object to facilitate metadata collection throught the code.Florez Ospina Juan Felipe2024-06-13 15:47:02 +02:00
f43d86e729Modified a few variable values in yaml files so that they are within expected values.Florez Ospina Juan Felipe2024-06-13 15:45:39 +02:00
9ab9aa49c4Abstracted a code snippet from def create_hdf5_file_from_filesystem_path(..) as transfer_file_dict_to_hdf5() so that it can be reusable.Florez Ospina Juan Felipe2024-06-13 15:44:01 +02:00
e7ed6145f0Implemented a data extraction module to access data from an hdf5 file in the form of dataframes.Florez Ospina Juan Felipe2024-06-11 10:38:04 +02:00
a410bde23eRemoved data table split into categorical and numerical variables and numering is only introduce to disambiguate repeated columns.Florez Ospina Juan Felipe2024-06-10 16:18:51 +02:00
1ec7ad76ffRemoved additional numbering from some intrument specifications. These are now only added if the column names are ambigous.Florez Ospina Juan Felipe2024-06-10 16:14:13 +02:00
197ad0288aUpdated file reader and data integration with datastart and dataend properties.Florez Ospina Juan Felipe2024-06-04 13:37:20 +02:00
9dcc757accrenamed folder src/instrument_descriptions/ to src/intruments/ and moved text_data_sources.yaml in there.Florez Ospina Juan Felipe2024-06-04 10:54:09 +02:00
a6ddb24eebAdded .strip to column names to remove unwanted characters (\r|\t|\n) and included units description to timestamps.Florez Ospina Juan Felipe2024-06-04 09:57:37 +02:00
014bd14fcdModified temperature units from °C to Celcius for simpler string encoding. It seems ascii codec cannot encode such a characterFlorez Ospina Juan Felipe2024-06-04 09:44:09 +02:00
385267a98fUpdated treemap visualization to select only root metadata, which is of string type.Florez Ospina Juan Felipe2024-06-03 14:17:42 +02:00
d335836a7dUpdated reader to standardize timestamps to a desired format when possible. The desired format is set in text_data_sources.yaml.Florez Ospina Juan Felipe2024-06-02 15:59:01 +02:00
3a9aede909Made def third_update_hdf5_file_with_review more modular by separating data update and git operations, resulting new functions that can be reused in less restrictive matadata annotation contexts.Florez Ospina Juan Felipe2024-05-29 15:26:48 +02:00
ef7c6c9efbImplemented a git operations module for automated git ops, based on subprocess.Florez Ospina Juan Felipe2024-05-29 15:17:09 +02:00
dad5e082f1Changed ordering of data integration config files so that they align with our experimental campaign hierarchy.Florez Ospina Juan Felipe2024-05-28 14:43:32 +02:00
3de6abce50added the feature to activate or deactivate data copying before reading the input file. This is to avoid redundant copying when we are already working on file copies.Florez Ospina Juan Felipe2024-05-28 14:40:14 +02:00
fd1c6461bbUpdated some of the raname_as metadata for all instruments so that it is much machine readable and perhpas be used as an alternative to the original name in future releases.Florez Ospina Juan Felipe2024-05-28 14:37:43 +02:00
804ea52583Modified function to return list of paths when config_file.yaml integration mode = experimental step.Florez Ospina Juan Felipe2024-05-28 11:29:32 +02:00
f6a46168ecImproved parsing from HDF5 attr dict to yaml compatible dict. Now we can parse HDF5 compound attributes (structured np arrays).Florez Ospina Juan Felipe2024-05-28 11:27:44 +02:00
08d58557dfFixed bug that didnot allowed analythical_methods composite keywords (e.g., ICAD/HONO) to be matched in intrument configurations.Florez Ospina Juan Felipe2024-05-28 08:57:57 +02:00
2911416431Improved modularity of hdf5_file creation by creating a function that copies the intput directory file and applies directory, files, and extensions constraints before regular directory to hdf5 transfer. See [200~def copy_directory_with_contraints(input_dir_path, output_dir_path, select_dir_keywords, select_file_keywords, allowed_file_extensions):Florez Ospina Juan Felipe2024-05-27 18:15:08 +02:00
77afbbbf8fAdded function to convert list of strings into a np.array of bytes. This is useful to create list-valued attributes in HDF5.Florez Ospina Juan Felipe2024-05-26 14:56:36 +02:00
88572b44b1Fixed buggy statement. import datetime ... followed by datetime.now() was fixed as datetime.datetime.now().Florez Ospina Juan Felipe2024-05-26 12:26:54 +02:00
37071945f5Removed hdf5 file creation redundancy by creating a helper function create_HDF5_file(date_str,select_file_keywords), which handles variations in date_str and keywords.Florez Ospina Juan Felipe2024-05-26 12:24:15 +02:00
4dc09339b5Replaced lambda function with regular function and fstring for better readability and debuggingFlorez Ospina Juan Felipe2024-05-26 11:39:40 +02:00
b7f9bfe149Replaced print statement with logging and raise exception for better error handling and managmentFlorez Ospina Juan Felipe2024-05-26 11:34:20 +02:00
ac37235072Added function setup_logging to configure logger to record logs in specified output directory.Florez Ospina Juan Felipe2024-05-26 11:19:54 +02:00
d000a8348fAdded bottom level instrument metadata descriptions such as units and description.Florez Ospina Juan Felipe2024-05-24 09:50:25 +02:00
8d4f4e68c7Removed yaml file output from data integration file. The creation of this file is being outsource to data store repoFlorez Ospina Juan Felipe2024-05-24 09:32:30 +02:00
88de88c316Removed creation of yaml file subsequent to data integration. This can cause misalignment with data store. I think the yaml snapshot of a hdf5 file should therefore be outsourced there.Florez Ospina Juan Felipe2024-05-24 09:30:24 +02:00
1537633b1aMade a few optimizations to code and documentation. Expressions relying on list comprehensions were simplified with generator expressions. ex,: any([keyword in filename for keyword in select_file_keywords]) was simplified to any(keyword in filename for keyword in select_file_keywords).Florez Ospina Juan Felipe2024-05-24 09:06:07 +02:00
d574ac382dReplaced attribute table_header in Lopap configuration file with a shorter version which is consistent accross more files. Some of the headers might change.Florez Ospina Juan Felipe2024-05-24 08:55:36 +02:00
63b683e4aaOptimzed and included df to np structured array conversion. \n-Replaced loop plus append with list comprehension. \n-Replaced pd df column concatenation based on row-wise concatenation with df.aggr() method that uses column wise concatenation.Florez Ospina Juan Felipe2024-05-23 22:20:19 +02:00
bd458c6cd0Optimzed and included df to np structured array conversion. \n-Replaced loop plus append with list comprehension. \n-Replaced pd df column concatenation based on row-wise concatenation with df.aggr() method that uses column wise concatenation.Florez Ospina Juan Felipe2024-05-23 22:18:37 +02:00
7367da84b9Simplified code by updating HDF5 attributes using .update() dict method (inherited from dict type).Florez Ospina Juan Felipe2024-05-22 20:11:54 +02:00
1729cd40faAdded feature to interpret links to description in the yaml intrument configuration file and added them at the dataset level as attributes.Florez Ospina Juan Felipe2024-05-09 19:17:08 +02:00
1429c56916Added link to descriptions and units of table variables/or columns. These can be used as attributes of datasets from tabular dataFlorez Ospina Juan Felipe2024-05-09 19:15:20 +02:00
f49120102dIncluded timestamp specification, which indicates column names in a list that contain datetime information.Florez Ospina Juan Felipe2024-04-30 14:51:58 +02:00
553c3fe946Incorparated feature to merge data and time data which may originally be in separate columns in text source files. This is specified in the text source specification yaml fileFlorez Ospina Juan Felipe2024-04-30 14:50:33 +02:00