230 Commits

Author SHA1 Message Date
9329f39deb Deleted output no longer returned in data integration pipeline 2024-05-24 14:55:08 +02:00
d000a8348f Added bottom level instrument metadata descriptions such as units and description. 2024-05-24 09:50:25 +02:00
8d4f4e68c7 Removed yaml file output from data integration file. The creation of this file is being outsource to data store repo 2024-05-24 09:32:30 +02:00
88de88c316 Removed creation of yaml file subsequent to data integration. This can cause misalignment with data store. I think the yaml snapshot of a hdf5 file should therefore be outsourced there. 2024-05-24 09:30:24 +02:00
1537633b1a Made a few optimizations to code and documentation. Expressions relying on list comprehensions were simplified with generator expressions. ex,: any([keyword in filename for keyword in select_file_keywords]) was simplified to any(keyword in filename for keyword in select_file_keywords). 2024-05-24 09:06:07 +02:00
d574ac382d Replaced attribute table_header in Lopap configuration file with a shorter version which is consistent accross more files. Some of the headers might change. 2024-05-24 08:55:36 +02:00
63b683e4aa Optimzed and included df to np structured array conversion. \n-Replaced loop plus append with list comprehension. \n-Replaced pd df column concatenation based on row-wise concatenation with df.aggr() method that uses column wise concatenation. 2024-05-23 22:20:19 +02:00
bd458c6cd0 Optimzed and included df to np structured array conversion. \n-Replaced loop plus append with list comprehension. \n-Replaced pd df column concatenation based on row-wise concatenation with df.aggr() method that uses column wise concatenation. 2024-05-23 22:18:37 +02:00
a45fb4476b Replaced commented lines by accurate comments 2024-05-22 20:15:17 +02:00
7367da84b9 Simplified code by updating HDF5 attributes using .update() dict method (inherited from dict type). 2024-05-22 20:11:54 +02:00
1729cd40fa Added feature to interpret links to description in the yaml intrument configuration file and added them at the dataset level as attributes. 2024-05-09 19:17:08 +02:00
1429c56916 Added link to descriptions and units of table variables/or columns. These can be used as attributes of datasets from tabular data 2024-05-09 19:15:20 +02:00
f49120102d Included timestamp specification, which indicates column names in a list that contain datetime information. 2024-04-30 14:51:58 +02:00
553c3fe946 Incorparated feature to merge data and time data which may originally be in separate columns in text source files. This is specified in the text source specification yaml file 2024-04-30 14:50:33 +02:00
493be88f49 Removed unecessary pygit depenedency and associated function that relied on it. 2024-04-26 13:15:33 +02:00
4d91e59279 Included new delete attribute and restart review features. 2024-04-26 13:08:27 +02:00
14ae29bf3c Corrected parsing problem from hdf5 to yaml attribute. Single element arrays are now represented as a scalar as opposed to a list with a single element. 2024-04-26 12:54:41 +02:00
3c6440977f Implemented delete attribute feature and corrected yaml display of compound attributes 2024-04-25 08:58:29 +02:00
4663ed10e2 Fixed branch naming problem 2024-04-24 17:21:44 +02:00
cf2431d4c3 Included created_at() function 2024-04-24 17:16:59 +02:00
be02ad01ed Removed problematic lines, which depended on soon to be removed dependency config_file.py 2024-04-24 17:14:13 +02:00
6260e7da3c Redefined name of review branch as review:<review's name> 2024-04-24 17:09:38 +02:00
a9146e5afc Implemented delete attribute feature for review and simplified code. 2024-04-24 17:02:18 +02:00
ceb8a34ee0 Commented out no needed python import statements 2024-04-23 13:23:13 +02:00
8876d5af4f Example data integretion configuration files 2024-04-23 12:03:24 +02:00
a12cd80355 Implemented function that takes yaml config files specifying data integration output 2024-04-23 11:10:13 +02:00
b233dc094d yaml intrument configuration file for text data 2024-04-23 11:07:49 +02:00
d3ec0bd473 Included additional directory path validation based on dir keywords 2024-04-23 11:05:20 +02:00
9d9e9dcfe5 Added lines to parse instrument reader properties from yaml file. 2024-04-23 11:02:10 +02:00
074d2e3954 Removed config_file output file naming and instead user now inputs desired output filename. Also added input argument to introduce root level metadata. 2024-04-18 19:14:06 +02:00
1ed37920c2 Replaced git commands in terms of subprocess.run 2024-04-17 15:26:45 +02:00
a1c88fdb5a Added lines to flatten (shorten) original directory paths in the resulting hdf5 file. 2024-04-17 15:20:26 +02:00
8005b60579 Included a boolean input argument hdf5_upload to deactivate hdf5 upload for testing. 2024-04-07 17:09:01 +02:00
edd1bbf5be Added an import and treemap to png statemets, but for some reason didnot work, and took forever to run. So, I left the lines but for now commented them out. 2024-04-07 16:55:37 +02:00
89e94a1b2b Renamed forth_submit_.. function to last_submit .. 2024-04-05 17:21:18 +02:00
5e70d9158b Deleted function third_complete_metadata_review() because forth_complete_metadata_review() is the same. Also, modified a substring of their name from complete to submit and submit to save for clarity. Usually submission is the last step of a review process. 2024-04-05 17:10:34 +02:00
d68dc98070 Implemented some safeguards that enable only commits of untracked metadata review files 2024-04-04 14:20:13 +02:00
dd1f1245e3 Refactored comment lines. 2024-04-04 12:58:17 +02:00
2d5fecfb34 Removed git checkout statements, to avoid conflicting changes of .ipybn files. 2024-04-04 12:56:37 +02:00
fa4fe691d0 Refactored a few git statemets in terms of subprocess.run 2024-04-04 11:02:24 +02:00
f9b31c06fd Reimplemented file filtering, first file extension contraints are imposed and then file keyword contraints. 2024-04-03 13:49:16 +02:00
9cde013be0 Modified node values as the number of children of each group. When nodes are datasets, their value is 1. 2024-04-02 18:48:50 +02:00
9071120e50 Refactored code to read .dat and .txt files in binary mode first rb, then the prespecified encoding is used to decode the lines. This is to have more control over the decoding process and be able to better spot possible encoding errors. 2024-04-02 18:35:04 +02:00
f351f102b7 Commented out a print statement. 2024-04-02 18:31:58 +02:00
39cae66936 Implemented a two important changes. 1. filename of output file is not passed as input but it is automatically computed based on an input config_param dict. 2) input filenames in file system path are now filtered on an initial walk through the directory tree. This is to use stored path filenames for prunning directory tree, later on. 2024-04-02 17:33:58 +02:00
9c70fd643f Refactored code in terms of subprocess for git functionality. 2024-03-28 19:38:12 +01:00
6fb5253d21 Corrected a few bugs; deletion of useless buggy line and configuration of text reader with latin-1 encoding for a few cases. 2024-03-28 18:20:57 +01:00
bbff419313 Removed strange bug when reading .TXT smps files. Specified latin-1 encoding and relaxed error detection to ignore. 2024-03-28 17:43:26 +01:00
06429e6def Generalized workflow functions to consider reviewer attributes such as initials and type e.g., data-owner and metadata-reviewer. 2024-03-28 16:11:01 +01:00
37fd603943 Completed first version of metadata_review_lib.py. Still need to test and correct possible bugs. 2024-03-28 13:59:47 +01:00