Commit Graph

263 Commits

Author SHA1 Message Date
0417ac6deb Modified hdf5 file path whose metadata is to be reviewed. 2024-04-04 09:31:19 +02:00
dee204010d Initialized metadata review process. 2024-04-04 09:23:55 +02:00
72e37ed277 Implemented jupyter notebooks for metadata review workflow excecution. 2024-04-04 09:18:36 +02:00
719e9d6672 Repurposed the role of the config_file.py. Now it only provides functions to select the file_readers based on group id and produce a created_at timestamp. 2024-04-03 13:55:54 +02:00
5cd19979b6 Implemented first approach to data integration workflow 2024-04-03 13:51:21 +02:00
f9b31c06fd Reimplemented file filtering, first file extension contraints are imposed and then file keyword contraints. 2024-04-03 13:49:16 +02:00
9cde013be0 Modified node values as the number of children of each group. When nodes are datasets, their value is 1. 2024-04-02 18:48:50 +02:00
9071120e50 Refactored code to read .dat and .txt files in binary mode first rb, then the prespecified encoding is used to decode the lines. This is to have more control over the decoding process and be able to better spot possible encoding errors. 2024-04-02 18:35:04 +02:00
f351f102b7 Commented out a print statement. 2024-04-02 18:31:58 +02:00
39cae66936 Implemented a two important changes. 1. filename of output file is not passed as input but it is automatically computed based on an input config_param dict. 2) input filenames in file system path are now filtered on an initial walk through the directory tree. This is to use stored path filenames for prunning directory tree, later on. 2024-04-02 17:33:58 +02:00
9c70fd643f Refactored code in terms of subprocess for git functionality. 2024-03-28 19:38:12 +01:00
0fcdc4ad2e Refactored code in terms of subprocess for git functionlity 2024-03-28 19:36:50 +01:00
accb271d83 Submitted metadata review. 2024-03-28 19:33:52 +01:00
8991dfd6df Initialized metadata review process. 2024-03-28 19:32:30 +01:00
c552752468 Submitted metadata review. 2024-03-28 19:31:19 +01:00
5d54ac99cd Submitted metadata review. 2024-03-28 19:27:32 +01:00
1df8faedf6 Submitted metadata review. 2024-03-28 19:24:23 +01:00
366e01fa4c Initialized metadata review process. 2024-03-28 19:11:46 +01:00
2a7d1eeb89 Initialized metadata review process. 2024-03-28 19:11:46 +01:00
e08119e9b6 Initialized metadata review process. 2024-03-28 19:09:51 +01:00
c3048f3083 Submitted metadata review. 2024-03-28 18:28:58 +01:00
7581b26b7c Submitted metadata review. 2024-03-28 18:28:58 +01:00
a6a911e9d0 Initialized metadata review process. 2024-03-28 18:28:58 +01:00
942485ffc1 Modified code to select usecases based on integer number. 2024-03-28 18:24:44 +01:00
2b568ff05a Implemented jupyter notebook to run data integration workflow. Tested all usecases defined in config. So far so good. 2024-03-28 18:22:40 +01:00
6fb5253d21 Corrected a few bugs; deletion of useless buggy line and configuration of text reader with latin-1 encoding for a few cases. 2024-03-28 18:20:57 +01:00
bbff419313 Removed strange bug when reading .TXT smps files. Specified latin-1 encoding and relaxed error detection to ignore. 2024-03-28 17:43:26 +01:00
06429e6def Generalized workflow functions to consider reviewer attributes such as initials and type e.g., data-owner and metadata-reviewer. 2024-03-28 16:11:01 +01:00
37fd603943 Completed first version of metadata_review_lib.py. Still need to test and correct possible bugs. 2024-03-28 13:59:47 +01:00
f0af30f7e8 Deleted metadata_review_workflow.py and turned it into a jupyter notebook. 2024-03-28 13:10:42 +01:00
438ac4d24d Included lines for setting up author and commiter in pygit2 commit function 2024-03-27 14:24:33 +01:00
54e30ef9ec Implemented git add and commit for second metadata review step, and create it function to checkout branches. 2024-03-27 14:23:16 +01:00
6aa98b71b3 Implemented git add and commit for second metadata review step, and create it function to checkout branches. 2024-03-27 14:22:25 +01:00
56010f58ad Submitted metadata review. 2024-03-27 13:55:39 +01:00
819474f678 Initialized metadata review process. 2024-03-27 13:55:39 +01:00
383c5377c1 Initialized metadata review process. 2024-03-27 13:51:17 +01:00
270825e9dc Initialized metadata review process. 2024-03-27 11:35:53 +01:00
06283e286f Initialized metadata review process. 2024-03-26 17:22:42 +01:00
2aac145379 Removed buggy statement, which was expected to detect recently created review files 2024-03-26 16:34:38 +01:00
1a89e1af66 Implemented script to run metadata review workflow 2024-03-26 16:25:44 +01:00
302b7dbfa5 Implemented metadata review library 2024-03-26 16:21:02 +01:00
1f2bb419fe Save commit 2024-03-26 16:20:04 +01:00
f37ba4705a Included .h5 files for now, but they should be enable later on through git LFS. 2024-03-26 16:18:48 +01:00
a727e38db4 Implemented hdf5_vis.py, which is a hdf5 visualization library to obtain treemap and yaml representations of hdf5 files. 2024-03-26 16:14:40 +01:00
a58bf4f019 Refactored import dependencies. 2024-03-26 13:57:19 +01:00
e934ae65d6 Relocated from src/ 2024-03-25 08:52:13 +01:00
1b9963d44d Moved to input_files/ 2024-03-25 08:51:34 +01:00
1bf1f60beb Added lines to treat string attributes as fixed-length strings, which are represented as bytes that need to be decoded with utf-8. There are a few advantages, and hdf5 reader provide more precise behavior than variable length strings 2024-03-22 17:28:47 +01:00
13cb6395aa Restructured the way table_preamble attribute is represented. Now it is a list of strings as opposed to a multilinear string with special characters like \n. This is to avoid parsing problems in the yalm files. 2024-03-22 17:26:30 +01:00
fff935f551 Included optional argument in make_copy function and commented out a few lines that increase dataset storage complexity. 2024-03-21 17:16:14 +01:00