Commit Graph

52 Commits

Author SHA1 Message Date
02a7c4d834 Performed a few function relocations and deletions from src/hdf5_lib.py into src/hdf5_ops.py and made a copy of previous version as src/hdf5_lib_part2.py 2024-09-26 15:13:31 +02:00
8f9e2fc594 Moved is_structured_array() and to_serializable_dtype() to utils, ranamed a few functions and propagated changes to dependent modules. 2024-09-26 14:03:11 +02:00
dd8fc1a906 Robustified definition of path_to_input_dir arg or parameter by ensuring is always defined using forward slashes and then is normalized to the os specification. Improved dry run = True of copy directory func. 2024-09-25 15:12:19 +02:00
7b3b453db1 Major update. Remove file filtering option and outputname input arg. The output name is now the same as the path_to_input_dir + .h5. By default, the hdf5 writer preserves second level subdirectories and the rest are flattend. dir filtering is outsource to copy_dir_with_constraints from utils- 2024-09-16 16:35:09 +02:00
6d91c043f8 Renamed parameter 'input_file_system_path' to 'path_to_input_directory' for clarity. 2024-09-16 14:24:55 +02:00
926dc9208a Modified to use filereader_registry.py. 2024-08-23 16:10:23 +02:00
b499ef2845 Integrated copy h5 file into group functionality, imported from g5505_file_reader 2024-08-23 15:47:04 +02:00
17dd1f1864 Modified import statements to account for reader module's relocation. 2024-08-23 13:27:26 +02:00
9d917226af Moved get_parent_relationships func into hdf5_vis.py and cleaned up unused import statements 2024-08-22 09:50:26 +02:00
99fb2de6d8 Moved ext_to_reader_dict to g5505_file_reader.py and replaced redear selection based on g5505_reader.select_file_reader(hdf5_file_path). 2024-08-07 16:30:36 +02:00
6c50625002 Added attribution insertion order tracking at the root level and reorganized a few import statements. 2024-07-17 08:41:40 +02:00
085ddda0b2 Made edits to documentation 2024-07-11 13:42:38 +02:00
586dcef621 Moved parse_attribute() from ..review_lib.py into ...utils.py and backpropagate (refactored) changes to respective modules. 2024-07-10 11:32:00 +02:00
3d8b46cf05 Moved a few functions from ...reader.py and hdf5_lib.py into ..utils.py, and refactored accordingly. 2024-07-10 09:19:30 +02:00
aa69faa995 Removed non utilized code. 2024-07-08 15:29:13 +02:00
2992f0a645 Cleaned code and modified def create_hdf5_file_from_dataframe to create group hierichy implicitly from path rather than recursively. 2024-07-08 15:24:48 +02:00
57ee91df7d Merge branch 'main' of https://gitlab.psi.ch/5505/dima 2024-07-02 16:50:08 +02:00
926a0f9e08 Updated documentation. 2024-07-02 16:49:48 +02:00
b21ccbddf0 Renamed script_name to processing_file. 2024-07-01 16:17:25 +02:00
0cc6cf0785 Added a few lines to detect the existence of the file and change the file mode from 'w' to 'a' based on that information. 2024-06-20 09:03:47 +02:00
210379a2b4 Updated function to add project level metadata at the root group of the hdf5 file. 2024-06-19 18:31:11 +02:00
2113a17e40 Added code to parse dict attributes. 2024-06-18 14:42:51 +02:00
60f4497711 Fixed bug regarding datetime to str column conversion in dataframe by using .map(srt) (element wise operation) as opposed to .apply(str) 2024-06-18 09:21:46 +02:00
2ea9269f75 Replaced applymap to .apply because the former is being depricated 2024-06-17 13:47:54 +02:00
86a811e6aa Created function to save dataframes with annotations in hdf5 format 2024-06-17 13:36:05 +02:00
622661d4d3 Abstracted a code snippet from def create_hdf5_file_from_filesystem_path(..) as transfer_file_dict_to_hdf5() so that it can be reusable. 2024-06-13 15:44:01 +02:00
2b2874cfdc Modified annotate_root_dir function. 2024-06-02 16:02:48 +02:00
82754e26b0 Refactored due to updates in the file reader function. 2024-05-28 14:41:34 +02:00
2fe2ac2efa Enhanced data transfer progress visualization and logging 2024-05-28 08:59:29 +02:00
33fec9bd59 Improved modularity of hdf5_file creation by creating a function that copies the intput directory file and applies directory, files, and extensions constraints before regular directory to hdf5 transfer. See [200~def copy_directory_with_contraints(input_dir_path, output_dir_path, select_dir_keywords, select_file_keywords, allowed_file_extensions): 2024-05-27 18:15:08 +02:00
292708e745 Removed creation of yaml file subsequent to data integration. This can cause misalignment with data store. I think the yaml snapshot of a hdf5 file should therefore be outsourced there. 2024-05-24 09:30:24 +02:00
c4f12eaa84 Made a few optimizations to code and documentation. Expressions relying on list comprehensions were simplified with generator expressions. ex,: any([keyword in filename for keyword in select_file_keywords]) was simplified to any(keyword in filename for keyword in select_file_keywords). 2024-05-24 09:06:07 +02:00
e4b9487575 Replaced commented lines by accurate comments 2024-05-22 20:15:17 +02:00
7c1c0bf33c Simplified code by updating HDF5 attributes using .update() dict method (inherited from dict type). 2024-05-22 20:11:54 +02:00
be02ad01ed Removed problematic lines, which depended on soon to be removed dependency config_file.py 2024-04-24 17:14:13 +02:00
ceb8a34ee0 Commented out no needed python import statements 2024-04-23 13:23:13 +02:00
d3ec0bd473 Included additional directory path validation based on dir keywords 2024-04-23 11:05:20 +02:00
074d2e3954 Removed config_file output file naming and instead user now inputs desired output filename. Also added input argument to introduce root level metadata. 2024-04-18 19:14:06 +02:00
a1c88fdb5a Added lines to flatten (shorten) original directory paths in the resulting hdf5 file. 2024-04-17 15:20:26 +02:00
f9b31c06fd Reimplemented file filtering, first file extension contraints are imposed and then file keyword contraints. 2024-04-03 13:49:16 +02:00
9cde013be0 Modified node values as the number of children of each group. When nodes are datasets, their value is 1. 2024-04-02 18:48:50 +02:00
39cae66936 Implemented a two important changes. 1. filename of output file is not passed as input but it is automatically computed based on an input config_param dict. 2) input filenames in file system path are now filtered on an initial walk through the directory tree. This is to use stored path filenames for prunning directory tree, later on. 2024-04-02 17:33:58 +02:00
a58bf4f019 Refactored import dependencies. 2024-03-26 13:57:19 +01:00
1bf1f60beb Added lines to treat string attributes as fixed-length strings, which are represented as bytes that need to be decoded with utf-8. There are a few advantages, and hdf5 reader provide more precise behavior than variable length strings 2024-03-22 17:28:47 +01:00
e389ffbefe Relocated def display_group_hierarchy_on_a_treemap(filename: str) to hdf5_vis.py 2024-03-21 16:27:54 +01:00
b886066133 Simplified code and corrected buggy if statement. Included input verification steps and OS path normalization. 2024-03-19 11:11:05 +01:00
79b7428b9f Cleaned up code by removing commented lines and so on. 2024-02-21 10:47:12 +01:00
1a4294e0c2 Modified to received unified dictionary structure and transform it into equivalent group datasets and attribute structure. 2024-02-16 16:52:21 +01:00
e7bdee21da Refactored to interact with config_file.py, which sets available file readers 2024-02-15 15:59:42 +01:00
337a1947fe Reverted a few minor refactoring changes. 2024-02-15 10:10:10 +01:00