Commit Graph

57 Commits

Author SHA1 Message Date
fe96134383 Fixed bug in HDF5DataOpsManager.append_dataset() and added 'creation_date' metadata attribute when instrument (groups) are created. 2024-10-09 16:06:44 +02:00
c321a17943 Fixed bug, causing input_path normalization operation to damage Windows network drive paths. Basically, os.path.normpath(path_to_input_directory).strip(os.sep) replaced by os.path.normpath(path_to_input_directory).rstrip(os.sep) 2024-10-07 16:16:12 +02:00
89e9dd9ab1 Fixed bugs in update_file() method and create_hdf5_file_from_filesystem_path() 2024-10-03 09:32:25 +02:00
aad0a7c3fb Added file openning mode as input parameter. Now, mode can only take values in ['w','r+'] 2024-10-02 13:54:59 +02:00
4d48e84e50 Made two helper functions private by adding the prefix __ 2024-10-01 09:31:41 +02:00
85b0e5ab74 Performed a few function relocations and deletions from src/hdf5_lib.py into src/hdf5_ops.py and made a copy of previous version as src/hdf5_lib_part2.py 2024-09-26 15:13:31 +02:00
a92660049f Moved is_structured_array() and to_serializable_dtype() to utils, ranamed a few functions and propagated changes to dependent modules. 2024-09-26 14:03:11 +02:00
1e1499c28a Robustified definition of path_to_input_dir arg or parameter by ensuring is always defined using forward slashes and then is normalized to the os specification. Improved dry run = True of copy directory func. 2024-09-25 15:12:19 +02:00
d63f522588 Major update. Remove file filtering option and outputname input arg. The output name is now the same as the path_to_input_dir + .h5. By default, the hdf5 writer preserves second level subdirectories and the rest are flattend. dir filtering is outsource to copy_dir_with_constraints from utils- 2024-09-16 16:35:09 +02:00
7a9f7a8c59 Renamed parameter 'input_file_system_path' to 'path_to_input_directory' for clarity. 2024-09-16 14:24:55 +02:00
e4b04b4484 Modified to use filereader_registry.py. 2024-08-23 16:10:23 +02:00
d985115125 Integrated copy h5 file into group functionality, imported from g5505_file_reader 2024-08-23 15:47:04 +02:00
18165eca1a Modified import statements to account for reader module's relocation. 2024-08-23 13:27:26 +02:00
d7fc38abd9 Moved get_parent_relationships func into hdf5_vis.py and cleaned up unused import statements 2024-08-22 09:50:26 +02:00
ae1e3bfc23 Moved ext_to_reader_dict to g5505_file_reader.py and replaced redear selection based on g5505_reader.select_file_reader(hdf5_file_path). 2024-08-07 16:30:36 +02:00
a06e28291c Added attribution insertion order tracking at the root level and reorganized a few import statements. 2024-07-17 08:41:40 +02:00
2ebe5f3220 Made edits to documentation 2024-07-11 13:42:38 +02:00
73beb83278 Moved parse_attribute() from ..review_lib.py into ...utils.py and backpropagate (refactored) changes to respective modules. 2024-07-10 11:32:00 +02:00
0a0b4ac41d Moved a few functions from ...reader.py and hdf5_lib.py into ..utils.py, and refactored accordingly. 2024-07-10 09:19:30 +02:00
afc6c93823 Removed non utilized code. 2024-07-08 15:29:13 +02:00
cb7d914908 Cleaned code and modified def create_hdf5_file_from_dataframe to create group hierichy implicitly from path rather than recursively. 2024-07-08 15:24:48 +02:00
77386432f8 Merge branch 'main' of https://gitlab.psi.ch/5505/dima 2024-07-02 16:50:08 +02:00
177a5aa2a1 Updated documentation. 2024-07-02 16:49:48 +02:00
c074e45892 Renamed script_name to processing_file. 2024-07-01 16:17:25 +02:00
106795ae59 Added a few lines to detect the existence of the file and change the file mode from 'w' to 'a' based on that information. 2024-06-20 09:03:47 +02:00
498a51cbc6 Updated function to add project level metadata at the root group of the hdf5 file. 2024-06-19 18:31:11 +02:00
04558e7785 Added code to parse dict attributes. 2024-06-18 14:42:51 +02:00
a6868d985d Fixed bug regarding datetime to str column conversion in dataframe by using .map(srt) (element wise operation) as opposed to .apply(str) 2024-06-18 09:21:46 +02:00
b66dc11a62 Replaced applymap to .apply because the former is being depricated 2024-06-17 13:47:54 +02:00
ed1641af55 Created function to save dataframes with annotations in hdf5 format 2024-06-17 13:36:05 +02:00
9ab9aa49c4 Abstracted a code snippet from def create_hdf5_file_from_filesystem_path(..) as transfer_file_dict_to_hdf5() so that it can be reusable. 2024-06-13 15:44:01 +02:00
1054367f12 Modified annotate_root_dir function. 2024-06-02 16:02:48 +02:00
a86fc97605 Refactored due to updates in the file reader function. 2024-05-28 14:41:34 +02:00
41c7660be3 Enhanced data transfer progress visualization and logging 2024-05-28 08:59:29 +02:00
2911416431 Improved modularity of hdf5_file creation by creating a function that copies the intput directory file and applies directory, files, and extensions constraints before regular directory to hdf5 transfer. See [200~def copy_directory_with_contraints(input_dir_path, output_dir_path, select_dir_keywords, select_file_keywords, allowed_file_extensions): 2024-05-27 18:15:08 +02:00
88de88c316 Removed creation of yaml file subsequent to data integration. This can cause misalignment with data store. I think the yaml snapshot of a hdf5 file should therefore be outsourced there. 2024-05-24 09:30:24 +02:00
1537633b1a Made a few optimizations to code and documentation. Expressions relying on list comprehensions were simplified with generator expressions. ex,: any([keyword in filename for keyword in select_file_keywords]) was simplified to any(keyword in filename for keyword in select_file_keywords). 2024-05-24 09:06:07 +02:00
a45fb4476b Replaced commented lines by accurate comments 2024-05-22 20:15:17 +02:00
7367da84b9 Simplified code by updating HDF5 attributes using .update() dict method (inherited from dict type). 2024-05-22 20:11:54 +02:00
be02ad01ed Removed problematic lines, which depended on soon to be removed dependency config_file.py 2024-04-24 17:14:13 +02:00
ceb8a34ee0 Commented out no needed python import statements 2024-04-23 13:23:13 +02:00
d3ec0bd473 Included additional directory path validation based on dir keywords 2024-04-23 11:05:20 +02:00
074d2e3954 Removed config_file output file naming and instead user now inputs desired output filename. Also added input argument to introduce root level metadata. 2024-04-18 19:14:06 +02:00
a1c88fdb5a Added lines to flatten (shorten) original directory paths in the resulting hdf5 file. 2024-04-17 15:20:26 +02:00
f9b31c06fd Reimplemented file filtering, first file extension contraints are imposed and then file keyword contraints. 2024-04-03 13:49:16 +02:00
9cde013be0 Modified node values as the number of children of each group. When nodes are datasets, their value is 1. 2024-04-02 18:48:50 +02:00
39cae66936 Implemented a two important changes. 1. filename of output file is not passed as input but it is automatically computed based on an input config_param dict. 2) input filenames in file system path are now filtered on an initial walk through the directory tree. This is to use stored path filenames for prunning directory tree, later on. 2024-04-02 17:33:58 +02:00
a58bf4f019 Refactored import dependencies. 2024-03-26 13:57:19 +01:00
1bf1f60beb Added lines to treat string attributes as fixed-length strings, which are represented as bytes that need to be decoded with utf-8. There are a few advantages, and hdf5 reader provide more precise behavior than variable length strings 2024-03-22 17:28:47 +01:00
e389ffbefe Relocated def display_group_hierarchy_on_a_treemap(filename: str) to hdf5_vis.py 2024-03-21 16:27:54 +01:00