Commit Graph

231 Commits

Author SHA1 Message Date
florez_j e96ecfa951 Implement record_data_lineage.py to be used as a parameterized decorator. This is to simplify provenance tracking on newly added file readers. 2025-09-19 19:01:08 +02:00
florez_j 6c908e6686 Update src/hdf5_writer.py to record unflattened path from original folder 2025-06-25 14:11:56 +02:00
florez_j dacafb6234 Update src/hdf5_ops.py to allow for replicates after flattening directory structures. 2025-06-25 14:11:02 +02:00
florez_j f4ddd36ef2 Clean import statements 2025-06-19 20:49:36 +02:00
florez_j e80c19ef61 Update src/hdf5_writer.py to consider data lineage metadata in data ingestion process 2025-06-07 15:31:13 +02:00
florez_j 32abd4cd56 Implemented hdf5_file_reader.py and updated register.yaml and hdf5_writer.py. This replaces previous function __copy_file_in_group(). 2025-02-25 12:25:15 +01:00
florez_j 295b43a89a Merge branch 'main' into 'feature/DB_for_FileReader_Repo'
# Conflicts:
#   instruments/filereader_registry.py
#   pipelines/data_integration.py
#   src/hdf5_writer.py
2025-02-25 10:41:02 +01:00
florez_j db4bb0ef03 Implemented create_hdf5_from_filesystem_new() using new instrument readers cml interface and subprocesses. This facilitates extension of file reading capabilities by collaborators without requiring changes to file_registry.py. Only additions in folders and registry.yaml. 2025-02-24 18:48:03 +01:00
florez_j 9511377883 Merge branch 'main' of https://gitlab.psi.ch/5505-public/dima 2025-02-22 18:02:45 +01:00
florez_j 1e67745fa4 Fix import for filereader_registry.py after moving it from intruments/readers/ one level above. 2025-02-22 17:59:00 +01:00
florez_j 821d314cb6 Change import statements with try except to enable explicit import of submodules from import to avoid conflicts with parent project. 2025-02-22 17:10:53 +01:00
florez_j 8ce6f588dc Implement data_lineage_metadata.json detection and then use it to annotate associated file. 2025-02-10 15:56:34 +01:00
florez_j 0d26777732 Enable instrumentFolder of form <instFolder>/<category>/ to be trasfered without flatenning 2025-02-07 16:24:21 +01:00
florez_j b374de60f3 Add try except block to trigger errors for invalid group names. 2025-02-06 16:07:45 +01:00
florez_j 5d0ab4603f Add property to extracted dataset as dataframe. Now time column is of datetime type to facilitate downstream procesing. 2025-02-04 17:23:32 +01:00
florez_j 6fae139360 Implement method in hdf5 manager to infer datetime column in dataset 2025-02-04 17:13:01 +01:00
florez_j 32bba4239a Synch with remote repo 2025-02-03 10:31:48 +01:00
florez_j ef66d8f1c2 Update unload operation to remove reference and fix logic error to dataset metadata extraction. 2025-01-24 10:28:43 +01:00
florez_j 23e0ced190 Relocated to visualization module 2024-12-02 15:39:41 +01:00
florez_j c373c18062 Moved hdf5_lib.py to visualization folder 2024-12-02 15:34:44 +01:00
florez_j 11ca454b94 Removed bacause some of the functionalities have been outsourced to other modules src/hdf5_ops.py and src/hdf5_writer.py 2024-11-26 11:55:06 +01:00
florez_j 5c61e2391a Update to DIMA package path resolution from file. 2024-11-24 19:45:18 +01:00
florez_j 1174ffc8b8 Commented out metadata info about group members for a given group. This is to simplify yaml or json representation of the metadata. 2024-11-24 15:57:54 +01:00
florez_j 967be876d1 Moved func create_hdf5_file_from_dataframe() from hdf5_lib_part2 into hdf5_write.py 2024-11-24 11:30:08 +01:00
florez_j 0330773f08 Moved read_mtable_as_dataframe(filename) to src/hdf5_ops.py 2024-11-24 11:03:44 +01:00
florez_j b24d33ab15 Check whether h5 file being written exists. If so, we do not overwrite it because it may be underdoing refinement, changes or updates, for archiving, sharing, or publishing. 2024-11-24 10:38:13 +01:00
florez_j 6701bc06ad Added read_mtable_as_dataframe(filename) back so that jupyter notebook can use it to demonstrate some functionality 2024-11-23 16:31:29 +01:00
florez_j 1be4b8493a Improved progress description stdout 2024-11-10 18:21:00 +01:00
florez_j e2fec03d4a Included cli commands update and serialize to simplify running metadata revision pipeline. 2024-10-29 07:56:43 +01:00
florez_j cc96672245 Moved git related operations from pipelines/ to src/git_ops.py 2024-10-28 16:30:34 +01:00
florez_j 69b73c26b0 Corrected import statements due to dependency name changes 2024-10-17 16:52:42 +02:00
florez_j 7c60193aa6 Renamed module: src/hdf5_lib.py -> src/hdf5_writer.py 2024-10-17 10:53:51 +02:00
florez_j 44073e3816 Replaced read_dataset_from_hdf5file(hdf5_file_path, dataset_path) with HDF5DataOpsManager.extract_dataset_as_dataframe(self,dataset_name) 2024-10-17 10:46:19 +02:00
florez_j 2a9d69c757 Robustified metadata and dataset extraction methods by requiring explicit load of file obj before their use. Renamed a few functions and fixed types in print statements. 2024-10-10 11:28:23 +02:00
florez_j 6be3b31247 Renamed open_file() --> load_file_obj() and close_file() --> unload_file_obj() to focus more on the management operations on the files that actual file handling operations. 2024-10-10 10:47:44 +02:00
florez_j 568f747a69 Robustifed metadata revision methods with error detection conditions and try-except statements. Metadata revision methods now do not have the capability of opening the file. 2024-10-10 10:39:10 +02:00
florez_j fe96134383 Fixed bug in HDF5DataOpsManager.append_dataset() and added 'creation_date' metadata attribute when instrument (groups) are created. 2024-10-09 16:06:44 +02:00
florez_j c321a17943 Fixed bug, causing input_path normalization operation to damage Windows network drive paths. Basically, os.path.normpath(path_to_input_directory).strip(os.sep) replaced by os.path.normpath(path_to_input_directory).rstrip(os.sep) 2024-10-07 16:16:12 +02:00
florez_j 89e9dd9ab1 Fixed bugs in update_file() method and create_hdf5_file_from_filesystem_path() 2024-10-03 09:32:25 +02:00
florez_j 9b5d777a5b Added .update_file() method, which enables complementary data structure updates to existing file with same name as append_dir's head. 2024-10-02 14:38:35 +02:00
florez_j aad0a7c3fb Added file openning mode as input parameter. Now, mode can only take values in ['w','r+'] 2024-10-02 13:54:59 +02:00
florez_j 4420f81642 Removed construct_attributes_dict(attrs_obj) and replaced by {key: utils.to_serializable_dtype(val) for key, val in obj.attrs.items()} 2024-10-01 10:42:20 +02:00
florez_j 4d48e84e50 Made two helper functions private by adding the prefix __ 2024-10-01 09:31:41 +02:00
florez_j 8cd4b7d925 Deleted annotate_root_dir(filename,annotation_dict: dict), and outsourced functionality to HDF5DataOpsManager.append_metadata() or .update_metadata() at obj_name = '/' 2024-10-01 09:19:14 +02:00
florez_j 6f5d4adcee Implemented metadata append, rename, delete, and update operations on the hdf5 manager object and refactored metadata update script based on yaml file to use said operations. 2024-09-30 16:32:39 +02:00
florez_j 96dad0bfb1 Renamed to_yaml() as serialize_metadata() and introduce input parameter output_format, which allows yaml or json. 2024-09-26 16:23:09 +02:00
florez_j 85b0e5ab74 Performed a few function relocations and deletions from src/hdf5_lib.py into src/hdf5_ops.py and made a copy of previous version as src/hdf5_lib_part2.py 2024-09-26 15:13:31 +02:00
florez_j a92660049f Moved is_structured_array() and to_serializable_dtype() to utils, ranamed a few functions and propagated changes to dependent modules. 2024-09-26 14:03:11 +02:00
florez_j a57e46d89c Renamed take_yml_snapshot_of_hdf5_file func as to_yaml func 2024-09-25 16:49:44 +02:00
florez_j 7b221599d8 Moved take_yml_snapshot_of_hdf5_file func and associted helper functions from hdf5_vis.py into hdf5_ops.py 2024-09-25 16:42:44 +02:00