6701bc06adAdded read_mtable_as_dataframe(filename) back so that jupyter notebook can use it to demonstrate some functionalityFlorez Ospina Juan Felipe2024-11-23 16:31:29 +01:00
fd92bce802Implemented sanitize dataframe function to deal with 'O' which may have numbers or strings detected as string types. Then we use it prior to convert dataframe into structured numpy array.Florez Ospina Juan Felipe2024-11-23 16:28:49 +01:00
8d17bf267cMajor code refactoring and simplifications to enhance modularity. Included a command line interface.Florez Ospina Juan Felipe2024-11-01 09:52:41 +01:00
e2fec03d4aIncluded cli commands update and serialize to simplify running metadata revision pipeline.Florez Ospina Juan Felipe2024-10-29 07:56:43 +01:00
3f7a089a28Fixed bug: to_serializable_dtype() did not identify correctly dtype of array's entries with object dtypeFlorez Ospina Juan Felipe2024-10-28 18:49:22 +01:00
44073e3816Replaced read_dataset_from_hdf5file(hdf5_file_path, dataset_path) with HDF5DataOpsManager.extract_dataset_as_dataframe(self,dataset_name)Florez Ospina Juan Felipe2024-10-17 10:46:19 +02:00
f1b2c64f66Fixed bug when file reader not available. File reader registry now returns a reade that maps input to None.Florez Ospina Juan Felipe2024-10-14 16:03:03 +02:00
2a330fcf92Added 'filename_format' attribute to YAML schema. It takes as value a string of comma separated keys from available attributes in YAML file.Florez Ospina Juan Felipe2024-10-14 16:01:24 +02:00
2a9d69c757Robustified metadata and dataset extraction methods by requiring explicit load of file obj before their use. Renamed a few functions and fixed types in print statements.Florez Ospina Juan Felipe2024-10-10 11:28:23 +02:00
6be3b31247Renamed open_file() --> load_file_obj() and close_file() --> unload_file_obj() to focus more on the management operations on the files that actual file handling operations.Florez Ospina Juan Felipe2024-10-10 10:47:44 +02:00
568f747a69Robustifed metadata revision methods with error detection conditions and try-except statements. Metadata revision methods now do not have the capability of opening the file.Florez Ospina Juan Felipe2024-10-10 10:39:10 +02:00
31c9db98caChanged datetime format output of created_at() function as '%Y-%m-%d %H:%M:%S.%f'Florez Ospina Juan Felipe2024-10-09 16:07:40 +02:00
fe96134383Fixed bug in HDF5DataOpsManager.append_dataset() and added 'creation_date' metadata attribute when instrument (groups) are created.Florez Ospina Juan Felipe2024-10-09 16:06:44 +02:00
9a3bf77f37Created file reader for acsm tofware files, updated registry and updated yaml file with instrument specific terms and reader config params.Florez Ospina Juan Felipe2024-10-07 16:18:14 +02:00
c321a17943Fixed bug, causing input_path normalization operation to damage Windows network drive paths. Basically, os.path.normpath(path_to_input_directory).strip(os.sep) replaced by os.path.normpath(path_to_input_directory).rstrip(os.sep)Florez Ospina Juan Felipe2024-10-07 16:16:12 +02:00
dc7f156367Updated README.md with guide for intrument dependent file reader extensions and updated TODO.md with pending tasks.Florez Ospina Juan Felipe2024-10-03 11:31:51 +02:00
098a79531cAdded new instrument (flagging app) file reading capabilities. It includes two files a flag_reader.py that takes flag.json files produced by the app into a standard intermidiate representation, and a yaml file with instrument dependent description terms. Last, we modified the filereader_registry.py to find the new instrument file reader.Florez Ospina Juan Felipe2024-10-03 09:07:06 +02:00
9b5d777a5bAdded .update_file() method, which enables complementary data structure updates to existing file with same name as append_dir's head.Florez Ospina Juan Felipe2024-10-02 14:38:35 +02:00
aad0a7c3fbAdded file openning mode as input parameter. Now, mode can only take values in ['w','r+']Florez Ospina Juan Felipe2024-10-02 13:54:59 +02:00
4420f81642Removed construct_attributes_dict(attrs_obj) and replaced by {key: utils.to_serializable_dtype(val) for key, val in obj.attrs.items()}Florez Ospina Juan Felipe2024-10-01 10:42:20 +02:00
8cd4b7d925Deleted annotate_root_dir(filename,annotation_dict: dict), and outsourced functionality to HDF5DataOpsManager.append_metadata() or .update_metadata() at obj_name = '/'Florez Ospina Juan Felipe2024-10-01 09:19:14 +02:00
6f5d4adceeImplemented metadata append, rename, delete, and update operations on the hdf5 manager object and refactored metadata update script based on yaml file to use said operations.Florez Ospina Juan Felipe2024-09-30 16:32:39 +02:00
96dad0bfb1Renamed to_yaml() as serialize_metadata() and introduce input parameter output_format, which allows yaml or json.Florez Ospina Juan Felipe2024-09-26 16:23:09 +02:00
85b0e5ab74Performed a few function relocations and deletions from src/hdf5_lib.py into src/hdf5_ops.py and made a copy of previous version as src/hdf5_lib_part2.pyFlorez Ospina Juan Felipe2024-09-26 15:13:31 +02:00
a92660049fMoved is_structured_array() and to_serializable_dtype() to utils, ranamed a few functions and propagated changes to dependent modules.Florez Ospina Juan Felipe2024-09-26 14:03:11 +02:00
7b221599d8Moved take_yml_snapshot_of_hdf5_file func and associted helper functions from hdf5_vis.py into hdf5_ops.pyFlorez Ospina Juan Felipe2024-09-25 16:42:44 +02:00
1e93a2c552Moved take_yml_snapshot_of_hdf5_file func and associted helper functions from hdf5_vis.py into hdf5_ops.pyFlorez Ospina Juan Felipe2024-09-25 16:40:16 +02:00
df2f7b3de6Abstracted reusable steps in integration_sources as dima_pipeline.py and added functionality to make a collection of hdf5 files, where each represents an single experiment of campaign.Florez Ospina Juan Felipe2024-09-25 15:23:23 +02:00
1e1499c28aRobustified definition of path_to_input_dir arg or parameter by ensuring is always defined using forward slashes and then is normalized to the os specification. Improved dry run = True of copy directory func.Florez Ospina Juan Felipe2024-09-25 15:12:19 +02:00
0dbec94374Fixed instrument_dir estimation to be bottom up, ie, based on path to file. Otherwise, it does not work when dima used as submoduleFlorez Ospina Juan Felipe2024-09-19 15:47:11 +02:00
2dd033bcb3Refactored code into functions to parse and validate yaml condif file and to perform specified data integration task using a pipeline like software structure.Florez Ospina Juan Felipe2024-09-17 15:28:11 +02:00
d63f522588Major update. Remove file filtering option and outputname input arg. The output name is now the same as the path_to_input_dir + .h5. By default, the hdf5 writer preserves second level subdirectories and the rest are flattend. dir filtering is outsource to copy_dir_with_constraints from utils-Florez Ospina Juan Felipe2024-09-16 16:35:09 +02:00
9c641c0daeRestructured a bit to include the default case of copying an imnput directory without any constraints. Also, added dry_run input argument that returns a path to files dict representation of output directory without making an actual copy. Useful when input directory is already safe to work with directlyFlorez Ospina Juan Felipe2024-09-16 15:38:30 +02:00
7a9f7a8c59Renamed parameter 'input_file_system_path' to 'path_to_input_directory' for clarity.Florez Ospina Juan Felipe2024-09-16 14:24:55 +02:00
9789d312f9Removed and splitted into instruments/readers/filereader_registry.py instruments/readers/g5505_text_reader.py instruments/readers/xps_ibw_reader.pyFlorez Ospina Juan Felipe2024-08-23 16:09:04 +02:00
d866c8f9f9Split instruments/readers/g5505_file_reader.py into a fileregistry.py and independent file readers. This is to improve instrument modularity and additionsFlorez Ospina Juan Felipe2024-08-23 16:06:44 +02:00
d985115125Integrated copy h5 file into group functionality, imported from g5505_file_readerFlorez Ospina Juan Felipe2024-08-23 15:47:04 +02:00
a4b7c6a8b0Moved copy_file_in_group() into hdf5_lib.py because it is not really doing the same role of all filereadersFlorez Ospina Juan Felipe2024-08-23 15:45:32 +02:00
b5c200d588Moved all yaml files with dictionary terms for each instrument to dictionaries folderFlorez Ospina Juan Felipe2024-08-23 14:32:23 +02:00
a0f44a1f4bMoved src/g5505_file_reader.py -> instruments/readers/g5505_file_reader.py to increase modularity with respect to new intrument additions.Florez Ospina Juan Felipe2024-08-23 10:11:29 +02:00
d7fc38abd9Moved get_parent_relationships func into hdf5_vis.py and cleaned up unused import statementsFlorez Ospina Juan Felipe2024-08-22 09:50:26 +02:00
05d1133e32Moved get_parent_child_relationships() funct from hdf5_lib.py tinto hdf5_vis.py to avoid circular dependency between the lower level and higher level module. Thus removed also src.hdf5_lib.py import statement.Florez Ospina Juan Felipe2024-08-22 09:47:57 +02:00
bb250e9940Implemented method to reformat a given column in a datatable holding datetime info into a desired datetime format. During data integration this will serve to normalize datatime formats across data tablesFlorez Ospina Juan Felipe2024-08-16 08:08:28 +02:00
062a688f47Added method to reformat columns containing datetime byte strings into a desired datetime formated objectFlorez Ospina Juan Felipe2024-08-14 16:22:28 +02:00
c876e925a7Modified code to point to new instrument folders location. Also, upgrated code to accept either a user specified location or the default locationFlorez Ospina Juan Felipe2024-08-12 13:40:01 +02:00
8f7f14ab68Removed time stamp configuration attributes from ACSM_TOFWARE, because it can be messy for a configuration file.Florez Ospina Juan Felipe2024-08-08 11:24:41 +02:00
ae1e3bfc23Moved ext_to_reader_dict to g5505_file_reader.py and replaced redear selection based on g5505_reader.select_file_reader(hdf5_file_path).Florez Ospina Juan Felipe2024-08-07 16:30:36 +02:00
4e669b3eeeMoved hdf5_file_path to file reader mapping and extension definitions to g5505_file_reader_module.py. Created functions to compute file_reader key from path to file in the hdf5 file and select the reader based on the key. This should enable more modular file reader selection.Florez Ospina Juan Felipe2024-08-07 16:21:22 +02:00
3430627494Modified reader to output table_preamble as a dataset as opposed to attributes of the file. I believe this is better for readability of the metadata given that those preambles can sometimes contain large ammounts of text.Florez Ospina Juan Felipe2024-08-02 14:37:06 +02:00
a06e28291cAdded attribution insertion order tracking at the root level and reorganized a few import statements.Florez Ospina Juan Felipe2024-07-17 08:41:40 +02:00
f04f5eaaf9Robustified column name to description assigment, however it may be a bit slower than before.Florez Ospina Juan Felipe2024-07-10 13:31:47 +02:00