Data Structure Conversion

utils.g5505_utils.augment_with_filenumber(df)[source]
utils.g5505_utils.augment_with_filetype(df)[source]
utils.g5505_utils.convert_attrdict_to_np_structured_array(attr_value: dict)[source]

Converts a dictionary of attributes into a numpy structured array for HDF5 compound type compatibility.

Each dictionary key is mapped to a field in the structured array, with the data type (S) determined by the longest string representation of the values. If the dictionary is empty, the function returns ‘missing’.

Parameters

attr_valuedict

Dictionary containing the attributes to be converted. Example: attr_value = {

‘name’: ‘Temperature’, ‘unit’: ‘Celsius’, ‘value’: 23.5, ‘timestamp’: ‘2023-09-26 10:00’

}

Returns

new_attr_valuendarray or str

Numpy structured array with UTF-8 encoded fields. Returns ‘missing’ if the input dictionary is empty.

utils.g5505_utils.convert_dataframe_to_np_structured_array(df: DataFrame)[source]
utils.g5505_utils.convert_string_to_bytes(input_list: list)[source]

Convert a list of strings into a numpy array with utf8-type entries.

Parameters

input_list (list) : list of string objects

Returns

input_array_bytes (ndarray): array of ut8-type entries.

utils.g5505_utils.copy_directory_with_contraints(input_dir_path, output_dir_path, select_dir_keywords=None, select_file_keywords=None, allowed_file_extensions=None, dry_run=False)[source]

Copies files from input_dir_path to output_dir_path based on specified constraints.

Parameters

input_dir_path (str): Path to the input directory. output_dir_path (str): Path to the output directory. select_dir_keywords (list): optional, List of keywords for selecting directories. select_file_keywords (list): optional, List of keywords for selecting files. allowed_file_extensions (list): optional, List of allowed file extensions.

Returns

path_to_files_dict (dict): dictionary mapping directory paths to lists of copied file names satisfying the constraints.

utils.g5505_utils.created_at(datetime_format='%Y-%m-%d %H:%M:%S')[source]
utils.g5505_utils.group_by_df_column(df, column_name: str)[source]

df (pandas.DataFrame): column_name (str): column_name of df by which grouping operation will take place.

utils.g5505_utils.infer_units(column_name)[source]
utils.g5505_utils.is_callable_list(x: list)[source]
utils.g5505_utils.is_str_list(x: list)[source]
utils.g5505_utils.is_structured_array(attr_val)[source]
utils.g5505_utils.make_file_copy(source_file_path, output_folder_name: str = 'tmp_files')[source]
utils.g5505_utils.progressBar(count_value, total, suffix='')[source]
utils.g5505_utils.sanitize_dataframe(df: DataFrame) DataFrame[source]
utils.g5505_utils.setup_logging(log_dir, log_filename)[source]

Sets up logging to a specified directory and file.

Parameters:

log_dir (str): Directory to save the log file. log_filename (str): Name of the log file.

utils.g5505_utils.split_sample_col_into_sample_and_data_quality_cols(input_data: DataFrame)[source]
utils.g5505_utils.to_serializable_dtype(value)[source]

Transform value’s dtype into YAML/JSON compatible dtype

Parameters

value_type_

_description_

Returns

_type_

_description_