updated readme and reader to handle ignore ascii character errors

2024-05-24 15:55:15 +02:00
parent 9329f39deb
commit c7051bfe69
2 changed files with 22 additions and 4 deletions
--- a/README.md
+++ b/README.md
@ -20,11 +20,25 @@ Follow these steps to install and set up the project:
  ```
  conda env create -f environment.yml
  ```
-4. Activate the created environment by running the appropriate command based on your preferred Python IDE or terminal:
+4. Activate the created environment by running the following command:
+  ```
+  conda activate multiphase_chemistry_env
+  ```
+5. Once the enviroment is activated, register the associated kernel in jupyter by running:

-  * Jupyter Notebook/Lab: When starting a new notebook, select the `multiphase_chemistry_env` environment from the kernel options.
+  ```
+  python -m ipykernel install --user --name multiphase_chemistry_env --display-name "Python (multiphase_chemistry_env)"  
+  ```

-  * Visual Studio Code (VS Code): After opening your project in VS Code, click on the Python interpreter in the status bar and choose the `multiphase_chemistry_env` environment.
+
+6. Start a Jupyter Notebook by running the command:
+  ```
+  jupyter notebook
+  ```
+  and select the `multiphase_chemistry_env` environment from the kernel options.
+
+
+7. Otherwise, for Visual Studio Code (VS Code), after opening your project in VS Code, click on the Python interpreter in the status bar and choose the `multiphase_chemistry_env` environment.

 ## Data integration workflow

--- a/src/g5505_file_reader.py
+++ b/src/g5505_file_reader.py
@ -175,8 +175,12 @@ def read_txt_files_as_dict(filename : str ):

    if table_preamble:
        max_length = max(len(item) for item in table_preamble)
+        # Convert the strings to bytes with utf-8 encoding, specifying errors='ignore' to skip characters that cannot be encoded
+        table_preamble_bytes = [item.encode('utf-8', errors='ignore') for item in table_preamble]
+    
+
        utf8_type = h5py.string_dtype('utf-8', max_length)
-        header_dict["table_preamble"] = np.array(table_preamble,dtype=utf8_type) 
+        header_dict["table_preamble"] = np.array(table_preamble_bytes,dtype=utf8_type) 

   
    # TODO: it does not work with separator as none :(. fix for RGA