131 Commits

Author SHA1 Message Date
35f2876e62 Update config files 2025-11-21 15:00:26 +01:00
dfcb0a67fc Fix some values 2025-11-21 11:02:58 +01:00
29d4197f45 Add ipykernel and matplotlib as optional dependencies 2025-11-21 10:51:59 +01:00
e362624ee4 Fix data path and improve plot layout 2025-11-21 10:44:03 +01:00
faf255feee Update processed data with correct calibration 2025-11-21 10:36:02 +01:00
10e6a8b31e Add notebook for data visualization 2025-11-21 09:37:58 +01:00
6ab90b1a56 New handling of conversion from CSV/ZIP to Parquet via config file 2025-11-19 17:00:07 +01:00
2a5c4aec17 Remove redundant prints 2025-11-19 16:01:15 +01:00
8d9f5bf852 Int are now inferred as float to handle potential NA values 2025-11-19 15:43:58 +01:00
fc9cf7c861 Remove warnings for log10(0) in histograms calcualtion 2025-11-17 16:44:14 +01:00
5a52ca2fdd Add local option for converting csv to parquet 2025-11-17 16:07:36 +01:00
0717931c6d Update default path for config files 2025-11-17 15:50:06 +01:00
71db651ba4 fix: Fix version number to v2.0.0 2025-09-30 09:19:07 +02:00
cbe0e3c484 Rename calibration example notebook 2025-09-30 01:17:35 +02:00
b50c96ad57 Update version numbergit add pyproject.toml 2025-09-30 01:01:03 +02:00
9f3f50b151 Update toml file to include bokeh for dask dashboard visualization 2025-09-30 00:59:39 +02:00
566b728db2 Update readme and documentation 2025-09-30 00:58:43 +02:00
ba9670b1df Remove old test data files 2025-09-30 00:56:17 +02:00
bea19f5ed8 Reorganize data for testing 2025-09-30 00:51:21 +02:00
ae2a0731c1 Silence error for log10(0) when expected 2025-09-30 00:33:01 +02:00
3431bc8a4d Change date format to string for campatibility with Windows systems 2025-09-30 00:31:59 +02:00
d46f3319f3 Remove toolkit_legacy.py and references to it 2025-09-30 00:27:40 +02:00
cde421edda cleanup config organization 2025-09-30 00:23:12 +02:00
40ba49a61f update gitignore 2025-09-29 11:12:38 +02:00
02e913f24d update gitignore 2025-09-29 11:10:19 +02:00
00464994f4 Update dask client definition with serializer and deserializer 2025-09-29 11:08:40 +02:00
c3f23a873a Update the delete partition function to delete also the general metadata to prevent errors when rewriting parquet files 2025-09-29 11:07:20 +02:00
8d4e24c29b Update the helper function to reflect changes in run_config to process time slices of a dataset 2025-09-29 11:05:34 +02:00
b97be2dff3 Remove the dask client restart at the beginning of each time chunk processing 2025-09-29 11:03:13 +02:00
da275cdc97 feat: adapt sp2xr_pipeline and helpers to run multiple times across the same dataset and different time slots but ensuring config settings are the same for the entire dataset 2025-09-29 11:00:14 +02:00
203bd9d740 feat: add to run_config.yaml number of processes for the dask cluster and option to select start and end dates for processing 2025-09-29 10:09:04 +02:00
af814498bf Update .gitignore 2025-09-29 09:56:11 +02:00
872d2c5ac4 fix: Add retry logic and consistent partitioning for distributed processing 2025-09-12 10:29:15 +02:00
d3a7448883 Improve conversion from original csv or zip files to parquet with more robust schema definition 2025-09-11 16:18:21 +02:00
0829f1908e fix: fix import in sp2xr_pipeline.py after the changes in the calibration modules 2025-09-11 14:54:07 +02:00
0a71ca614c feat: modernize all type 2025-09-11 14:49:48 +02:00
a2df98042c refactor: reorganize calibration modules and add type hints 2025-09-11 14:45:06 +02:00
755656f8c7 Update pyproject.toml 2025-09-11 12:36:21 +02:00
a0666be19f Remove example_processing_code.py 2025-09-11 12:01:39 +02:00
641871a567 test: add test for path extraction from file directory 2025-09-09 19:14:30 +02:00
6621236ea4 fix: correct handling of file path structures in different operating systems 2025-09-09 19:13:14 +02:00
f437b1c5fe feat: add the possibility to decide the saving partition schema between date or date/hour 2025-09-09 17:09:38 +02:00
29e2351341 Feat: user can now decide frequency of repartition of dask dataframes after being loaded (both hk and pbp) 2025-09-09 16:03:42 +02:00
e946d4ff94 Chore: remove the scattering of the mass, size, time delay bins across dask workers outside of the client definition 2025-09-09 15:07:47 +02:00
b377c36c28 Chore: cleanup old code 2025-09-09 15:03:44 +02:00
3a41fbf387 fix: fix bug that was leading to extremely large dask graphs and move all histogram calcualtion logic to the distributio.py module 2025-09-09 14:53:19 +02:00
b91380a6db fix: increase Dask worker wait time to prevent premature shutdown 2025-09-09 14:43:14 +02:00
0e932e9a70 test: add parquet files to use for testing 2025-09-09 14:25:13 +02:00
0268a5460c fix: fix parquet saving of distributions (specify engine, write metadata, ...) 2025-08-25 15:07:13 +02:00
21e14ae2f1 chore: update run config 2025-08-22 18:20:26 +02:00