Commit Graph

107 Commits

Author SHA1 Message Date
02e913f24d update gitignore 2025-09-29 11:10:19 +02:00
00464994f4 Update dask client definition with serializer and deserializer 2025-09-29 11:08:40 +02:00
c3f23a873a Update the delete partition function to delete also the general metadata to prevent errors when rewriting parquet files 2025-09-29 11:07:20 +02:00
8d4e24c29b Update the helper function to reflect changes in run_config to process time slices of a dataset 2025-09-29 11:05:34 +02:00
b97be2dff3 Remove the dask client restart at the beginning of each time chunk processing 2025-09-29 11:03:13 +02:00
da275cdc97 feat: adapt sp2xr_pipeline and helpers to run multiple times across the same dataset and different time slots but ensuring config settings are the same for the entire dataset 2025-09-29 11:00:14 +02:00
203bd9d740 feat: add to run_config.yaml number of processes for the dask cluster and option to select start and end dates for processing 2025-09-29 10:09:04 +02:00
af814498bf Update .gitignore 2025-09-29 09:56:11 +02:00
872d2c5ac4 fix: Add retry logic and consistent partitioning for distributed processing 2025-09-12 10:29:15 +02:00
d3a7448883 Improve conversion from original csv or zip files to parquet with more robust schema definition 2025-09-11 16:18:21 +02:00
0829f1908e fix: fix import in sp2xr_pipeline.py after the changes in the calibration modules 2025-09-11 14:54:07 +02:00
0a71ca614c feat: modernize all type 2025-09-11 14:49:48 +02:00
a2df98042c refactor: reorganize calibration modules and add type hints 2025-09-11 14:45:06 +02:00
755656f8c7 Update pyproject.toml 2025-09-11 12:36:21 +02:00
a0666be19f Remove example_processing_code.py 2025-09-11 12:01:39 +02:00
641871a567 test: add test for path extraction from file directory 2025-09-09 19:14:30 +02:00
6621236ea4 fix: correct handling of file path structures in different operating systems 2025-09-09 19:13:14 +02:00
f437b1c5fe feat: add the possibility to decide the saving partition schema between date or date/hour 2025-09-09 17:09:38 +02:00
29e2351341 Feat: user can now decide frequency of repartition of dask dataframes after being loaded (both hk and pbp) 2025-09-09 16:03:42 +02:00
e946d4ff94 Chore: remove the scattering of the mass, size, time delay bins across dask workers outside of the client definition 2025-09-09 15:07:47 +02:00
b377c36c28 Chore: cleanup old code 2025-09-09 15:03:44 +02:00
3a41fbf387 fix: fix bug that was leading to extremely large dask graphs and move all histogram calcualtion logic to the distributio.py module 2025-09-09 14:53:19 +02:00
b91380a6db fix: increase Dask worker wait time to prevent premature shutdown 2025-09-09 14:43:14 +02:00
0e932e9a70 test: add parquet files to use for testing 2025-09-09 14:25:13 +02:00
0268a5460c fix: fix parquet saving of distributions (specify engine, write metadata, ...) 2025-08-25 15:07:13 +02:00
21e14ae2f1 chore: update run config 2025-08-22 18:20:26 +02:00
c1243e3b1e feat: improve cluster shutdown and cleanup logic 2025-08-22 18:18:44 +02:00
8a0c1f3305 feat: add TMPDIR for temporary files in cluster jobs 2025-08-22 18:15:39 +02:00
7bd42c22a2 feat: add sbatch file to run via Slurm 2025-08-22 18:13:50 +02:00
55efbc74a2 Cleanup: remove wreck code from scripts/sp2xr_apply_calibration.py 2025-08-22 16:43:58 +02:00
3173a7c83b Cleanup: remove wreck code from src/sp2xr/resample_pbp_hk.py 2025-08-22 16:42:02 +02:00
2da9eb6089 Cleanup: remove wreck code from src/sp2xr/helpers.py 2025-08-22 16:40:26 +02:00
d2a0533a12 Cleanup: remove wreck code from src/sp2xr/schema.py 2025-08-22 16:38:06 +02:00
4696c2cbb9 Cleanup: Remove wreck code from sp2xr_pipeline 2025-08-22 16:36:08 +02:00
681de6c203 Fix: add missing import of cast_and_arrow function to sp2xr_pipeline 2025-08-22 16:31:03 +02:00
40bae0e5d2 Refactor: move function _cast_and_arrow to schema.py 2025-08-22 16:19:42 +02:00
2704128e8d Test notebook moved out of project 2025-08-22 16:11:18 +02:00
43c31728e0 Fix: add control to skip empty ddf when time chunk is empty 2025-08-22 16:06:05 +02:00
8947046049 Feature: remove saving of ddf_pbp_hk_dt files 2025-08-22 16:03:43 +02:00
547c7f3108 Fix: removed default values in parse_args to avoid unexpected behavior when passing run_config settings 2025-08-22 16:01:01 +02:00
9c384f5245 Fix: Removed default values in the load_and_resolve_config function to avoid unexpected behavior when run_config doesn't provide settings 2025-08-22 15:59:48 +02:00
f42a308474 Feature: processing is now divided in time chunks to reduce size of dask graph 2025-08-22 11:37:58 +02:00
d6b3f2028f Fix: bug fixed in conversion from BC mass to diam due to density units mismatch in the config file and default values 2025-08-22 10:48:08 +02:00
bbd21ba7b9 Chore: add test data folder to gitignore 2025-08-21 11:58:39 +02:00
bf0e663449 Test: add temporary notebook for tests 2025-08-21 11:57:51 +02:00
ebd14bcbae Fix: typo in the config reading was blocking calibration 2025-08-21 11:39:11 +02:00
d7f778d531 Fix: the hisotgrams were adding lines with NaNs when the corresponding partition was completely empty. Now it is back to old behavior and no index is added for partitions completely empty. 2025-08-21 11:37:23 +02:00
063b01e73f Test: PbP and HK parquet files added for testing 2025-08-20 19:08:28 +02:00
a2cc520ff2 feat: possibility to choose between running locally and via slurm cluster 2025-08-14 11:48:52 +02:00
18b8635147 chore: moved config files part 2 2025-08-14 10:42:44 +02:00