|
|
02e913f24d
|
update gitignore
|
2025-09-29 11:10:19 +02:00 |
|
|
|
00464994f4
|
Update dask client definition with serializer and deserializer
|
2025-09-29 11:08:40 +02:00 |
|
|
|
c3f23a873a
|
Update the delete partition function to delete also the general metadata to prevent errors when rewriting parquet files
|
2025-09-29 11:07:20 +02:00 |
|
|
|
8d4e24c29b
|
Update the helper function to reflect changes in run_config to process time slices of a dataset
|
2025-09-29 11:05:34 +02:00 |
|
|
|
b97be2dff3
|
Remove the dask client restart at the beginning of each time chunk processing
|
2025-09-29 11:03:13 +02:00 |
|
|
|
da275cdc97
|
feat: adapt sp2xr_pipeline and helpers to run multiple times across the same dataset and different time slots but ensuring config settings are the same for the entire dataset
|
2025-09-29 11:00:14 +02:00 |
|
|
|
203bd9d740
|
feat: add to run_config.yaml number of processes for the dask cluster and option to select start and end dates for processing
|
2025-09-29 10:09:04 +02:00 |
|
|
|
af814498bf
|
Update .gitignore
|
2025-09-29 09:56:11 +02:00 |
|
|
|
872d2c5ac4
|
fix: Add retry logic and consistent partitioning for distributed processing
|
2025-09-12 10:29:15 +02:00 |
|
|
|
d3a7448883
|
Improve conversion from original csv or zip files to parquet with more robust schema definition
|
2025-09-11 16:18:21 +02:00 |
|
|
|
0829f1908e
|
fix: fix import in sp2xr_pipeline.py after the changes in the calibration modules
|
2025-09-11 14:54:07 +02:00 |
|
|
|
0a71ca614c
|
feat: modernize all type
|
2025-09-11 14:49:48 +02:00 |
|
|
|
a2df98042c
|
refactor: reorganize calibration modules and add type hints
|
2025-09-11 14:45:06 +02:00 |
|
|
|
755656f8c7
|
Update pyproject.toml
|
2025-09-11 12:36:21 +02:00 |
|
|
|
a0666be19f
|
Remove example_processing_code.py
|
2025-09-11 12:01:39 +02:00 |
|
|
|
641871a567
|
test: add test for path extraction from file directory
|
2025-09-09 19:14:30 +02:00 |
|
|
|
6621236ea4
|
fix: correct handling of file path structures in different operating systems
|
2025-09-09 19:13:14 +02:00 |
|
|
|
f437b1c5fe
|
feat: add the possibility to decide the saving partition schema between date or date/hour
|
2025-09-09 17:09:38 +02:00 |
|
|
|
29e2351341
|
Feat: user can now decide frequency of repartition of dask dataframes after being loaded (both hk and pbp)
|
2025-09-09 16:03:42 +02:00 |
|
|
|
e946d4ff94
|
Chore: remove the scattering of the mass, size, time delay bins across dask workers outside of the client definition
|
2025-09-09 15:07:47 +02:00 |
|
|
|
b377c36c28
|
Chore: cleanup old code
|
2025-09-09 15:03:44 +02:00 |
|
|
|
3a41fbf387
|
fix: fix bug that was leading to extremely large dask graphs and move all histogram calcualtion logic to the distributio.py module
|
2025-09-09 14:53:19 +02:00 |
|
|
|
b91380a6db
|
fix: increase Dask worker wait time to prevent premature shutdown
|
2025-09-09 14:43:14 +02:00 |
|
|
|
0e932e9a70
|
test: add parquet files to use for testing
|
2025-09-09 14:25:13 +02:00 |
|
|
|
0268a5460c
|
fix: fix parquet saving of distributions (specify engine, write metadata, ...)
|
2025-08-25 15:07:13 +02:00 |
|
|
|
21e14ae2f1
|
chore: update run config
|
2025-08-22 18:20:26 +02:00 |
|
|
|
c1243e3b1e
|
feat: improve cluster shutdown and cleanup logic
|
2025-08-22 18:18:44 +02:00 |
|
|
|
8a0c1f3305
|
feat: add TMPDIR for temporary files in cluster jobs
|
2025-08-22 18:15:39 +02:00 |
|
|
|
7bd42c22a2
|
feat: add sbatch file to run via Slurm
|
2025-08-22 18:13:50 +02:00 |
|
|
|
55efbc74a2
|
Cleanup: remove wreck code from scripts/sp2xr_apply_calibration.py
|
2025-08-22 16:43:58 +02:00 |
|
|
|
3173a7c83b
|
Cleanup: remove wreck code from src/sp2xr/resample_pbp_hk.py
|
2025-08-22 16:42:02 +02:00 |
|
|
|
2da9eb6089
|
Cleanup: remove wreck code from src/sp2xr/helpers.py
|
2025-08-22 16:40:26 +02:00 |
|
|
|
d2a0533a12
|
Cleanup: remove wreck code from src/sp2xr/schema.py
|
2025-08-22 16:38:06 +02:00 |
|
|
|
4696c2cbb9
|
Cleanup: Remove wreck code from sp2xr_pipeline
|
2025-08-22 16:36:08 +02:00 |
|
|
|
681de6c203
|
Fix: add missing import of cast_and_arrow function to sp2xr_pipeline
|
2025-08-22 16:31:03 +02:00 |
|
|
|
40bae0e5d2
|
Refactor: move function _cast_and_arrow to schema.py
|
2025-08-22 16:19:42 +02:00 |
|
|
|
2704128e8d
|
Test notebook moved out of project
|
2025-08-22 16:11:18 +02:00 |
|
|
|
43c31728e0
|
Fix: add control to skip empty ddf when time chunk is empty
|
2025-08-22 16:06:05 +02:00 |
|
|
|
8947046049
|
Feature: remove saving of ddf_pbp_hk_dt files
|
2025-08-22 16:03:43 +02:00 |
|
|
|
547c7f3108
|
Fix: removed default values in parse_args to avoid unexpected behavior when passing run_config settings
|
2025-08-22 16:01:01 +02:00 |
|
|
|
9c384f5245
|
Fix: Removed default values in the load_and_resolve_config function to avoid unexpected behavior when run_config doesn't provide settings
|
2025-08-22 15:59:48 +02:00 |
|
|
|
f42a308474
|
Feature: processing is now divided in time chunks to reduce size of dask graph
|
2025-08-22 11:37:58 +02:00 |
|
|
|
d6b3f2028f
|
Fix: bug fixed in conversion from BC mass to diam due to density units mismatch in the config file and default values
|
2025-08-22 10:48:08 +02:00 |
|
|
|
bbd21ba7b9
|
Chore: add test data folder to gitignore
|
2025-08-21 11:58:39 +02:00 |
|
|
|
bf0e663449
|
Test: add temporary notebook for tests
|
2025-08-21 11:57:51 +02:00 |
|
|
|
ebd14bcbae
|
Fix: typo in the config reading was blocking calibration
|
2025-08-21 11:39:11 +02:00 |
|
|
|
d7f778d531
|
Fix: the hisotgrams were adding lines with NaNs when the corresponding partition was completely empty. Now it is back to old behavior and no index is added for partitions completely empty.
|
2025-08-21 11:37:23 +02:00 |
|
|
|
063b01e73f
|
Test: PbP and HK parquet files added for testing
|
2025-08-20 19:08:28 +02:00 |
|
|
|
a2cc520ff2
|
feat: possibility to choose between running locally and via slurm cluster
|
2025-08-14 11:48:52 +02:00 |
|
|
|
18b8635147
|
chore: moved config files part 2
|
2025-08-14 10:42:44 +02:00 |
|