Updated Readme

This commit is contained in:
2024-11-09 15:22:41 +01:00
parent 95e8a76bd9
commit 076ab19617

View File

@ -1,4 +1,51 @@
# SP2XR
This repository contains python functions and template scripts to analyze SP2-XR data with the Python library Dask.
To be written
## Suggested structure for the data
- Campaign_name
- **SP2XR_files**
- 20200415
(from here on it is the usual SP2XR structure, no need to unzip the files)
- **SP2XR_pbp_parquet**
Directory automatically generated by `read_csv_files_with_dask()`, it contains the pbp files converted to parquet and organized by date and hour. Single file names correspond to the original file names.
- **SP2XR_hk_parquet**
Directory automatically generated by `read_csv_files_with_dask()`, it contains the pbp file converted to parquet and organized by date and hour. Single file names correspond to the original file names.
- **SP2XR_pbp_processed**
Directory automatically generated by `process_pbp_parquet`. It contains the processed pbp files (calibration applied, various distributions calculated, mixing state calculated, ...). By default the data are grouped to 1s time resolution.
- **SP2XR_pbp_processed_1min**
Directory automatically generated by `resample_to_dt`. It contains files at the same process level of SP2XR_pbp_processed but at the specified time resolution.
# Suggested structure for the converted
❔ Do you want to use git to track your analysis code?
- **Yes!**
1. Create a repository for the analysis of your specific dataset (from here on this is referred as main repository)
2. Add the SP2XR repository (this repository) to your main repository as submodule
3. Copy the template file `processing_code.py` from the submodule to your main repository.
4. Modify the template `processing_code.py` according to your needs (file paths, time resolution, calibration values, ...)
- **No, thanks.**
1. Download this repository and place it in the same directory of your analysis scripts.
2. Modify the template `processing_code.py` according to your needs (file paths, time resolution, calibration values, ...)
# How to run `processing_code.py`
The code contains several blocks for the different processing steps. This division helps to truble shoot possible problems in file reading/writing.
The different blocks correspond to these actions:
1. Define paths and other variables
2. From .csv/.zip to parquet for
a. PbP files
b. HK files
3. Analysis of the single particle data (pbp data, not raw traces) to user defined time resolution. Operations performed in this block:
- read the specified config file (!! for the moment I assume that all the files processed in one go will have the same config parameters)
- Apply scattering and incadescence calibration parameters
- Flag data according to config file parameters (e.g., `Incand Transit Time`, `Incand FWHM`, `Incand relPeak`, ...)
- Flag data according to mixing state (see dedicated section)
- Resample pbp to the specified time (usually and suggested is 1s). Some columns are summed (e.g., BC mass for BC mass conc) and some counted (e.g., BC mass for BC numb conc and Opt D of purely scattering particles for their # conc).
- Resample the flows columns in the hk to the same specified time
- Create a joint pbp_hk file with the specified time resolution
- Calculate distributions for the different flags. Time resolution is the same as for above computations. Min, max, numb_bins for time delay, [BC mass, BC numb], Scatt numb are defined by the user.
- Merge pbp_hk and distributions in 1 variable and save it as parquet files partitioned on date and hour
4. Resample of the pbp_processed data to another time resolution
5. From .sp2b to parquet. See notes below for the analysis of the raw traces. This block can usually be skipped.
6. Process the sp2b.parquet files