52 lines
2.4 KiB
Markdown
52 lines
2.4 KiB
Markdown
# Data Processing Workflow
|
|
|
|
## Recommended Directory Structure
|
|
```
|
|
Campaign_name/
|
|
├── SP2XR_files/ # Raw instrument files (.csv/.zip)
|
|
│ └── 20200415/ # Organized by date
|
|
├── SP2XR_pbp_parquet/ # Converted particle data
|
|
├── SP2XR_hk_parquet/ # Converted housekeeping data
|
|
└── SP2XR_pbp_processed_1min/ # Calibrated and processed data (user-defined resolution)
|
|
```
|
|
|
|
## Processing Steps
|
|
|
|
### 1. CSV to Parquet Conversion (`sp2xr_csv2parquet.py`)
|
|
- Convert raw CSV/ZIP files to Parquet format
|
|
- Separate particle-by-particle (PbP) and housekeeping (HK) data streams
|
|
- Apply data schema and column mapping transformations
|
|
- Organize by date and hour for efficient querying
|
|
|
|
### 2. Data Loading and Preparation (`sp2xr_pipeline.py`)
|
|
- Load PbP and HK Parquet files with time-based filtering
|
|
- Repartition data for optimal parallel processing
|
|
- Resample housekeeping data to user-defined time resolution (e.g., 1s, 60s)
|
|
|
|
### 3. Calibration and Quality Control
|
|
- Apply scattering and incandescence calibrations to raw particle signals
|
|
- Convert signals to physical units (diameter in nm, mass in fg)
|
|
- Flag particles based on instrument quality control parameters
|
|
- Calculate mixing state classifications using time delay method
|
|
- Merge calibrated particle data with resampled flow measurements
|
|
|
|
### 4. Time Aggregation and Summary Statistics
|
|
- Aggregate particle-by-particle data to time bins (dt resolution)
|
|
- Calculate summary statistics (counts, means) for each time bin
|
|
- Join aggregated PbP data with resampled HK data
|
|
|
|
### 5. Bulk Concentrations (optional, if `conc: true`)
|
|
- Compute number and mass concentrations for different particle types
|
|
- Calculate size-resolved concentrations for different coating states
|
|
- Account for flow rate corrections and sampling efficiency
|
|
|
|
### 6. Size and Mass Distributions (optional, if `*_hist: true`)
|
|
- Compute size distributions (dNdlogDsc) for scattering-only particles
|
|
- Compute mass distributions (dNdlogDmev, dMdlogDmev) for BC-containing particles
|
|
- Calculate time-lag distributions for mixing state analysis
|
|
- User-configurable bin edges and ranges
|
|
|
|
## Known Limitations
|
|
|
|
1. **Sampling frequency**: Currently assumes all BC and BC-free particles are recorded in PbP files
|
|
2. **Distribution calculations**: When using time resolution ≠ 1s, histogram calculations may be incorrect. Process at 1s resolution first, then resample. |