155 lines
4.2 KiB
Markdown
155 lines
4.2 KiB
Markdown
# Configuration Guide
|
|
|
|
## Overview
|
|
|
|
SP2-XR uses configuration files to handle different instrument versions, data schemas, calibrations, and processing parameters.
|
|
|
|
## Configuration System Structure
|
|
|
|
### Primary Locations
|
|
- **`config/`** - Essential templates and examples
|
|
- **Auto-generated configs** - Created by `sp2xr_generate_config.py` script
|
|
|
|
### Recommended Workflow
|
|
1. **Auto-generate** data schema and instrument settings from your actual data
|
|
2. **Copy and customize** pipeline template for your workflow
|
|
3. **Validate** with small datasets before full processing
|
|
|
|
### Configuration File Types
|
|
|
|
## 1. Configuration Generation Tools
|
|
|
|
**Features:**
|
|
- Automatically detects PbP and HK files (CSV, ZIP, Parquet)
|
|
- Analyzes actual data to determine column names and types
|
|
- Creates mapping templates for column standardization
|
|
- Creates a .yaml file with the instrument settings
|
|
|
|
### Auto-Generated Schemas
|
|
|
|
```bash
|
|
# Generate schema with instrument settings (recommended)
|
|
python scripts/sp2xr_generate_config.py /path/to/your/data \
|
|
--mapping \
|
|
--schema-output my_data_schema.yaml \
|
|
--instrument-output my_instrument_settings.yaml
|
|
```
|
|
|
|
**Benefits of auto-generation:**
|
|
- Analyzes actual data files to detect correct column names and types
|
|
- Automatically finds and validates INI calibration files
|
|
- Creates instrument settings with source traceability
|
|
- Handles different SP2-XR instrument versions automatically
|
|
|
|
### Generated Schema Structure
|
|
The auto-generated schema includes:
|
|
|
|
```yaml
|
|
# Data type definitions (detected from your files)
|
|
pbp_schema:
|
|
Time (sec): float
|
|
Scatter Size (nm): float
|
|
# ... all columns from your data
|
|
|
|
# Standard column names used by SP2-XR package
|
|
pbp_canonical_schema:
|
|
Time (sec): float
|
|
Scatter Size (nm): float
|
|
# ... canonical column definitions
|
|
|
|
# Column mapping (your files → canonical names)
|
|
pbp_column_mapping:
|
|
Time (sec): "Time (sec)" # Exact match
|
|
Scatter Size (nm): "Size_nm" # Maps your column name
|
|
```
|
|
|
|
### Auto-Generated Instrument Settings
|
|
The generation script automatically converts INI calibration files to structured YAML format with:
|
|
|
|
- **Metadata**: Source file path, generation timestamp, traceability
|
|
- **Instrument parameters**: All settings from INI file
|
|
- **Signal saturation**
|
|
|
|
```yaml
|
|
metadata:
|
|
source_ini_file: /full/path/to/calibration.ini
|
|
generated_on: '2024-01-01T12:00:00'
|
|
generated_by: sp2xr_generate_config.py
|
|
|
|
instrument_parameters:
|
|
ScattTransitMin: 10.0
|
|
IncTransitMin: 5.0
|
|
# ... all INI parameters
|
|
|
|
```
|
|
|
|
## 2. Main Data Processing Configurations
|
|
|
|
### Comprehensive Template
|
|
Copy and customize the complete pipeline template:
|
|
|
|
```bash
|
|
# Copy the template
|
|
cp config/complete_example.yaml my_campaign_pipeline.yaml
|
|
|
|
# Edit for your specific needs
|
|
nano my_campaign_pipeline.yaml
|
|
```
|
|
|
|
### Pipeline Configuration Structure
|
|
The pipeline config includes all processing settings:
|
|
|
|
```yaml
|
|
# File paths
|
|
paths:
|
|
input_pbp: /path/to/SP2XR_pbp_parquet
|
|
input_hk: /path/to/SP2XR_hk_parquet
|
|
output: /path/to/SP2XR_processed_output
|
|
instrument_config: /path/to/instrument_settings.yaml
|
|
|
|
# Workflow settings
|
|
workflow:
|
|
conc: true # Calculate concentrations
|
|
BC_hist: true # BC mass distributions
|
|
scatt_hist: true # Scattering size distributions
|
|
dt: 60 # Time resolution (seconds)
|
|
|
|
# Computing resources
|
|
cluster:
|
|
use_local: false # true for local, false for SLURM
|
|
cores: 16 # CPU cores
|
|
memory: 128GB # Memory allocation
|
|
|
|
# Analysis parameters
|
|
histo:
|
|
inc: # BC mass histograms
|
|
min_mass: 0.3
|
|
max_mass: 400
|
|
n_bins: 50
|
|
scatt: # Scattering histograms
|
|
min_D: 100
|
|
max_D: 500
|
|
n_bins: 20
|
|
|
|
# Calibration parameters
|
|
calibration:
|
|
incandescence:
|
|
curve_type: "polynomial"
|
|
parameters: [0.05, 2.047e-07]
|
|
scattering:
|
|
curve_type: "powerlaw"
|
|
parameters: [17.22, 0.169, -1.494]
|
|
```
|
|
|
|
**Key sections to customize:**
|
|
- **`paths`** - Update all file and directory paths
|
|
- **`calibration`** - Use parameters from your instrument settings
|
|
- **`cluster`** - Match your computing environment
|
|
- **`workflow.dt`** - Set appropriate time resolution
|
|
- **`histo`** - Configure size/mass distribution bins
|
|
|
|
|
|
|
|
|
|
|