4.2 KiB
4.2 KiB
Configuration Guide
Overview
SP2-XR uses configuration files to handle different instrument versions, data schemas, calibrations, and processing parameters.
Configuration System Structure
Primary Locations
config/- Essential templates and examples- Auto-generated configs - Created by
sp2xr_generate_config.pyscript
Recommended Workflow
- Auto-generate data schema and instrument settings from your actual data
- Copy and customize pipeline template for your workflow
- Validate with small datasets before full processing
Configuration File Types
1. Configuration Generation Tools
Features:
- Automatically detects PbP and HK files (CSV, ZIP, Parquet)
- Analyzes actual data to determine column names and types
- Creates mapping templates for column standardization
- Creates a .yaml file with the instrument settings
Auto-Generated Schemas
# Generate schema with instrument settings (recommended)
python scripts/sp2xr_generate_config.py /path/to/your/data \
--mapping \
--schema-output my_data_schema.yaml \
--instrument-output my_instrument_settings.yaml
Benefits of auto-generation:
- Analyzes actual data files to detect correct column names and types
- Automatically finds and validates INI calibration files
- Creates instrument settings with source traceability
- Handles different SP2-XR instrument versions automatically
Generated Schema Structure
The auto-generated schema includes:
# Data type definitions (detected from your files)
pbp_schema:
Time (sec): float
Scatter Size (nm): float
# ... all columns from your data
# Standard column names used by SP2-XR package
pbp_canonical_schema:
Time (sec): float
Scatter Size (nm): float
# ... canonical column definitions
# Column mapping (your files → canonical names)
pbp_column_mapping:
Time (sec): "Time (sec)" # Exact match
Scatter Size (nm): "Size_nm" # Maps your column name
Auto-Generated Instrument Settings
The generation script automatically converts INI calibration files to structured YAML format with:
- Metadata: Source file path, generation timestamp, traceability
- Instrument parameters: All settings from INI file
- Signal saturation
metadata:
source_ini_file: /full/path/to/calibration.ini
generated_on: '2024-01-01T12:00:00'
generated_by: sp2xr_generate_config.py
instrument_parameters:
ScattTransitMin: 10.0
IncTransitMin: 5.0
# ... all INI parameters
2. Main Data Processing Configurations
Comprehensive Template
Copy and customize the complete pipeline template:
# Copy the template
cp config/complete_example.yaml my_campaign_pipeline.yaml
# Edit for your specific needs
nano my_campaign_pipeline.yaml
Pipeline Configuration Structure
The pipeline config includes all processing settings:
# File paths
paths:
input_pbp: /path/to/SP2XR_pbp_parquet
input_hk: /path/to/SP2XR_hk_parquet
output: /path/to/SP2XR_processed_output
instrument_config: /path/to/instrument_settings.yaml
# Workflow settings
workflow:
conc: true # Calculate concentrations
BC_hist: true # BC mass distributions
scatt_hist: true # Scattering size distributions
dt: 60 # Time resolution (seconds)
# Computing resources
cluster:
use_local: false # true for local, false for SLURM
cores: 16 # CPU cores
memory: 128GB # Memory allocation
# Analysis parameters
histo:
inc: # BC mass histograms
min_mass: 0.3
max_mass: 400
n_bins: 50
scatt: # Scattering histograms
min_D: 100
max_D: 500
n_bins: 20
# Calibration parameters
calibration:
incandescence:
curve_type: "polynomial"
parameters: [0.05, 2.047e-07]
scattering:
curve_type: "powerlaw"
parameters: [17.22, 0.169, -1.494]
Key sections to customize:
paths- Update all file and directory pathscalibration- Use parameters from your instrument settingscluster- Match your computing environmentworkflow.dt- Set appropriate time resolutionhisto- Configure size/mass distribution bins