Configuration Guide

Overview

SP2-XR uses configuration files to handle different instrument versions, data schemas, calibrations, and processing parameters.

Configuration System Structure

Primary Locations

config/ - Essential templates and examples
Auto-generated configs - Created by sp2xr_generate_config.py script

Recommended Workflow

Auto-generate data schema and instrument settings from your actual data
Copy and customize pipeline template for your workflow
Validate with small datasets before full processing

Configuration File Types

1. Configuration Generation Tools

Features:

Automatically detects PbP and HK files (CSV, ZIP, Parquet)
Analyzes actual data to determine column names and types
Creates mapping templates for column standardization
Creates a .yaml file with the instrument settings

Auto-Generated Schemas

# Generate schema with instrument settings (recommended)
python scripts/sp2xr_generate_config.py /path/to/your/data \
  --mapping \
  --schema-output my_data_schema.yaml \
  --instrument-output my_instrument_settings.yaml

Benefits of auto-generation:

Analyzes actual data files to detect correct column names and types
Automatically finds and validates INI calibration files
Creates instrument settings with source traceability
Handles different SP2-XR instrument versions automatically

Generated Schema Structure

The auto-generated schema includes:

# Data type definitions (detected from your files)
pbp_schema:
  Time (sec): float
  Scatter Size (nm): float
  # ... all columns from your data

# Standard column names used by SP2-XR package
pbp_canonical_schema:
  Time (sec): float
  Scatter Size (nm): float
  # ... canonical column definitions

# Column mapping (your files → canonical names)
pbp_column_mapping:
  Time (sec): "Time (sec)"           #  Exact match
  Scatter Size (nm): "Size_nm"       #  Maps your column name

Auto-Generated Instrument Settings

The generation script automatically converts INI calibration files to structured YAML format with:

Metadata: Source file path, generation timestamp, traceability
Instrument parameters: All settings from INI file
Signal saturation

metadata:
  source_ini_file: /full/path/to/calibration.ini
  generated_on: '2024-01-01T12:00:00'
  generated_by: sp2xr_generate_config.py

instrument_parameters:
  ScattTransitMin: 10.0
  IncTransitMin: 5.0
  # ... all INI parameters

2. Main Data Processing Configurations

Comprehensive Template

Copy and customize the complete pipeline template:

# Copy the template
cp config/complete_example.yaml my_campaign_pipeline.yaml

# Edit for your specific needs
nano my_campaign_pipeline.yaml

Pipeline Configuration Structure

The pipeline config includes all processing settings:

# File paths
paths:
  input_pbp: /path/to/SP2XR_pbp_parquet
  input_hk: /path/to/SP2XR_hk_parquet
  output: /path/to/SP2XR_processed_output
  instrument_config: /path/to/instrument_settings.yaml

# Workflow settings
workflow:
  conc: true              # Calculate concentrations
  BC_hist: true           # BC mass distributions
  scatt_hist: true        # Scattering size distributions
  dt: 60                  # Time resolution (seconds)

# Computing resources
cluster:
  use_local: false        # true for local, false for SLURM
  cores: 16               # CPU cores
  memory: 128GB           # Memory allocation

# Analysis parameters
histo:
  inc:                    # BC mass histograms
    min_mass: 0.3
    max_mass: 400
    n_bins: 50
  scatt:                  # Scattering histograms
    min_D: 100
    max_D: 500
    n_bins: 20

# Calibration parameters
calibration:
  incandescence:
    curve_type: "polynomial"
    parameters: [0.05, 2.047e-07]
  scattering:
    curve_type: "powerlaw"
    parameters: [17.22, 0.169, -1.494]

Key sections to customize:

paths - Update all file and directory paths
calibration - Use parameters from your instrument settings
cluster - Match your computing environment
workflow.dt - Set appropriate time resolution
histo - Configure size/mass distribution bins

4.2 KiB Raw Permalink Blame History