Files
SP2XR/docs/configuration.md

4.2 KiB

Configuration Guide

Overview

SP2-XR uses configuration files to handle different instrument versions, data schemas, calibrations, and processing parameters.

Configuration System Structure

Primary Locations

  • config/ - Essential templates and examples
  • Auto-generated configs - Created by sp2xr_generate_config.py script
  1. Auto-generate data schema and instrument settings from your actual data
  2. Copy and customize pipeline template for your workflow
  3. Validate with small datasets before full processing

Configuration File Types

1. Configuration Generation Tools

Features:

  • Automatically detects PbP and HK files (CSV, ZIP, Parquet)
  • Analyzes actual data to determine column names and types
  • Creates mapping templates for column standardization
  • Creates a .yaml file with the instrument settings

Auto-Generated Schemas

# Generate schema with instrument settings (recommended)
python scripts/sp2xr_generate_config.py /path/to/your/data \
  --mapping \
  --schema-output my_data_schema.yaml \
  --instrument-output my_instrument_settings.yaml

Benefits of auto-generation:

  • Analyzes actual data files to detect correct column names and types
  • Automatically finds and validates INI calibration files
  • Creates instrument settings with source traceability
  • Handles different SP2-XR instrument versions automatically

Generated Schema Structure

The auto-generated schema includes:

# Data type definitions (detected from your files)
pbp_schema:
  Time (sec): float
  Scatter Size (nm): float
  # ... all columns from your data

# Standard column names used by SP2-XR package
pbp_canonical_schema:
  Time (sec): float
  Scatter Size (nm): float
  # ... canonical column definitions

# Column mapping (your files → canonical names)
pbp_column_mapping:
  Time (sec): "Time (sec)"           #  Exact match
  Scatter Size (nm): "Size_nm"       #  Maps your column name

Auto-Generated Instrument Settings

The generation script automatically converts INI calibration files to structured YAML format with:

  • Metadata: Source file path, generation timestamp, traceability
  • Instrument parameters: All settings from INI file
  • Signal saturation
metadata:
  source_ini_file: /full/path/to/calibration.ini
  generated_on: '2024-01-01T12:00:00'
  generated_by: sp2xr_generate_config.py

instrument_parameters:
  ScattTransitMin: 10.0
  IncTransitMin: 5.0
  # ... all INI parameters

2. Main Data Processing Configurations

Comprehensive Template

Copy and customize the complete pipeline template:

# Copy the template
cp config/complete_example.yaml my_campaign_pipeline.yaml

# Edit for your specific needs
nano my_campaign_pipeline.yaml

Pipeline Configuration Structure

The pipeline config includes all processing settings:

# File paths
paths:
  input_pbp: /path/to/SP2XR_pbp_parquet
  input_hk: /path/to/SP2XR_hk_parquet
  output: /path/to/SP2XR_processed_output
  instrument_config: /path/to/instrument_settings.yaml

# Workflow settings
workflow:
  conc: true              # Calculate concentrations
  BC_hist: true           # BC mass distributions
  scatt_hist: true        # Scattering size distributions
  dt: 60                  # Time resolution (seconds)

# Computing resources
cluster:
  use_local: false        # true for local, false for SLURM
  cores: 16               # CPU cores
  memory: 128GB           # Memory allocation

# Analysis parameters
histo:
  inc:                    # BC mass histograms
    min_mass: 0.3
    max_mass: 400
    n_bins: 50
  scatt:                  # Scattering histograms
    min_D: 100
    max_D: 500
    n_bins: 20

# Calibration parameters
calibration:
  incandescence:
    curve_type: "polynomial"
    parameters: [0.05, 2.047e-07]
  scattering:
    curve_type: "powerlaw"
    parameters: [17.22, 0.169, -1.494]

Key sections to customize:

  • paths - Update all file and directory paths
  • calibration - Use parameters from your instrument settings
  • cluster - Match your computing environment
  • workflow.dt - Set appropriate time resolution
  • histo - Configure size/mass distribution bins