# Configuration Guide ## Overview SP2-XR uses configuration files to handle different instrument versions, data schemas, calibrations, and processing parameters. ## Configuration System Structure ### Primary Locations - **`config/`** - Essential templates and examples - **Auto-generated configs** - Created by `sp2xr_generate_config.py` script ### Recommended Workflow 1. **Auto-generate** data schema and instrument settings from your actual data 2. **Copy and customize** pipeline template for your workflow 3. **Validate** with small datasets before full processing ### Configuration File Types ## 1. Configuration Generation Tools **Features:** - Automatically detects PbP and HK files (CSV, ZIP, Parquet) - Analyzes actual data to determine column names and types - Creates mapping templates for column standardization - Creates a .yaml file with the instrument settings ### Auto-Generated Schemas ```bash # Generate schema with instrument settings (recommended) python scripts/sp2xr_generate_config.py /path/to/your/data \ --mapping \ --schema-output my_data_schema.yaml \ --instrument-output my_instrument_settings.yaml ``` **Benefits of auto-generation:** - Analyzes actual data files to detect correct column names and types - Automatically finds and validates INI calibration files - Creates instrument settings with source traceability - Handles different SP2-XR instrument versions automatically ### Generated Schema Structure The auto-generated schema includes: ```yaml # Data type definitions (detected from your files) pbp_schema: Time (sec): float Scatter Size (nm): float # ... all columns from your data # Standard column names used by SP2-XR package pbp_canonical_schema: Time (sec): float Scatter Size (nm): float # ... canonical column definitions # Column mapping (your files → canonical names) pbp_column_mapping: Time (sec): "Time (sec)" # Exact match Scatter Size (nm): "Size_nm" # Maps your column name ``` ### Auto-Generated Instrument Settings The generation script automatically converts INI calibration files to structured YAML format with: - **Metadata**: Source file path, generation timestamp, traceability - **Instrument parameters**: All settings from INI file - **Signal saturation** ```yaml metadata: source_ini_file: /full/path/to/calibration.ini generated_on: '2024-01-01T12:00:00' generated_by: sp2xr_generate_config.py instrument_parameters: ScattTransitMin: 10.0 IncTransitMin: 5.0 # ... all INI parameters ``` ## 2. Main Data Processing Configurations ### Comprehensive Template Copy and customize the complete pipeline template: ```bash # Copy the template cp config/complete_example.yaml my_campaign_pipeline.yaml # Edit for your specific needs nano my_campaign_pipeline.yaml ``` ### Pipeline Configuration Structure The pipeline config includes all processing settings: ```yaml # File paths paths: input_pbp: /path/to/SP2XR_pbp_parquet input_hk: /path/to/SP2XR_hk_parquet output: /path/to/SP2XR_processed_output instrument_config: /path/to/instrument_settings.yaml # Workflow settings workflow: conc: true # Calculate concentrations BC_hist: true # BC mass distributions scatt_hist: true # Scattering size distributions dt: 60 # Time resolution (seconds) # Computing resources cluster: use_local: false # true for local, false for SLURM cores: 16 # CPU cores memory: 128GB # Memory allocation # Analysis parameters histo: inc: # BC mass histograms min_mass: 0.3 max_mass: 400 n_bins: 50 scatt: # Scattering histograms min_D: 100 max_D: 500 n_bins: 20 # Calibration parameters calibration: incandescence: curve_type: "polynomial" parameters: [0.05, 2.047e-07] scattering: curve_type: "powerlaw" parameters: [17.22, 0.169, -1.494] ``` **Key sections to customize:** - **`paths`** - Update all file and directory paths - **`calibration`** - Use parameters from your instrument settings - **`cluster`** - Match your computing environment - **`workflow.dt`** - Set appropriate time resolution - **`histo`** - Configure size/mass distribution bins