# Scripts Reference ## Available Scripts ### `scripts/sp2xr_pipeline.py` Main processing pipeline that orchestrates the complete data analysis workflow. **Usage:** ```bash python scripts/sp2xr_pipeline.py --config path/to/config.yaml [options] ``` **Command-line Options:** - `--config` - Path to YAML configuration file (required) - `--set KEY=VALUE` - Override config values using dot notation (e.g., `--set dt=60`) - Additional cluster configuration options (see `--help`) **Features:** - Distributed processing with Dask (local or SLURM cluster) - Automatic time chunking and partition management - Calibration application and quality flagging - Distribution calculations and time resampling - Concentration calculations - Output partitioned by date and hour ### `scripts/sp2xr_csv2parquet.py` Batch conversion of raw CSV/ZIP files to Parquet format with support for both local and cluster processing. **Usage:** ```bash # Local processing (automatic resource detection) python scripts/sp2xr_csv2parquet.py \ --source /path/to/csv/files \ --target /path/to/parquet/output \ --config config.yaml \ --filter "PbP" \ --local # SLURM cluster processing python scripts/sp2xr_csv2parquet.py \ --source /path/to/csv/files \ --target /path/to/parquet/output \ --config config.yaml \ --filter "PbP" \ --cores 32 --memory 64GB --partition general # Process housekeeping files with custom chunk size python scripts/sp2xr_csv2parquet.py \ --source /path/to/csv/files \ --target /path/to/parquet/output \ --config config.yaml \ --filter "hk" \ --chunk 50 \ --local ``` **Features:** - Supports both local and SLURM cluster execution - Automatic resource detection for local processing - Configurable batch processing with chunking - Progress tracking and error handling - Graceful shutdown with signal handling ### `scripts/sp2xr_apply_calibration.py` Apply calibration parameters to particle data using Dask + SLURM. **Usage:** ```bash python scripts/sp2xr_apply_calibration.py \ --input /path/to/data.parquet \ --config /path/to/config.yaml \ --output /path/to/calibrated.parquet \ [--cores 32] [--memory 64GB] [--walltime 02:00:00] [--partition daily] ``` **Options:** - `--input` - Input Parquet file or directory (required) - `--config` - YAML calibration configuration (required) - `--output` - Output directory for calibrated Parquet dataset (required) - `--cores` - Cores per SLURM job (default: 32) - `--memory` - Memory per job (default: 64GB) - `--walltime` - Wall-time limit (default: 02:00:00) - `--partition` - SLURM partition (default: daily) **Features:** - Parallel processing with Dask + SLURM cluster - Automatic scaling and resource management - Partitioned output by date and hour ### `scripts/sp2xr_generate_config.py` Generate schema configuration files by automatically detecting SP2XR files in a directory. **Usage:** ```bash # Generate basic schema config from current directory python scripts/sp2xr_generate_config.py . # Generate config from specific directory python scripts/sp2xr_generate_config.py /path/to/sp2xr/data # Generate config with column mapping support (for non-standard column names) python scripts/sp2xr_generate_config.py /path/to/data --mapping # Specify custom schema and instrument settings output filenames python scripts/sp2xr_generate_config.py /path/to/data \ --schema-output my_schema.yaml \ --instrument-output my_settings.yaml # Generate mapping config with custom output names python scripts/sp2xr_generate_config.py /path/to/data --mapping \ --schema-output campaign_schema.yaml \ --instrument-output campaign_settings.yaml # Use specific files instead of auto-detection python scripts/sp2xr_generate_config.py . --pbp-file data.pbp.csv --hk-file data.hk.csv ``` **Options:** - `directory` - Directory containing SP2XR files (PbP and HK files) - `--schema-output`, `-s` - Output filename for data schema config (default: `config_schema.yaml`) - `--instrument-output`, `-i` - Output filename for instrument settings config (default: `{schema_output}_instrument_settings.yaml`) - `--mapping`, `-m` - Generate config with column mapping support (creates canonical column mappings) - `--pbp-file` - Specify specific PbP file instead of auto-detection - `--hk-file` - Specify specific HK file instead of auto-detection **Features:** - Automatic file detection (searches recursively for PbP and HK files) - Schema inference from CSV/ZIP/Parquet files - Column mapping support for non-standard column names - Automatic INI file detection and conversion to YAML - Validates INI file consistency across multiple files ### `scripts/sp2xr_ini2yaml.py` Convert legacy INI calibration files to YAML format. **Usage:** ```bash python scripts/sp2xr_ini2yaml.py input.ini output.yaml ``` **Arguments:** - `ini` - Input .ini calibration file - `yaml` - Output .yaml file path **Features:** - Converts SP2-XR instrument .ini files to editable YAML format - Preserves all calibration parameters and settings ### `calibration_workflow.ipynb` Interactive Jupyter notebook for determining instrument-specific calibration coefficients. **Purpose:** - Analyze calibration standards (PSL spheres, Aquadag, etc.) to derive scattering and incandescence calibration curves - Iterative process with visualization for quality control - Generate calibration parameters for use in configuration files **Workflow:** 1. Load calibration standard measurements 2. Plot raw signals vs. known particle properties 3. Fit calibration curves (polynomial, power-law, etc.) 4. Export calibration coefficients to YAML configuration ### `scripts/run_sp2xr_pipeline.sbatch` SLURM batch job script for running the pipeline on HPC systems. **Features:** - Configurable resource allocation - Automatic scratch directory management - Module loading and environment activation - Error and output logging