5.8 KiB
5.8 KiB
Scripts Reference
Available Scripts
scripts/sp2xr_pipeline.py
Main processing pipeline that orchestrates the complete data analysis workflow.
Usage:
python scripts/sp2xr_pipeline.py --config path/to/config.yaml [options]
Command-line Options:
--config- Path to YAML configuration file (required)--set KEY=VALUE- Override config values using dot notation (e.g.,--set dt=60)- Additional cluster configuration options (see
--help)
Features:
- Distributed processing with Dask (local or SLURM cluster)
- Automatic time chunking and partition management
- Calibration application and quality flagging
- Distribution calculations and time resampling
- Concentration calculations
- Output partitioned by date and hour
scripts/sp2xr_csv2parquet.py
Batch conversion of raw CSV/ZIP files to Parquet format with support for both local and cluster processing.
Usage:
# Local processing (automatic resource detection)
python scripts/sp2xr_csv2parquet.py \
--source /path/to/csv/files \
--target /path/to/parquet/output \
--config config.yaml \
--filter "PbP" \
--local
# SLURM cluster processing
python scripts/sp2xr_csv2parquet.py \
--source /path/to/csv/files \
--target /path/to/parquet/output \
--config config.yaml \
--filter "PbP" \
--cores 32 --memory 64GB --partition general
# Process housekeeping files with custom chunk size
python scripts/sp2xr_csv2parquet.py \
--source /path/to/csv/files \
--target /path/to/parquet/output \
--config config.yaml \
--filter "hk" \
--chunk 50 \
--local
Features:
- Supports both local and SLURM cluster execution
- Automatic resource detection for local processing
- Configurable batch processing with chunking
- Progress tracking and error handling
- Graceful shutdown with signal handling
scripts/sp2xr_apply_calibration.py
Apply calibration parameters to particle data using Dask + SLURM.
Usage:
python scripts/sp2xr_apply_calibration.py \
--input /path/to/data.parquet \
--config /path/to/config.yaml \
--output /path/to/calibrated.parquet \
[--cores 32] [--memory 64GB] [--walltime 02:00:00] [--partition daily]
Options:
--input- Input Parquet file or directory (required)--config- YAML calibration configuration (required)--output- Output directory for calibrated Parquet dataset (required)--cores- Cores per SLURM job (default: 32)--memory- Memory per job (default: 64GB)--walltime- Wall-time limit (default: 02:00:00)--partition- SLURM partition (default: daily)
Features:
- Parallel processing with Dask + SLURM cluster
- Automatic scaling and resource management
- Partitioned output by date and hour
scripts/sp2xr_generate_config.py
Generate schema configuration files by automatically detecting SP2XR files in a directory.
Usage:
# Generate basic schema config from current directory
python scripts/sp2xr_generate_config.py .
# Generate config from specific directory
python scripts/sp2xr_generate_config.py /path/to/sp2xr/data
# Generate config with column mapping support (for non-standard column names)
python scripts/sp2xr_generate_config.py /path/to/data --mapping
# Specify custom schema and instrument settings output filenames
python scripts/sp2xr_generate_config.py /path/to/data \
--schema-output my_schema.yaml \
--instrument-output my_settings.yaml
# Generate mapping config with custom output names
python scripts/sp2xr_generate_config.py /path/to/data --mapping \
--schema-output campaign_schema.yaml \
--instrument-output campaign_settings.yaml
# Use specific files instead of auto-detection
python scripts/sp2xr_generate_config.py . --pbp-file data.pbp.csv --hk-file data.hk.csv
Options:
directory- Directory containing SP2XR files (PbP and HK files)--schema-output,-s- Output filename for data schema config (default:config_schema.yaml)--instrument-output,-i- Output filename for instrument settings config (default:{schema_output}_instrument_settings.yaml)--mapping,-m- Generate config with column mapping support (creates canonical column mappings)--pbp-file- Specify specific PbP file instead of auto-detection--hk-file- Specify specific HK file instead of auto-detection
Features:
- Automatic file detection (searches recursively for PbP and HK files)
- Schema inference from CSV/ZIP/Parquet files
- Column mapping support for non-standard column names
- Automatic INI file detection and conversion to YAML
- Validates INI file consistency across multiple files
scripts/sp2xr_ini2yaml.py
Convert legacy INI calibration files to YAML format.
Usage:
python scripts/sp2xr_ini2yaml.py input.ini output.yaml
Arguments:
ini- Input .ini calibration fileyaml- Output .yaml file path
Features:
- Converts SP2-XR instrument .ini files to editable YAML format
- Preserves all calibration parameters and settings
calibration_workflow.ipynb
Interactive Jupyter notebook for determining instrument-specific calibration coefficients.
Purpose:
- Analyze calibration standards (PSL spheres, Aquadag, etc.) to derive scattering and incandescence calibration curves
- Iterative process with visualization for quality control
- Generate calibration parameters for use in configuration files
Workflow:
- Load calibration standard measurements
- Plot raw signals vs. known particle properties
- Fit calibration curves (polynomial, power-law, etc.)
- Export calibration coefficients to YAML configuration
scripts/run_sp2xr_pipeline.sbatch
SLURM batch job script for running the pipeline on HPC systems.
Features:
- Configurable resource allocation
- Automatic scratch directory management
- Module loading and environment activation
- Error and output logging