170 lines
5.8 KiB
Markdown
170 lines
5.8 KiB
Markdown
# Scripts Reference
|
|
|
|
## Available Scripts
|
|
|
|
### `scripts/sp2xr_pipeline.py`
|
|
Main processing pipeline that orchestrates the complete data analysis workflow.
|
|
|
|
**Usage:**
|
|
```bash
|
|
python scripts/sp2xr_pipeline.py --config path/to/config.yaml [options]
|
|
```
|
|
|
|
**Command-line Options:**
|
|
- `--config` - Path to YAML configuration file (required)
|
|
- `--set KEY=VALUE` - Override config values using dot notation (e.g., `--set dt=60`)
|
|
- Additional cluster configuration options (see `--help`)
|
|
|
|
**Features:**
|
|
- Distributed processing with Dask (local or SLURM cluster)
|
|
- Automatic time chunking and partition management
|
|
- Calibration application and quality flagging
|
|
- Distribution calculations and time resampling
|
|
- Concentration calculations
|
|
- Output partitioned by date and hour
|
|
|
|
### `scripts/sp2xr_csv2parquet.py`
|
|
Batch conversion of raw CSV/ZIP files to Parquet format with support for both local and cluster processing.
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Local processing (automatic resource detection)
|
|
python scripts/sp2xr_csv2parquet.py \
|
|
--source /path/to/csv/files \
|
|
--target /path/to/parquet/output \
|
|
--config config.yaml \
|
|
--filter "PbP" \
|
|
--local
|
|
|
|
# SLURM cluster processing
|
|
python scripts/sp2xr_csv2parquet.py \
|
|
--source /path/to/csv/files \
|
|
--target /path/to/parquet/output \
|
|
--config config.yaml \
|
|
--filter "PbP" \
|
|
--cores 32 --memory 64GB --partition general
|
|
|
|
# Process housekeeping files with custom chunk size
|
|
python scripts/sp2xr_csv2parquet.py \
|
|
--source /path/to/csv/files \
|
|
--target /path/to/parquet/output \
|
|
--config config.yaml \
|
|
--filter "hk" \
|
|
--chunk 50 \
|
|
--local
|
|
```
|
|
|
|
**Features:**
|
|
- Supports both local and SLURM cluster execution
|
|
- Automatic resource detection for local processing
|
|
- Configurable batch processing with chunking
|
|
- Progress tracking and error handling
|
|
- Graceful shutdown with signal handling
|
|
|
|
### `scripts/sp2xr_apply_calibration.py`
|
|
Apply calibration parameters to particle data using Dask + SLURM.
|
|
|
|
**Usage:**
|
|
```bash
|
|
python scripts/sp2xr_apply_calibration.py \
|
|
--input /path/to/data.parquet \
|
|
--config /path/to/config.yaml \
|
|
--output /path/to/calibrated.parquet \
|
|
[--cores 32] [--memory 64GB] [--walltime 02:00:00] [--partition daily]
|
|
```
|
|
|
|
**Options:**
|
|
- `--input` - Input Parquet file or directory (required)
|
|
- `--config` - YAML calibration configuration (required)
|
|
- `--output` - Output directory for calibrated Parquet dataset (required)
|
|
- `--cores` - Cores per SLURM job (default: 32)
|
|
- `--memory` - Memory per job (default: 64GB)
|
|
- `--walltime` - Wall-time limit (default: 02:00:00)
|
|
- `--partition` - SLURM partition (default: daily)
|
|
|
|
**Features:**
|
|
- Parallel processing with Dask + SLURM cluster
|
|
- Automatic scaling and resource management
|
|
- Partitioned output by date and hour
|
|
|
|
### `scripts/sp2xr_generate_config.py`
|
|
Generate schema configuration files by automatically detecting SP2XR files in a directory.
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Generate basic schema config from current directory
|
|
python scripts/sp2xr_generate_config.py .
|
|
|
|
# Generate config from specific directory
|
|
python scripts/sp2xr_generate_config.py /path/to/sp2xr/data
|
|
|
|
# Generate config with column mapping support (for non-standard column names)
|
|
python scripts/sp2xr_generate_config.py /path/to/data --mapping
|
|
|
|
# Specify custom schema and instrument settings output filenames
|
|
python scripts/sp2xr_generate_config.py /path/to/data \
|
|
--schema-output my_schema.yaml \
|
|
--instrument-output my_settings.yaml
|
|
|
|
# Generate mapping config with custom output names
|
|
python scripts/sp2xr_generate_config.py /path/to/data --mapping \
|
|
--schema-output campaign_schema.yaml \
|
|
--instrument-output campaign_settings.yaml
|
|
|
|
# Use specific files instead of auto-detection
|
|
python scripts/sp2xr_generate_config.py . --pbp-file data.pbp.csv --hk-file data.hk.csv
|
|
```
|
|
|
|
**Options:**
|
|
- `directory` - Directory containing SP2XR files (PbP and HK files)
|
|
- `--schema-output`, `-s` - Output filename for data schema config (default: `config_schema.yaml`)
|
|
- `--instrument-output`, `-i` - Output filename for instrument settings config (default: `{schema_output}_instrument_settings.yaml`)
|
|
- `--mapping`, `-m` - Generate config with column mapping support (creates canonical column mappings)
|
|
- `--pbp-file` - Specify specific PbP file instead of auto-detection
|
|
- `--hk-file` - Specify specific HK file instead of auto-detection
|
|
|
|
**Features:**
|
|
- Automatic file detection (searches recursively for PbP and HK files)
|
|
- Schema inference from CSV/ZIP/Parquet files
|
|
- Column mapping support for non-standard column names
|
|
- Automatic INI file detection and conversion to YAML
|
|
- Validates INI file consistency across multiple files
|
|
|
|
### `scripts/sp2xr_ini2yaml.py`
|
|
Convert legacy INI calibration files to YAML format.
|
|
|
|
**Usage:**
|
|
```bash
|
|
python scripts/sp2xr_ini2yaml.py input.ini output.yaml
|
|
```
|
|
|
|
**Arguments:**
|
|
- `ini` - Input .ini calibration file
|
|
- `yaml` - Output .yaml file path
|
|
|
|
**Features:**
|
|
- Converts SP2-XR instrument .ini files to editable YAML format
|
|
- Preserves all calibration parameters and settings
|
|
|
|
### `calibration_workflow.ipynb`
|
|
Interactive Jupyter notebook for determining instrument-specific calibration coefficients.
|
|
|
|
**Purpose:**
|
|
- Analyze calibration standards (PSL spheres, Aquadag, etc.) to derive scattering and incandescence calibration curves
|
|
- Iterative process with visualization for quality control
|
|
- Generate calibration parameters for use in configuration files
|
|
|
|
**Workflow:**
|
|
1. Load calibration standard measurements
|
|
2. Plot raw signals vs. known particle properties
|
|
3. Fit calibration curves (polynomial, power-law, etc.)
|
|
4. Export calibration coefficients to YAML configuration
|
|
|
|
### `scripts/run_sp2xr_pipeline.sbatch`
|
|
SLURM batch job script for running the pipeline on HPC systems.
|
|
|
|
**Features:**
|
|
- Configurable resource allocation
|
|
- Automatic scratch directory management
|
|
- Module loading and environment activation
|
|
- Error and output logging |