Files
SP2XR/docs/api-reference.md

178 lines
8.8 KiB
Markdown

# API Reference
## Core Modules
### `sp2xr.io`
Input/output operations for data conversion and processing.
**Functions:**
- `csv_to_parquet(source_dir, target_dir, config_file, filter_pattern)` - Convert CSV/ZIP files to Parquet format
- `process_sp2xr_file(file_path, config_file, output_dir)` - Process individual SP2XR data files
- `read_sp2xr_csv(file_path, schema, **kwargs)` - Read SP2XR CSV/ZIP files with schema validation
- `load_matching_hk_file(pbp_file, hk_dir, hk_schema)` - Load housekeeping file matching a PbP file
- `enrich_sp2xr_dataframe(df, filename)` - Add time-derived and metadata columns (date, hour, filename)
- `save_sp2xr_parquet(df, output_dir, partition_cols)` - Save DataFrame to partitioned Parquet format
---
### `sp2xr.calibration`
Complete calibration workflow with flags and processing.
**Calibration Functions:**
- `calibrate_single_particle(ddf, instr_config, run_config)` - Complete calibration workflow with quality flags
- `calibrate_particle_data(df, config)` - Apply scattering and incandescence calibrations to particle data
- `apply_calibration(df, config)` - Legacy wrapper for backward compatibility
- `apply_inc_calibration(df, calib_params)` - Apply incandescence calibration (BC mass calculation)
- `apply_scatt_calibration(df, calib_params)` - Apply scattering calibration (optical diameter calculation)
**Calibration Curve Functions:**
- `polynomial(x, coeffs)` - Generic polynomial curve: `a0 + a1*x + a2*x^2 + ...`
- `powerlaw(x, a, b)` - Generic power-law calibration: `a * x^b`
**Mass/Diameter Conversion:**
- `BC_mass_to_diam(mass_fg, material='fullerene')` - Convert BC mass (fg) to mass-equivalent diameter (nm)
- `BC_diam_to_mass(diam_nm, material='fullerene')` - Convert BC diameter (nm) to mass (fg)
- `mass2meqDiam(mass, rho_eff)` - Calculate mass-equivalent diameter from mass and effective density
**Distribution Conversion:**
- `dNdlogDp_to_dMdlogDp(dNdlogDp, dp_nm, rho)` - Convert number distribution to mass distribution
---
### `sp2xr.calibration_constants`
Spline coefficient functions for BC calibration materials.
**Core Functions:**
- `SP2_calibCurveSpline(x, spline_coeffs)` - Spline interpolation for density calculations
- `SP2_calibCurveSplineCheck(spline_coeffs)` - Validate spline coefficient dimensions
**Aquadag Calibration Functions:**
- `Aquadag_RhoVsLogMspline_Var25()` - Aquadag density vs. log(mass) spline coefficients (Var25)
- `Aquadag_RhoVsLogMspline_Var8()` - Aquadag density vs. log(mass) spline coefficients (Var8)
- `Aquadag_RhoVsLogDspline_Var25()` - Aquadag density vs. log(diameter) spline (Var25)
- `Aquadag_RhoVsLogDspline_Var8()` - Aquadag density vs. log(diameter) spline (Var8)
- `Aquadag_RhoVsLogDspline_Var8_old()` - Legacy Aquadag diameter-to-density spline
**Fullerene Calibration Functions:**
- `Fullerene_RhoVsLogMspline_Var2()` - Fullerene density vs. log(mass) spline (Var2)
- `Fullerene_RhoVsLogMspline_Var5()` - Fullerene density vs. log(mass) spline (Var5)
- `Fullerene_RhoVsLogMspline_Var8()` - Fullerene density vs. log(mass) spline (Var8)
- `Fullerene_RhoVsLogMspline_Var8_old()` - Legacy Fullerene mass-to-density spline
- `Fullerene_RhoVsLogDspline_Var2()` - Fullerene diameter-to-density spline (Var2)
- `Fullerene_RhoVsLogDspline_Var5()` - Fullerene diameter-to-density spline (Var5)
- `Fullerene_RhoVsLogDspline_Var8()` - Fullerene diameter-to-density spline (Var8)
- `Fullerene_RhoVsLogDspline_Var8_old()` - Legacy Fullerene diameter-to-density spline
**Glassy Carbon Calibration Functions:**
- `GlassyCarbonAlpha_Mass2Diam()` - Glassy carbon mass-to-diameter conversion coefficients
- `GlassyCarbonAlpha_Diam2Mass()` - Glassy carbon diameter-to-mass conversion coefficients
---
### `sp2xr.flag_single_particle_data`
Quality control flagging based on instrument parameters.
**Functions:**
- `define_flags(df, flag_config)` - Define quality control flags (transit time, FWHM, peak intensity, etc.)
- `add_thin_thick_flags(df, lag_threshold)` - Add thin/thick coating classification flags based on lag time
**Constants:**
- `FLAG_COLS` - List of flag column names used in quality filtering
---
### `sp2xr.distribution`
Size/mass distribution calculations.
**Main Functions:**
- `process_histograms(ddf, config, inc_bins, inc_ctrs, scatt_bins, scatt_ctrs, timelag_bins, timelag_ctrs, chunk_start, client)` - Calculate size/mass distributions (main workflow function)
- `process_hist_and_dist_partition(partition, dt_s, inc_mass_bin_lims, scatt_bin_lims, timelag_bins_lims, calc_dist, flow_dt)` - Process histogram and distribution for a single partition
**Bin Utilities:**
- `make_bin_arrays(min_val, max_val, n_bins, log_scale=True)` - Create bin arrays for histograms
- `bin_lims_to_ctrs(bin_lims)` - Convert bin limits to bin centers
- `bin_ctrs_to_lims(bin_ctrs)` - Convert bin centers to bin limits
- `get_dlogp(bin_lims)` - Calculate log bin widths (for dN/dlogDp calculations)
**Distribution Calculations:**
- `calculate_histogram(series, bins, dt_s, flow_sccm)` - Calculate histogram from series with time/flow normalization
- `counts2numConc(counts, dlogDp, dt_s, flow_sccm)` - Convert counts to number concentration (dN/dlogDp)
- `dNdlogDp_to_dMdlogDp(dNdlogDp, dp_nm, rho)` - Convert number distribution to mass distribution
**Metadata:**
- `make_hist_meta(bin_ctrs, dt_index)` - Create metadata DataFrame for histogram output
---
### `sp2xr.resample_pbp_hk`
Time resampling functions for data aggregation.
**Functions:**
- `build_dt_summary(pdf, dt_s)` - Resample single-particle data to time bins (aggregate particle counts)
- `resample_hk_partition(pdf, dt)` - Partition-wise resampling of housekeeping data to specified time resolution
- `join_pbp_with_flow(ddf_pbp, flow_series, config)` - Join particle data with flow measurements
- `aggregate_dt(ddf_pbp_dt, ddf_hk_dt, config)` - Aggregate PbP and HK data at specified time resolution
---
### `sp2xr.concentrations`
Concentration calculations for different particle types.
**Functions:**
- `add_concentrations(df, dt)` - Add BC mass concentration, scattering mass concentration, and number concentration columns
---
### `sp2xr.helpers`
Utility functions for file handling, argument parsing, cluster initialization, and configuration management.
#### Configuration Management
- `load_and_resolve_config(args)` - Load and merge YAML configuration with command-line arguments
- `load_yaml_cfg(path)` - Load YAML configuration file
- `parse_args()` - Parse command-line arguments for pipeline scripts
- `apply_sets(config, set_args)` - Apply command-line overrides (--set) to configuration
- `get(config, key_path, default=None)` - Get nested configuration value using dot notation
- `choose(cli_val, config, key_path, default)` - Choose between CLI argument and config value
- `validate_config_compatibility(config)` - Validate configuration compatibility and consistency
#### Cluster/Dask Management
- `initialize_cluster(config)` - Initialize Dask cluster (local or SLURM)
- `make_slurm_cluster(config)` - Create SLURM cluster with specified resources
- `make_local_cluster(config)` - Create local Dask cluster with auto-detected resources
#### File Operations
- `find_files(directory, pattern)` - Recursively find files matching pattern
- `find_matching_hk_file(pbp_file, hk_dir)` - Find housekeeping file matching a PbP file
- `extract_base_filename(file_path)` - Extract base filename from SP2XR file path
- `extract_sp2xr_filename_parts(filename)` - Extract filename components (timestamp, instrument ID, etc.)
#### Time/Partition Management
- `extract_partitioned_datetimes(parquet_path)` - Extract timestamps from partitioned Parquet paths
- `get_time_chunks_from_range(start, end, freq)` - Generate time chunk tuples for processing
- `delete_partition_if_exists(output_path, partition_values)` - Delete specific partition directory
- `floor_index_to_dt(df, dt_s)` - Replace DatetimeIndex with lower-second floored values
- `calculate_delta_sec(time1, time2)` - Calculate time difference in seconds
- `extract_datetime(filename)` - Extract datetime from SP2XR filename
#### INI/YAML Conversion
- `read_xr_ini_file(ini_path)` - Read SP2-XR .ini calibration file
- `find_and_validate_ini_files(directory)` - Find and validate .ini files (ensure consistency)
- `export_xr_ini_to_yaml(ini_path, yaml_path)` - Convert .ini file to YAML format
- `export_xr_ini_to_yaml_with_source(ini_path, yaml_path)` - Convert .ini to YAML with source metadata
#### Utilities
- `chunks(lst, n)` - Yield successive n-sized chunks from list
- `partition_rowcount(ddf)` - Count total rows in Dask DataFrame
---
### `sp2xr.schema`
Data schema definitions and type enforcement for SP2XR data streams.
**Constants:**
- `CANONICAL_DTYPES` - Dictionary mapping column names to canonical data types
- `DEFAULT_FLOAT` - Default float dtype for numeric columns
**Functions:**
- `enforce_schema(df)` - Cast DataFrame columns to canonical data types
- `cast_and_arrow(df)` - Cast to canonical dtypes and convert to PyArrow backend