Files
SP2XR/docs/api-reference.md

8.8 KiB

API Reference

Core Modules

sp2xr.io

Input/output operations for data conversion and processing.

Functions:

  • csv_to_parquet(source_dir, target_dir, config_file, filter_pattern) - Convert CSV/ZIP files to Parquet format
  • process_sp2xr_file(file_path, config_file, output_dir) - Process individual SP2XR data files
  • read_sp2xr_csv(file_path, schema, **kwargs) - Read SP2XR CSV/ZIP files with schema validation
  • load_matching_hk_file(pbp_file, hk_dir, hk_schema) - Load housekeeping file matching a PbP file
  • enrich_sp2xr_dataframe(df, filename) - Add time-derived and metadata columns (date, hour, filename)
  • save_sp2xr_parquet(df, output_dir, partition_cols) - Save DataFrame to partitioned Parquet format

sp2xr.calibration

Complete calibration workflow with flags and processing.

Calibration Functions:

  • calibrate_single_particle(ddf, instr_config, run_config) - Complete calibration workflow with quality flags
  • calibrate_particle_data(df, config) - Apply scattering and incandescence calibrations to particle data
  • apply_calibration(df, config) - Legacy wrapper for backward compatibility
  • apply_inc_calibration(df, calib_params) - Apply incandescence calibration (BC mass calculation)
  • apply_scatt_calibration(df, calib_params) - Apply scattering calibration (optical diameter calculation)

Calibration Curve Functions:

  • polynomial(x, coeffs) - Generic polynomial curve: a0 + a1*x + a2*x^2 + ...
  • powerlaw(x, a, b) - Generic power-law calibration: a * x^b

Mass/Diameter Conversion:

  • BC_mass_to_diam(mass_fg, material='fullerene') - Convert BC mass (fg) to mass-equivalent diameter (nm)
  • BC_diam_to_mass(diam_nm, material='fullerene') - Convert BC diameter (nm) to mass (fg)
  • mass2meqDiam(mass, rho_eff) - Calculate mass-equivalent diameter from mass and effective density

Distribution Conversion:

  • dNdlogDp_to_dMdlogDp(dNdlogDp, dp_nm, rho) - Convert number distribution to mass distribution

sp2xr.calibration_constants

Spline coefficient functions for BC calibration materials.

Core Functions:

  • SP2_calibCurveSpline(x, spline_coeffs) - Spline interpolation for density calculations
  • SP2_calibCurveSplineCheck(spline_coeffs) - Validate spline coefficient dimensions

Aquadag Calibration Functions:

  • Aquadag_RhoVsLogMspline_Var25() - Aquadag density vs. log(mass) spline coefficients (Var25)
  • Aquadag_RhoVsLogMspline_Var8() - Aquadag density vs. log(mass) spline coefficients (Var8)
  • Aquadag_RhoVsLogDspline_Var25() - Aquadag density vs. log(diameter) spline (Var25)
  • Aquadag_RhoVsLogDspline_Var8() - Aquadag density vs. log(diameter) spline (Var8)
  • Aquadag_RhoVsLogDspline_Var8_old() - Legacy Aquadag diameter-to-density spline

Fullerene Calibration Functions:

  • Fullerene_RhoVsLogMspline_Var2() - Fullerene density vs. log(mass) spline (Var2)
  • Fullerene_RhoVsLogMspline_Var5() - Fullerene density vs. log(mass) spline (Var5)
  • Fullerene_RhoVsLogMspline_Var8() - Fullerene density vs. log(mass) spline (Var8)
  • Fullerene_RhoVsLogMspline_Var8_old() - Legacy Fullerene mass-to-density spline
  • Fullerene_RhoVsLogDspline_Var2() - Fullerene diameter-to-density spline (Var2)
  • Fullerene_RhoVsLogDspline_Var5() - Fullerene diameter-to-density spline (Var5)
  • Fullerene_RhoVsLogDspline_Var8() - Fullerene diameter-to-density spline (Var8)
  • Fullerene_RhoVsLogDspline_Var8_old() - Legacy Fullerene diameter-to-density spline

Glassy Carbon Calibration Functions:

  • GlassyCarbonAlpha_Mass2Diam() - Glassy carbon mass-to-diameter conversion coefficients
  • GlassyCarbonAlpha_Diam2Mass() - Glassy carbon diameter-to-mass conversion coefficients

sp2xr.flag_single_particle_data

Quality control flagging based on instrument parameters.

Functions:

  • define_flags(df, flag_config) - Define quality control flags (transit time, FWHM, peak intensity, etc.)
  • add_thin_thick_flags(df, lag_threshold) - Add thin/thick coating classification flags based on lag time

Constants:

  • FLAG_COLS - List of flag column names used in quality filtering

sp2xr.distribution

Size/mass distribution calculations.

Main Functions:

  • process_histograms(ddf, config, inc_bins, inc_ctrs, scatt_bins, scatt_ctrs, timelag_bins, timelag_ctrs, chunk_start, client) - Calculate size/mass distributions (main workflow function)
  • process_hist_and_dist_partition(partition, dt_s, inc_mass_bin_lims, scatt_bin_lims, timelag_bins_lims, calc_dist, flow_dt) - Process histogram and distribution for a single partition

Bin Utilities:

  • make_bin_arrays(min_val, max_val, n_bins, log_scale=True) - Create bin arrays for histograms
  • bin_lims_to_ctrs(bin_lims) - Convert bin limits to bin centers
  • bin_ctrs_to_lims(bin_ctrs) - Convert bin centers to bin limits
  • get_dlogp(bin_lims) - Calculate log bin widths (for dN/dlogDp calculations)

Distribution Calculations:

  • calculate_histogram(series, bins, dt_s, flow_sccm) - Calculate histogram from series with time/flow normalization
  • counts2numConc(counts, dlogDp, dt_s, flow_sccm) - Convert counts to number concentration (dN/dlogDp)
  • dNdlogDp_to_dMdlogDp(dNdlogDp, dp_nm, rho) - Convert number distribution to mass distribution

Metadata:

  • make_hist_meta(bin_ctrs, dt_index) - Create metadata DataFrame for histogram output

sp2xr.resample_pbp_hk

Time resampling functions for data aggregation.

Functions:

  • build_dt_summary(pdf, dt_s) - Resample single-particle data to time bins (aggregate particle counts)
  • resample_hk_partition(pdf, dt) - Partition-wise resampling of housekeeping data to specified time resolution
  • join_pbp_with_flow(ddf_pbp, flow_series, config) - Join particle data with flow measurements
  • aggregate_dt(ddf_pbp_dt, ddf_hk_dt, config) - Aggregate PbP and HK data at specified time resolution

sp2xr.concentrations

Concentration calculations for different particle types.

Functions:

  • add_concentrations(df, dt) - Add BC mass concentration, scattering mass concentration, and number concentration columns

sp2xr.helpers

Utility functions for file handling, argument parsing, cluster initialization, and configuration management.

Configuration Management

  • load_and_resolve_config(args) - Load and merge YAML configuration with command-line arguments
  • load_yaml_cfg(path) - Load YAML configuration file
  • parse_args() - Parse command-line arguments for pipeline scripts
  • apply_sets(config, set_args) - Apply command-line overrides (--set) to configuration
  • get(config, key_path, default=None) - Get nested configuration value using dot notation
  • choose(cli_val, config, key_path, default) - Choose between CLI argument and config value
  • validate_config_compatibility(config) - Validate configuration compatibility and consistency

Cluster/Dask Management

  • initialize_cluster(config) - Initialize Dask cluster (local or SLURM)
  • make_slurm_cluster(config) - Create SLURM cluster with specified resources
  • make_local_cluster(config) - Create local Dask cluster with auto-detected resources

File Operations

  • find_files(directory, pattern) - Recursively find files matching pattern
  • find_matching_hk_file(pbp_file, hk_dir) - Find housekeeping file matching a PbP file
  • extract_base_filename(file_path) - Extract base filename from SP2XR file path
  • extract_sp2xr_filename_parts(filename) - Extract filename components (timestamp, instrument ID, etc.)

Time/Partition Management

  • extract_partitioned_datetimes(parquet_path) - Extract timestamps from partitioned Parquet paths
  • get_time_chunks_from_range(start, end, freq) - Generate time chunk tuples for processing
  • delete_partition_if_exists(output_path, partition_values) - Delete specific partition directory
  • floor_index_to_dt(df, dt_s) - Replace DatetimeIndex with lower-second floored values
  • calculate_delta_sec(time1, time2) - Calculate time difference in seconds
  • extract_datetime(filename) - Extract datetime from SP2XR filename

INI/YAML Conversion

  • read_xr_ini_file(ini_path) - Read SP2-XR .ini calibration file
  • find_and_validate_ini_files(directory) - Find and validate .ini files (ensure consistency)
  • export_xr_ini_to_yaml(ini_path, yaml_path) - Convert .ini file to YAML format
  • export_xr_ini_to_yaml_with_source(ini_path, yaml_path) - Convert .ini to YAML with source metadata

Utilities

  • chunks(lst, n) - Yield successive n-sized chunks from list
  • partition_rowcount(ddf) - Count total rows in Dask DataFrame

sp2xr.schema

Data schema definitions and type enforcement for SP2XR data streams.

Constants:

  • CANONICAL_DTYPES - Dictionary mapping column names to canonical data types
  • DEFAULT_FLOAT - Default float dtype for numeric columns

Functions:

  • enforce_schema(df) - Cast DataFrame columns to canonical data types
  • cast_and_arrow(df) - Cast to canonical dtypes and convert to PyArrow backend