8.8 KiB
API Reference
Core Modules
sp2xr.io
Input/output operations for data conversion and processing.
Functions:
csv_to_parquet(source_dir, target_dir, config_file, filter_pattern)- Convert CSV/ZIP files to Parquet formatprocess_sp2xr_file(file_path, config_file, output_dir)- Process individual SP2XR data filesread_sp2xr_csv(file_path, schema, **kwargs)- Read SP2XR CSV/ZIP files with schema validationload_matching_hk_file(pbp_file, hk_dir, hk_schema)- Load housekeeping file matching a PbP fileenrich_sp2xr_dataframe(df, filename)- Add time-derived and metadata columns (date, hour, filename)save_sp2xr_parquet(df, output_dir, partition_cols)- Save DataFrame to partitioned Parquet format
sp2xr.calibration
Complete calibration workflow with flags and processing.
Calibration Functions:
calibrate_single_particle(ddf, instr_config, run_config)- Complete calibration workflow with quality flagscalibrate_particle_data(df, config)- Apply scattering and incandescence calibrations to particle dataapply_calibration(df, config)- Legacy wrapper for backward compatibilityapply_inc_calibration(df, calib_params)- Apply incandescence calibration (BC mass calculation)apply_scatt_calibration(df, calib_params)- Apply scattering calibration (optical diameter calculation)
Calibration Curve Functions:
polynomial(x, coeffs)- Generic polynomial curve:a0 + a1*x + a2*x^2 + ...powerlaw(x, a, b)- Generic power-law calibration:a * x^b
Mass/Diameter Conversion:
BC_mass_to_diam(mass_fg, material='fullerene')- Convert BC mass (fg) to mass-equivalent diameter (nm)BC_diam_to_mass(diam_nm, material='fullerene')- Convert BC diameter (nm) to mass (fg)mass2meqDiam(mass, rho_eff)- Calculate mass-equivalent diameter from mass and effective density
Distribution Conversion:
dNdlogDp_to_dMdlogDp(dNdlogDp, dp_nm, rho)- Convert number distribution to mass distribution
sp2xr.calibration_constants
Spline coefficient functions for BC calibration materials.
Core Functions:
SP2_calibCurveSpline(x, spline_coeffs)- Spline interpolation for density calculationsSP2_calibCurveSplineCheck(spline_coeffs)- Validate spline coefficient dimensions
Aquadag Calibration Functions:
Aquadag_RhoVsLogMspline_Var25()- Aquadag density vs. log(mass) spline coefficients (Var25)Aquadag_RhoVsLogMspline_Var8()- Aquadag density vs. log(mass) spline coefficients (Var8)Aquadag_RhoVsLogDspline_Var25()- Aquadag density vs. log(diameter) spline (Var25)Aquadag_RhoVsLogDspline_Var8()- Aquadag density vs. log(diameter) spline (Var8)Aquadag_RhoVsLogDspline_Var8_old()- Legacy Aquadag diameter-to-density spline
Fullerene Calibration Functions:
Fullerene_RhoVsLogMspline_Var2()- Fullerene density vs. log(mass) spline (Var2)Fullerene_RhoVsLogMspline_Var5()- Fullerene density vs. log(mass) spline (Var5)Fullerene_RhoVsLogMspline_Var8()- Fullerene density vs. log(mass) spline (Var8)Fullerene_RhoVsLogMspline_Var8_old()- Legacy Fullerene mass-to-density splineFullerene_RhoVsLogDspline_Var2()- Fullerene diameter-to-density spline (Var2)Fullerene_RhoVsLogDspline_Var5()- Fullerene diameter-to-density spline (Var5)Fullerene_RhoVsLogDspline_Var8()- Fullerene diameter-to-density spline (Var8)Fullerene_RhoVsLogDspline_Var8_old()- Legacy Fullerene diameter-to-density spline
Glassy Carbon Calibration Functions:
GlassyCarbonAlpha_Mass2Diam()- Glassy carbon mass-to-diameter conversion coefficientsGlassyCarbonAlpha_Diam2Mass()- Glassy carbon diameter-to-mass conversion coefficients
sp2xr.flag_single_particle_data
Quality control flagging based on instrument parameters.
Functions:
define_flags(df, flag_config)- Define quality control flags (transit time, FWHM, peak intensity, etc.)add_thin_thick_flags(df, lag_threshold)- Add thin/thick coating classification flags based on lag time
Constants:
FLAG_COLS- List of flag column names used in quality filtering
sp2xr.distribution
Size/mass distribution calculations.
Main Functions:
process_histograms(ddf, config, inc_bins, inc_ctrs, scatt_bins, scatt_ctrs, timelag_bins, timelag_ctrs, chunk_start, client)- Calculate size/mass distributions (main workflow function)process_hist_and_dist_partition(partition, dt_s, inc_mass_bin_lims, scatt_bin_lims, timelag_bins_lims, calc_dist, flow_dt)- Process histogram and distribution for a single partition
Bin Utilities:
make_bin_arrays(min_val, max_val, n_bins, log_scale=True)- Create bin arrays for histogramsbin_lims_to_ctrs(bin_lims)- Convert bin limits to bin centersbin_ctrs_to_lims(bin_ctrs)- Convert bin centers to bin limitsget_dlogp(bin_lims)- Calculate log bin widths (for dN/dlogDp calculations)
Distribution Calculations:
calculate_histogram(series, bins, dt_s, flow_sccm)- Calculate histogram from series with time/flow normalizationcounts2numConc(counts, dlogDp, dt_s, flow_sccm)- Convert counts to number concentration (dN/dlogDp)dNdlogDp_to_dMdlogDp(dNdlogDp, dp_nm, rho)- Convert number distribution to mass distribution
Metadata:
make_hist_meta(bin_ctrs, dt_index)- Create metadata DataFrame for histogram output
sp2xr.resample_pbp_hk
Time resampling functions for data aggregation.
Functions:
build_dt_summary(pdf, dt_s)- Resample single-particle data to time bins (aggregate particle counts)resample_hk_partition(pdf, dt)- Partition-wise resampling of housekeeping data to specified time resolutionjoin_pbp_with_flow(ddf_pbp, flow_series, config)- Join particle data with flow measurementsaggregate_dt(ddf_pbp_dt, ddf_hk_dt, config)- Aggregate PbP and HK data at specified time resolution
sp2xr.concentrations
Concentration calculations for different particle types.
Functions:
add_concentrations(df, dt)- Add BC mass concentration, scattering mass concentration, and number concentration columns
sp2xr.helpers
Utility functions for file handling, argument parsing, cluster initialization, and configuration management.
Configuration Management
load_and_resolve_config(args)- Load and merge YAML configuration with command-line argumentsload_yaml_cfg(path)- Load YAML configuration fileparse_args()- Parse command-line arguments for pipeline scriptsapply_sets(config, set_args)- Apply command-line overrides (--set) to configurationget(config, key_path, default=None)- Get nested configuration value using dot notationchoose(cli_val, config, key_path, default)- Choose between CLI argument and config valuevalidate_config_compatibility(config)- Validate configuration compatibility and consistency
Cluster/Dask Management
initialize_cluster(config)- Initialize Dask cluster (local or SLURM)make_slurm_cluster(config)- Create SLURM cluster with specified resourcesmake_local_cluster(config)- Create local Dask cluster with auto-detected resources
File Operations
find_files(directory, pattern)- Recursively find files matching patternfind_matching_hk_file(pbp_file, hk_dir)- Find housekeeping file matching a PbP fileextract_base_filename(file_path)- Extract base filename from SP2XR file pathextract_sp2xr_filename_parts(filename)- Extract filename components (timestamp, instrument ID, etc.)
Time/Partition Management
extract_partitioned_datetimes(parquet_path)- Extract timestamps from partitioned Parquet pathsget_time_chunks_from_range(start, end, freq)- Generate time chunk tuples for processingdelete_partition_if_exists(output_path, partition_values)- Delete specific partition directoryfloor_index_to_dt(df, dt_s)- Replace DatetimeIndex with lower-second floored valuescalculate_delta_sec(time1, time2)- Calculate time difference in secondsextract_datetime(filename)- Extract datetime from SP2XR filename
INI/YAML Conversion
read_xr_ini_file(ini_path)- Read SP2-XR .ini calibration filefind_and_validate_ini_files(directory)- Find and validate .ini files (ensure consistency)export_xr_ini_to_yaml(ini_path, yaml_path)- Convert .ini file to YAML formatexport_xr_ini_to_yaml_with_source(ini_path, yaml_path)- Convert .ini to YAML with source metadata
Utilities
chunks(lst, n)- Yield successive n-sized chunks from listpartition_rowcount(ddf)- Count total rows in Dask DataFrame
sp2xr.schema
Data schema definitions and type enforcement for SP2XR data streams.
Constants:
CANONICAL_DTYPES- Dictionary mapping column names to canonical data typesDEFAULT_FLOAT- Default float dtype for numeric columns
Functions:
enforce_schema(df)- Cast DataFrame columns to canonical data typescast_and_arrow(df)- Cast to canonical dtypes and convert to PyArrow backend