# API Reference ## Core Modules ### `sp2xr.io` Input/output operations for data conversion and processing. **Functions:** - `csv_to_parquet(source_dir, target_dir, config_file, filter_pattern)` - Convert CSV/ZIP files to Parquet format - `process_sp2xr_file(file_path, config_file, output_dir)` - Process individual SP2XR data files - `read_sp2xr_csv(file_path, schema, **kwargs)` - Read SP2XR CSV/ZIP files with schema validation - `load_matching_hk_file(pbp_file, hk_dir, hk_schema)` - Load housekeeping file matching a PbP file - `enrich_sp2xr_dataframe(df, filename)` - Add time-derived and metadata columns (date, hour, filename) - `save_sp2xr_parquet(df, output_dir, partition_cols)` - Save DataFrame to partitioned Parquet format --- ### `sp2xr.calibration` Complete calibration workflow with flags and processing. **Calibration Functions:** - `calibrate_single_particle(ddf, instr_config, run_config)` - Complete calibration workflow with quality flags - `calibrate_particle_data(df, config)` - Apply scattering and incandescence calibrations to particle data - `apply_calibration(df, config)` - Legacy wrapper for backward compatibility - `apply_inc_calibration(df, calib_params)` - Apply incandescence calibration (BC mass calculation) - `apply_scatt_calibration(df, calib_params)` - Apply scattering calibration (optical diameter calculation) **Calibration Curve Functions:** - `polynomial(x, coeffs)` - Generic polynomial curve: `a0 + a1*x + a2*x^2 + ...` - `powerlaw(x, a, b)` - Generic power-law calibration: `a * x^b` **Mass/Diameter Conversion:** - `BC_mass_to_diam(mass_fg, material='fullerene')` - Convert BC mass (fg) to mass-equivalent diameter (nm) - `BC_diam_to_mass(diam_nm, material='fullerene')` - Convert BC diameter (nm) to mass (fg) - `mass2meqDiam(mass, rho_eff)` - Calculate mass-equivalent diameter from mass and effective density **Distribution Conversion:** - `dNdlogDp_to_dMdlogDp(dNdlogDp, dp_nm, rho)` - Convert number distribution to mass distribution --- ### `sp2xr.calibration_constants` Spline coefficient functions for BC calibration materials. **Core Functions:** - `SP2_calibCurveSpline(x, spline_coeffs)` - Spline interpolation for density calculations - `SP2_calibCurveSplineCheck(spline_coeffs)` - Validate spline coefficient dimensions **Aquadag Calibration Functions:** - `Aquadag_RhoVsLogMspline_Var25()` - Aquadag density vs. log(mass) spline coefficients (Var25) - `Aquadag_RhoVsLogMspline_Var8()` - Aquadag density vs. log(mass) spline coefficients (Var8) - `Aquadag_RhoVsLogDspline_Var25()` - Aquadag density vs. log(diameter) spline (Var25) - `Aquadag_RhoVsLogDspline_Var8()` - Aquadag density vs. log(diameter) spline (Var8) - `Aquadag_RhoVsLogDspline_Var8_old()` - Legacy Aquadag diameter-to-density spline **Fullerene Calibration Functions:** - `Fullerene_RhoVsLogMspline_Var2()` - Fullerene density vs. log(mass) spline (Var2) - `Fullerene_RhoVsLogMspline_Var5()` - Fullerene density vs. log(mass) spline (Var5) - `Fullerene_RhoVsLogMspline_Var8()` - Fullerene density vs. log(mass) spline (Var8) - `Fullerene_RhoVsLogMspline_Var8_old()` - Legacy Fullerene mass-to-density spline - `Fullerene_RhoVsLogDspline_Var2()` - Fullerene diameter-to-density spline (Var2) - `Fullerene_RhoVsLogDspline_Var5()` - Fullerene diameter-to-density spline (Var5) - `Fullerene_RhoVsLogDspline_Var8()` - Fullerene diameter-to-density spline (Var8) - `Fullerene_RhoVsLogDspline_Var8_old()` - Legacy Fullerene diameter-to-density spline **Glassy Carbon Calibration Functions:** - `GlassyCarbonAlpha_Mass2Diam()` - Glassy carbon mass-to-diameter conversion coefficients - `GlassyCarbonAlpha_Diam2Mass()` - Glassy carbon diameter-to-mass conversion coefficients --- ### `sp2xr.flag_single_particle_data` Quality control flagging based on instrument parameters. **Functions:** - `define_flags(df, flag_config)` - Define quality control flags (transit time, FWHM, peak intensity, etc.) - `add_thin_thick_flags(df, lag_threshold)` - Add thin/thick coating classification flags based on lag time **Constants:** - `FLAG_COLS` - List of flag column names used in quality filtering --- ### `sp2xr.distribution` Size/mass distribution calculations. **Main Functions:** - `process_histograms(ddf, config, inc_bins, inc_ctrs, scatt_bins, scatt_ctrs, timelag_bins, timelag_ctrs, chunk_start, client)` - Calculate size/mass distributions (main workflow function) - `process_hist_and_dist_partition(partition, dt_s, inc_mass_bin_lims, scatt_bin_lims, timelag_bins_lims, calc_dist, flow_dt)` - Process histogram and distribution for a single partition **Bin Utilities:** - `make_bin_arrays(min_val, max_val, n_bins, log_scale=True)` - Create bin arrays for histograms - `bin_lims_to_ctrs(bin_lims)` - Convert bin limits to bin centers - `bin_ctrs_to_lims(bin_ctrs)` - Convert bin centers to bin limits - `get_dlogp(bin_lims)` - Calculate log bin widths (for dN/dlogDp calculations) **Distribution Calculations:** - `calculate_histogram(series, bins, dt_s, flow_sccm)` - Calculate histogram from series with time/flow normalization - `counts2numConc(counts, dlogDp, dt_s, flow_sccm)` - Convert counts to number concentration (dN/dlogDp) - `dNdlogDp_to_dMdlogDp(dNdlogDp, dp_nm, rho)` - Convert number distribution to mass distribution **Metadata:** - `make_hist_meta(bin_ctrs, dt_index)` - Create metadata DataFrame for histogram output --- ### `sp2xr.resample_pbp_hk` Time resampling functions for data aggregation. **Functions:** - `build_dt_summary(pdf, dt_s)` - Resample single-particle data to time bins (aggregate particle counts) - `resample_hk_partition(pdf, dt)` - Partition-wise resampling of housekeeping data to specified time resolution - `join_pbp_with_flow(ddf_pbp, flow_series, config)` - Join particle data with flow measurements - `aggregate_dt(ddf_pbp_dt, ddf_hk_dt, config)` - Aggregate PbP and HK data at specified time resolution --- ### `sp2xr.concentrations` Concentration calculations for different particle types. **Functions:** - `add_concentrations(df, dt)` - Add BC mass concentration, scattering mass concentration, and number concentration columns --- ### `sp2xr.helpers` Utility functions for file handling, argument parsing, cluster initialization, and configuration management. #### Configuration Management - `load_and_resolve_config(args)` - Load and merge YAML configuration with command-line arguments - `load_yaml_cfg(path)` - Load YAML configuration file - `parse_args()` - Parse command-line arguments for pipeline scripts - `apply_sets(config, set_args)` - Apply command-line overrides (--set) to configuration - `get(config, key_path, default=None)` - Get nested configuration value using dot notation - `choose(cli_val, config, key_path, default)` - Choose between CLI argument and config value - `validate_config_compatibility(config)` - Validate configuration compatibility and consistency #### Cluster/Dask Management - `initialize_cluster(config)` - Initialize Dask cluster (local or SLURM) - `make_slurm_cluster(config)` - Create SLURM cluster with specified resources - `make_local_cluster(config)` - Create local Dask cluster with auto-detected resources #### File Operations - `find_files(directory, pattern)` - Recursively find files matching pattern - `find_matching_hk_file(pbp_file, hk_dir)` - Find housekeeping file matching a PbP file - `extract_base_filename(file_path)` - Extract base filename from SP2XR file path - `extract_sp2xr_filename_parts(filename)` - Extract filename components (timestamp, instrument ID, etc.) #### Time/Partition Management - `extract_partitioned_datetimes(parquet_path)` - Extract timestamps from partitioned Parquet paths - `get_time_chunks_from_range(start, end, freq)` - Generate time chunk tuples for processing - `delete_partition_if_exists(output_path, partition_values)` - Delete specific partition directory - `floor_index_to_dt(df, dt_s)` - Replace DatetimeIndex with lower-second floored values - `calculate_delta_sec(time1, time2)` - Calculate time difference in seconds - `extract_datetime(filename)` - Extract datetime from SP2XR filename #### INI/YAML Conversion - `read_xr_ini_file(ini_path)` - Read SP2-XR .ini calibration file - `find_and_validate_ini_files(directory)` - Find and validate .ini files (ensure consistency) - `export_xr_ini_to_yaml(ini_path, yaml_path)` - Convert .ini file to YAML format - `export_xr_ini_to_yaml_with_source(ini_path, yaml_path)` - Convert .ini to YAML with source metadata #### Utilities - `chunks(lst, n)` - Yield successive n-sized chunks from list - `partition_rowcount(ddf)` - Count total rows in Dask DataFrame --- ### `sp2xr.schema` Data schema definitions and type enforcement for SP2XR data streams. **Constants:** - `CANONICAL_DTYPES` - Dictionary mapping column names to canonical data types - `DEFAULT_FLOAT` - Default float dtype for numeric columns **Functions:** - `enforce_schema(df)` - Cast DataFrame columns to canonical data types - `cast_and_arrow(df)` - Cast to canonical dtypes and convert to PyArrow backend