# Conversion of SP2-XR *PbP* and *HK* .csv/.zip files to .parquet ## Overview Different SP2-XR instrument versions may use different column names in their CSV/ZIP data files. To ensure consistent data processing, the SP2-XR package now supports automatic column name standardization during CSV to Parquet conversion. ## How It Works 1. **Input**: Instrument-specific CSV/ZIP files with their original column names 2. **Mapping**: A configuration file maps your column names to canonical (standard) column names 3. **Output**: Parquet files with standardized column names for consistent downstream processing ## Usage ### Step 1: Generate Enhanced Config Template Use `generate_config.py` to create a configuration template: ```python from meta_files.generate_config import generate_mapping_template # Generate config with column mappings generate_mapping_template("your_pbp_file.csv", "your_hk_file.csv", "config_with_mapping.yaml") ``` ### Step 2: Customize Column Mappings Open the generated `config_with_mapping.yaml` and update the column mappings: ```yaml pbp_column_mapping: # Canonical name -> Your file's column name Time (sec): "Time_Seconds" # Replace with your actual column name Particle Flags: "Flags" # Replace with your actual column name Incand Mass (fg): "Mass_fg" # Replace with your actual column name # ... etc hk_column_mapping: Time Stamp: "Timestamp" # Replace with your actual column name Time (sec): "Time_Seconds" # Replace with your actual column name # ... etc ``` ### Step 3: Convert CSV to Parquet with Mapping Use your customized config with the CSV to Parquet conversion: ```bash python scripts/sp2xr_csv2parquet.py --source /path/to/csv --target /path/to/parquet --config config_with_mapping.yaml ``` ## Configuration File Structure The enhanced config file contains several sections: - **`pbp_schema`** / **`hk_schema`**: Data types for your input files - **`pbp_canonical_schema`** / **`hk_canonical_schema`**: Standard column schemas used by SP2-XR processing - **`pbp_column_mapping`** / **`hk_column_mapping`**: Maps canonical names to your file's column names ## Column Mapping Rules 1. **Exact matches**: If your column names exactly match canonical names, they're automatically mapped 2. **Custom mapping**: Replace placeholder values with your actual column names 3. **Missing columns**: Set to `null` or remove the line if your data doesn't have that column 4. **Extra columns**: Unmapped columns in your files are preserved as-is