# Conversion of SP2-XR *PbP* and *HK* .csv/.zip files to .parquet

## Overview
Different SP2-XR instrument versions may use different column names in their CSV/ZIP data files. To ensure consistent data processing, the SP2-XR package now supports automatic column name standardization during CSV to Parquet conversion.

## How It Works
1. **Input**: Instrument-specific CSV/ZIP files with their original column names
2. **Mapping**: A configuration file maps your column names to canonical (standard) column names
3. **Output**: Parquet files with standardized column names for consistent downstream processing

## Usage

### Step 1: Generate Enhanced Config Template
Use `generate_config.py` to create a configuration template:

```python
from meta_files.generate_config import generate_mapping_template

# Generate config with column mappings
generate_mapping_template("your_pbp_file.csv", "your_hk_file.csv", "config_with_mapping.yaml")
```

### Step 2: Customize Column Mappings
Open the generated `config_with_mapping.yaml` and update the column mappings:

```yaml
pbp_column_mapping:
  # Canonical name -> Your file's column name
  Time (sec): "Time_Seconds"  # Replace with your actual column name
  Particle Flags: "Flags"     # Replace with your actual column name
  Incand Mass (fg): "Mass_fg" # Replace with your actual column name
  # ... etc
  
hk_column_mapping:
  Time Stamp: "Timestamp"     # Replace with your actual column name
  Time (sec): "Time_Seconds"  # Replace with your actual column name
  # ... etc
```

### Step 3: Convert CSV to Parquet with Mapping
Use your customized config with the CSV to Parquet conversion:

```bash
python scripts/sp2xr_csv2parquet.py --source /path/to/csv --target /path/to/parquet --config config_with_mapping.yaml
```

## Configuration File Structure

The enhanced config file contains several sections:

- **`pbp_schema`** / **`hk_schema`**: Data types for your input files
- **`pbp_canonical_schema`** / **`hk_canonical_schema`**: Standard column schemas used by SP2-XR processing
- **`pbp_column_mapping`** / **`hk_column_mapping`**: Maps canonical names to your file's column names

## Column Mapping Rules

1. **Exact matches**: If your column names exactly match canonical names, they're automatically mapped
2. **Custom mapping**: Replace placeholder values with your actual column names
3. **Missing columns**: Set to `null` or remove the line if your data doesn't have that column
4. **Extra columns**: Unmapped columns in your files are preserved as-is