Jungfraujoch/docs/JFJOCH_PROCESS.md

# jfjoch_process

`jfjoch_process` is the **offline** crystallographic data-analysis tool of Jungfraujoch.
It takes an existing HDF5 dataset, runs the full analysis pipeline — spot finding, indexing,
geometry refinement, Bragg integration and (optionally) scaling and merging — and writes the
results to a `_process.h5` file, plus reflection files (`.mtz`/`.cif`/`.hkl`) when merging is
requested.

It runs the *same* analysis code as the online and interactive tools, just driven from the
command line over a file rather than a live detector stream.

> **Note.** `jfjoch_process` is under very active development. This page describes the tool and
> its options at a high level; the authoritative, always-current list of options is the program's
> own usage message — run `jfjoch_process` with no arguments.

## Where it fits among the three analysis tools

| Tool | Mode | Driven by | Output |
| --- | --- | --- | --- |
| [`jfjoch_broker`](JFJOCH_BROKER.md) | Online, real-time streaming analysis on FPGA + GPU | HTTP/REST + ZeroMQ | Live results and statistics, images streamed to [`jfjoch_writer`](JFJOCH_WRITER.md) |
| [`jfjoch_viewer`](JFJOCH_VIEWER.md) | Interactive, on-screen exploration | Qt desktop application | Displayed on screen (results not saved to disk) |
| **`jfjoch_process`** | **Offline batch processing of a stored dataset** | **Command-line interface** | **`_process.h5`, and `.mtz`/`.cif`/`.hkl` when merging** |

Use `jfjoch_process` to re-analyse data after acquisition, to experiment with processing
parameters, or to produce merged intensities for downstream structure solution.

## Hardware

As with the rest of Jungfraujoch, **serious performance requires an NVIDIA GPU**. The CUDA build
provides the GPU fast-feedback indexer (`ffbidx`) and the GPU FFT indexer (`fft`); without CUDA
only the CPU `fftw` indexer is available. Spot finding, integration and scaling run on the CPU and
scale with the thread count (`-N`).

## Input and output

**Input** is a single Jungfraujoch HDF5 master file (NXmx-based). If the dataset already contains
stored spot lists, two-pass rotation indexing can reuse them instead of re-running spot finding on
the first pass.

**Output** (controlled by `-o, --output-prefix`, default `output`):

- `<prefix>_process.h5` — NXmx-compliant HDF5 with derived metadata (spots, indexing,
  integration, azimuthal integration, per-image statistics). See
  [HDF5 / NeXus data format](HDF5.md) for the layout.
- When merging (`-M`, or whenever a `--reference-mtz` is supplied), the merged reflections are
  written as `<prefix>.mtz` (default), or `<prefix>.cif` / `<prefix>.hkl` depending on
  `--scaling-output`. No-reference scaling additionally emits per-iteration `<prefix>_iterN_scale.dat`.

Merged statistics (⟨I/σ⟩, CC1/2, completeness, …), the error model and timing are printed to the
console.

## Re-scaling and re-merging (`jfjoch_scale`)

The companion tool `jfjoch_scale` re-scales and merges the *already-integrated* reflections stored
in one or more `_process.h5` files, without re-running spot finding or integration. Use it to
re-merge quickly with a different space group, partiality model, resolution limit or reference MTZ,
or to combine several processed runs into one set of merged intensities.

## Quick start

### Rotation data

Two-pass rotation indexing, rotation partiality, scale and merge in space group 96:

```
jfjoch_process rotation_master.h5 \
    -o lyso_rot -N 16 \
    -R -S 96 \
    -M -P rot
```

`-R` runs the two-pass rotation indexer (index the sweep once, then process every frame against
that lattice); `-P rot` selects the rotation partiality model; `-M` scales and merges. For strong
rotation data the de-novo FFT indexer often indexes more frames — add `-X fft` (and drop `-C` to
let it find the cell from scratch).

### Still / serial data

Known-cell indexing of independent stills with the GPU fast-feedback indexer, then merge against a
reference structure:

```
jfjoch_process serial_master.h5 \
    -o lyso_serial -N 16 \
    -X ffbidx -C 79,79,38,90,90,90 -S 96 \
    --spot-sigma 4 \
    -M -z reference.mtz -r pixelrefine \
    --scaling-high-resolution 1.8
```

`ffbidx` requires a known cell (`-C`) and is the indexer of choice for sparse serial stills.
`-r pixelrefine` selects the experimental reference-driven still integrator (needs
`--reference-mtz`). For weak serial data, tightening spot finding with `--spot-sigma 4` typically
raises the indexing rate substantially.

## Command-line options

General:

| Option | Description |
| --- | --- |
| `-o, --output-prefix <txt>` | Output file prefix (default: `output`) |
| `-N, --threads <num>` | Number of worker threads (default: 1) |
| `-s, --start-image <num>` | First image to process (default: 0) |
| `-e, --end-image <num>` | Last image to process (default: all) |
| `-t, --stride <num>` | Process every *n*-th image (default: 1) |
| `-v, --verbose` | Verbose output |

Spot finding:

| Option | Description |
| --- | --- |
| `--spot-sigma <num>` | Noise sigma level for spot finding (default: 3.0) |
| `--spot-threshold <num>` | Photon-count threshold for spot finding (default: 10) |
| `--spot-high-resolution <num>` | High-resolution limit for spot finding, Å (default: 1.5) |
| `--max-spots <num>` | Maximum spot count (default: 250) |

Indexing:

| Option | Description |
| --- | --- |
| `-X, --indexing-algorithm <txt>` | `FFBIDX` \| `FFT` \| `FFTW` \| `Auto` \| `None` |
| `-C, --unit-cell <cell>` | Reference unit cell `"a,b,c,alpha,beta,gamma"` (required by `ffbidx`) |
| `-S, --space-group <num>` | Space group number (used for indexing and scaling) |
| `-r, --refine <txt>` | Geometry refinement: `none` \| `orientation` \| `beam_and_lattice` (default) \| `pixelrefine` |
| `-R, --two-pass-rotation[=num]` | Two-pass offline rotation indexing (optional image count, default 30) |
| `--single-pass-rotation[=num]` | Online-like single-pass rotation indexing (optional min angular range, deg) |
| `--redo-rotation-spots` | Redo spot finding for the two-pass rotation first pass |
| `--force-rotation-lattice <vec>` | Force rotation lattice (9 floats, Å), skipping the first pass |

Indexer choice in brief: `ffbidx` (GPU) refines toward a **known cell** and is best for sparse
serial stills; `fft` (GPU) / `fftw` (CPU) index **de novo** and suit strong rotation data. See the
[CPU/GPU data-analysis reference](CPU_DATA_ANALYSIS.md) for the algorithms.

Scaling and merging:

| Option | Description |
| --- | --- |
| `-M, --scale-merge` | Scale and merge |
| `-P, --partiality <txt>` | Partiality model: `fixed` (default) \| `rot` \| `unity` |
| `-A, --anomalous` | Anomalous mode (keep Friedel pairs separate) |
| `-B, --refine-bfactor` | Refine a per-image B-factor |
| `-w, --wedge[=num]` | Refine the per-image rotation wedge (optional starting value) |
| `--scaling-high-resolution <num>` | High-resolution limit for scaling, Å (default: no limit) |
| `--min-partiality <num>` | Minimum partiality to accept a reflection (default: 0.02) |
| `--reject-outliers <num>` | Per-observation outlier rejection, N σ from the per-reflection median (default: off) |
| `--reject-delta-cchalf <num>` | Drop images with ΔCC1/2 below mean − N·stddev (default: off) |
| `--min-image-cc <num>` | Per-image CC limit, percent (default: no limit) |
| `--scaling-iterations <num>` | Scaling iterations with no reference data (default: 3) |
| `--scaling-output <txt>` | Reflection output format: `mtz` (default) \| `cif` \| `txt` |
| `-z, --reference-mtz <file>` | Reference MTZ (enables reference-driven scaling) |

Pixel refinement (experimental; select with `-r pixelrefine`, requires `--reference-mtz`):

| Option | Description |
| --- | --- |
| `--bandwidth <num>` | Relative X-ray bandwidth FWHM (e.g. `0.01` for a 1% DMM); default from file or 0 (monochromatic) |
| `--integration-radius <r>` | Signal-box radius `r1`, or `r1,r2,r3` (px). One value ⇒ `r2=r1+2`, `r3=r1+4` |
| `--profile-multiplier <num>` | Scale the measured tangential profile width (default: 6; XDS-style generous aperture) |