# Automatic Processing tool (ap)
The Automatic Processing (ap) tool, designed to streamline data processing and logbook management during beamtime, is a vital component for SwissFEL experiments. This tool seamlessly integrates with [sf-daq](https://github.com/paulscherrerinstitute/sf_daq_broker) output files, automating tasks like indexing via crystfel and logbook population in Google Spreadsheets.
# Table of Contents
* [Usage](#usage)
* [Before beamtime](#usage1)
* [During beamtime](#usage2)
* [After beamtime](#usage3)
* [Configuration files](#config)
* [Google Authentication](#google-api)
* [Installation](#installation-from-source)
## Description
The Automatic Processing tool continuously monitors and processes files generated by sf-daq. It automatically executes tasks like indexing (utilizing [crystfel](https://www.desy.de/~twhite/crystfel/)) and populates a designated Google Spreadsheet with relevant experiment parameters. This simplifies data processing and documentation, enhancing the efficiency of beamtime operations.
## Pre-Installed Package (PSI)
The Automatic Processing tool is installed in **/sf/jungfrau/applications/ap**, and it's recommended to utilize it from this directory (all examples provided below will use this software path).
To activate the corresponding conda environment, use the following commands:
```
$ source /sf/jungfrau/applications/miniconda3/etc/profile.d/conda.sh
$ conda activate ap
```
## Usage
### Before beamtime
Before the beamtime starts, follow these steps:
1. Run **prepare.sh** script from res/ directory:
```bash
cd p12345/res
/sf/jungfrau/applications/ap/scripts/prepare.sh
```
2. Modify the configuration files:
* BEAM_ENERGY.txt
* DETECTOR_DISTANCE.txt
* env_setup.sh
* run_index.sh
3. create file .geom with the crystfel geometry file for detector.
4. Prepare cell files for the exposed proteins in the ap_config/CELL directory.
```bash
$ ls res/ap_config/CELL
lyso.cell hewl.cell
```
**HINT**: To automate indexing in an alternative space group, provide an alternative space group settings in the .cell_alternative file.
5. Create an empty Google Spreadsheet and corresponding [credentials files](#google-api).
```
ls res/ap_config/credentials*json
credentials.json credentials-1.json credentials-2.json credentials-3.json
```
6. Grant necessary access to the spreadsheet for service accounts. To find e-mails of the service accounts:
```
grep client_email credentials*json
```
7. Edit env_setup.sh file to fill URL_TO_GOOGLE_SPREADSHEET(https://...) to the LOGBOOK variable.
8. Setup/prepare spreadsheet for automatic filling:
```
. ./env_setup.sh
python /sf/jungfrau/applications/ap/ap/update-spreadsheet.py --setup --url ${LOGBOOK}
```
### During Beamtime
1. Check that input detector parameters of for sf-daq contains:
* adc_to_energy : True
* save_dap_results : True
* crystfel_lists_laser : True
* geometry: False
Optional(to reduce files sizes):
* compression: True
* factor: 0.25
2. Login to SwissFEL online computing infrastructure with your personal PSI account:
```bash
ssh psi_account@sf-l-001
cd /sf/alvra/data/p12345/res/ap_config
```
3. Start the automatic processing tool execution:
```bash
/sf/jungfrau/applications/ap/scripts/ap.sh
```
**HINT**: It's recommended to start this process in a screen or tmux session.
4. stop the automatic processing tool:
* if running from your account : Ctrl-C in corresponding session
* if running by other account, use:
```
touch /sf/alvra/data/p12345/res/ap_config/STOP
```
### Possible actions during beamtime:
#### changes in configuration files
Changes to configuration files can be made at any time, and new processing jobs will automatically consider these updated values.
#### re-processing of already processed runs
If re-running indexing becomes necessary due to new configuration parameters or updated files:
1. Ensure previous indexing jobs for these runs are finished (check CURRENT_JOBS.txt file in the config directory or use "squeue"command).
2. Identify the **unique_acquisition_run_number**s associated with the specific scan (e.g., scan number 206).
3. Remove the corresponding files for these runs from the output/ directory to initiate re-indexing.
Example:
```
# Scan number 206 (raw/run0206*/ directory with data) needs to be re-indexed. # Scan contains 24 steps.
# corresponding **unique_acquisition_run_number** are 4048-4071
grep unique_acquisition_run_number raw/run0206*/meta/acq*.json
#or look at logbook, **unique_acquisition_run_number** is the first column of # spreadsheet
#check that there are no jobs with such numbers/name running, looking at #CURRENT_JOBS.txt file or *squeue*
#
# remove res/ap_config/output/run*4048-4071*.index* files to re-run indexing # for that scan
```
#### pausing indexing
To pause indexing due to unknown processing parameters:
* Create a semaphore file named NO_INDEXING in the config directory.
```bash
touch res/ap_config/NO_INDEXING
```
Once this file is removed, all not indexed runs will resume processing by the tool.
### After Beamtime
Upon completing the beamtime activities, follow these steps:
1. Stop the Automatic Processing:
* Terminate the automatic processing executable by using Ctrl-C or creating file STOP in running directory, once all runs are processed for this beamtime
2. Remove Credentials Files and Revoke API Keys:
* Remove all credentials*json files.
* Revoke API keys associated with the Automatic Processing from the [Google Developer Console](https://console.developers.google.com/)
## Configuration files
### BEAM_ENERGY.txt
This file should contain beam energy values in electronvolts(eV). Ensure there's one line with the default value, allowing the definition of specific beam energy values for individual runs (scans).
Example:
```plain text
DEFAULT 11330.0
run9876 11001.2
run9870 12015.1
```
For runs 9876 and 9870, photon beam energies of 11001.2 and 12015.1 eV will be used, respectively. For any other run, 11330.0 eV will be applied as the default value.
### DETECTOR_DISTANCE.txt
This file should contain the detector distance (from sample to detector) in meters. The format is similar to the BEAM_ENERGY.txt file.
Example
```
DEFAULT 0.09369
run9988 0.09212
run9977 0.09413
```
For runs 9988 and 9977, detector distances of 9.212 cm and 9.413 cm will be used, respectively. For all other runs, the default value of 9.369 cm will be applied.
### env_setup.sh
This file should be manually filled during the [preparation step](#usage1) with proper values for:
* Beamline name (e.g., alvra or bernina)
* Pgroup name (e.g., p12345)
* DETECTOR_NAME (e.g., JF17T16V01) used in the experiment
* THRESHOLD_INDEXING (modifiable in run_index.sh file; see latter)
* LOGBOOK (URL to the Google Spreadsheet used for automatic filling)
D
### run_index.sh
This file contains indexing parameters utilized by crystfel.
**HINT**: If multiple proteins are used during the experiment, different indexing parameters can be defined for each. The presence of a run_index..sh file uses parameters specific to the protein sample; otherwise, run_index.sh parameters are applied as the default.
## Google Authentication
To enable automatic filling of the Google Spreadsheet, follow these steps:
* [enable API access for a project](https://docs.gspread.org/en/v5.10.0/oauth2.html#enable-api-access-for-a-project)
* [create service accounts](https://docs.gspread.org/en/v5.10.0/oauth2.html#for-bots-using-service-account) (steps 1-4)
## Installation from source
The Automatic Processing tool can be installed from scratch. Place it in a location accessible from online computing nodes and reachable by individuals intending to use the ap tool.
Steps for Installation:
1. Clone the repository using HTTPS or SSH:
```bash
git clone https://gitlab.psi.ch/sf-daq/ap.git
# or via ssh with
git clone git@gitlab.psi.ch:sf-daq/ap.git
```
2. If setting up a new conda environment, ensure the installation of the following packages within that environment:
* gspread
* numpy
* matplotlib
When installing from source, remember to make corresponding changes to the lines at the end of the env_setup.sh file.
## Roadmap
From 2018 to 2023, the Automatic Processing service was a vital component of Serial Femtosecond Crystallography (SFX) experiments conducted across various SwissFEL beamlines, including Alvra, Bernina (and SwissMX), and Cristallina. During this period, the authors of the code actively managed the tool, facilitating rapid changes and seamless integration with other experiment components. This collaborative effort ensured the continuous refinement and adaptation of the tool to meet the specific needs of users.
Significant strides were made in 2013 to bifurcate the tool into configuration and executable components. Notably, during a beamtime in June at Cristallina, the configuration part was entirely managed by beamline personnel. Subsequently, in July, successful tests were conducted at Alvra, where the executable part was overseen by the beamtime personnel. These achievements paved the way for contemplating a migration of this service into a more comprehensive tool.
Initially designed for SFX experiments due to their demanding nature, the Automatic Processing tool has demonstrated adaptability and potential for expansion to accommodate other experiment types at SwissFEL. The tool's versatility and robustness lay the groundwork for its potential application in diverse experimental setups beyond SFX.
## Authors and reference
Automatic Processing tool was made in 2018 by Karol Nass and Dmitry Ozerov.
Automatic processing pipeline is described in "*Nass, K. et al. Pink-beam serial femtosecond crystallography for accurate structure-factor determination at an X-ray free-electron laser. IUCrJ 8, 905–920 (2021).*" This paper can be referenced in case usage of automatic pipeline was helpful in beamtime.