Update README.md

This commit is contained in:
ozerov_d
2023-08-18 13:42:08 +02:00
parent 41a5fefaf7
commit d767351a0d

141
README.md
View File

@ -4,19 +4,150 @@ Automatic Processing tool
Runs on files produced by [sf-daq](https://github.com/paulscherrerinstitute/sf_daq_broker)
# Table of Contents
* [Installation](#installation)
* [Configuration files](#config)
* [Google Authentication](#google-api)
* [Usage](#usage)
* [Before beamtime](#usage1)
* [During beamtime](#usage2)
* [start/stop](#usage2_start)
* [changes in configuration files](#usage2_config)
* [data re-processing](#usage2_reprocess)
* [pausing indexing](#usage2_pause)
* [After beamtime](#usage3)
## Description
Automatic Processing tool checks for the new files/runs produced by sf-daq and runs automatically workload (currently - indexing (from crystfel)) on them and filling logbook (google spreadsheet) with information from the sf-daq data taking and processing.
Automatic Processing tool checks for the new files/runs produced by sf-daq and runs automatically workload (currently - indexing (from crystfel)) and fills logbook (google spreadsheet) with information with some daq parameters from the sf-daq and processing.
## Installation
## Installation<a name="installation"></a>
## Usage
### Pre-installed software (recommended)
Automatic Processing tool is installed in /sf/jungrau/applications/ap place and it's recommended to use if from that place (all examples below will be using that tool place).
### Installation from source
git, conda (for gspread)
## Configuration files<a name="config"></a>
Here description of configuration files will be, together with format and hints of usage
## Google Authentication (optional)<a name="google-api"></a>
ap can fill automatically google spreadsheet with different information. This is done using google-api and one need to have api-keys created and allowed for the corresponding spreadsheet (logbook). To create keys, few steps needs to be done first:
- [enable API access for a project](https://docs.gspread.org/en/v5.10.0/oauth2.html#enable-api-access-for-a-project)
- [create (*hint* - do several for same project) service accounts](https://docs.gspread.org/en/v5.10.0/oauth2.html#for-bots-using-service-account) (steps 1-4)
## Usage<a name="usage"></a>
### Before beamtime<a name="usage1"></a>
* make directory in res/ space of corresponding pgroup and fill it with configuration files (name **ap** is used as directory name in examples below, but any name can be choosen):
> $ mkdir p12345/res/ap
>
> $ cd p12345/res/ap
>
> $ /sf/jungfrau/applications/ap/scripts/prepare.sh
* make corresponding changes in the configuration files (see section [Configuration files]):
* BEAM_ENERGY.txt
* DETECTOR_DISTANCE.txt
* env_setup.sh
* run_index.sh
* create file <DETECTOR_NAME>.geom (DETECTOR_NAME is variable defined by you in env_setup.sh file) with the crystfel geometry file for corresponding detector (example : JF17T16V01.geom file for CrystallinaMX instrument)
* put in ap/CELL directory cell files of the protein which will be exposed during beamtime (format of the files should be readable by crystfel). Name of the cell files needs to be <cell_name>.cell.
> $ ls res/ap/CELL
>
> lyso.cell hewl.cell
* (optional, if filling is requested)
* create (an empty) google spreadsheet
* create (several distinct) credentials files (see section [google authentication] how to create service accounts if not done before) and store them with the names in the config directory (it's important to have file with name credentials.json and have few(3 is enough) with names credentials-1.json, credentials-2.json...):
> $ ls res/ap/credentials*json
>
>credentials.json credentials-1.json credentials-2.json credentials-3.json
***RECOMMENDATION*** - use/generate new credentials files for each beamtime to not expose experiment information
* give write access to the google spreadsheet to the service-accounts (recommended) or give full editor access to all who know url of the logbook(quicker, but not recommended action). To find e-mails of the service accounts:
> $ grep client_email credentials*json
* setup/prepare spreadsheet for automatic filling:
> $ source /sf/jungfrau/applications/miniconda3/etc/profile.d/conda.sh
>
> $ conda activate sf-dap
>
> $ python /sf/jungrau/applications/ap/ap/update-spreadsheet.py --setup --log URL_TO_GOOGLE_SPEADSHEET(https://...)
* edit env_setup.sh file to fill URL_TO_GOOGLE_SPREADSHEET to the LOGBOOK variable
### During Beamtime<a name="usage2"></a>
#### start/stop automatic processing tool:<a name="usage2_start"></a>
* login to swissfel online computing infrastructure with your personal PSI account:
> $ ssh psi_account@sf-l-001
* go to the directory with configuration files:
> $ cd /sf/alvra/data/p12345/res/ap
* start automatic processing tool execution
> $ /sf/jungfrau/applications/app/scripts/ap.sh
***HINT*** - best is to start this process in screen or tmux session, to be able to re-connect to this session remotely
* stop automatic processing tool:
* if running from your account : Ctrl-C in corresponding session
* if running by other account - put file STOP inside configuration directory
> $ touch /sf/alvra/data/p12345/res/ap/STOP
(if such file is present inside directory - new automatic processing tool will not start, so remove file before re-starting the tool)
#### changes in configuration files <a name="usage2_config"></a>
can be done at any time and new processing jobs will take new values
#### re-processing of already processed runs <a name="usage2_reprocess"></a>
in case of need to re-run indexing (new config parameters, new geometry file etc) - first make sure that previous indexing jobs for these runs are finished (check CURRENT_JOBS.txt file in config directory or run "squeue"). If they are finished - remove corresponding to the runs (please note that run number is **unique_acquisition_run_number**, not scan number) files from output directory. Example:
> scan number 206 (raw/run0206*/ directory with data) needs to be re-indexed. Scan contains 24 steps.
> corresponding **unique_acquisition_run_number** are 4048-4071
>
> $ grep unique_acquisition_run_number raw/run0206*/meta/acq*.json
>
> or look at logbook, **unique_acquisition_run_number** is the first column of spreadsheet
>
> check that there are no jobs with such numbers/name running, looking at CURRENT_JOBS.txt file or *squeue*
>
> remove ap/output/run*4048-4071*.index* files to re-run indexing for that scan
#### pausing indexing<a name="usage2_pause"></a>
in case of unknown processing parameters (detector distance, geometry file(beam center), not yet known cell file...), it's possible to pause (not start indexing jobs) putting semaphore file NO_INDEXING in config directory
> $ touch res/ap/NO_INDEXING
once this file is removed - all not indexed runs will be processed
### After Beamtime<a name="usage3"></a>
* stop automatic processing executable (Ctrl-c ap.py process) once all runs are processed for this beamtime
* remove credentials*json files and revoke api-keys in [Google Developer Console](https://console.developers.google.com/) (->"Service Accounts", for each account -> click on "Actions: ..." and choose "Manage Keys", then remove key)
* revoke write access to to google spreadsheet to the service-accounts used by ap
## Roadmap
For all SFX experiments at SwissFEL (Alvra, Bernina(SwissMX and pure Bernina one) and Cristallina) this service was used in years 2018-2023 and were running by authors of the code, which helped to make a fast changes and integration with other components as well as successful tuning this product to users needs. In 2013 successful steps were made to split tool to config and executable parts and beamtimes in June at Cristallina were running with config part fully under control of beamline people, in July - executable part was tested to be running under control of beamtime people of Alvra. That opens a possibility to start a migration of this service to tool.
For all SFX experiments at SwissFEL (Alvra, Bernina(SwissMX and pure Bernina one) and Cristallina) this service was used in years 2018-2023 and were running by authors of the code, which helped to make a fast changes and integration with other components as well as successful tuning this product to users needs. In 2013 successful steps were made to split tool to config and executable parts and beamtimes in June at Cristallina were running with config part fully under control of beamline people, in July - executable part was tested to be running under control of beamtime people of Alvra. That opens a possibility to start a migration of this service to tool.
Till now Automatic Processing were used for SFX experiments only, since they were a more demanding for this tool. But enhancement of tool to other types of experiments at SwissFEL is certainly possible.
## Authors and acknowledgment
Automatic Processing tool was made in 2018 by Dmitry Ozerov and Karol Nass.
Automatic Processing tool was made in 2018 by Karol Nass and Dmitry Ozerov.