Update README.md

This commit is contained in:
ozerov_d
2023-09-05 12:38:49 +02:00
parent c790457eec
commit 7ee27417d8

150
README.md
View File

@ -26,48 +26,48 @@ Automatic Processing tool checks for the new files/runs produced by sf-daq and r
Automatic Processing tool is installed in /sf/jungfrau/applications/ap place and it's recommended to use if from that place (all examples below will be using that tool place). Automatic Processing tool is installed in /sf/jungfrau/applications/ap place and it's recommended to use if from that place (all examples below will be using that tool place).
Installed conda environment can be activated with Installed conda environment can be activated with
> $ source /sf/jungfrau/applications/miniconda3/etc/profile.d/conda.sh ```
> $ source /sf/jungfrau/applications/miniconda3/etc/profile.d/conda.sh
> $ conda activate ap $ conda activate ap
```
### Installation from source ### Installation from source
Automatic Processing tool can also be installed from scratch with: Automatic Processing tool can also be installed from scratch with:
> $ git clone https://gitlab.psi.ch/sf-daq/ap.git # or via ssh with "git clone git@gitlab.psi.ch:sf-daq/ap.git" ```
> $ git clone https://gitlab.psi.ch/sf-daq/ap.git # or via ssh with
> $ conda create -n ap gspread # git clone git@gitlab.psi.ch:sf-daq/ap.git
> $ conda create -n ap gspread
> $ conda activate ap $ conda activate ap
```
In case of installation from source, so different location of the code and conda environment - change correspondingly line in scripts/env_setup.sh In case of installation from source, so different location of the code and conda environment - change correspondingly line in scripts/env_setup.sh
## Configuration files<a name="config"></a> ## Configuration files<a name="config"></a>
### BEAM_ENERGY.txt ### BEAM_ENERGY.txt
This file should contain a beam energy values (in eV). There must be one line with the default value and it's possible to define beam energy values, different from defaults for specific runs(scans). Example: This file should contain a beam energy values (in eV). There must be one line with the default value and it's possible to define beam energy values, different from defaults for specific runs(scans).
> $ cat BEAM_ENERGY.txt
> Example:
> DEFAULT 11330.0 ```
> $ cat BEAM_ENERGY.txt
> run9876 11001.2 DEFAULT 11330.0
> run9876 11001.2
> run9870 12015.1 run9870 12015.1
> ```
(for the runs 9876 and 9870 - 11001.2 and 12015.1 photon beam energy will be used, while for any other - 11330.0 value) (for the runs 9876 and 9870 - 11001.2 and 12015.1 photon beam energy will be used, while for any other - 11330.0)
### DETECTOR_DISTANCE.txt ### DETECTOR_DISTANCE.txt
This file should contain a detector distance (from sample to detector) in meter. Format is similar to BEAM_ENERGY.txt file, so for example: This file should contain a detector distance (from sample to detector) in meter. Format is similar to BEAM_ENERGY.txt file, so for example:
> $ cat DETECTOR_DISTANCE.txt ```
> $ cat DETECTOR_DISTANCE.txt
> DEFAULT 0.09369 DEFAULT 0.09369
> run9988 0.09212
> run9988 0.09212 run9977 0.09413
> ```
> run9977 0.09413
>
(for runs 9988 and 9977 - 9.212cm and 9.413cm will be used as detector distance, for all other runs a default value of 9.369cm will be used) (for runs 9988 and 9977 - 9.212cm and 9.413cm will be used as detector distance, for all other runs a default value of 9.369cm will be used)
### env_setup.sh ### env_setup.sh
During preparation [step](#usage1) this file should be filled with the proper values for the beamline name(alvra or bernina or..), pgroup name (p12345), DETECTOR_NAME (JF17T16V01) used in experiment, THRESHOLD_INDEXING (can be changed, adapted, in run_index.sh file, see latter) and LOGBOOK (url to google spreadsheet which will be used for automatic filling) During preparation [step](#usage1) this file should be filled (manually) with the proper values for the beamline name(alvra or bernina or ..), pgroup name (p12345), DETECTOR_NAME (JF17T16V01) used in experiment, THRESHOLD_INDEXING (can be changed, adapted, in run_index.sh file, see latter) and LOGBOOK (url to google spreadsheet which will be used for automatic filling)
### run_index.sh ### run_index.sh
@ -85,14 +85,12 @@ this file contains indexing parameters used by crystfel.
### Before beamtime<a name="usage1"></a> ### Before beamtime<a name="usage1"></a>
* make directory in res/ space of corresponding pgroup and fill it with configuration files (name **ap** is used as directory name in examples below, but any name can be choosen): * make directory in res/ space of corresponding pgroup and populate it with configuration files (name **ap** is used as directory name in examples below, but any name can be choosen) using **prepare.sh** script:
```
> $ mkdir p12345/res/ap $ mkdir p12345/res/ap
> $ cd p12345/res/ap
> $ cd p12345/res/ap $ /sf/jungfrau/applications/ap/scripts/prepare.sh
> ```
> $ /sf/jungfrau/applications/ap/scripts/prepare.sh
* make corresponding changes in the configuration files (see section [Configuration files](#config)): * make corresponding changes in the configuration files (see section [Configuration files](#config)):
* BEAM_ENERGY.txt * BEAM_ENERGY.txt
@ -106,38 +104,39 @@ this file contains indexing parameters used by crystfel.
* create file <DETECTOR_NAME>.geom (DETECTOR_NAME is variable defined by you in env_setup.sh file) with the crystfel geometry file for corresponding detector (example : JF17T16V01.geom file for CrystallinaMX instrument) * create file <DETECTOR_NAME>.geom (DETECTOR_NAME is variable defined by you in env_setup.sh file) with the crystfel geometry file for corresponding detector (example : JF17T16V01.geom file for CrystallinaMX instrument)
* put in ap/CELL directory cell files of the protein which will be exposed during beamtime (format of the files should be readable by crystfel). Name of the cell files needs to be <cell_name>.cell. * put in ap/CELL directory cell files of the protein which will be exposed during beamtime (format of the files should be readable by crystfel). Name of the cell files needs to be <cell_name>.cell.
```
> $ ls res/ap/CELL $ ls res/ap/CELL
> lyso.cell hewl.cell
> lyso.cell hewl.cell ```
**HINT** - in case there are several space group at which protein can be indexed, it's possible to run automatically indexing in the *alternative* space group. To do this - provide an alternative space group settings in the file <cell_name>.cell_alternative. Example: **HINT** - in case there are several space group at which protein can be indexed, it's possible to run automatically indexing in the *alternative* space group. To do this - provide an alternative space group settings in the file <cell_name>.cell_alternative. Example:
```
> $ ls res/ap/CELL > $ ls res/ap/CELL
> lyso.cell chim.cell chim.cell_alternative
> lyso.cell chim.cell chim.cell_alternative ```
runs with the <cell_name>=lyso will be indexed using lyso.cell file, while for the <cell_name>=chim - indexing will be done twice, using chim.cell and chim.cell_alternative files (and results of both indexing will be filled in logbook) runs with the <cell_name>=lyso will be indexed using lyso.cell file, while for the <cell_name>=chim - indexing will be done twice, using chim.cell and chim.cell_alternative files (and results of both indexing will be filled in logbook)
* create (an empty) google spreadsheet * create (an empty) google spreadsheet
* create (several distinct) credentials files (see section [google authentication](#google-api) how to create service accounts and keys if not done before) and store them with the names in the config directory (it's important to have file with name credentials.json and have few(3 is enough) with names credentials-1.json, credentials-2.json...): * create (several distinct) credentials files (see section [google authentication](#google-api) how to create service accounts and keys if not done before) and store them with the names in the config directory (it's important to have file with name credentials.json and have few(3 is enough) with names credentials-1.json, credentials-2.json...):
> $ ls res/ap/credentials*json ```
> $ ls res/ap/credentials*json
>credentials.json credentials-1.json credentials-2.json credentials-3.json credentials.json credentials-1.json credentials-2.json credentials-3.json
```
***RECOMMENDATION*** - use/generate new credentials files for each beamtime to not expose experiment information ***RECOMMENDATION*** - use/generate new credentials files for each beamtime to not expose experiment information
* give write access to the google spreadsheet to the service-accounts (recommended) or give full editor access to all who know url of the logbook(quicker, but not recommended action). To find e-mails of the service accounts: * give write access to the google spreadsheet to the service-accounts (recommended) or give full editor access to all who know url of the logbook(quicker, but not recommended action). To find e-mails of the service accounts:
> $ grep client_email credentials*json ```
$ grep client_email credentials*json
```
* edit env_setup.sh file to fill URL_TO_GOOGLE_SPREADSHEET(https://...) to the LOGBOOK variable * edit env_setup.sh file to fill URL_TO_GOOGLE_SPREADSHEET(https://...) to the LOGBOOK variable
* setup/prepare spreadsheet for automatic filling: * setup/prepare spreadsheet for automatic filling:
> $ . ./env_setup.sh ```
> $ . ./env_setup.sh
> $ python /sf/jungfrau/applications/ap/ap/update-spreadsheet.py --setup --url ${LOGBOOK} $ python /sf/jungfrau/applications/ap/ap/update-spreadsheet.py --setup --url ${LOGBOOK}
```
@ -157,21 +156,25 @@ this file contains indexing parameters used by crystfel.
* geometry: False (that's the usual choice, module-to-module adjustment is made then with crystfel geometry file. Choice of the value should be aligned with the geometry file used) * geometry: False (that's the usual choice, module-to-module adjustment is made then with crystfel geometry file. Choice of the value should be aligned with the geometry file used)
* login to swissfel online computing infrastructure with your personal PSI account: * login to swissfel online computing infrastructure with your personal PSI account:
> $ ssh psi_account@sf-l-001 ```
$ ssh psi_account@sf-l-001
```
* go to the directory with configuration files (prepared in the [Before Beamtime](#usage1) step): * go to the directory with configuration files (prepared in the [Before Beamtime](#usage1) step):
> $ cd /sf/alvra/data/p12345/res/ap ```
$ cd /sf/alvra/data/p12345/res/ap
* start automatic processing tool execution ```
> $ /sf/jungfrau/applications/ap/scripts/ap.sh * start automatic processing tool execution:
```
$ /sf/jungfrau/applications/ap/scripts/ap.sh
```
***HINT*** - best is to start this process in screen or tmux session, to be able to re-connect to this session remotely ***HINT*** - best is to start this process in screen or tmux session, to be able to re-connect to this session remotely
* stop automatic processing tool: * stop automatic processing tool:
* if running from your account : Ctrl-C in corresponding session * if running from your account : Ctrl-C in corresponding session
* if running by other account - put file STOP inside configuration directory * if running by other account - put file STOP inside configuration directory
> $ touch /sf/alvra/data/p12345/res/ap/STOP ```
$ touch /sf/alvra/data/p12345/res/ap/STOP
```
(if such file is present inside directory - new automatic processing tool will not start, so remove file before re-starting the tool) (if such file is present inside directory - new automatic processing tool will not start, so remove file before re-starting the tool)
#### changes in configuration files <a name="usage2_config"></a> #### changes in configuration files <a name="usage2_config"></a>
@ -179,22 +182,21 @@ can be done at any time and new processing jobs will take new values
#### re-processing of already processed runs <a name="usage2_reprocess"></a> #### re-processing of already processed runs <a name="usage2_reprocess"></a>
in case of need to re-run indexing (new config parameters, new geometry file etc) - first make sure that previous indexing jobs for these runs are finished (check CURRENT_JOBS.txt file in config directory or run "squeue"). If they are finished - remove corresponding to the runs (please note that run number is **unique_acquisition_run_number**, not scan number) files from output directory. Example: in case of need to re-run indexing (new config parameters, new geometry file etc) - first make sure that previous indexing jobs for these runs are finished (check CURRENT_JOBS.txt file in config directory or run "squeue"). If they are finished - remove corresponding to the runs (please note that run number is **unique_acquisition_run_number**, not scan number) files from output directory. Example:
> scan number 206 (raw/run0206*/ directory with data) needs to be re-indexed. Scan contains 24 steps. ```
> corresponding **unique_acquisition_run_number** are 4048-4071 scan number 206 (raw/run0206*/ directory with data) needs to be re-indexed. Scan contains 24 steps.
> corresponding **unique_acquisition_run_number** are 4048-4071
> $ grep unique_acquisition_run_number raw/run0206*/meta/acq*.json $ grep unique_acquisition_run_number raw/run0206*/meta/acq*.json
>
> or look at logbook, **unique_acquisition_run_number** is the first column of spreadsheet
>
> check that there are no jobs with such numbers/name running, looking at CURRENT_JOBS.txt file or *squeue*
>
> remove res/ap/output/run*4048-4071*.index* files to re-run indexing for that scan
or look at logbook, **unique_acquisition_run_number** is the first column of spreadsheet
check that there are no jobs with such numbers/name running, looking at CURRENT_JOBS.txt file or *squeue*
remove res/ap/output/run*4048-4071*.index* files to re-run indexing for that scan
```
#### pausing indexing<a name="usage2_pause"></a> #### pausing indexing<a name="usage2_pause"></a>
in case of unknown processing parameters (detector distance, geometry file(beam center), not yet known cell file...), it's possible to pause (not start indexing jobs) putting semaphore file NO_INDEXING in config directory in case of unknown processing parameters (detector distance, geometry file(beam center), not yet known cell file...), it's possible to pause (not start indexing jobs) putting semaphore file NO_INDEXING in config directory
```
> $ touch res/ap/NO_INDEXING $ touch res/ap/NO_INDEXING
```
once this file is removed - all not indexed runs will be processed by the tool once this file is removed - all not indexed runs will be processed by the tool
### After Beamtime<a name="usage3"></a> ### After Beamtime<a name="usage3"></a>