fix: align docs with major new backend changes (#18)
All checks were successful
Build and deploy documentation / build-docs (push) Successful in 6s
Build and deploy documentation / deploy-docs (push) Successful in 7s

This commit was merged in pull request #18.
This commit is contained in:
2026-02-19 09:46:51 +01:00
parent f57138a7b0
commit 1f40eb1334

View File

@@ -42,13 +42,6 @@ main steps in the lifecycle of the data management:
- Publishing of datasets
- Retention of datasets
Note: as of today (June 2021) the services can be only be used from
within the PSI intranet with the exception of the published data,
which is by definition publicly available. Although the service itself
can be used from any operating system, the command line and
GUI tools currently offered are available only for Linux and Windows
platforms.
## The Concept of Datasets
For the following it is useful to have a better understanding of the
@@ -127,6 +120,25 @@ first. Installation is described in the appendix Installation of Tools
## Ingest
### Important Update since January 2025
The SciCat stack has gone through a major upgrade, thus the command
line syntax has changed.
The separate executables (like datasetIngestor, datasetRetriever...)
were combined into one scicat-cli executable, with each executable's
features available as commands given as the first parameter to this executable.
These commands bear the same names as the former executables.
The general syntax change is that if you called
./[COMMAND] [flags] before, now it's ./scicat-cli [COMMAND] [flags].
Furthermore, the use of single hyphen, multi-letter flags is now discontinued,
as it went against general convention. So, in practical terms, -[long_flag_name]
and --[long_flag_name] were both accepted, but now only the latter is accepted.
There are backward compatible scripts in the [github repo](https://github.com/paulscherrerinstitute/scicat-cli?tab=readme-ov-file#backwards-compatibility-with-v2).
### Important Update since April 14th 2022
For all commandline tools, like the datasetIngestor, datasetRetriever
@@ -237,7 +249,7 @@ real life example from Bio department:
For manual creation of this file there are various helper tools
available. One option is to use the ScicatEditor
<https://bliven_s.gitpages.psi.ch/SciCatEditor/> for creating these
<https://www.scicatproject.org/SciCatEditor/> for creating these
metadata files. This is a browser-based tool specifically for
ingesting PSI data. Using the tool avoids syntax errors and provides
templates for common data sets and options. The finished JSON file can
@@ -257,7 +269,7 @@ Linux type notation is used. For the changes which apply to Windows
see the separate section below)
```sh
datasetIngestor metadata.json
scicat-cli datasetIngestor metadata.json
```
It will ask for your PSI credentials and then print some info
@@ -268,7 +280,7 @@ already provided in the metadata.json file. If there are no errors,
proceed to the real ingestion:
```sh
datasetIngestor --ingest metadata.json
scicat-cli datasetIngestor --ingest metadata.json
```
For particularly important datasets, you may also want to use the
@@ -286,7 +298,7 @@ workstations/PCs are likely to fall in this category.
There are more options for this command, just type
```sh
datasetIngestor
scicat-cli datasetIngestor
```
to see a list of available options. In particular you can define
@@ -303,7 +315,7 @@ For Windows you need execute the corresponding commands inside a
powershell and use the binary files ending in .exe, e.g.
```sh
datasetIngestor.exe -token SCICAT-TOKEN -user username:password -copy metadata.json
scicat-cli.exe datasetIngestor --token SCICAT-TOKEN --user username:password --copy metadata.json
```
For Windows systems you can only use personal accounts and the data is
@@ -358,7 +370,7 @@ Triggering the copy to tape can be done in 3 ways. Either you do it
automatically as part of the ingestion
```sh
datasetIngestor --ingest --autoarchive metadata.json
scicat-cli datasetIngestor --ingest --autoarchive metadata.json
```
In this case directly after ingestion a job is created to copy the
@@ -379,31 +391,14 @@ data is stored.
A third option is to use a command line version datasetArchiver.
```console
datasetArchiver [options] (ownerGroup | space separated list of datasetIds)
scicat-cli datasetArchiver [options] (ownerGroup | space separated list of datasetIds)
```
You must choose either an ownerGroup, in which case all archivable datasets
of this ownerGroup not yet archived will be archived.
Or you choose a (list of) datasetIds, in which case all archivable datasets
of this list not yet archived will be archived.
List of options:
-devenv
Use development environment instead or production
-localenv
Use local environment (local) instead or production
-noninteractive
Defines if no questions will be asked, just do it - make sure you know what you are doing
-tapecopies int
Number of tapecopies to be used for archiving (default 1)
-testenv
Use test environment (qa) instead or production
-token string
Defines optional API token instead of username:password
-user string
Defines optional username and password
```
## Retrieve
Here we describe the retrieval via the command line tools. A retrieve
@@ -429,39 +424,22 @@ minutes (e.g. for 1GB) up to days (e.g for 100TB)
For the second step you can use the **datasetRetriever** command, which
uses the rsync protocol to copy the data to your destination.
```console
Tool to retrieve datasets from the intermediate cache server of the tape archive
to the destination path on your local system.
Run script with 1 argument:
datasetRetriever [options] local-destination-path
```console
scicat-cli datasetRetriever [options] local-destination-path
```
Per default all available datasets on the retrieve server will be fetched.
Use option -dataset or -ownerGroup to restrict the datasets which should be fetched.
-chksum
Switch on optional chksum verification step (default no checksum tests)
-dataset string
Defines single dataset to retrieve (default all available datasets)
-devenv
Use development environment (default is to use production system)
-ownergroup string
Defines to fetch only datasets of the specified ownerGroup (default is to fetch all available datasets)
-retrieve
Defines if this command is meant to actually copy data to the local system (default nothing is done)
-testenv
Use test environment (qa) (default is to use production system)
-token string
Defines optional API token instead of username:password
-user string
Defines optional username and password (default is to prompt for username and password)
```
Use option --dataset or --ownerGroup to restrict the datasets which should be fetched.
For the program to check which data is available on the cache server
and if the catalog knows about these datasets, you can use:
```console
datasetRetriever my-local-destination-folder
scicat-cli datasetRetriever my-local-destination-folder
======Checking for available datasets on archive cache server ebarema4in.psi.ch:
@@ -477,7 +455,7 @@ If you want you can skip the previous step and
directly trigger the file copy by adding the -retrieve flag:
```sh
datasetRetriever -retrieve <local destinationFolder>
scicat-cli datasetRetriever --retrieve <local destinationFolder>
```
This will copy the files into the destinationFolder using the original
@@ -489,19 +467,19 @@ Optionally you can also verify the consistency of the copied data by
using the `-chksum` flag
```sh
datasetRetriever -retrieve -chksum <local destinationFolder>
scicat-cli datasetRetriever --retrieve --chksum <local destinationFolder>
```
If you just want to retrieve a single dataset do the following:
```sh
datasetRetriever -retrieve -dataset <datasetId> <local destinationFolder>
scicat-cli datasetRetriever --retrieve --dataset <datasetId> <local destinationFolder>
```
If you want to retrieve all datasets of a given **ownerGroup** do the following:
```sh
datasetRetriever -retrieve -ownergroup <group> <local destinationFolder>
scicat-cli datasetRetriever --retrieve --ownergroup <group> <local destinationFolder>
```
#### Expert commands
@@ -559,7 +537,7 @@ easiest to get such an API token is to sign it at
button. This will bring you to the user settings page, from where you
can copy the token with a click on the corresponding copy button.
### General considerations
<!-- ### General considerations
`SciCat` is a GUI based tool designed to make initial
ingests easy. It is especially useful, to ingest data, which can not
@@ -591,7 +569,7 @@ On the SLS beamline consoles the software is also pre-installed in the
/work/sls/bin folder, which is part of the standard PATH variable.
If you are not working on the Ra cluster you can download the
software on Linux:
software on Linux, Windows or Mac.
```sh
/usr/bin/curl -O https://gitlab.psi.ch/scicat/tools/raw/master/linux/SciCat;chmod +x ./SciCat
@@ -642,7 +620,7 @@ the desired datasets and clicking on "Save."
### Settings
Additional settings, such as the default value for certain fields can be modified in settings panel (button
on the lower left corner).
on the lower left corner). -->
## Publish
@@ -816,42 +794,22 @@ module load datacatalog
If you do not have access to PSI modules (for instance, when archiving
from Ubuntu systems), then you can install the datacatalog software
yourself. These tools require 64-bit linux.
yourself. Both linux, Mac and Windows versions are available.
I suggest storing the SciCat scripts in ~/bin so that they can be
easily accessed.
```sh
mkdir -p ~/bin
cd ~/bin
/usr/bin/curl -O https://gitlab.psi.ch/scicat/tools/raw/master/linux/datasetIngestor
chmod +x ./datasetIngestor
/usr/bin/curl -O https://gitlab.psi.ch/scicat/tools/raw/master/linux/datasetRetriever
chmod +x ./datasetRetriever
/usr/bin/curl -O https://gitlab.psi.ch/scicat/tools/raw/master/linux/SciCat
chmod +x ./SciCat
```
To download and install the binaries, please follow these steps:
When the scripts are updated you will be prompted to re-run some of
the above commands to get the latest version.
1. Go to the [GitHub releases page](https://github.com/paulscherrerinstitute/scicat-cli/releases)
You can call the ingestion scripts using the full path
(~/bin/datasetIngestor) or else add ~/bin to your unix PATH. To do so,
add the following line to your ~/.bashrc file:
2. Choose the release of interest (latest released is recommended)
```sh
export PATH="$HOME/bin:$PATH"
```
3. Download the file from the Assets of the chosen release, making sure to select the one compatible with your OS
#### Installation on Windows Systems
4. Decompress the asset
On Windows the executables can be downloaded from the following URL,
just enter the address in abrowser and download the file
```sh
https://gitlab.psi.ch/scicat/tools/-/blob/master/windows/datasetIngestor.exe
https://gitlab.psi.ch/scicat/tools/-/blob/master/windows/SciCatGUI_Win10.zip
```
5. Open the folder and run the required APP (grant execute permissions if required)
#### Online work stations in beamline hutches
@@ -1239,7 +1197,7 @@ chosen for the same quantity:
and the folders will be scanned for files
```sh
datasetIngestor metadata.json [filelisting.txt | 'folderlisting.txt']
scicat-cli datasetIngestor metadata.json [filelisting.txt | 'folderlisting.txt']
```
You will be prompted for your username and password.
@@ -1249,7 +1207,7 @@ chosen for the same quantity:
catalog
```sh
datasetIngestor --ingest metadata.json [filelisting.txt | 'folderlisting.txt']
scicat-cli datasetIngestor --ingest metadata.json [filelisting.txt | 'folderlisting.txt']
```
When the job is finshed all needed metadata will be ingested into the
@@ -1289,31 +1247,11 @@ chosen for the same quantity:
Then you run the datasetIngestor program usually under a beamline
specic account. In order to run fully automatic all potential
questions asked interactively by the program must be pre-answered
through a set of command line options:
through a set of command line options. The command below shows all
available options:
```console
datasetIngestor [options] metadata-file [filelisting-file|'folderlisting.txt']
-allowexistingsource
Defines if existing sourceFolders can be reused
-autoarchive
Option to create archive job automatically after ingestion
-copy
Defines if files should be copied from your local system to a central server before ingest.
-devenv
Use development environment instead of production environment (developers only)
-ingest
Defines if this command is meant to actually ingest data
-linkfiles string
Define what to do with symbolic links: (keep|delete|keepInternalOnly) (default "keepInternalOnly")
-noninteractive
If set no questions will be asked and the default settings for all undefined flags will be assumed
-tapecopies int
Number of tapecopies to be used for archiving (default 1)
-testenv
Use test environment (qa) instead of production environment
-user string
Defines optional username:password string
scicat-cli datasetIngestor [options] metadata-file [filelisting-file|'folderlisting.txt']
```
- here is a typical example using the MX beamline at SLS as an example
@@ -1321,11 +1259,11 @@ chosen for the same quantity:
metadata.json
```sh
datasetIngestor -ingest \
-linkfiles keepInternalOnly \
-allowexistingsource \
-user slsmx:XXXXXXXX \
-noninteractive \
scicat-cli datasetIngestor --ingest \
--linkfiles keepInternalOnly \
--allowexistingsource \
--user slsmx:XXXXXXXX \
--noninteractive \
metadata.json
```
@@ -1366,7 +1304,7 @@ Otherwise just follow the description in the section "Manual ingest
using datasetIngestor program" and use the option -copy, e.g.
```sh
datasetIngestor -autoarchive -copy -ingest metadata.json
scicat-cli datasetIngestor --autoarchive --copy --ingest metadata.json
```
This command will copy the data to a central rsync server, from where
@@ -1494,13 +1432,10 @@ following curl command:
```sh
# for "functional" accounts
curl -X POST --header 'Content-Type: application/json' -d '{"username":"YOUR-LOGIN","password":"YOUR-PASSWORD"}' 'https://dacat-qa.psi.ch/api/v3/Users/login'
# for normal user accounts
curl -X POST --header 'Content-Type: application/json' -d '{"username":"YOUR-LOGIN","password":"YOUR-PASSWORD"}' 'https://dacat-qa.psi.ch/auth/msad'
curl -X POST --header 'Content-Type: application/json' -d '{"username":"YOUR-LOGIN","password":"YOUR-PASSWORD"}' 'https://dacat-qa.psi.ch/api/v3/auth/login'
# reply if succesful:
{"id":"NQhe3...","ttl":1209600,"created":"2019-01-22T07:03:21.422Z","userId":"5a745bde4d12b30008020843"}
{"access_token":"NQhe3...", "id":"NQhe3...","created":"2019-01-22T07:03:21.422Z","userId":"5a745bde4d12b30008020843","expires_in":604800, "ttl":604800,...}
```
The "id" field contains the access token, which you copy in to the corresponding field at the top of the explorer page.
@@ -1553,7 +1488,7 @@ use the command datasetGetProposal, which returns the proposal
information for a given ownerGroup
```sh
/usr/bin/curl -O https://gitlab.psi.ch/scicat/tools/raw/master/linux/datasetGetProposal;chmod +x ./datasetGetProposal
scicat-cli datasetGetProposal
```
### Link to Group specific descriptions