fix: align docs with major new backend changes #18
@@ -42,13 +42,6 @@ main steps in the lifecycle of the data management:
|
||||
- Publishing of datasets
|
||||
- Retention of datasets
|
||||
|
||||
Note: as of today (June 2021) the services can be only be used from
|
||||
within the PSI intranet with the exception of the published data,
|
||||
which is by definition publicly available. Although the service itself
|
||||
can be used from any operating system, the command line and
|
||||
GUI tools currently offered are available only for Linux and Windows
|
||||
platforms.
|
||||
|
||||
## The Concept of Datasets
|
||||
|
||||
For the following it is useful to have a better understanding of the
|
||||
@@ -127,6 +120,25 @@ first. Installation is described in the appendix Installation of Tools
|
||||
|
||||
## Ingest
|
||||
|
||||
### Important Update since January 2025
|
||||
|
||||
The SciCat stack has gone through a major upgrade, thus the command
|
||||
line syntax has changed.
|
||||
|
||||
The separate executables (like datasetIngestor, datasetRetriever...)
|
||||
were combined into one scicat-cli executable, with each executable's
|
||||
features available as commands given as the first parameter to this executable.
|
||||
|
||||
These commands bear the same names as the former executables.
|
||||
The general syntax change is that if you called
|
||||
./[COMMAND] [flags] before, now it's ./scicat-cli [COMMAND] [flags].
|
||||
|
||||
Furthermore, the use of single hyphen, multi-letter flags is now discontinued,
|
||||
as it went against general convention. So, in practical terms, -[long_flag_name]
|
||||
and --[long_flag_name] were both accepted, but now only the latter is accepted.
|
||||
|
||||
There are backward compatible scripts in the [github repo](https://github.com/paulscherrerinstitute/scicat-cli?tab=readme-ov-file#backwards-compatibility-with-v2).
|
||||
|
||||
### Important Update since April 14th 2022
|
||||
|
||||
For all commandline tools, like the datasetIngestor, datasetRetriever
|
||||
@@ -237,7 +249,7 @@ real life example from Bio department:
|
||||
|
||||
For manual creation of this file there are various helper tools
|
||||
available. One option is to use the ScicatEditor
|
||||
<https://bliven_s.gitpages.psi.ch/SciCatEditor/> for creating these
|
||||
<https://www.scicatproject.org/SciCatEditor/> for creating these
|
||||
metadata files. This is a browser-based tool specifically for
|
||||
ingesting PSI data. Using the tool avoids syntax errors and provides
|
||||
templates for common data sets and options. The finished JSON file can
|
||||
@@ -257,7 +269,7 @@ Linux type notation is used. For the changes which apply to Windows
|
||||
see the separate section below)
|
||||
|
||||
```sh
|
||||
datasetIngestor metadata.json
|
||||
scicat-cli datasetIngestor metadata.json
|
||||
```
|
||||
|
||||
It will ask for your PSI credentials and then print some info
|
||||
@@ -268,7 +280,7 @@ already provided in the metadata.json file. If there are no errors,
|
||||
proceed to the real ingestion:
|
||||
|
||||
```sh
|
||||
datasetIngestor --ingest metadata.json
|
||||
scicat-cli datasetIngestor --ingest metadata.json
|
||||
```
|
||||
|
||||
For particularly important datasets, you may also want to use the
|
||||
@@ -286,7 +298,7 @@ workstations/PCs are likely to fall in this category.
|
||||
There are more options for this command, just type
|
||||
|
||||
```sh
|
||||
datasetIngestor
|
||||
scicat-cli datasetIngestor
|
||||
```
|
||||
|
||||
to see a list of available options. In particular you can define
|
||||
@@ -303,7 +315,7 @@ For Windows you need execute the corresponding commands inside a
|
||||
powershell and use the binary files ending in .exe, e.g.
|
||||
|
||||
```sh
|
||||
datasetIngestor.exe -token SCICAT-TOKEN -user username:password -copy metadata.json
|
||||
scicat-cli.exe datasetIngestor --token SCICAT-TOKEN --user username:password --copy metadata.json
|
||||
```
|
||||
|
||||
For Windows systems you can only use personal accounts and the data is
|
||||
@@ -358,7 +370,7 @@ Triggering the copy to tape can be done in 3 ways. Either you do it
|
||||
automatically as part of the ingestion
|
||||
|
||||
```sh
|
||||
datasetIngestor --ingest --autoarchive metadata.json
|
||||
scicat-cli datasetIngestor --ingest --autoarchive metadata.json
|
||||
```
|
||||
|
||||
In this case directly after ingestion a job is created to copy the
|
||||
@@ -379,31 +391,14 @@ data is stored.
|
||||
A third option is to use a command line version datasetArchiver.
|
||||
|
||||
```console
|
||||
datasetArchiver [options] (ownerGroup | space separated list of datasetIds)
|
||||
scicat-cli datasetArchiver [options] (ownerGroup | space separated list of datasetIds)
|
||||
```
|
||||
|
||||
You must choose either an ownerGroup, in which case all archivable datasets
|
||||
of this ownerGroup not yet archived will be archived.
|
||||
Or you choose a (list of) datasetIds, in which case all archivable datasets
|
||||
of this list not yet archived will be archived.
|
||||
|
||||
List of options:
|
||||
|
||||
-devenv
|
||||
Use development environment instead or production
|
||||
-localenv
|
||||
Use local environment (local) instead or production
|
||||
-noninteractive
|
||||
Defines if no questions will be asked, just do it - make sure you know what you are doing
|
||||
-tapecopies int
|
||||
Number of tapecopies to be used for archiving (default 1)
|
||||
-testenv
|
||||
Use test environment (qa) instead or production
|
||||
-token string
|
||||
Defines optional API token instead of username:password
|
||||
-user string
|
||||
Defines optional username and password
|
||||
```
|
||||
|
||||
## Retrieve
|
||||
|
||||
Here we describe the retrieval via the command line tools. A retrieve
|
||||
@@ -429,39 +424,22 @@ minutes (e.g. for 1GB) up to days (e.g for 100TB)
|
||||
For the second step you can use the **datasetRetriever** command, which
|
||||
uses the rsync protocol to copy the data to your destination.
|
||||
|
||||
```console
|
||||
Tool to retrieve datasets from the intermediate cache server of the tape archive
|
||||
to the destination path on your local system.
|
||||
Run script with 1 argument:
|
||||
|
||||
datasetRetriever [options] local-destination-path
|
||||
```console
|
||||
scicat-cli datasetRetriever [options] local-destination-path
|
||||
```
|
||||
|
||||
Per default all available datasets on the retrieve server will be fetched.
|
||||
Use option -dataset or -ownerGroup to restrict the datasets which should be fetched.
|
||||
|
||||
-chksum
|
||||
Switch on optional chksum verification step (default no checksum tests)
|
||||
-dataset string
|
||||
Defines single dataset to retrieve (default all available datasets)
|
||||
-devenv
|
||||
Use development environment (default is to use production system)
|
||||
-ownergroup string
|
||||
Defines to fetch only datasets of the specified ownerGroup (default is to fetch all available datasets)
|
||||
-retrieve
|
||||
Defines if this command is meant to actually copy data to the local system (default nothing is done)
|
||||
-testenv
|
||||
Use test environment (qa) (default is to use production system)
|
||||
-token string
|
||||
Defines optional API token instead of username:password
|
||||
-user string
|
||||
Defines optional username and password (default is to prompt for username and password)
|
||||
```
|
||||
Use option --dataset or --ownerGroup to restrict the datasets which should be fetched.
|
||||
|
||||
For the program to check which data is available on the cache server
|
||||
and if the catalog knows about these datasets, you can use:
|
||||
|
||||
```console
|
||||
datasetRetriever my-local-destination-folder
|
||||
scicat-cli datasetRetriever my-local-destination-folder
|
||||
|
||||
======Checking for available datasets on archive cache server ebarema4in.psi.ch:
|
||||
|
||||
@@ -477,7 +455,7 @@ If you want you can skip the previous step and
|
||||
directly trigger the file copy by adding the -retrieve flag:
|
||||
|
||||
```sh
|
||||
datasetRetriever -retrieve <local destinationFolder>
|
||||
scicat-cli datasetRetriever --retrieve <local destinationFolder>
|
||||
```
|
||||
|
||||
This will copy the files into the destinationFolder using the original
|
||||
@@ -489,19 +467,19 @@ Optionally you can also verify the consistency of the copied data by
|
||||
using the `-chksum` flag
|
||||
|
||||
```sh
|
||||
datasetRetriever -retrieve -chksum <local destinationFolder>
|
||||
scicat-cli datasetRetriever --retrieve --chksum <local destinationFolder>
|
||||
```
|
||||
|
||||
If you just want to retrieve a single dataset do the following:
|
||||
|
||||
```sh
|
||||
datasetRetriever -retrieve -dataset <datasetId> <local destinationFolder>
|
||||
scicat-cli datasetRetriever --retrieve --dataset <datasetId> <local destinationFolder>
|
||||
```
|
||||
|
||||
If you want to retrieve all datasets of a given **ownerGroup** do the following:
|
||||
|
||||
```sh
|
||||
datasetRetriever -retrieve -ownergroup <group> <local destinationFolder>
|
||||
scicat-cli datasetRetriever --retrieve --ownergroup <group> <local destinationFolder>
|
||||
```
|
||||
|
||||
#### Expert commands
|
||||
@@ -559,7 +537,7 @@ easiest to get such an API token is to sign it at
|
||||
button. This will bring you to the user settings page, from where you
|
||||
can copy the token with a click on the corresponding copy button.
|
||||
|
||||
### General considerations
|
||||
<!-- ### General considerations
|
||||
|
||||
`SciCat` is a GUI based tool designed to make initial
|
||||
ingests easy. It is especially useful, to ingest data, which can not
|
||||
@@ -591,7 +569,7 @@ On the SLS beamline consoles the software is also pre-installed in the
|
||||
/work/sls/bin folder, which is part of the standard PATH variable.
|
||||
|
||||
If you are not working on the Ra cluster you can download the
|
||||
software on Linux:
|
||||
software on Linux, Windows or Mac.
|
||||
|
||||
```sh
|
||||
/usr/bin/curl -O https://gitlab.psi.ch/scicat/tools/raw/master/linux/SciCat;chmod +x ./SciCat
|
||||
@@ -642,7 +620,7 @@ the desired datasets and clicking on "Save."
|
||||
### Settings
|
||||
|
||||
Additional settings, such as the default value for certain fields can be modified in settings panel (button
|
||||
on the lower left corner).
|
||||
on the lower left corner). -->
|
||||
|
||||
## Publish
|
||||
|
||||
@@ -816,42 +794,22 @@ module load datacatalog
|
||||
|
||||
If you do not have access to PSI modules (for instance, when archiving
|
||||
from Ubuntu systems), then you can install the datacatalog software
|
||||
yourself. These tools require 64-bit linux.
|
||||
yourself. Both linux, Mac and Windows versions are available.
|
||||
|
||||
I suggest storing the SciCat scripts in ~/bin so that they can be
|
||||
easily accessed.
|
||||
|
||||
```sh
|
||||
mkdir -p ~/bin
|
||||
cd ~/bin
|
||||
/usr/bin/curl -O https://gitlab.psi.ch/scicat/tools/raw/master/linux/datasetIngestor
|
||||
chmod +x ./datasetIngestor
|
||||
/usr/bin/curl -O https://gitlab.psi.ch/scicat/tools/raw/master/linux/datasetRetriever
|
||||
chmod +x ./datasetRetriever
|
||||
/usr/bin/curl -O https://gitlab.psi.ch/scicat/tools/raw/master/linux/SciCat
|
||||
chmod +x ./SciCat
|
||||
```
|
||||
To download and install the binaries, please follow these steps:
|
||||
|
||||
When the scripts are updated you will be prompted to re-run some of
|
||||
the above commands to get the latest version.
|
||||
1. Go to the [GitHub releases page](https://github.com/paulscherrerinstitute/scicat-cli/releases)
|
||||
|
||||
You can call the ingestion scripts using the full path
|
||||
(~/bin/datasetIngestor) or else add ~/bin to your unix PATH. To do so,
|
||||
add the following line to your ~/.bashrc file:
|
||||
2. Choose the release of interest (latest released is recommended)
|
||||
|
||||
```sh
|
||||
export PATH="$HOME/bin:$PATH"
|
||||
```
|
||||
3. Download the file from the Assets of the chosen release, making sure to select the one compatible with your OS
|
||||
|
||||
#### Installation on Windows Systems
|
||||
4. Decompress the asset
|
||||
|
||||
On Windows the executables can be downloaded from the following URL,
|
||||
just enter the address in abrowser and download the file
|
||||
|
||||
```sh
|
||||
https://gitlab.psi.ch/scicat/tools/-/blob/master/windows/datasetIngestor.exe
|
||||
https://gitlab.psi.ch/scicat/tools/-/blob/master/windows/SciCatGUI_Win10.zip
|
||||
```
|
||||
5. Open the folder and run the required APP (grant execute permissions if required)
|
||||
|
||||
#### Online work stations in beamline hutches
|
||||
|
||||
@@ -1239,7 +1197,7 @@ chosen for the same quantity:
|
||||
and the folders will be scanned for files
|
||||
|
||||
```sh
|
||||
datasetIngestor metadata.json [filelisting.txt | 'folderlisting.txt']
|
||||
scicat-cli datasetIngestor metadata.json [filelisting.txt | 'folderlisting.txt']
|
||||
```
|
||||
|
||||
You will be prompted for your username and password.
|
||||
@@ -1249,7 +1207,7 @@ chosen for the same quantity:
|
||||
catalog
|
||||
|
||||
```sh
|
||||
datasetIngestor --ingest metadata.json [filelisting.txt | 'folderlisting.txt']
|
||||
scicat-cli datasetIngestor --ingest metadata.json [filelisting.txt | 'folderlisting.txt']
|
||||
```
|
||||
|
||||
When the job is finshed all needed metadata will be ingested into the
|
||||
@@ -1289,31 +1247,11 @@ chosen for the same quantity:
|
||||
Then you run the datasetIngestor program usually under a beamline
|
||||
specic account. In order to run fully automatic all potential
|
||||
questions asked interactively by the program must be pre-answered
|
||||
through a set of command line options:
|
||||
through a set of command line options. The command below shows all
|
||||
available options:
|
||||
|
||||
```console
|
||||
datasetIngestor [options] metadata-file [filelisting-file|'folderlisting.txt']
|
||||
|
||||
-allowexistingsource
|
||||
Defines if existing sourceFolders can be reused
|
||||
-autoarchive
|
||||
Option to create archive job automatically after ingestion
|
||||
-copy
|
||||
Defines if files should be copied from your local system to a central server before ingest.
|
||||
-devenv
|
||||
Use development environment instead of production environment (developers only)
|
||||
-ingest
|
||||
Defines if this command is meant to actually ingest data
|
||||
-linkfiles string
|
||||
Define what to do with symbolic links: (keep|delete|keepInternalOnly) (default "keepInternalOnly")
|
||||
-noninteractive
|
||||
If set no questions will be asked and the default settings for all undefined flags will be assumed
|
||||
-tapecopies int
|
||||
Number of tapecopies to be used for archiving (default 1)
|
||||
-testenv
|
||||
Use test environment (qa) instead of production environment
|
||||
-user string
|
||||
Defines optional username:password string
|
||||
scicat-cli datasetIngestor [options] metadata-file [filelisting-file|'folderlisting.txt']
|
||||
```
|
||||
|
||||
- here is a typical example using the MX beamline at SLS as an example
|
||||
@@ -1321,11 +1259,11 @@ chosen for the same quantity:
|
||||
metadata.json
|
||||
|
||||
```sh
|
||||
datasetIngestor -ingest \
|
||||
-linkfiles keepInternalOnly \
|
||||
-allowexistingsource \
|
||||
-user slsmx:XXXXXXXX \
|
||||
-noninteractive \
|
||||
scicat-cli datasetIngestor --ingest \
|
||||
--linkfiles keepInternalOnly \
|
||||
--allowexistingsource \
|
||||
--user slsmx:XXXXXXXX \
|
||||
--noninteractive \
|
||||
metadata.json
|
||||
```
|
||||
|
||||
@@ -1366,7 +1304,7 @@ Otherwise just follow the description in the section "Manual ingest
|
||||
using datasetIngestor program" and use the option -copy, e.g.
|
||||
|
||||
```sh
|
||||
datasetIngestor -autoarchive -copy -ingest metadata.json
|
||||
scicat-cli datasetIngestor --autoarchive --copy --ingest metadata.json
|
||||
```
|
||||
|
||||
This command will copy the data to a central rsync server, from where
|
||||
@@ -1494,13 +1432,10 @@ following curl command:
|
||||
|
||||
```sh
|
||||
# for "functional" accounts
|
||||
curl -X POST --header 'Content-Type: application/json' -d '{"username":"YOUR-LOGIN","password":"YOUR-PASSWORD"}' 'https://dacat-qa.psi.ch/api/v3/Users/login'
|
||||
|
||||
# for normal user accounts
|
||||
curl -X POST --header 'Content-Type: application/json' -d '{"username":"YOUR-LOGIN","password":"YOUR-PASSWORD"}' 'https://dacat-qa.psi.ch/auth/msad'
|
||||
curl -X POST --header 'Content-Type: application/json' -d '{"username":"YOUR-LOGIN","password":"YOUR-PASSWORD"}' 'https://dacat-qa.psi.ch/api/v3/auth/login'
|
||||
|
||||
# reply if succesful:
|
||||
{"id":"NQhe3...","ttl":1209600,"created":"2019-01-22T07:03:21.422Z","userId":"5a745bde4d12b30008020843"}
|
||||
{"access_token":"NQhe3...", "id":"NQhe3...","created":"2019-01-22T07:03:21.422Z","userId":"5a745bde4d12b30008020843","expires_in":604800, "ttl":604800,...}
|
||||
```
|
||||
|
||||
The "id" field contains the access token, which you copy in to the corresponding field at the top of the explorer page.
|
||||
@@ -1553,7 +1488,7 @@ use the command datasetGetProposal, which returns the proposal
|
||||
information for a given ownerGroup
|
||||
|
||||
```sh
|
||||
/usr/bin/curl -O https://gitlab.psi.ch/scicat/tools/raw/master/linux/datasetGetProposal;chmod +x ./datasetGetProposal
|
||||
scicat-cli datasetGetProposal
|
||||
```
|
||||
|
||||
### Link to Group specific descriptions
|
||||
|
||||
Reference in New Issue
Block a user