2602 lines
81 KiB
Markdown
2602 lines
81 KiB
Markdown
---
|
||
title: Ingestor Manual
|
||
---
|
||
|
||
# Overview and Concepts
|
||
|
||
PSI offers a Data Catalog Service for annotated long-term data storage
|
||
, retrieval and publishing. The annotation information , i.e. metadata
|
||
is stored in a central database to allow for fast query for the
|
||
data. The raw data itself is stored on the PetaByte Archive at the
|
||
Swiss National Supercomputing Centre (CSCS). The Data Catalog and
|
||
Archive is designed to be suitable for:
|
||
|
||
- Raw data generated by PSI instruments or simulations
|
||
- Derived data produced by processing the raw input data
|
||
- Data required to reproduce PSI research and publications, e.g FAIR data
|
||
|
||
All data which are added to the data catalog must either not be
|
||
classified or have a classification level of "normal".
|
||
You are not allowed to add any personal or private data. You are not
|
||
allowed to use the data catalog as a backup system. Data must come
|
||
from scientific activities pursued at PSI. If data from external
|
||
partner institutes should be stored, then this needs a dedicated
|
||
contract signed by the management.
|
||
|
||
The service is based on the catalog system SciCat, documented at
|
||
<https://scicatproject.github.io/> and
|
||
<https://github.com/SciCatProject/>, which is an open source system that
|
||
allows to ingest and retrieve datasets in different ways, matching the
|
||
requirements of the respective use cases. The use cases differ in the
|
||
level of automation provided.
|
||
|
||
Data is always stored in terms of `datasets`, which you can think of as a
|
||
collection of files combined with administrativ and scientific metadata.
|
||
|
||
This manual describes how you can use this services by following the
|
||
main steps in the lifecycle of the data management:
|
||
|
||
- Definition and ingestion of metadata
|
||
- Archiving of the datasets
|
||
- Retrieving of datasets
|
||
- Publishing of datasets
|
||
- Retention of datasets
|
||
|
||
Note: as of today (June 2021) the services can be only be used from
|
||
within the PSI intranet with the exception of the published data,
|
||
which is by definition publicly available. Although the service itself
|
||
can be used from any operating system, the command line and
|
||
GUI tools currently offered are available only for Linux and Windows
|
||
platforms.
|
||
|
||
# The Concept of Datasets<a id="sec-2" name="sec-2"></a>
|
||
|
||
For the following it is useful to have a better understanding of the
|
||
concept of a dataset. A dataset is a logical grouping of potentially
|
||
many files. It is up to the scientist to define datasets from the
|
||
files. When defining datasets take the following conditions into
|
||
account
|
||
|
||
- a dataset is the smallest unit for adding meta data
|
||
- a dataset is the smallest unit for data handling (archiving and retrieval)
|
||
- a dataset is the smallest unit for publication (DOI assignmnet)
|
||
|
||
Therefore you need to find a compromise between putting too few or too
|
||
many files into a single dataset.
|
||
|
||
`Ingestion` of datasets means, that you make data known to the data
|
||
catalog by providing both metadata about the dataset and the file
|
||
listing comprising the dataset. For each dataset a persistent
|
||
identifier (PID) is automatically created.
|
||
|
||
It is important to note that the data catalog is a "passive" system in
|
||
the sense that it has to be told if new data arrives. The data catalog
|
||
has no direct access to the file systems containing the actual
|
||
files. In contrast the **datasetIngestor** program is run from systems, which
|
||
have access to the data files.
|
||
|
||
The datasets always belong to an so called ownerGroup. Only members of
|
||
these groups have access to the data, unless the dataset is being
|
||
published. At PSI there are two types of ownerGroups,
|
||
|
||
- pgroups, starting with letter "p". They are used for experimental
|
||
data linked to a proposal system. They are managed by the digital
|
||
user office DUO
|
||
- a-groups, starting with "a-" for any other data to be archived
|
||
|
||
Once data is contained in the data catalog, this information is
|
||
considered to be stored permanently. However after a retention period
|
||
the connected raw data files may actually be deleted. In this case the
|
||
dataset is **marked** as deleted in the data catalog, but the data
|
||
catalog entry persists, in agreement with the FAIR principles.
|
||
|
||
Warning: you should not modify the files which make up your dataset
|
||
after the dataset was ingested to the datacatalog. This means that you
|
||
should ingest the data only, if you are sure that no further
|
||
modifications on the files take place. The subsequent archive job will
|
||
only take care of the files which existed at ingest time and otherwise
|
||
return an error message and not archive the data at all.
|
||
|
||
# Getting started<a id="sec-3" name="sec-3"></a>
|
||
|
||
You will need a PSI account and this account needs to be member in so
|
||
called `p-groups`, which are managed by the PSI digital user office
|
||
proposal system and are usually linked to a principal investigator
|
||
(PI). This is required to define the authorization to the data,
|
||
i.e. who is allowed to see which datasets.
|
||
|
||
In addition to these so called `user accounts` there are a couple of
|
||
pre-defined `functional accounts` which are used for automated
|
||
processes. In particular each beamline has one such functional
|
||
account, e.g. called sls-tomcat, which can be used for automated
|
||
ingestion of new data or to query all data generated at a given
|
||
beamline. These accounts are only defined in the data catalog system
|
||
and are given to the respective beamline managers.
|
||
|
||
If your data can not be linked to this proposal system you can still
|
||
use the services, but you may need to ask for the creation of a
|
||
so-called `a-group` and become member in these groups. You can order
|
||
an `a-group` via ServiceNow under `PSI Service Catalog` -> `IT` ->
|
||
`Identity & Access Management` -> `Order Group / Project drive`. Under
|
||
`Account Type` choose `Archive Group / Project Drive`. You will be
|
||
asked about the group members. The group owner is not automatically
|
||
added to the group members.
|
||
|
||
To use some of the software you may need to install it
|
||
first. Installation is described in the appendix Installation of Tools
|
||
|
||
# Ingest<a id="sec-4" name="sec-4"></a>
|
||
|
||
## Important Update since April 14th 2022<a id="sec-4-1" name="sec-4-1"></a>
|
||
|
||
For all commandline tools, like the datasetIngestor, datasetRetriever
|
||
etc, using your own user account you **have** to use the –token
|
||
option with a predefined API token SCICAT-TOKEN. Specifying
|
||
username/password is not possible for normal users (this limitation
|
||
is caused by the switch to a new authentication protocol). The
|
||
easiest to get such an API token is to sign it at
|
||
<https://discovery.psi.ch>, then follow the "Login with PSI account"
|
||
button. This will bring you to the user settings page, from where
|
||
you can copy the token with a click on the corresponding copy
|
||
button.
|
||
|
||
For functional accounts, like beamline accounts you can
|
||
however continue to use username/password authentication instead.
|
||
|
||
## Definition of input files<a id="sec-4-2" name="sec-4-2"></a>
|
||
|
||
First you need to specify the location of the files that you want to
|
||
have stored as one dataset. A typically example would be all the files
|
||
taken during a measurement, a scan etc or all output data from an
|
||
analysis of raw data files. In the simplest case it is sufficient to
|
||
define only one location. i.e. the **sourceFolder**, which should
|
||
contain all the files (and only those files) that make up the
|
||
dataset. In a more general case you can also specify an explicit list
|
||
of files and/or directories that you want to have assembled to a
|
||
dataset. See the datasetIngestor command options for details. The
|
||
appendix has a Recommended file structure for raw datasets on
|
||
disk. Please take note of the limitations of a dataset, as
|
||
defined in the appendix Dataset limitations.
|
||
|
||
## Definition of metadata<a id="sec-4-3" name="sec-4-3"></a>
|
||
|
||
There are two types of metadata which need to be provided:
|
||
|
||
- administrative metadata: specifies when and where the data is taken,
|
||
who is the owner etc. There are both mandatory and optional fields
|
||
and the fields depend on the type of the dataset
|
||
(generic/raw/derived), see Section 11.4
|
||
below. The most important metadata field for ownership is the value
|
||
of the "ownerGroup" field, which defines a group name, whose member
|
||
have access to the data.
|
||
- scientific metadata: this depends on the scientific discipline and
|
||
can be defined in a flexible way by respective research group. It is
|
||
up to the research groups to define the format(s) of their data that
|
||
they want to support, ideally on an international level. See also
|
||
the section About Scientific Values and Units .
|
||
|
||
Therefore the next step to ingesting your data into the catalog is to
|
||
prepare a file describing what data you have. This is called
|
||
metadata.json, and can be created with any text editor. It can in
|
||
principle be saved anywhere, but keeping it with your archived data is
|
||
recommended.
|
||
|
||
Here is a minimalistic example the file metadata.json for raw data:
|
||
|
||
{
|
||
"creationLocation": "/PSI/SLS/TOMCAT",
|
||
"sourceFolder": "/data/p16/p16623/June2020",
|
||
"type": "raw",
|
||
"ownerGroup":"p16623"
|
||
}
|
||
|
||
In the Appendix Use Case Examples you find many more examples for
|
||
metadata.json files, both for raw and derived data. Here is a more
|
||
real life example from Bio department:
|
||
|
||
{
|
||
"principalInvestigator": "albrecht.gessler@psi.ch",
|
||
"creationLocation": "/PSI/EMF/JEOL2200FS",
|
||
"dataFormat": "TIFF+LZW Image Stack",
|
||
"sourceFolder": "/gpfs/group/LBR/pXXX/myimages",
|
||
"datasetName": "myimages",
|
||
"owner": "Wilhelm Tell",
|
||
"ownerEmail": "wilhelm.tell@psi.ch",
|
||
"type": "raw",
|
||
"description": "EM micrographs of amygdalin",
|
||
"ownerGroup": "a-12345",
|
||
"scientificMetadata": {
|
||
"sample": {
|
||
"name": "Amygdalin beta-glucosidase 1",
|
||
"uniprot": "P29259",
|
||
"species": "Apple"
|
||
},
|
||
"dataCollection": {
|
||
"date": "2018-08-01"
|
||
},
|
||
"microscopeParameters": {
|
||
"pixel size": {
|
||
"value": 0.885,
|
||
"unit": "A"
|
||
},
|
||
"voltage": {
|
||
"value": 200,
|
||
"unit": "kV"
|
||
},
|
||
"dosePerFrame": {
|
||
"value": 1.277,
|
||
"unit": "e/A2"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
|
||
For manual creation of this file there are various helper tools
|
||
available. One option is to use the ScicatEditor
|
||
<https://bliven_s.gitpages.psi.ch/SciCatEditor/> for creating these
|
||
metadata files. This is a browser-based tool specifically for
|
||
ingesting PSI data. Using the tool avoids syntax errors and provides
|
||
templates for common data sets and options. The finished JSON file can
|
||
then be downloaded or copied into a text editor.
|
||
|
||
Another option for datasets on ra or merlin is to use the SciCat
|
||
graphical interface from NoMachine. This provides a graphical
|
||
interface for selecting data to archive. This is particularly useful
|
||
for data associated with a DUO experiment and p-group. Type `SciCat` to
|
||
get started after loading the datacatalog module. The GUI also
|
||
replaces the the command-line ingestion decribed below.
|
||
|
||
After preparing your metadata.json file, run the following steps to
|
||
ingest the data. First, perform a "dry-run" that will check the
|
||
metadata for errors: ( Please note that in the following only the
|
||
Linux type notation is used. For the changes which apply to Windows
|
||
see the separate section below)
|
||
|
||
datasetIngestor metadata.json
|
||
|
||
It will ask for your PSI credentials and then print some info
|
||
about the data to be ingested. This command will scan the files, make
|
||
checks and extract further metadata information from the files and
|
||
from the DUO system, unless the corresponding metadata fields are
|
||
already provided in the metadata.json file. If there are no errors,
|
||
proceed to the real ingestion:
|
||
|
||
datasetIngestor --ingest metadata.json
|
||
|
||
For particularly important datasets, you may also want to use the
|
||
parameter –tapecopies 2 to store redundant copies of the data.
|
||
To give some numbers, 0.2–0.4% of the tapes get damaged so
|
||
there is a chance that archiving with only one copy will result in lost
|
||
data, in very few cases. Keep in mind that archival with redundancy
|
||
doubles the cost which is billed to the responsible department.
|
||
|
||
You may be asked whether you want to copy the data first to a central
|
||
system. This step is needed for all files which are not residing on
|
||
one of the central fileservers at PSI. In particular local (windows)
|
||
workstations/PCs are likely to fall in this category.
|
||
|
||
There are more options for this command, just type
|
||
|
||
datasetIngestor
|
||
|
||
to see a list of available options. In particular you can define
|
||
explicit list of files to be combined into a dataset, which can come
|
||
from many different folders by providing a filelisting.txt file
|
||
containing this information in addition to the metadata.json file. The
|
||
section in the Appendix Using the datasetIngestor Tool has more details
|
||
|
||
## Special notes for the decentral use case<a id="sec-4-4" name="sec-4-4"></a>
|
||
|
||
### For Windows<a id="sec-4-4-1" name="sec-4-4-1"></a>
|
||
|
||
For Windows you need execute the corresponding commands inside a
|
||
powershell and use the binary files ending in .exe, e.g.
|
||
|
||
datasetIngestor.exe -token SCICAT-TOKEN -user username:password -copy metadata.json
|
||
|
||
For Windows systems you can only use personal accounts and the data is
|
||
always handled as `decentral` case, i.e. the data will first be copied
|
||
from the windows machine to a central file server via scp
|
||
first. Therefore you need to specify all of the above parameters
|
||
-token, -user and -copy.
|
||
|
||
Please also note the syntax, that has to be used for the definition of
|
||
the sourceFolder inside the metadata.json file: this has to be in the
|
||
following form:
|
||
|
||
"sourceFolder": "/C/Somefolder/etc",
|
||
|
||
, i.e. **forward slashes** and **no colon** ":" after the drive letter like
|
||
"C:" in this case.
|
||
|
||
### For Linux.<a id="sec-4-4-2" name="sec-4-4-2"></a>
|
||
|
||
You must have a valid kerberos ticket in order to be able to copy the
|
||
data to the intermediate storage server. You can use the kinit command
|
||
to get this ticket.
|
||
|
||
## Summary of the different use cases<a id="sec-4-5" name="sec-4-5"></a>
|
||
|
||
The following table summarizes the different use cases
|
||
|
||
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
|
||
|
||
|
||
<colgroup>
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
</colgroup>
|
||
<thead>
|
||
<tr>
|
||
<th scope="col" class="left">OS</th>
|
||
<th scope="col" class="left">sourceLocation</th>
|
||
<th scope="col" class="left">Account-Type</th>
|
||
<th scope="col" class="left">Neededed parameters</th>
|
||
<th scope="col" class="left">Comment</th>
|
||
</tr>
|
||
</thead>
|
||
|
||
<tbody>
|
||
<tr>
|
||
<td class="left">Linux</td>
|
||
<td class="left">central</td>
|
||
<td class="left">User</td>
|
||
<td class="left">token</td>
|
||
<td class="left">Fetch token via Web GUI discovery.psi.ch</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Linux</td>
|
||
<td class="left">central</td>
|
||
<td class="left">Functional</td>
|
||
<td class="left">username/pw</td>
|
||
<td class="left">The tool fetches token from API server</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Linux</td>
|
||
<td class="left">anywhere/decentral</td>
|
||
<td class="left">User</td>
|
||
<td class="left">token + Kerb ticket</td>
|
||
<td class="left">Token for API, Kerb ticket for copying data</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Linux</td>
|
||
<td class="left">anywhere/decentral</td>
|
||
<td class="left">Functional</td>
|
||
<td class="left">not supported</td>
|
||
<td class="left">Functional accounts not existing on ssh server</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Windows</td>
|
||
<td class="left">central</td>
|
||
<td class="left">User</td>
|
||
<td class="left">(token)</td>
|
||
<td class="left">Needs mounting of Windows FS to Arema</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Windows</td>
|
||
<td class="left">central</td>
|
||
<td class="left">Functional</td>
|
||
<td class="left">(username/pw)</td>
|
||
<td class="left">dito</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Windows</td>
|
||
<td class="left">anywhere/decentral</td>
|
||
<td class="left">User</td>
|
||
<td class="left">token + username/pw</td>
|
||
<td class="left">Token for API, username/pw for copying data</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Windows</td>
|
||
<td class="left">anywhere/decentral</td>
|
||
<td class="left">Functional</td>
|
||
<td class="left">not supported</td>
|
||
<td class="left">Functional accounts not existing on ssh server</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
|
||
# Archive<a id="sec-5" name="sec-5"></a>
|
||
|
||
If there are no errors, your data has been accepted into the data
|
||
catalog! From now on, no changes should be made to the ingested
|
||
data. This is important, since the next step is for the system to copy
|
||
all the data to the CSCS Petabyte archive. Writing to tape is slow, so
|
||
this process may take some time, and it will fail if any
|
||
modifications are detected.
|
||
|
||
Triggering the copy to tape can be done in 3 ways. Either you do it
|
||
automatically as part of the ingestion
|
||
|
||
datasetIngestor --ingest --autoarchive metadata.json
|
||
|
||
In this case directly after ingestion a job is created to copy the
|
||
data to tape. Your dataset should now be in the queue. Check the data
|
||
catalog: <https://discovery.psi.ch>. Your job should have status
|
||
'WorkInProgress'. You will receive an email when the ingestion is
|
||
complete.
|
||
|
||
The second method is to use the discovery.psi.ch to interactively
|
||
start the archive job: click on the "Archivable" button. You should
|
||
see the newly ingested datasets. Select all the datasets you want to
|
||
have archived and click 'Archive'. You should see the status change
|
||
from 'datasetCreated' to 'scheduleArchiveJob'. This indicates that the
|
||
data is in the process of being transferred to CSCS. After some time
|
||
the dataset's status will change to 'datasetOnAchive' indicating the
|
||
data is stored.
|
||
|
||
A third option is to use a command line version datasetArchiver.
|
||
|
||
datasetArchiver [options] (ownerGroup | space separated list of datasetIds)
|
||
|
||
You must choose either an ownerGroup, in which case all archivable datasets
|
||
of this ownerGroup not yet archived will be archived.
|
||
Or you choose a (list of) datasetIds, in which case all archivable datasets
|
||
of this list not yet archived will be archived.
|
||
|
||
List of options:
|
||
|
||
-devenv
|
||
Use development environment instead or production
|
||
-localenv
|
||
Use local environment (local) instead or production
|
||
-noninteractive
|
||
Defines if no questions will be asked, just do it - make sure you know what you are doing
|
||
-tapecopies int
|
||
Number of tapecopies to be used for archiving (default 1)
|
||
-testenv
|
||
Use test environment (qa) instead or production
|
||
-token string
|
||
Defines optional API token instead of username:password
|
||
-user string
|
||
Defines optional username and password
|
||
|
||
# Retrieve<a id="sec-6" name="sec-6"></a>
|
||
|
||
Here we describe the retrieval via the command line tools. A retrieve
|
||
process via a desktop GUI application is described in the section SciCatArchiver GUI .
|
||
|
||
Retrieving is two-step process: first the data is copied from tape to a
|
||
central retrieve server. From there the data needs to be copied to the
|
||
final destination system of your choice.
|
||
|
||
## First Step<a id="sec-6-1" name="sec-6-1"></a>
|
||
|
||
For the first step: login to <https://discovery.psi.ch> , find the
|
||
datasets you want to retrieve and selected all "Retrievable" datasets
|
||
by clicking the corresponding button. Finally click the retrieve
|
||
button. This will create a retrieve job. Once it is finshed you will
|
||
get an email. Depending on the size of your datasets this may take
|
||
minutes (e.g. for 1GB) up to days (e.g for 100TB)
|
||
|
||
## Second Step (for Linux)<a id="sec-6-2" name="sec-6-2"></a>
|
||
|
||
### Standard commands<a id="sec-6-2-1" name="sec-6-2-1"></a>
|
||
|
||
For the second step you can use the **datasetRetriever** command, which
|
||
uses the rsync protocol to copy the data to your destination.
|
||
|
||
Tool to retrieve datasets from the intermediate cache server of the tape archive
|
||
to the destination path on your local system.
|
||
Run script with 1 argument:
|
||
|
||
datasetRetriever [options] local-destination-path
|
||
|
||
Per default all available datasets on the retrieve server will be fetched.
|
||
Use option -dataset or -ownerGroup to restrict the datasets which should be fetched.
|
||
|
||
-chksum
|
||
Switch on optional chksum verification step (default no checksum tests)
|
||
-dataset string
|
||
Defines single dataset to retrieve (default all available datasets)
|
||
-devenv
|
||
Use development environment (default is to use production system)
|
||
-ownergroup string
|
||
Defines to fetch only datasets of the specified ownerGroup (default is to fetch all available datasets)
|
||
-retrieve
|
||
Defines if this command is meant to actually copy data to the local system (default nothing is done)
|
||
-testenv
|
||
Use test environment (qa) (default is to use production system)
|
||
-token string
|
||
Defines optional API token instead of username:password
|
||
-user string
|
||
Defines optional username and password (default is to prompt for username and password)
|
||
|
||
For the program to check which data is available on the cache server
|
||
and if the catalog knows about these datasets, you can use:
|
||
|
||
datasetRetriever my-local-destination-folder
|
||
|
||
======Checking for available datasets on archive cache server ebarema4in.psi.ch:
|
||
|
||
Dataset ID Size[MB] Owner SourceFolder
|
||
===================================================================
|
||
0f6fe8b3-d3f1-4cfb-a1af-0464c901a24f 1895 p16371 /sls/MX/Data10/e16371/20171017_E2/cbfs/2017-10-17_22-28-30_Na108_thau7_100degs_dtz60_f_500_Hz_Eth0_6200_eV
|
||
58f2037e-3f9b-4e08-8963-c70c3d29c068 1896 p16371 /sls/MX/Data10/e16371/20171017_E2/cbfs/2017-10-17_21-41-02_cca385a_lyso8_100degs_f_500_Hz_Eth0_6200_eV
|
||
cf8e5b25-9c76-49a7-80d9-fd38a71e0ef8 3782 p16371 /sls/MX/Data10/e16371/20171017_E2/cbfs/2017-10-18_10-15-41_na108_thau6_50degs_lowdose_pos1_f_500_Hz_Eth0_6200_eV
|
||
df1c7a17-2caa-41ee-af6e-c3cf4452af17 1893 p16371 /sls/MX/Data10/e16371/20171017_E2/cbfs/2017-10-17_20-58-34_cca385a_lyso3_100degs_f_500_Hz_Eth0_6200_eV
|
||
|
||
If you want you can skip the previous step and
|
||
directly trigger the file copy by adding the -retrieve flag:
|
||
|
||
datasetRetriever -retrieve <local destinationFolder>
|
||
|
||
This will copy the files into the destinationFolder using the original
|
||
sourceFolder path beneath the destinationFolder. This is especially
|
||
useful if you want to retrieve many datasets, which you expect to
|
||
appear in the same folder structure as originally.
|
||
S
|
||
Optionally you can also verify the consistency of the copied data by
|
||
using the -chksum flag
|
||
|
||
datasetRetriever -retrieve -chksum <local destinationFolder>
|
||
|
||
If you just want to retrieve a single dataset do the following:
|
||
|
||
datasetRetriever -retrieve -dataset <datasetId> <local destinationFolder>
|
||
|
||
If you want to retrieve all datasets of a given **ownerGroup** do the following:
|
||
|
||
datasetRetriever -retrieve -ownergroup <group> <local destinationFolder>
|
||
|
||
### Expert commands<a id="sec-6-2-2" name="sec-6-2-2"></a>
|
||
|
||
If you prefer to have more control over the file transfer you are free
|
||
to type your own rsync commands, e.g. to simply the folders available
|
||
in the retrieve cache do:
|
||
|
||
rsync -e ssh --list-only pb-retrieve.psi.ch:retrieve/
|
||
|
||
To actually copy the data over use:
|
||
|
||
rsync -e ssh -av pb-retrieve.psi.ch:retrieve/{shortDatasetId} your-destination-target/
|
||
|
||
In this case the shortDatsetId is the dataseid id without the PSI
|
||
prefix, e.g. for dataset PID
|
||
20.500.11935/08bc2944-e09e-48da-894d-0c5c47977553 the shortDatasetId
|
||
is 08bc2944-e09e-48da-894d-0c5c47977553
|
||
|
||
## Second Step (for Windows)<a id="sec-6-3" name="sec-6-3"></a>
|
||
|
||
The second step for Windows is instead using the sftp
|
||
protocol. Therefore any sftp client for Windows, like e.g. Filezilla,
|
||
can then be used to retrieve the data to your local Windows PC. The
|
||
following connection information must be provided, taking the command
|
||
line client access via powershell as an example
|
||
|
||
# for the production system
|
||
sftp -P 4222 your-username@pb-retrieve.psi.ch
|
||
# or for the test system
|
||
sftp -P 4222 your-username@pbt-retrieve.psi.ch
|
||
|
||
After the connection is built up you can copy files recursively,
|
||
e.g. using the "get -r \*" command. With the filezilla GUI you can
|
||
achieve the same via drag and drop operations
|
||
|
||
# Ingest, Archive and Retrieve with QT desktop application SciCat<a id="sec-7" name="sec-7"></a>
|
||
|
||
## Important Update since April 14th 2022:<a id="sec-7-1" name="sec-7-1"></a>
|
||
|
||
You currently first need to get a token before you can use SciCat: the
|
||
easiest to get such an API token is to sign it at
|
||
<https://discovery.psi.ch>, then follow the "Login with PSI account"
|
||
button. This will bring you to the user settings page, from where you
|
||
can copy the token with a click on the corresponding copy button.
|
||
|
||
## General considerations<a id="sec-7-2" name="sec-7-2"></a>
|
||
|
||
`SciCat` is a GUI based tool designed to make initial
|
||
ingests easy. It is especially useful, to ingest data, which can not
|
||
be ingested automatically. Therefore it is designed in particular to
|
||
assist you when archiving derived datasets. Often, the archival of
|
||
derived data cannot be scheduled in advance, nor does it follow a
|
||
strict file structure. The `SciCat` GUI can help you to ingest such
|
||
datasets more easily. Yet, the ingestion of raw datasets is also
|
||
supported. Additionally, the tool also allows for the convenient
|
||
retrieval of datasets.
|
||
|
||
## Getting started<a id="sec-7-3" name="sec-7-3"></a>
|
||
|
||
Currently, `SciCat` is supported on PSI-hosted **Linux** and **Windows**
|
||
systems and is accessible on the Ra cluster as part of the datacatalog
|
||
module: just type
|
||
|
||
module load datacatalog
|
||
|
||
Then the software can be started with
|
||
|
||
SciCat
|
||
|
||
On the SLS beamline consoles the software is also pre-installed in the
|
||
/work/sls/bin folder, which is part of the standard PATH variable.
|
||
|
||
If you are not working on the Ra cluster you can download the
|
||
software on Linux:
|
||
|
||
/usr/bin/curl -O https://gitlab.psi.ch/scicat/tools/raw/master/linux/SciCat;chmod +x ./SciCat
|
||
|
||
On Windows the executable can be downloaded from
|
||
|
||
https://gitlab.psi.ch/scicat/tools/-/blob/master/windows/SciCatGUI_Win10.zip
|
||
|
||
To start the GUI, unzip the directory and execute SciCat.exe
|
||
|
||
## Login and permissions<a id="sec-7-4" name="sec-7-4"></a>
|
||
|
||
After starting the GUI, you will be asked for a username and password. Please
|
||
enter your PSI credentials. Functional accounts are not supported.
|
||
|
||
## Pgroup selection<a id="sec-7-5" name="sec-7-5"></a>
|
||
|
||
The first step is always to select the pgroup. If there is no proposal assigned to
|
||
this account, you will have to specify the information about the PI manually.
|
||
|
||

|
||
|
||
## Archiving<a id="sec-7-6" name="sec-7-6"></a>
|
||
|
||
After selection the files, you will be prompted with a metadata editor, where you can modify
|
||
the general info, such as dataset name, description etc. Please make
|
||
sure that you select the correct data type (raw or derived). As a general rule of thumb, it is
|
||
a derived dataset if you can specify a raw dataset as input. If you want to ingest a derived dataset,
|
||
you can specify corresponding raw datasets on the "Input datasets" tab.
|
||
To edit scientific metadata, switch to "Scientific metadata" tab.
|
||
|
||
## Retrieval<a id="sec-7-7" name="sec-7-7"></a>
|
||
|
||
Retrieving successfully archived datasets from SciCat is a two-step process. First you will have to
|
||
retrieve to an intermediate server. Once the data is there, you will be notified by email.
|
||
The final step is to copy the data to the final destination on your machine.
|
||
Both steps can be steered from within the GUI.
|
||
|
||
On the retrieve page, all datasets of your pgroup are listed. If the data has been archived successfully,
|
||
the cell in column "retrievable" is set to "true". To retrieve the data to the intermediate file server,
|
||
select the datasets that you want to retrieve and click on "Retrieve." After the retrieval, the column
|
||
"retrieved" is set to true. You are now able to start copying the data to you local machine by selecting
|
||
the desired datasets and clicking on "Save."
|
||
|
||
## Settings<a id="sec-7-8" name="sec-7-8"></a>
|
||
|
||
Additional settings, such as the default value for certain fields can be modified in settings panel (button
|
||
on the lower left corner).
|
||
|
||
# Publish<a id="sec-8" name="sec-8"></a>
|
||
|
||
As part of a publication workflow datasets must become citable via a
|
||
digital object identifier (DOI). This assignment is done as part of
|
||
the publication workflow described below. The publication then can
|
||
link to these published datasets using this DOI. The DOIs can link to
|
||
both raw and/or derived datasets. The published data and therefore the
|
||
DOI ususally refers to a **set** of Datasets, thus avoiding the need to
|
||
list potentially thousands of individual dataset identifiers in a
|
||
journal publication.
|
||
|
||
You publish data in the following way: go to <https://discovery.psi.ch> ,
|
||
login and select all the datasets, that you want to publish under a
|
||
new DOI.
|
||
|
||

|
||
|
||
Then you add these datasest a a "shopping cart" by using the "add to
|
||
Cart" button. You can repeat this often as needed. Once finished with
|
||
the selection you can "check out" the cart (click on the cart in the
|
||
top bar) and pick the "Publish" action.
|
||
|
||

|
||
|
||
This opens a form
|
||
with prefilled information derived from the connected proposal
|
||
data. This data can then be edited by the user and finally saved.
|
||
|
||

|
||
|
||
This defines the data as to be published and makes it known to the
|
||
data catalog, but the corresponding DOI is not yet made globally
|
||
available. For this last step to happen, someone with access to this
|
||
newly generated published data definition (e.g. the person defining
|
||
the published data or e.g. the PI) has to hit the "register"
|
||
button. This will trigger the global publication of the DOI. The links
|
||
on <https://doi.org> are usually updated within one day, so wait one day
|
||
before following these links or searching for the doi via the doi
|
||
reolver.
|
||
|
||
All published data definitions are then openly available via the so
|
||
called "Landing Pages", which are hosted on <https://doi.psi.ch> .
|
||
|
||
The file data itself data becomes available via the normal data export
|
||
System of the Ra cluster, which requires however a PSI account. If you
|
||
want to make the file data anonymously available you need to send a
|
||
corresponding request to stephan.egli@psi.ch for now. This process is
|
||
planned to be automated in future.
|
||
|
||
For now all publication are triggered by a scientist explicitly,
|
||
whenever necessary. In future in addition an automated publication
|
||
after the embargo period (default 3 years after data taking) will be
|
||
implemented (details to be defined)
|
||
|
||
# Cleanup and Retention<a id="sec-9" name="sec-9"></a>
|
||
|
||
This part is not yet defined.
|
||
|
||
# Troubleshooting<a id="sec-10" name="sec-10"></a>
|
||
|
||
## Locale error message<a id="sec-10-1" name="sec-10-1"></a>
|
||
|
||
If you get error messages like the following (so far only happened
|
||
from Mac Computers)
|
||
|
||
perl: warning: Setting locale failed.
|
||
perl: warning: Please check that your locale settings:
|
||
....
|
||
|
||
then you need to prevent that the Mac ssh client sends the
|
||
LC<sub>TYPE</sub> variable. Just follow the description in:
|
||
<https://www.cyberciti.biz/faq/os-x-terminal-bash-warning-setlocale-lc_ctype-cannot-change-locale/>
|
||
|
||
## Invalid certificate messages<a id="sec-10-2" name="sec-10-2"></a>
|
||
|
||
The following message can be safely ignored:
|
||
|
||
key_cert_check_authority: invalid certificate
|
||
Certificate invalid: name is not a listed principal
|
||
|
||
It indicates that no kerberos token was provided for authentication.
|
||
You can avoid the warning by first running kinit (PSI linux systems).
|
||
|
||
## Long Running copy commands<a id="sec-10-3" name="sec-10-3"></a>
|
||
|
||
For decentral ingestion cases, the copy step is indicated by a message
|
||
'Running [/usr/bin/rsync -e ssh -avxz …'. It is expected that this
|
||
step will take a long time and may appear to have hung. You can check
|
||
what files have been successfully transfered using rsync:
|
||
|
||
rsync--list-only user_n@pb-archive.psi.ch:archive/UID/PATH/
|
||
|
||
where UID is the dataset ID (12345678-1234-1234-1234-123456789012) and
|
||
PATH is the absolute path to your data. Note that rsync creates
|
||
directories first and that the transfer order is not alphabetical in
|
||
some cases, but it should be possible to see whether any data has
|
||
transferred.
|
||
|
||
## Kerberos tickets<a id="sec-10-4" name="sec-10-4"></a>
|
||
|
||
As a normal user you should have a valid Kerberos ticket. This is
|
||
usually the case on the centrally provided Linux machines
|
||
automtically. You can verify the existence with the "klist"
|
||
command. In case no valid ticket is returned you have to get one using
|
||
the "kinit" command. (Note: beamline accounts do not need this)
|
||
|
||
klist
|
||
# if no Ticket listed get one by
|
||
kinit
|
||
|
||
## Instructions to set ACLS in AFS<a id="sec-10-5" name="sec-10-5"></a>
|
||
|
||
In the AFS file system the user have to permit access to the
|
||
sourceFolder by setting read and lookup ACL permission for the AFS
|
||
group “pb-archive”. The easiest way to achieve is to run the following
|
||
script with the sourceFolder as an argunent
|
||
|
||
/afs/psi.ch/service/bin/pb_setacl.sh sourceFolder
|
||
|
||
This script must be run by a person who has the rights to modify the
|
||
access rights in AFS.
|
||
|
||
# Appendix<a id="sec-11" name="sec-11"></a>
|
||
|
||
## Installation of Tools<a id="sec-11-1" name="sec-11-1"></a>
|
||
|
||
### Access to the SciCat GUI<a id="sec-11-1-1" name="sec-11-1-1"></a>
|
||
|
||
For the access to the SciCat web-based user interface no software
|
||
needs to be installed, simply use your browser to go to
|
||
<https://discovery.psi.ch>.
|
||
|
||
### Loading datacatalog tools on Clusters<a id="sec-11-1-2" name="sec-11-1-2"></a>
|
||
|
||
The latest datacatalog software is maintained in the PSI module system
|
||
on the main clusters (Ra, Merlin). To access it from PSI linux
|
||
systems, run the following command:
|
||
|
||
module load datacatalog
|
||
|
||
### (Non-standard Linux systems) Installing datacatalog tools<a id="sec-11-1-3" name="sec-11-1-3"></a>
|
||
|
||
If you do not have access to PSI modules (for instance, when archiving
|
||
from Ubuntu systems), then you can install the datacatalog software
|
||
yourself. These tools require 64-bit linux.
|
||
|
||
I suggest storing the SciCat scripts in ~/bin so that they can be
|
||
easily accessed.
|
||
|
||
mkdir -p ~/bin
|
||
cd ~/bin
|
||
/usr/bin/curl -O https://gitlab.psi.ch/scicat/tools/raw/master/linux/datasetIngestor
|
||
chmod +x ./datasetIngestor
|
||
/usr/bin/curl -O https://gitlab.psi.ch/scicat/tools/raw/master/linux/datasetRetriever
|
||
chmod +x ./datasetRetriever
|
||
/usr/bin/curl -O https://gitlab.psi.ch/scicat/tools/raw/master/linux/SciCat
|
||
chmod +x ./SciCat
|
||
|
||
When the scripts are updated you will be prompted to re-run some of
|
||
the above commands to get the latest version.
|
||
|
||
You can call the ingestion scripts using the full path
|
||
(~/bin/datasetIngestor) or else add ~/bin to your unix PATH. To do so,
|
||
add the following line to your ~/.bashrc file:
|
||
|
||
export PATH="$HOME/bin:$PATH"
|
||
|
||
### Installation on Windows Systems<a id="sec-11-1-4" name="sec-11-1-4"></a>
|
||
|
||
On Windows the executables can be downloaded from the following URL,
|
||
just enter the address in abrowser and download the file
|
||
|
||
https://gitlab.psi.ch/scicat/tools/-/blob/master/windows/datasetIngestor.exe
|
||
https://gitlab.psi.ch/scicat/tools/-/blob/master/windows/SciCatGUI_Win10.zip
|
||
|
||
### Online work stations in beamline hutches<a id="sec-11-1-5" name="sec-11-1-5"></a>
|
||
|
||
The command line tools are pre-installed in /work/sls/bin. No further
|
||
action needed
|
||
|
||
## Dataset limitations<a id="sec-11-2" name="sec-11-2"></a>
|
||
|
||
### Size limitations<a id="sec-11-2-1" name="sec-11-2-1"></a>
|
||
|
||
- a single dataset should currently not have more than 400k files
|
||
- a single dataset should not be larger than 50 TB
|
||
- recommended size of a single dataset: between 1GB and 1TB
|
||
|
||
### SourceFolder and file names limitations<a id="sec-11-2-2" name="sec-11-2-2"></a>
|
||
|
||
The sourceFolder metadata and the name of the files can contain the following special characters:
|
||
- \%
|
||
- \#
|
||
- \-
|
||
- \+
|
||
- \.
|
||
- \:
|
||
- \=
|
||
- \@
|
||
- \_
|
||
|
||
Any other special characters are not guaranteed to work.
|
||
|
||
## Recommended file structure for raw datasets<a id="sec-11-3" name="sec-11-3"></a>
|
||
|
||
One recommended way of structuring your data on disk is the following:
|
||
|
||
e12345 <--- user's group e-account, linked to a DUO proposal
|
||
|
||
- sampleName <-- contains measurement for a given sample
|
||
- datasetfolder1 <-- name can be anything
|
||
... in here all the files, and only the files
|
||
... which make up a measurement
|
||
- datasetfolder2 <-- name can be anything
|
||
... dito
|
||
- etc...
|
||
- derived-dataset1 (optional, for online processed data
|
||
name should contain "derived")
|
||
... in here all the files and only the files
|
||
... which make up the derived data
|
||
- derived-dataset2
|
||
... dito
|
||
|
||
- nextSampleName...
|
||
|
||
e12375 <--- next user's group e-account
|
||
|
||
## Metadata Field Definitions<a id="sec-11-4" name="sec-11-4"></a>
|
||
|
||
The following table defines the mandatory and optional fields for the
|
||
administrative metadata, which have to be provided (status June
|
||
2021). All fields marked "m" are mandatory, the rest is optional. Some
|
||
fields are filled automatically if possible, see comments. For the
|
||
most recent status see this URL
|
||
<https://scicatproject.github.io/api-documentation/> and follow the link
|
||
called "Model" for the respective datamodel (e.g. Dataset), visible
|
||
e.g. inside the GET API call section. Or see the model definitions as
|
||
defined in the SciCat backend, see the json files in
|
||
<https://github.com/SciCatProject/catamel/tree/develop/common/models>
|
||
|
||
All "Date" fields must follow the date/time format defined in RFC
|
||
3339, section 5.6, see <https://www.ietf.org/rfc/rfc3339.txt>
|
||
|
||
### Metadata field definitions for datasets of type "base"<a id="sec-11-4-1" name="sec-11-4-1"></a>
|
||
|
||
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
|
||
|
||
|
||
<colgroup>
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
</colgroup>
|
||
<thead>
|
||
<tr>
|
||
<th scope="col" class="left">field</th>
|
||
<th scope="col" class="left">type</th>
|
||
<th scope="col" class="left">must</th>
|
||
<th scope="col" class="left">comment</th>
|
||
</tr>
|
||
</thead>
|
||
|
||
<tbody>
|
||
<tr>
|
||
<td class="left">pid</td>
|
||
<td class="left">string</td>
|
||
<td class="left">m</td>
|
||
<td class="left">filled by API automatically, do **not** provide this</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">owner</td>
|
||
<td class="left">string</td>
|
||
<td class="left">m</td>
|
||
<td class="left">filled by datasetIngestor if missing</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">ownerEmail</td>
|
||
<td class="left">string</td>
|
||
<td class="left"> </td>
|
||
<td class="left">filled by datasetIngestor if missing</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">orcidOfOwner</td>
|
||
<td class="left">string</td>
|
||
<td class="left"> </td>
|
||
<td class="left"> </td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">contactEmail</td>
|
||
<td class="left">string</td>
|
||
<td class="left">m</td>
|
||
<td class="left">filled by datasetIngestor if missing</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">datasetName</td>
|
||
<td class="left">string</td>
|
||
<td class="left"> </td>
|
||
<td class="left">set to "tail" of sourceFolder path if missing</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">sourceFolder</td>
|
||
<td class="left">string</td>
|
||
<td class="left">m</td>
|
||
<td class="left"> </td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">size</td>
|
||
<td class="left">number</td>
|
||
<td class="left"> </td>
|
||
<td class="left">autofilled when OrigDataBlock created</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">packedSize</td>
|
||
<td class="left">number</td>
|
||
<td class="left"> </td>
|
||
<td class="left">autofilled when DataBlock created</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">creationTime</td>
|
||
<td class="left">date</td>
|
||
<td class="left">m</td>
|
||
<td class="left">filled by API if missing</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">type</td>
|
||
<td class="left">string</td>
|
||
<td class="left">m</td>
|
||
<td class="left">(raw, derived…)</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">validationStatus</td>
|
||
<td class="left">string</td>
|
||
<td class="left"> </td>
|
||
<td class="left"> </td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">keywords</td>
|
||
<td class="left">Array[string]</td>
|
||
<td class="left"> </td>
|
||
<td class="left"> </td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">description</td>
|
||
<td class="left">string</td>
|
||
<td class="left"> </td>
|
||
<td class="left"> </td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">classification</td>
|
||
<td class="left">string</td>
|
||
<td class="left"> </td>
|
||
<td class="left">filled by API or datasetIngestor if missing</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">license</td>
|
||
<td class="left">string</td>
|
||
<td class="left"> </td>
|
||
<td class="left">filled by datasetIngestor if missing (CC By-SA 4.0)</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">version</td>
|
||
<td class="left">string</td>
|
||
<td class="left"> </td>
|
||
<td class="left">autofilled by API</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">doi</td>
|
||
<td class="left">string</td>
|
||
<td class="left"> </td>
|
||
<td class="left">filled as part of publication workflow</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">isPublished</td>
|
||
<td class="left">boolean</td>
|
||
<td class="left"> </td>
|
||
<td class="left">filled by datasetIngestor if missing (false)</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">ownerGroup</td>
|
||
<td class="left">string</td>
|
||
<td class="left">m</td>
|
||
<td class="left">must be filled explicitly</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">accessGroups</td>
|
||
<td class="left">Array[string]</td>
|
||
<td class="left"> </td>
|
||
<td class="left">filled by datasetIngestor to beamline specific group</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left"> </td>
|
||
<td class="left"> </td>
|
||
<td class="left"> </td>
|
||
<td class="left">derived from creationLocation</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left"> </td>
|
||
<td class="left"> </td>
|
||
<td class="left"> </td>
|
||
<td class="left">e.g. /PSI/SLS/TOMCAT -> accessGroups=["slstomcat"]</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
|
||
### Additional fields for type="raw"<a id="sec-11-4-2" name="sec-11-4-2"></a>
|
||
|
||
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
|
||
|
||
|
||
<colgroup>
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
</colgroup>
|
||
<thead>
|
||
<tr>
|
||
<th scope="col" class="left">field</th>
|
||
<th scope="col" class="left">type</th>
|
||
<th scope="col" class="left">must</th>
|
||
<th scope="col" class="left">comment</th>
|
||
</tr>
|
||
</thead>
|
||
|
||
<tbody>
|
||
<tr>
|
||
<td class="left">principalInvestigator</td>
|
||
<td class="left">string</td>
|
||
<td class="left">m</td>
|
||
<td class="left">filled in datasetIngestor if missing (proposal must exist)</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">endTime</td>
|
||
<td class="left">date</td>
|
||
<td class="left"> </td>
|
||
<td class="left">filled from datasetIngetor if missing</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">creationLocation</td>
|
||
<td class="left">string</td>
|
||
<td class="left">m</td>
|
||
<td class="left">see known Instrument list below</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">dataFormat</td>
|
||
<td class="left">string</td>
|
||
<td class="left"> </td>
|
||
<td class="left"> </td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">scientificMetadata</td>
|
||
<td class="left">object</td>
|
||
<td class="left"> </td>
|
||
<td class="left"> </td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">proposalId</td>
|
||
<td class="left">string</td>
|
||
<td class="left"> </td>
|
||
<td class="left">filled by API automatically if missing</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
|
||
### Additional fields for type="derived"<a id="sec-11-4-3" name="sec-11-4-3"></a>
|
||
|
||
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
|
||
|
||
|
||
<colgroup>
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
</colgroup>
|
||
<thead>
|
||
<tr>
|
||
<th scope="col" class="left">field</th>
|
||
<th scope="col" class="left">type</th>
|
||
<th scope="col" class="left">must</th>
|
||
<th scope="col" class="left">comment</th>
|
||
</tr>
|
||
</thead>
|
||
|
||
<tbody>
|
||
<tr>
|
||
<td class="left">investigator</td>
|
||
<td class="left">string</td>
|
||
<td class="left">m</td>
|
||
<td class="left"> </td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">inputDatasets</td>
|
||
<td class="left">Array[string]</td>
|
||
<td class="left">m</td>
|
||
<td class="left"> </td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">usedSoftware</td>
|
||
<td class="left">string</td>
|
||
<td class="left">m</td>
|
||
<td class="left"> </td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">jobParameters</td>
|
||
<td class="left">object</td>
|
||
<td class="left"> </td>
|
||
<td class="left"> </td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">jobLogData</td>
|
||
<td class="left">string</td>
|
||
<td class="left"> </td>
|
||
<td class="left"> </td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">scientificMetadata</td>
|
||
<td class="left">object</td>
|
||
<td class="left"> </td>
|
||
<td class="left"> </td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
|
||
## About Scientific Values and Units<a id="sec-11-5" name="sec-11-5"></a>
|
||
|
||
It is strongly recommended that physical quantities are stored in the
|
||
following format (the field names are just examples, the structure
|
||
with the two fields "value" and "unit" is important here)
|
||
|
||
"scientificMetadata": {
|
||
...
|
||
"beamlineParameters": {
|
||
"Ring current": {
|
||
"value": 402.246,
|
||
"unit": "mA"
|
||
},
|
||
"Beam energy": {
|
||
"value": 22595,
|
||
"unit": "eV"
|
||
}
|
||
}
|
||
....
|
||
}
|
||
|
||
In future for such quantities the data catalog will automatically add
|
||
two additional fields "valueSI" and "unitSI" with the corresponding
|
||
SI units. The rationale for this is to support value queries in a
|
||
reliable manner across datasets with potentially different units
|
||
chosen for the same quantity:
|
||
|
||
"scientificMetadata": {
|
||
...
|
||
"beamlineParameters": {
|
||
"Ring current": {
|
||
"value": 402.246,
|
||
"unit": "mA",
|
||
"valueSI": 0.402246,
|
||
"unitSI": "A"
|
||
},
|
||
"Beam energy": {
|
||
"value": 22595,
|
||
"unit": "eV",
|
||
"valueSI": 3.6201179E-15
|
||
"unitSI":"J"
|
||
}
|
||
}
|
||
....
|
||
}
|
||
|
||
## Use Case Examples<a id="sec-11-6" name="sec-11-6"></a>
|
||
|
||
### Use Case: Manual ingest using datasetIngestor program<a id="sec-11-6-1" name="sec-11-6-1"></a>
|
||
|
||
1. Overview
|
||
|
||
Data owners may want to define in an adhoc manner the creation of
|
||
datasets in order to allow a subsequent archiving of the data. The
|
||
most important use cases are
|
||
|
||
- raw data from a beamline
|
||
- derived data created by a scientist
|
||
- archiving of historic data
|
||
- archiving of data stored on local (decentral) file storage systems
|
||
|
||
For this purpose a command line client **datasetIngestor** is provided
|
||
which allows to
|
||
|
||
- ingest the meta data and files
|
||
- optionally copy the data to a central cache file server
|
||
|
||
The necessary steps to use this tool are now described:
|
||
|
||
2. Preparation of the meta data
|
||
|
||
You need to create a file metadata.json defining at least the
|
||
administrative metadata
|
||
|
||
3. Example of minimal json file for raw data:
|
||
|
||
{
|
||
"creationLocation": "/PSI/SLS/TOMCAT",
|
||
"sourceFolder": "/scratch/devops",
|
||
"type": "raw",
|
||
"ownerGroup":"p16623"
|
||
}
|
||
|
||
4. Example for raw data including scientific metadata
|
||
|
||
{
|
||
"principalInvestigator": "egon.meier@psi.ch",
|
||
"creationLocation": "/PSI/SLS/TOMCAT",
|
||
"dataFormat": "Tomcat pre HDF5 format 2017",
|
||
"sourceFolder": "/sls/X02DA/data/e12345/Data10/disk3/817b_B2_",
|
||
"owner": "Egon Meier",
|
||
"ownerEmail": "egon.meier@psi.ch",
|
||
"type": "raw",
|
||
"description": "Add a short description here for this dataset ...",
|
||
"ownerGroup": "p12345",
|
||
"scientificMetadata": {
|
||
"beamlineParameters": {
|
||
"Monostripe": "Ru/C",
|
||
"Ring current": {
|
||
"value": 0.402246,
|
||
"unit": "A"
|
||
},
|
||
"Beam energy": {
|
||
"value": 22595,
|
||
"unit": "eV"
|
||
}
|
||
},
|
||
"detectorParameters": {
|
||
"Objective": 20,
|
||
"Scintillator": "LAG 20um",
|
||
"Exposure time": {
|
||
"value": 0.4,
|
||
"unit": "s"
|
||
}
|
||
},
|
||
"scanParameters": {
|
||
"Number of projections": 1801,
|
||
"Rot Y min position": {
|
||
"value": 0,
|
||
"unit": "deg"
|
||
},
|
||
"Inner scan flag": 0,
|
||
"File Prefix": "817b_B2_",
|
||
"Sample In": {
|
||
"value": 0,
|
||
"unit": "m"
|
||
},
|
||
"Number of darks": 10,
|
||
"Rot Y max position": {
|
||
"value": 180,
|
||
"unit": "deg"
|
||
},
|
||
"Angular step": {
|
||
"value": 0.1,
|
||
"unit": "deg"
|
||
},
|
||
"Number of flats": 120,
|
||
"Sample Out": {
|
||
"value": -0.005,
|
||
"unit": "m"
|
||
},
|
||
"Flat frequency": 0,
|
||
"Number of inter-flats": 0
|
||
}
|
||
}
|
||
}
|
||
|
||
5. Example of minimal json file for derived data:
|
||
|
||
{ "sourceFolder" : "/data/test/myExampleData",
|
||
"type" : "derived",
|
||
"ownerGroup": "p12345",
|
||
"investigator":"federika.marone@psi.ch",
|
||
"inputDatasets": ["/data/test/input1.dat",
|
||
"20.500.11935/000031f3-0675-4d30-b5ca-b9c674bcf027"],
|
||
"usedSoftware": ["https://gitlab.psi.ch/MyAnalysisRepo/tomcatScripts/commit/60629a1cbef493a26aac626602ba8f1a6c9e14d2"]
|
||
}
|
||
|
||
- owner and contactEmail will be filled automatically
|
||
- important: in case you ingest derived datasets with a **beamline
|
||
account** , such as slstomcat (instead of a personal account), you **have** to add the beamline account
|
||
to the accessGroups field like this:
|
||
|
||
{ "sourceFolder" : "/data/test/myExampleData",
|
||
"type" : "derived",
|
||
"ownerGroup": "p12345",
|
||
"accessGroups":["slstomcat"],
|
||
"investigator":"federika.marone@psi.ch",
|
||
"inputDatasets": ["/data/test/input1.dat",
|
||
"20.500.11935/000031f3-0675-4d30-b5ca-b9c674bcf027"],
|
||
"usedSoftware": ["https://gitlab.psi.ch/MyAnalysisRepo/tomcatScripts/commit/60629a1cbef493a26aac626602ba8f1a6c9e14d2"]
|
||
}
|
||
|
||
1. Extended derived example
|
||
|
||
{
|
||
"sourceFolder": "/some/folder/containg/the/derived/data",
|
||
"owner": "Thomas Meier",
|
||
"ownerEmail": "thomas.meier@psi.ch",
|
||
"contactEmail": "eugen.mueller@psi.ch",
|
||
"type": "derived",
|
||
"ownerGroup": "p13268",
|
||
"creationTime": "2011-09-14T12:08:25.000Z",
|
||
"investigator": "thomas.meier@psi.ch",
|
||
"inputDatasets": [
|
||
"20.500.11935/000031f3-0675-4d30-b5ca-b9c674bcf027",
|
||
"20.500.11935/000031f3-0675-4d30-b5ca-b9c674bcf028"
|
||
],
|
||
"usedSoftware": ["https://gitlab.psi.ch/MyAnalysisRepo/tomcatScripts/commit/60629a1cbef493a26aac626602ba8f1a6c9e14d2"]
|
||
}
|
||
|
||
6. Optionally: preparation of a file listing file
|
||
|
||
**Please note**: The following is only needed, if you do not want to
|
||
store all files in a source Folder, but just a **subset**. In this case
|
||
you can specify an explicit list of files and directories. Only the
|
||
files specified in this list will be stored as part of the
|
||
dataset. For the directories in this list it is implied that they are
|
||
recursively descended and all data contained in the directory is taken
|
||
Here is an example for a filelisting.txt file. All entries in this
|
||
textfiles are path names **relativ** to the sourceFolder specified in
|
||
the metadata.json file
|
||
|
||
Example of filelisting.txt
|
||
|
||
datafile1
|
||
datafile2
|
||
specialStuff/logfile1.log
|
||
allFilesInThisDirectory
|
||
|
||
7. Optionally: for multiple datasets to be created
|
||
|
||
If you have many sourceFolders containing data, each to be turned into
|
||
a dataset then the easiest method is to define a 'folderlisting.txt'
|
||
file. (the file must have exactly this name). This is a useful option
|
||
to archive large amounts of "historic" data.
|
||
|
||
Each line in this file is the absolute path to the sourceFolder In
|
||
this case it is assumed, that the metadata.json file is valid for all
|
||
datasets and that **all** files inside the sourceFolder are part of the
|
||
dataset (i.e. you can **not** combine the filelisting.txt option with the
|
||
folderlisting.txt option)
|
||
|
||
Example of folderlisting.txt
|
||
|
||
/some/folder/containg/the/data/raw/sample1
|
||
/some/folder/containg/the/data/raw/sample2
|
||
/some/folder/containg/the/data/derived
|
||
|
||
8. Starting the ingest
|
||
|
||
Just run the following command in a terminal as a first test if
|
||
everything is okay. This is a so called "dry run" and nothing will
|
||
actually be stored, but the consistency of the data will be checked
|
||
and the folders will be scanned for files
|
||
|
||
datasetIngestor metadata.json [filelisting.txt | 'folderlisting.txt']
|
||
|
||
You will be prompted for your username and password.
|
||
|
||
If everything looks as expected you should now repeat the command with
|
||
the "–ingest" flag to actually store the dataset(s) in the data
|
||
catalog
|
||
|
||
datasetIngestor --ingest metadata.json [filelisting.txt | 'folderlisting.txt']
|
||
|
||
When the job is finshed all needed metadata will be ingested into the
|
||
data catalog (and for decentral data the data will be copied to the
|
||
central cache file server).
|
||
|
||
In addition you have the option to directly trigger the archiving of
|
||
the data to tape by adding the –autoarchive flag. Do this only if you
|
||
sure that this data is worth to be archived
|
||
|
||
### Use Case: Automated ingest of raw datasets from beamline or instruments<a id="sec-11-6-2" name="sec-11-6-2"></a>
|
||
|
||
1. Using the datasetIngestor Tool
|
||
|
||
This method usually requires a fully automatic ingestion procedure,
|
||
since data is produced at regular times and in a predictable way.
|
||
|
||
For each beamline this automation is done together with the experts
|
||
from the data catalog group and potentially with the help from the
|
||
controls /detector-integration groups. Please contact
|
||
scicatarchivemanager@psi.ch to get in touch.
|
||
|
||
The recommended method is to define preparation scripts, which
|
||
automatically produce the files metadata.json and optionally
|
||
filelisting.txt or folderlisting.txt (for multiple datasets) as you
|
||
would do in the manual case described in the previous section.
|
||
Example of such scripts can be provided by the data catalog team,
|
||
please contact scicatingestor@psi.ch for further help. The effort to
|
||
implement such a system depends very much on the availability of the
|
||
meta data as well as on the effort to convert the existing metadata to
|
||
the data catalog format inside the converter processes. If the meta
|
||
data is already available in some form in a file an estimate of the
|
||
order of magnitude of work needed per instrument is 1-2 person-weeks
|
||
of work, including test runs etc. But efforts may also be considerably
|
||
smaller or larger in some cases.
|
||
|
||
Then you run the datasetIngestor program usually under a beamline
|
||
specic account. In order to run fully automatic all potential
|
||
questions asked interactively by the program must be pre-answered
|
||
through a set of command line options:
|
||
|
||
datasetIngestor [options] metadata-file [filelisting-file|'folderlisting.txt']
|
||
|
||
-allowexistingsource
|
||
Defines if existing sourceFolders can be reused
|
||
-autoarchive
|
||
Option to create archive job automatically after ingestion
|
||
-copy
|
||
Defines if files should be copied from your local system to a central server before ingest.
|
||
-devenv
|
||
Use development environment instead of production environment (developers only)
|
||
-ingest
|
||
Defines if this command is meant to actually ingest data
|
||
-linkfiles string
|
||
Define what to do with symbolic links: (keep|delete|keepInternalOnly) (default "keepInternalOnly")
|
||
-noninteractive
|
||
If set no questions will be asked and the default settings for all undefined flags will be assumed
|
||
-tapecopies int
|
||
Number of tapecopies to be used for archiving (default 1)
|
||
-testenv
|
||
Use test environment (qa) instead of production environment
|
||
-user string
|
||
Defines optional username:password string
|
||
|
||
- here is a typical example using the MX beamline at SLS as an example
|
||
and ingesting a singel dataset with meta data defined in
|
||
metadata.json
|
||
|
||
datasetIngestor -ingest \
|
||
-linkfiles keepInternalOnly \
|
||
-allowexistingsource \
|
||
-user slsmx:XXXXXXXX \
|
||
-noninteractive \
|
||
metadata.json
|
||
|
||
This command must be called by the respective data acquisition systems
|
||
at a proper time, i.e. after all the files from the measurement run
|
||
have been written to disk and all metadata became available (often
|
||
this meta data is collected by the controls system).
|
||
|
||
2. HDF5 Files
|
||
|
||
If the raw data exists in form of HDF5 files, there is a good chance
|
||
that the meta data can be extracted from the HDF5 files' meta data. In
|
||
such a case the meta data extraction must be done as part of the part
|
||
beamline preparation scripts. Example of such HDF5 extraction scripts
|
||
exist which can the basis of a beamline specific solution, again
|
||
please contact scicatingestor@psi.ch. These scripts will mostly need
|
||
minimal adjustments for each beamline, mainly specifying the filter
|
||
conditions defining which of the meta data in the HDF5 file are to be
|
||
considered meta data for the data catalog.
|
||
|
||
Very often the whole dataset will only consist of one HDF5 file, thus
|
||
also simplifying the filelisting definition.
|
||
|
||
### Use Case: Ingest datasets stored on decentral systems<a id="sec-11-6-3" name="sec-11-6-3"></a>
|
||
|
||
These are data that you want to have archived for some reason, but are
|
||
not available on central file systems. Data from the old PSI archiv
|
||
system fall in this category or data from local PCs, Laptops or
|
||
instruments. If this data is not assigned to a p-group (given via the
|
||
DUO digital user office, usually linked to a proposal) then you must
|
||
assign this data to an a-group. The allocation of an "a-group" for
|
||
this kind of data must be done beforehand by a tool currently in
|
||
preparation at AIT. The "a-group" will define the ownership and
|
||
therefor the access to the data by listing a number of users onside the
|
||
group.
|
||
|
||
Otherwise just follow the description in the section "Manual ingest
|
||
using datasetIngestor program" and use the option -copy, e.g.
|
||
|
||
datasetIngestor -autoarchive -copy -ingest metadata.json
|
||
|
||
This command will copy the data to a central rsync server, from where
|
||
the archive system can then copy the files to tape, in this case
|
||
(option -autoarchive) the copy to archive tapes will happen automatically
|
||
|
||
On recent versions of the datasetIngestor program the program detects
|
||
automatically,if your data lies on central or decentral systems. In
|
||
the latter case it will, after a confirmation by the user, copy the
|
||
data automatically to the rsync cache server, even if the copy flag is
|
||
not provided.
|
||
|
||
### Use Case: Ingest datasets from simulations/model calculations<a id="sec-11-6-4" name="sec-11-6-4"></a>
|
||
|
||
These can be treated like datasets of type "base" or "raw". In the
|
||
latter case specify the field "creationLocation" as the name of the
|
||
server or cluster which produced the simulation files. Otherwise the
|
||
procedure is identical to the previous use case.
|
||
|
||
## Policy settings and email notifications<a id="sec-11-7" name="sec-11-7"></a>
|
||
|
||
The archiving process can further be configured via **policy**
|
||
parameters, e.g. if you require a second tape copy for very
|
||
precious data. Also the details about the notification settings by
|
||
email for both archive and retrieve jobs can be set here. You reach
|
||
the menu to set the policy values via the submenu `Policies`
|
||
in the dropdown menu to the top right of the GUI.
|
||
|
||
Emails are automatically sent at the start of every archive and
|
||
retrieve jobs as well as when the job finishes. The email is sent to
|
||
the person creating the jobs. In addition it is sent the list of
|
||
emails defined in the policy settings. Per default this list is empty
|
||
but can be extended by you. In the policy one can also switch off the
|
||
email notification. However emails about error conditions (which can
|
||
be either user caused or system caused) can not be switched off. Such
|
||
error messages are always sent to the user as well as the archive
|
||
administrators.
|
||
|
||
For user caused errors the user has to take action to repair the
|
||
situation. Typically error cases are, that the user has moved or
|
||
removed part or all of the files before archiving them. System errors
|
||
on the other hand have their reason inside the catalog and archive
|
||
system (e.g. a network connection problem or similar) and will be
|
||
taken care of by the archive managers. In such a case the user
|
||
creating the job will be informed manually, when the problem is fixed
|
||
again.
|
||
|
||
Policy parameters can be defined at site level or at ownerGroup
|
||
level. For each ownerGroup at least one manager must be defined
|
||
(e.g. a principal investigator (PI) via the linked proposal
|
||
information) in the policy model (field "manager") . Only the manager
|
||
can change the policy settings at ownerGroup level, but all group
|
||
mebers can see them.
|
||
|
||
Changes to this policy settings only effect future dataset creation
|
||
and archiving
|
||
|
||
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
|
||
|
||
|
||
<colgroup>
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
</colgroup>
|
||
<thead>
|
||
<tr>
|
||
<th scope="col" class="left">Parameter</th>
|
||
<th scope="col" class="left">Allowed Values</th>
|
||
<th scope="col" class="left">Default</th>
|
||
<th scope="col" class="left">Level</th>
|
||
</tr>
|
||
</thead>
|
||
|
||
<tbody>
|
||
<tr>
|
||
<td class="left">policyPublicationShiftInYears</td>
|
||
<td class="left">small positive integer, e.g. 3</td>
|
||
<td class="left">3</td>
|
||
<td class="left">Site (ro)</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">policyRetentionShiftInYears</td>
|
||
<td class="left">small positive integer, e.g. 10</td>
|
||
<td class="left">10</td>
|
||
<td class="left">Site (ro)</td>
|
||
</tr>
|
||
</tbody>
|
||
|
||
<tbody>
|
||
<tr>
|
||
<td class="left">autoArchive</td>
|
||
<td class="left">true/false</td>
|
||
<td class="left">false</td>
|
||
<td class="left">ownerGroup</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">tapeRedundancy</td>
|
||
<td class="left">low/medium/(high)</td>
|
||
<td class="left">low</td>
|
||
<td class="left">ownerGroup</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">archiveEmailNotification</td>
|
||
<td class="left">true/false</td>
|
||
<td class="left">false</td>
|
||
<td class="left">ownerGroup</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">archiveEmailsToBeNotified</td>
|
||
<td class="left">Array of additional emails</td>
|
||
<td class="left">[]</td>
|
||
<td class="left">ownerGroup</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">retrieveEmailNotification</td>
|
||
<td class="left">true/false</td>
|
||
<td class="left">false</td>
|
||
<td class="left">ownerGroup</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">retrieveEmailsToBeNotified</td>
|
||
<td class="left">Array of additional emails</td>
|
||
<td class="left">[]</td>
|
||
<td class="left">ownerGroup</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">(archiveDelayInDays)</td>
|
||
<td class="left">small positive integer, e.g. 7</td>
|
||
<td class="left">0</td>
|
||
<td class="left">ownerGroup</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
|
||
The job Initiator always gets an email unless email notification is disabled.
|
||
|
||
## Analyzing Metadata Statistics<a id="sec-11-8" name="sec-11-8"></a>
|
||
|
||
Note: This service is currently (summer 2021) out of order due to the
|
||
missing JupyterHub environment.
|
||
|
||
### Overview<a id="sec-11-8-1" name="sec-11-8-1"></a>
|
||
|
||
It is possible to analyze the information about datasets amd jobs etc,
|
||
e.g. for statistical purposes. A Jupyterhub based solution was chosen
|
||
as a tool for allowing to do this analysis in a flexible and
|
||
interactive manner. This means you can use Jupyter notebooks to query
|
||
the Data catalog via the API for its data and analyze the results in
|
||
terms of tables and graphs. Example notebooks are provided.
|
||
|
||
### Getting started<a id="sec-11-8-2" name="sec-11-8-2"></a>
|
||
|
||
Simply follow the following link and login with your PSI account:
|
||
<https://jupyterhub.apps.ocp4a.psi.ch/> . The initial start of the
|
||
Jupyter environments takes some time (about 40 seconds), but
|
||
subsequent starts are much faster. You will then see a "bootstrap"
|
||
notebook which you can execute to populate your Jupyter home directory
|
||
with the example notebooks.
|
||
|
||
The example notebooks require you to login to the data catalog API
|
||
server. Here you can again use your personal account, which gives you
|
||
access to all data, for which you have read access (i.e. for which you
|
||
are member of the associated p-group). Beamline managers can also use
|
||
the beamline accounts here in order to get the statistics relevant for
|
||
the whole beamline. You can then look at the example notebooks,
|
||
e.g. datasetAnalyzer.ipynb and run it, look at resulting tables and
|
||
graphs. Afterwards you can optionally adapt the notebooks to your
|
||
needs.
|
||
|
||
Please note, that this service is currently only available as a pilot
|
||
with **no guaranteed availability**. This also means, that you should
|
||
make **regular backups of your own notebooks** which you may develop
|
||
using this tool. For this you can e.g. simply download the notebook
|
||
and copy it to a place for which backup exists, like your home
|
||
directory.
|
||
|
||
## Access to the API (for script developers)<a id="sec-11-9" name="sec-11-9"></a>
|
||
|
||
The data catalog can also be accessed directly via a REST API. There
|
||
exists an API "Explorer" which allows to test such API calls
|
||
conveniently. The explorer can be found at
|
||
<https://dacat-qa.psi.ch/explorer> .The explorer works with a test
|
||
database which is separate from the production database and contains
|
||
other data.
|
||
|
||
For most of the API calls you will need an access token first. You
|
||
create such an access token by "login" to the data catalog via the
|
||
following curl command:
|
||
|
||
# for "functional" accounts
|
||
curl -X POST --header 'Content-Type: application/json' -d '{"username":"YOUR-LOGIN","password":"YOUR-PASSWORD"}' 'https://dacat-qa.psi.ch/api/v3/Users/login'
|
||
|
||
# for normal user accounts
|
||
curl -X POST --header 'Content-Type: application/json' -d '{"username":"YOUR-LOGIN","password":"YOUR-PASSWORD"}' 'https://dacat-qa.psi.ch/auth/msad'
|
||
|
||
# reply if succesful:
|
||
{"id":"NQhe3...","ttl":1209600,"created":"2019-01-22T07:03:21.422Z","userId":"5a745bde4d12b30008020843"}
|
||
|
||
The "id" field contains the access token, which you copy in to the corresponding field at the top of the explorer page.
|
||
|
||
Afterwards you can test the full API. If you found the right API call
|
||
you can finally apply the call to the production system by replacing
|
||
"dacat-qa" by "dacat" and then by retrieving the access token from the
|
||
production system.
|
||
|
||
## Using datasetIngestor inside wrapper scripts (for developers)<a id="sec-11-10" name="sec-11-10"></a>
|
||
|
||
The command datasetIngestor returns with a return code equal zero in
|
||
case the command could be executed succesfully. If the program however
|
||
fails for some reason the return code will be one. Typical examples of
|
||
failures are that files can not be found or not be accessed. Other
|
||
possibilities are that the catalog system is not available,
|
||
e.g. during scheduled maintenance periods. All outputs describing the
|
||
reason for the failure are written to STDERR. Please have a look at
|
||
these outputs to understand what the reason for the failure was. If
|
||
you need help please contact scicatingestor@psi.ch
|
||
|
||
Please note: it is the task of the wrapper scripts to test
|
||
for the return code and to repeat the command once all conditions for
|
||
a succesful execution are fulfilled
|
||
|
||
In case the ingest finishes succesfully the dataset persistent
|
||
identifiers (PID) of the resulting dataset(s) are written to STDOUT,
|
||
one line per dataset.
|
||
|
||
## Ingestion of datasets which should never be published<a id="sec-11-11" name="sec-11-11"></a>
|
||
|
||
For datasets which should never be published you should add the
|
||
following fields at ingest time to your metadata.json file:
|
||
|
||
"datasetlifecycle": {
|
||
"publishable":false,
|
||
"dateOfPublishing":"2099-12-31T00:00:00.000Z",
|
||
"archiveRetentionTime":"2099-12-31T00:00:00.000Z"
|
||
}
|
||
|
||
- this will move the time of publication to a date in some far future
|
||
(2100 in this case)
|
||
|
||
## Retrieving proposal information<a id="sec-11-12" name="sec-11-12"></a>
|
||
|
||
In case you need information about the principal investigator you can
|
||
use the command datasetGetProposal, which returns the proposal
|
||
information for a given ownerGroup
|
||
|
||
/usr/bin/curl -O https://gitlab.psi.ch/scicat/tools/raw/master/linux/datasetGetProposal;chmod +x ./datasetGetProposal
|
||
|
||
## Link to Group specific descriptions<a id="sec-11-13" name="sec-11-13"></a>
|
||
|
||
- BIO department: <https://intranet.psi.ch/BIO/ComputingDataCatalog>
|
||
|
||
## List of known creationLocation for raw data<a id="sec-11-14" name="sec-11-14"></a>
|
||
|
||
The following values for the creationLocation should be used for the
|
||
respective beamlines. They are derived from the identifiers used
|
||
inside the digital user office DUO
|
||
|
||
### SLS<a id="sec-11-14-1" name="sec-11-14-1"></a>
|
||
|
||
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
|
||
|
||
|
||
<colgroup>
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
</colgroup>
|
||
<thead>
|
||
<tr>
|
||
<th scope="col" class="left">Beamline</th>
|
||
<th scope="col" class="left">creationLocation</th>
|
||
<th scope="col" class="left">Ingest Account</th>
|
||
</tr>
|
||
</thead>
|
||
|
||
<tbody>
|
||
<tr>
|
||
<td class="left">Adress-RIXS</td>
|
||
<td class="left">/PSI/SLS/ADRESS-RIXS</td>
|
||
<td class="left">slsadress-rixs</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Adress-SX-ARPES</td>
|
||
<td class="left">/PSI/SLS/ADRESS-SX-ARPES</td>
|
||
<td class="left">slsadress-sx-arpes</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">cSAXS</td>
|
||
<td class="left">/PSI/SLS/CSAXS</td>
|
||
<td class="left">slscsaxs</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Micro-XAS</td>
|
||
<td class="left">/PSI/SLS/MICRO-XAS</td>
|
||
<td class="left">slsmicro-xas</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Micro-XAS-Femto</td>
|
||
<td class="left">/PSI/SLS/MICRO-XAS-FEMTO</td>
|
||
<td class="left">slsmicro-xas-femto</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">MS-Powder</td>
|
||
<td class="left">/PSI/SLS/MS-POWDER</td>
|
||
<td class="left">slsms-powder</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">MS-Surf-Diffr</td>
|
||
<td class="left">/PSI/SLS/MS-SURF-DIFFR</td>
|
||
<td class="left">slsms-surf-diffr</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Nano-XAS</td>
|
||
<td class="left">/PSI/SLS/NANOXAS</td>
|
||
<td class="left">slsnanoxas</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Pearl</td>
|
||
<td class="left">/PSI/SLS/PEARL</td>
|
||
<td class="left">slspearl</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Phoenix</td>
|
||
<td class="left">/PSI/SLS/PHOENIX</td>
|
||
<td class="left">slsphoenix</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Pollux</td>
|
||
<td class="left">/PSI/SLS/POLLUX</td>
|
||
<td class="left">slspollux</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">MX (PX,PXII,PXIII)</td>
|
||
<td class="left">/PSI/SLS/MX</td>
|
||
<td class="left">slsmx</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">SIM</td>
|
||
<td class="left">/PSI/SLS/SIM</td>
|
||
<td class="left">slssim</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Sis-Cophee</td>
|
||
<td class="left">/PSI/SLS/SIS-COPHEE</td>
|
||
<td class="left">slssis-cophee</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Sis-Hrpes</td>
|
||
<td class="left">/PSI/SLS/SIS-HRPES</td>
|
||
<td class="left">slssis-hrpes</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Super-XAS</td>
|
||
<td class="left">/PSI/SLS/SUPER-XAS</td>
|
||
<td class="left">slssuper-xas</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Tomcat</td>
|
||
<td class="left">/PSI/SLS/TOMCAT</td>
|
||
<td class="left">slstomcat</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">VUV</td>
|
||
<td class="left">/PSI/SLS/VUV</td>
|
||
<td class="left">slsvuv</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">XIL-II</td>
|
||
<td class="left">/PSI/SLS/XIL-II</td>
|
||
<td class="left">slsxil-ii</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Xtreme</td>
|
||
<td class="left">/PSI/SLS/XTREME</td>
|
||
<td class="left">slsxtreme</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
|
||
The connected email distribution lists are {ingestAccount}@psi.ch
|
||
|
||
### Swissfel<a id="sec-11-14-2" name="sec-11-14-2"></a>
|
||
|
||
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
|
||
|
||
|
||
<colgroup>
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
</colgroup>
|
||
<thead>
|
||
<tr>
|
||
<th scope="col" class="left">Beamline</th>
|
||
<th scope="col" class="left">creationLocation</th>
|
||
<th scope="col" class="left">Ingest Account</th>
|
||
</tr>
|
||
</thead>
|
||
|
||
<tbody>
|
||
<tr>
|
||
<td class="left">Alvra</td>
|
||
<td class="left">/PSI/SWISSFEL/ARAMIS-ALVRA</td>
|
||
<td class="left">swissfelaramis-alvra</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Bernina</td>
|
||
<td class="left">/PSI/SWISSFEL/ARAMIS-BERNINA</td>
|
||
<td class="left">swissfelaramis-bernina</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Cristallina</td>
|
||
<td class="left">/PSI/SWISSFEL/ARAMIS-CRISTALLINA</td>
|
||
<td class="left">swissfelaramis-cristallina</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Furka</td>
|
||
<td class="left">/PSI/SWISSFEL/ATHOS-FURKA</td>
|
||
<td class="left">swissfelathos-furka</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Maloja</td>
|
||
<td class="left">/PSI/SWISSFEL/ATHOS-MALOJA</td>
|
||
<td class="left">swissfelathos-maloja</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
|
||
The connected email distribution lists are {ingestAccount}@psi.ch
|
||
|
||
### SINQ<a id="sec-11-14-3" name="sec-11-14-3"></a>
|
||
|
||
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
|
||
|
||
|
||
<colgroup>
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
</colgroup>
|
||
<thead>
|
||
<tr>
|
||
<th scope="col" class="left">Instrument</th>
|
||
<th scope="col" class="left">creationLocation</th>
|
||
<th scope="col" class="left">Ingest Account</th>
|
||
</tr>
|
||
</thead>
|
||
|
||
<tbody>
|
||
<tr>
|
||
<td class="left">AMOR</td>
|
||
<td class="left">/PSI/SINQ/AMOR</td>
|
||
<td class="left">sinqamor</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">DMC</td>
|
||
<td class="left">/PSI/SINQ/DMC</td>
|
||
<td class="left">sinqdmc</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">EIGER</td>
|
||
<td class="left">/PSI/SINQ/EIGER</td>
|
||
<td class="left">sinqeiger</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">FOCUS</td>
|
||
<td class="left">/PSI/SINQ/FOCUS</td>
|
||
<td class="left">sinqfocus</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">HRPT</td>
|
||
<td class="left">/PSI/SINQ/HRPT</td>
|
||
<td class="left">sinqhrpt</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">ICON</td>
|
||
<td class="left">/PSI/SINQ/ICON</td>
|
||
<td class="left">sinqicon</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">Morpheus</td>
|
||
<td class="left">/PSI/SINQ/MORPHEUS</td>
|
||
<td class="left">sinqmorpheus</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">NARZISS</td>
|
||
<td class="left">/PSI/SINQ/NARZISS</td>
|
||
<td class="left">sinqnarziss</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">NEUTRA</td>
|
||
<td class="left">/PSI/SINQ/NEUTRA</td>
|
||
<td class="left">sinqneutra</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">POLDI</td>
|
||
<td class="left">/PSI/SINQ/POLDI</td>
|
||
<td class="left">sinqpoldi</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">RITA-II</td>
|
||
<td class="left">/PSI/SINQ/RITA-II</td>
|
||
<td class="left">sinqrita-ii</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">SANS-I</td>
|
||
<td class="left">/PSI/SINQ/SANS-I</td>
|
||
<td class="left">sinqsans-i</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">SANS-II</td>
|
||
<td class="left">/PSI/SINQ/SANS-II</td>
|
||
<td class="left">sinqsans-ii</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">TASP</td>
|
||
<td class="left">/PSI/SINQ/TASP</td>
|
||
<td class="left">sinqtasp</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">ZEBRA</td>
|
||
<td class="left">/PSI/SINQ/ZEBRA</td>
|
||
<td class="left">sinqzebra</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left"> </td>
|
||
<td class="left"> </td>
|
||
<td class="left"> </td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
|
||
The connected email distribution lists are {ingestAccount}@psi.ch
|
||
|
||
### SmuS<a id="sec-11-14-4" name="sec-11-14-4"></a>
|
||
|
||
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
|
||
|
||
|
||
<colgroup>
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
</colgroup>
|
||
<thead>
|
||
<tr>
|
||
<th scope="col" class="left">Instrument</th>
|
||
<th scope="col" class="left">creationLocation</th>
|
||
<th scope="col" class="left">Ingest Account</th>
|
||
</tr>
|
||
</thead>
|
||
|
||
<tbody>
|
||
<tr>
|
||
<td class="left">Dolly</td>
|
||
<td class="left">/PSI/SMUS/DOLLY</td>
|
||
<td class="left">smusdolly</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">GPD</td>
|
||
<td class="left">/PSI/SMUS/GPD</td>
|
||
<td class="left">smusgpd</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">GPS</td>
|
||
<td class="left">/PSI/SMUS/GPS</td>
|
||
<td class="left">smusgps</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">HAL-9500</td>
|
||
<td class="left">/PSI/SMUS/HAL-9500</td>
|
||
<td class="left">smushal-9500</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">LEM</td>
|
||
<td class="left">/PSI/SMUS/LEM</td>
|
||
<td class="left">smuslem</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">FLAME</td>
|
||
<td class="left">/PSI/SMUS/FLAME</td>
|
||
<td class="left">smusflame</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
|
||
The connected email distribution lists are {ingestAccount}@psi.ch
|
||
|
||
# Update History of Ingest Manual<a id="sec-12" name="sec-12"></a>
|
||
|
||
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
|
||
|
||
|
||
<colgroup>
|
||
<col class="left" />
|
||
|
||
<col class="left" />
|
||
</colgroup>
|
||
<thead>
|
||
<tr>
|
||
<th scope="col" class="left">Date</th>
|
||
<th scope="col" class="left">Updates</th>
|
||
</tr>
|
||
</thead>
|
||
|
||
<tbody>
|
||
<tr>
|
||
<td class="left">10. September 2018</td>
|
||
<td class="left">Initial Release</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">6. October 2018</td>
|
||
<td class="left">Added warning section to not modify data after ingest</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">10. October 2018</td>
|
||
<td class="left">ownerGroup field must be defined explicitly</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">28. October 2018</td>
|
||
<td class="left">Added section on datasetRetriever tool</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">20. November 2018</td>
|
||
<td class="left">Remove ssh key handling description (use Kerberos)</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">3. December 2018</td>
|
||
<td class="left">Restructure archive stepp, add autoarchive flag</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">17. January 2019</td>
|
||
<td class="left">Update on automatically filled values, more options for datasetIngestor</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">22. January 2019</td>
|
||
<td class="left">Added description for API access for script developers, 2 new commands</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left"> </td>
|
||
<td class="left">datasetArchiver and datasetGetProposal</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">22. February 2019</td>
|
||
<td class="left">Added known beamlines(instruments (creationLocation) value list</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">24. February 2019</td>
|
||
<td class="left">datasetIngestor use cases for automated ingests using beamline accounts</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">23. April 2019</td>
|
||
<td class="left">Added AFS infos and available central storage, need for Kerberos tickets</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">23. April 2019</td>
|
||
<td class="left">Availability of commands on RA cluster via pmodules</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">3. May 2019</td>
|
||
<td class="left">Added size limitation infos</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">9. May 2019</td>
|
||
<td class="left">Added hints for accessGroups definition for derived data</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left"> </td>
|
||
<td class="left">Added infos about email notifications</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">10. May 2019</td>
|
||
<td class="left">Added ownerGroup filtered retrieve option, decentral case auto detect</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">7. Juni 2019</td>
|
||
<td class="left">Feedback from Manuel added</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">21. Oct 2019</td>
|
||
<td class="left">New version of CLI tools to deal with edge cases (blanks in sourcefolder</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left"> </td>
|
||
<td class="left">dangling links, ingest for other person, need for kerberos ticket as user)</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">14. November 2019</td>
|
||
<td class="left">Restructuring of manual,New CLI tools, auto kinit login</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left"> </td>
|
||
<td class="left">Progress indicators, chksum test updated</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">20. Januar 2020</td>
|
||
<td class="left">Auto fill principalInvestigator if missing</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">3. March 2020</td>
|
||
<td class="left">Added Jupyter notebook analysis section</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">5. March 2020</td>
|
||
<td class="left">Add hint for datasets not to be published</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">19. March 2020</td>
|
||
<td class="left">Added hint that analysis Jupyter tool is in pilot phase only</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">19. March 2020</td>
|
||
<td class="left">Added recommendation concerning unit handling for physical quantities</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">9. July 2020</td>
|
||
<td class="left">Added GUI tool SciCatArchiver (developer: Klaus Wakonig)</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">11. July 2020</td>
|
||
<td class="left">Installation of SciCatArchiver on non-Ra system</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">14. July 2020</td>
|
||
<td class="left">Added publication workflow and recommended file structure chapter</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">16. July 2020</td>
|
||
<td class="left">Updated SciCat GUI deployment information</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">31. July 2020</td>
|
||
<td class="left">New deploy location, + policy parameters, new recommended file structure</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">27. August 2020</td>
|
||
<td class="left">Added Windows Support information</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">10. Sept 2020</td>
|
||
<td class="left">Corrected example JSON syntax in one location</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">23. November 2020</td>
|
||
<td class="left">Corrected instructions for using the SciCat GUI on Windows 10</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">19. February 2020</td>
|
||
<td class="left">Added info about proposalId link</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">24. Juni 2021</td>
|
||
<td class="left">Major restructuring of full document for easier readability</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">9. Dec 2021</td>
|
||
<td class="left">Corrected spelling of value/units convention</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">23. April 2022</td>
|
||
<td class="left">Added hint to use -token option for CLI and SciCat GUI as normal user</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">2. Dec 2022</td>
|
||
<td class="left">Extended ingest use cases description of needed parameters Win+Linux</td>
|
||
</tr>
|
||
|
||
|
||
<tr>
|
||
<td class="left">21. Dec 2023</td>
|
||
<td class="left">Include redundancy risks and costs and file names limitations</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|