[Show Example]: Sample ingestion output (datasetIngestor 1.1.11)
/data/project/bio/myproject/archive $ datasetIngestor -copy -autoarchive -allowexistingsource -ingest metadata.json
2019/11/06 11:04:43 Latest version: 1.1.11
2019/11/06 11:04:43 Your version of this program is up-to-date
2019/11/06 11:04:43 You are about to add a dataset to the === production === data catalog environment...
2019/11/06 11:04:43 Your username:
user_n
2019/11/06 11:04:48 Your password:
2019/11/06 11:04:52 User authenticated: XXX
2019/11/06 11:04:52 User is member in following a or p groups: XXX
2019/11/06 11:04:52 OwnerGroup information a-XXX verified successfully.
2019/11/06 11:04:52 contactEmail field added: XXX
2019/11/06 11:04:52 Scanning files in dataset /data/project/bio/myproject/archive
2019/11/06 11:04:52 No explicit filelistingPath defined - full folder /data/project/bio/myproject/archive is used.
2019/11/06 11:04:52 Source Folder: /data/project/bio/myproject/archive at /data/project/bio/myproject/archive
2019/11/06 11:04:57 The dataset contains 100000 files with a total size of 50000000000 bytes.
2019/11/06 11:04:57 creationTime field added: 2019-07-29 18:47:08 +0200 CEST
2019/11/06 11:04:57 endTime field added: 2019-11-06 10:52:17.256033 +0100 CET
2019/11/06 11:04:57 license field added: CC BY-SA 4.0
2019/11/06 11:04:57 isPublished field added: false
2019/11/06 11:04:57 classification field added: IN=medium,AV=low,CO=low
2019/11/06 11:04:57 Updated metadata object:
{
"accessGroups": [
"XXX"
],
"classification": "IN=medium,AV=low,CO=low",
"contactEmail": "XXX",
"creationLocation": "XXX",
"creationTime": "2019-07-29T18:47:08+02:00",
"dataFormat": "XXX",
"description": "XXX",
"endTime": "2019-11-06T10:52:17.256033+01:00",
"isPublished": false,
"license": "CC BY-SA 4.0",
"owner": "XXX",
"ownerEmail": "XXX",
"ownerGroup": "a-XXX",
"principalInvestigator": "XXX",
"scientificMetadata": {
...
},
"sourceFolder": "/data/project/bio/myproject/archive",
"type": "raw"
}
2019/11/06 11:04:57 Running [/usr/bin/ssh -l user_n pb-archive.psi.ch test -d /data/project/bio/myproject/archive].
key_cert_check_authority: invalid certificate
Certificate invalid: name is not a listed principal
user_n@pb-archive.psi.ch's password:
2019/11/06 11:05:04 The source folder /data/project/bio/myproject/archive is not centrally available (decentral use case).
The data must first be copied to a rsync cache server.
2019/11/06 11:05:04 Do you want to continue (Y/n)?
Y
2019/11/06 11:05:09 Created dataset with id 12.345.67890/12345678-1234-1234-1234-123456789012
2019/11/06 11:05:09 The dataset contains 108057 files.
2019/11/06 11:05:10 Created file block 0 from file 0 to 1000 with total size of 413229990 bytes
2019/11/06 11:05:10 Created file block 1 from file 1000 to 2000 with total size of 416024000 bytes
2019/11/06 11:05:10 Created file block 2 from file 2000 to 3000 with total size of 416024000 bytes
2019/11/06 11:05:10 Created file block 3 from file 3000 to 4000 with total size of 416024000 bytes
...
2019/11/06 11:05:26 Created file block 105 from file 105000 to 106000 with total size of 416024000 bytes
2019/11/06 11:05:27 Created file block 106 from file 106000 to 107000 with total size of 416024000 bytes
2019/11/06 11:05:27 Created file block 107 from file 107000 to 108000 with total size of 850195143 bytes
2019/11/06 11:05:27 Created file block 108 from file 108000 to 108057 with total size of 151904903 bytes
2019/11/06 11:05:27 short dataset id: 0a9fe316-c9e7-4cc5-8856-e1346dd31e31
2019/11/06 11:05:27 Running [/usr/bin/rsync -e ssh -avxz /data/project/bio/myproject/archive/ user_n@pb-archive.psi.ch:archive
/0a9fe316-c9e7-4cc5-8856-e1346dd31e31/data/project/bio/myproject/archive].
key_cert_check_authority: invalid certificate
Certificate invalid: name is not a listed principal
user_n@pb-archive.psi.ch's password:
Permission denied, please try again.
user_n@pb-archive.psi.ch's password:
/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied
/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied
/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied
/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied
/usr/libexec/test_acl.sh: line 30: /tmp/tmpacl.txt: Permission denied
...
2019/11/06 12:05:08 Successfully updated {"pid":"12.345.67890/12345678-1234-1234-1234-123456789012",...}
2019/11/06 12:05:08 Submitting Archive Job for the ingested datasets.
2019/11/06 12:05:08 Job response Status: okay
2019/11/06 12:05:08 A confirmation email will be sent to XXX
12.345.67890/12345678-1234-1234-1234-123456789012
### Publishing
After datasets are are ingested they can be assigned a public DOI. This can be included in publications and will make the datasets on http://doi.psi.ch.
For instructions on this, please read the ['Publish' section in the ingest manual](https://scicatproject.github.io/documentation/Ingestor/ingestManual.html#sec-8).
### Retrieving data
Retrieving data from the archive is also initiated through the Data Catalog. Please read the ['Retrieve' section in the ingest manual](https://scicatproject.github.io/documentation/Ingestor/ingestManual.html#sec-6).
## Further Information
* [PSI Data Catalog](https://discovery.psi.ch)
* [Full Documentation](https://scicatproject.github.io/documentation/Ingestor/ingestManual.html)
* [Published Datasets (doi.psi.ch)](https://doi.psi.ch)
* Data Catalog [PSI page](https://www.psi.ch/photon-science-data-services/data-catalog-and-archive)
* Data catalog [SciCat Software](https://scicatproject.github.io/)
* [FAIR](https://www.nature.com/articles/sdata201618) definition and [SNF Research Policy](http://www.snf.ch/en/theSNSF/research-policies/open_research_data/Pages/default.aspx#FAIR%20Data%20Principles%20for%20Research%20Data%20Management)
* [Petabyte Archive at CSCS](https://www.cscs.ch/fileadmin/user_upload/contents_publications/annual_reports/AR2017_Online.pdf)