initial formatting changes complete
This commit is contained in:
@@ -41,20 +41,21 @@ Archiving can be done from any node accessible by the users (usually from the lo
|
||||
Below are the main steps for using the Data Catalog.
|
||||
|
||||
* Ingest the dataset into the Data Catalog. This makes the data known to the Data Catalog system at PSI:
|
||||
* Prepare a metadata file describing the dataset
|
||||
* Run **`datasetIngestor`** script
|
||||
* If necessary, the script will copy the data to the PSI archive servers
|
||||
* Prepare a metadata file describing the dataset
|
||||
* Run **`datasetIngestor`** script
|
||||
* If necessary, the script will copy the data to the PSI archive servers
|
||||
* Usually this is necessary when archiving from directories other than **`/data/user`** or
|
||||
**`/data/project`**. It would be also necessary when the Merlin export server (**`merlin-archive.psi.ch`**)
|
||||
is down for any reason.
|
||||
* Archive the dataset:
|
||||
* Visit [https://discovery.psi.ch](https://discovery.psi.ch)
|
||||
* Click **`Archive`** for the dataset
|
||||
* The system will now copy the data to the PetaByte Archive at CSCS
|
||||
* Visit [<https://discovery.psi.ch](https://discovery.psi.ch>)
|
||||
* Click **`Archive`** for the dataset
|
||||
* The system will now copy the data to the PetaByte Archive at CSCS
|
||||
|
||||
* Retrieve data from the catalog:
|
||||
* Find the dataset on [https://discovery.psi.ch](https://discovery.psi.ch) and click **`Retrieve`**
|
||||
* Wait for the data to be copied to the PSI retrieval system
|
||||
* Run **`datasetRetriever`** script
|
||||
* Find the dataset on [<https://discovery.psi.ch](https://discovery.psi.ch>) and click **`Retrieve`**
|
||||
* Wait for the data to be copied to the PSI retrieval system
|
||||
* Run **`datasetRetriever`** script
|
||||
|
||||
Since large data sets may take a lot of time to transfer, some steps are
|
||||
designed to happen in the background. The discovery website can be used to
|
||||
@@ -246,7 +247,7 @@ step will take a long time and may appear to have hung. You can check what files
|
||||
where UID is the dataset ID (12345678-1234-1234-1234-123456789012) and PATH is the absolute path to your data. Note that rsync creates directories first and that the transfer order is not alphabetical in some cases, but it should be possible to see whether any data has transferred.
|
||||
|
||||
* There is currently a limit on the number of files per dataset (technically, the limit is from the total length of all file paths). It is recommended to break up datasets into 300'000 files or less.
|
||||
* If it is not possible or desirable to split data between multiple datasets, an alternate work-around is to package files into a tarball. For datasets which are already compressed, omit the -z option for a considerable speedup:
|
||||
* If it is not possible or desirable to split data between multiple datasets, an alternate work-around is to package files into a tarball. For datasets which are already compressed, omit the -z option for a considerable speedup:
|
||||
|
||||
```bash
|
||||
tar -f [output].tar [srcdir]
|
||||
@@ -266,7 +267,6 @@ step will take a long time and may appear to have hung. You can check what files
|
||||
/data/project/bio/myproject/archive $ datasetIngestor -copy -autoarchive -allowexistingsource -ingest metadata.json
|
||||
2019/11/06 11:04:43 Latest version: 1.1.11
|
||||
|
||||
|
||||
2019/11/06 11:04:43 Your version of this program is up-to-date
|
||||
2019/11/06 11:04:43 You are about to add a dataset to the === production === data catalog environment...
|
||||
2019/11/06 11:04:43 Your username:
|
||||
@@ -316,7 +316,6 @@ user_n@pb-archive.psi.ch's password:
|
||||
2019/11/06 11:05:04 The source folder /data/project/bio/myproject/archive is not centrally available (decentral use case).
|
||||
The data must first be copied to a rsync cache server.
|
||||
|
||||
|
||||
2019/11/06 11:05:04 Do you want to continue (Y/n)?
|
||||
Y
|
||||
2019/11/06 11:05:09 Created dataset with id 12.345.67890/12345678-1234-1234-1234-123456789012
|
||||
|
||||
Reference in New Issue
Block a user