Document common statuses

This commit is contained in:
Spencer Bliven
2019-07-29 15:47:53 +02:00
parent b7b52fbdce
commit 51400e382f

View File

@ -170,3 +170,30 @@ The following template should be used by any user submitting jobs to GPU nodes:
##SBATCH --ntasks-per-node=44 # Uncomment and specify number of tasks per node
##SBATCH --cpus-per-task=44 # Uncomment and specify the number of cores per task
```
## Job status
The status of submitted jobs can be check with the `squeue` command:
```
~ $ squeue -u bliven_s
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
134507729 gpu test_scr bliven_s PD 0:00 3 (AssocGrpNodeLimit)
134507768 general test_scr bliven_s PD 0:00 19 (AssocGrpCpuLimit)
134507729 gpu test_scr bliven_s PD 0:00 3 (Resources)
134506301 gpu test_scr bliven_s PD 0:00 1 (Priority)
134506288 gpu test_scr bliven_s R 9:16 1 merlin-g-008
```
Common Statuses:
- *merlin-\** Running on the specified host
- *(Priority)* Waiting in the queue
- *(Resources)* At the head of the queue, waiting for machines to become available
- *(AssocGrpCpuLimit), (AssocGrpNodeLimit)* Job would exceed per-user limitations on
the number of simultaneous CPUs/Nodes. Use `scancel` to remove the job and
resubmit with fewer resources, or else wait for your other jobs to finish.
- *(PartitionNodeLimit)* Exceeds all resources available on this partition.
Run `scancel` and resubmit to a different partition (`-p`) or with fewer
resources.