Document common statuses
This commit is contained in:
@ -170,3 +170,30 @@ The following template should be used by any user submitting jobs to GPU nodes:
|
||||
##SBATCH --ntasks-per-node=44 # Uncomment and specify number of tasks per node
|
||||
##SBATCH --cpus-per-task=44 # Uncomment and specify the number of cores per task
|
||||
```
|
||||
|
||||
|
||||
## Job status
|
||||
|
||||
The status of submitted jobs can be check with the `squeue` command:
|
||||
|
||||
```
|
||||
~ $ squeue -u bliven_s
|
||||
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
|
||||
134507729 gpu test_scr bliven_s PD 0:00 3 (AssocGrpNodeLimit)
|
||||
134507768 general test_scr bliven_s PD 0:00 19 (AssocGrpCpuLimit)
|
||||
134507729 gpu test_scr bliven_s PD 0:00 3 (Resources)
|
||||
134506301 gpu test_scr bliven_s PD 0:00 1 (Priority)
|
||||
134506288 gpu test_scr bliven_s R 9:16 1 merlin-g-008
|
||||
```
|
||||
|
||||
Common Statuses:
|
||||
- *merlin-\** Running on the specified host
|
||||
- *(Priority)* Waiting in the queue
|
||||
- *(Resources)* At the head of the queue, waiting for machines to become available
|
||||
- *(AssocGrpCpuLimit), (AssocGrpNodeLimit)* Job would exceed per-user limitations on
|
||||
the number of simultaneous CPUs/Nodes. Use `scancel` to remove the job and
|
||||
resubmit with fewer resources, or else wait for your other jobs to finish.
|
||||
- *(PartitionNodeLimit)* Exceeds all resources available on this partition.
|
||||
Run `scancel` and resubmit to a different partition (`-p`) or with fewer
|
||||
resources.
|
||||
|
||||
|
Reference in New Issue
Block a user