Document common statuses
This commit is contained in:
@ -170,3 +170,30 @@ The following template should be used by any user submitting jobs to GPU nodes:
|
|||||||
##SBATCH --ntasks-per-node=44 # Uncomment and specify number of tasks per node
|
##SBATCH --ntasks-per-node=44 # Uncomment and specify number of tasks per node
|
||||||
##SBATCH --cpus-per-task=44 # Uncomment and specify the number of cores per task
|
##SBATCH --cpus-per-task=44 # Uncomment and specify the number of cores per task
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
|
## Job status
|
||||||
|
|
||||||
|
The status of submitted jobs can be check with the `squeue` command:
|
||||||
|
|
||||||
|
```
|
||||||
|
~ $ squeue -u bliven_s
|
||||||
|
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
|
||||||
|
134507729 gpu test_scr bliven_s PD 0:00 3 (AssocGrpNodeLimit)
|
||||||
|
134507768 general test_scr bliven_s PD 0:00 19 (AssocGrpCpuLimit)
|
||||||
|
134507729 gpu test_scr bliven_s PD 0:00 3 (Resources)
|
||||||
|
134506301 gpu test_scr bliven_s PD 0:00 1 (Priority)
|
||||||
|
134506288 gpu test_scr bliven_s R 9:16 1 merlin-g-008
|
||||||
|
```
|
||||||
|
|
||||||
|
Common Statuses:
|
||||||
|
- *merlin-\** Running on the specified host
|
||||||
|
- *(Priority)* Waiting in the queue
|
||||||
|
- *(Resources)* At the head of the queue, waiting for machines to become available
|
||||||
|
- *(AssocGrpCpuLimit), (AssocGrpNodeLimit)* Job would exceed per-user limitations on
|
||||||
|
the number of simultaneous CPUs/Nodes. Use `scancel` to remove the job and
|
||||||
|
resubmit with fewer resources, or else wait for your other jobs to finish.
|
||||||
|
- *(PartitionNodeLimit)* Exceeds all resources available on this partition.
|
||||||
|
Run `scancel` and resubmit to a different partition (`-p`) or with fewer
|
||||||
|
resources.
|
||||||
|
|
||||||
|
Reference in New Issue
Block a user