initial formatting changes complete

This commit is contained in:
2026-01-06 16:40:15 +01:00
parent 173f822230
commit 5f759a629a
81 changed files with 806 additions and 1113 deletions

View File

@@ -166,20 +166,20 @@ sjstat
Scheduling pool data:
----------------------------------------------------------------------------------
Total Usable Free Node Time Other
Pool Memory Cpus Nodes Nodes Nodes Limit Limit traits
Total Usable Free Node Time Other
Pool Memory Cpus Nodes Nodes Nodes Limit Limit traits
----------------------------------------------------------------------------------
test 373502Mb 88 6 6 1 UNLIM 1-00:00:00
general* 373502Mb 88 66 66 8 50 7-00:00:00
daily 373502Mb 88 72 72 9 60 1-00:00:00
hourly 373502Mb 88 72 72 9 UNLIM 01:00:00
gpu 128000Mb 8 1 1 0 UNLIM 7-00:00:00
gpu 128000Mb 20 8 8 0 UNLIM 7-00:00:00
test 373502Mb 88 6 6 1 UNLIM 1-00:00:00
general* 373502Mb 88 66 66 8 50 7-00:00:00
daily 373502Mb 88 72 72 9 60 1-00:00:00
hourly 373502Mb 88 72 72 9 UNLIM 01:00:00
gpu 128000Mb 8 1 1 0 UNLIM 7-00:00:00
gpu 128000Mb 20 8 8 0 UNLIM 7-00:00:00
Running job data:
---------------------------------------------------------------------------------------------------
Time Time Time
JobID User Procs Pool Status Used Limit Started Master/Other
Time Time Time
JobID User Procs Pool Status Used Limit Started Master/Other
---------------------------------------------------------------------------------------------------
13433377 collu_g 1 gpu PD 0:00 24:00:00 N/A (Resources)
13433389 collu_g 20 gpu PD 0:00 24:00:00 N/A (Resources)
@@ -249,11 +249,10 @@ sview
!['sview' graphical user interface](../../images/slurm/sview.png)
## General Monitoring
The following pages contain basic monitoring for Slurm and computing nodes.
Currently, monitoring is based on Grafana + InfluxDB. In the future it will
The following pages contain basic monitoring for Slurm and computing nodes.
Currently, monitoring is based on Grafana + InfluxDB. In the future it will
be moved to a different service based on ElasticSearch + LogStash + Kibana.
In the meantime, the following monitoring pages are available in a best effort
@@ -262,17 +261,17 @@ support:
### Merlin6 Monitoring Pages
* Slurm monitoring:
* ***[Merlin6 Slurm Statistics - XDMOD](https://merlin-slurmmon01.psi.ch/)***
* [Merlin6 Slurm Live Status](https://hpc-monitor02.psi.ch/d/QNcbW1AZk/merlin6-slurm-live-status?orgId=1&refresh=10s)
* [Merlin6 Slurm Overview](https://hpc-monitor02.psi.ch/d/94UxWJ0Zz/merlin6-slurm-overview?orgId=1&refresh=10s)
* ***[Merlin6 Slurm Statistics - XDMOD](https://merlin-slurmmon01.psi.ch/)***
* [Merlin6 Slurm Live Status](https://hpc-monitor02.psi.ch/d/QNcbW1AZk/merlin6-slurm-live-status?orgId=1&refresh=10s)
* [Merlin6 Slurm Overview](https://hpc-monitor02.psi.ch/d/94UxWJ0Zz/merlin6-slurm-overview?orgId=1&refresh=10s)
* Nodes monitoring:
* [Merlin6 CPU Nodes Overview](https://hpc-monitor02.psi.ch/d/JmvLR8gZz/merlin6-computing-cpu-nodes?orgId=1&refresh=10s)
* [Merlin6 GPU Nodes Overview](https://hpc-monitor02.psi.ch/d/gOo1Z10Wk/merlin6-computing-gpu-nodes?orgId=1&refresh=10s)
* [Merlin6 CPU Nodes Overview](https://hpc-monitor02.psi.ch/d/JmvLR8gZz/merlin6-computing-cpu-nodes?orgId=1&refresh=10s)
* [Merlin6 GPU Nodes Overview](https://hpc-monitor02.psi.ch/d/gOo1Z10Wk/merlin6-computing-gpu-nodes?orgId=1&refresh=10s)
### Merlin5 Monitoring Pages
* Slurm monitoring:
* [Merlin5 Slurm Live Status](https://hpc-monitor02.psi.ch/d/o8msZJ0Zz/merlin5-slurm-live-status?orgId=1&refresh=10s)
* [Merlin5 Slurm Overview](https://hpc-monitor02.psi.ch/d/eWLEW1AWz/merlin5-slurm-overview?orgId=1&refresh=10s)
* [Merlin5 Slurm Live Status](https://hpc-monitor02.psi.ch/d/o8msZJ0Zz/merlin5-slurm-live-status?orgId=1&refresh=10s)
* [Merlin5 Slurm Overview](https://hpc-monitor02.psi.ch/d/eWLEW1AWz/merlin5-slurm-overview?orgId=1&refresh=10s)
* Nodes monitoring:
* [Merlin5 CPU Nodes Overview](https://hpc-monitor02.psi.ch/d/ejTyWJAWk/merlin5-computing-cpu-nodes?orgId=1&refresh=10s)
* [Merlin5 CPU Nodes Overview](https://hpc-monitor02.psi.ch/d/ejTyWJAWk/merlin5-computing-cpu-nodes?orgId=1&refresh=10s)