Files
gitea-pages/admin-guide/troubleshooting/pcie_bus_error.md
T
2024-08-07 16:34:32 +02:00

30 lines
1.3 KiB
Markdown

# PCIe Bus Error
When there are PCI Express bus errors like
```bash
Oct 05 11:26:19 pc16209.psi.ch kernel: pcieport 10000:e0:06.0: AER: TLP Header: 34000000 e1000010 89148914 00000000
Oct 05 11:26:19 pc16209.psi.ch kernel: pcieport 10000:e0:06.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
Oct 05 11:26:19 pc16209.psi.ch kernel: pcieport 10000:e0:06.0: device [8086:464d] error status/mask=00100000/00010000
Oct 05 11:26:19 pc16209.psi.ch kernel: pcieport 10000:e0:06.0: [20] UnsupReq (First)
```
or
```
Jan 31 09:29:29 pc16422.psi.ch kernel: pcieport 10000:e0:06.0: AER: can't find device of ID0030
Jan 31 09:29:29 pc16422.psi.ch kernel: pcieport 10000:e0:06.0: AER: Uncorrected (Non-Fatal) error received: 10000:00:06.0
```
This are AER (Advanced Error Reporting) messages of the PCIe chip.
One thing you might try is disabling **Active State Power Management** (ASPM) in the kernel.
To do so set in Hiera
```yaml
base::enable_pcie_aspm: false
```
the apply it with `puppet agent -t` and reboot.
AER: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/assembly_advanced-error-reporting_managing-monitoring-and-updating-the-kernel
Source: https://www.thomas-krenn.com/de/wiki/PCIe_Bus_Error_Status_00001100_beheben