30 lines
1.3 KiB
Markdown
30 lines
1.3 KiB
Markdown
# PCIe Bus Error
|
|
|
|
When there are PCI Express bus errors like
|
|
```bash
|
|
Oct 05 11:26:19 pc16209.psi.ch kernel: pcieport 10000:e0:06.0: AER: TLP Header: 34000000 e1000010 89148914 00000000
|
|
Oct 05 11:26:19 pc16209.psi.ch kernel: pcieport 10000:e0:06.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
|
|
Oct 05 11:26:19 pc16209.psi.ch kernel: pcieport 10000:e0:06.0: device [8086:464d] error status/mask=00100000/00010000
|
|
Oct 05 11:26:19 pc16209.psi.ch kernel: pcieport 10000:e0:06.0: [20] UnsupReq (First)
|
|
```
|
|
or
|
|
```
|
|
Jan 31 09:29:29 pc16422.psi.ch kernel: pcieport 10000:e0:06.0: AER: can't find device of ID0030
|
|
Jan 31 09:29:29 pc16422.psi.ch kernel: pcieport 10000:e0:06.0: AER: Uncorrected (Non-Fatal) error received: 10000:00:06.0
|
|
```
|
|
This are AER (Advanced Error Reporting) messages of the PCIe chip.
|
|
|
|
One thing you might try is disabling **Active State Power Management** (ASPM) in the kernel.
|
|
|
|
To do so set in Hiera
|
|
|
|
```yaml
|
|
base::enable_pcie_aspm: false
|
|
```
|
|
|
|
the apply it with `puppet agent -t` and reboot.
|
|
|
|
AER: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/assembly_advanced-error-reporting_managing-monitoring-and-updating-the-kernel
|
|
|
|
Source: https://www.thomas-krenn.com/de/wiki/PCIe_Bus_Error_Status_00001100_beheben
|