Files
gitea-pages/admin-guide/troubleshooting/pcie_bus_error.md
2024-08-07 16:34:32 +02:00

1.3 KiB

PCIe Bus Error

When there are PCI Express bus errors like

Oct 05 11:26:19 pc16209.psi.ch kernel: pcieport 10000:e0:06.0: AER:   TLP Header:  34000000 e1000010 89148914 00000000
Oct 05 11:26:19 pc16209.psi.ch kernel: pcieport 10000:e0:06.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
Oct 05 11:26:19 pc16209.psi.ch kernel: pcieport 10000:e0:06.0:   device [8086:464d] error status/mask=00100000/00010000
Oct 05 11:26:19 pc16209.psi.ch kernel: pcieport 10000:e0:06.0:      [20] UnsupReq          (First)

or

Jan 31 09:29:29 pc16422.psi.ch kernel: pcieport 10000:e0:06.0: AER: can't find device of ID0030
Jan 31 09:29:29 pc16422.psi.ch kernel: pcieport 10000:e0:06.0: AER: Uncorrected (Non-Fatal) error received: 10000:00:06.0

This are AER (Advanced Error Reporting) messages of the PCIe chip.

One thing you might try is disabling Active State Power Management (ASPM) in the kernel.

To do so set in Hiera

base::enable_pcie_aspm: false

the apply it with puppet agent -t and reboot.

AER: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/assembly_advanced-error-reporting_managing-monitoring-and-updating-the-kernel

Source: https://www.thomas-krenn.com/de/wiki/PCIe_Bus_Error_Status_00001100_beheben