[PATCH 0/6 v6] Make ELOG and GHES log and trace consistently

Fabio M. De Francesco posted 6 patches 3 months, 2 weeks ago
There is a newer version of this series
drivers/acpi/Kconfig       |  1 +
drivers/acpi/acpi_extlog.c | 60 ++++++++++++++++++++++++++++++++++++
drivers/acpi/apei/Kconfig  |  1 +
drivers/acpi/apei/ghes.c   | 62 +++++++++++++++++++++++++-------------
drivers/cxl/core/ras.c     |  6 ++++
drivers/pci/pcie/aer.c     |  2 +-
include/cxl/event.h        | 22 ++++++++++++++
7 files changed, 132 insertions(+), 22 deletions(-)
[PATCH 0/6 v6] Make ELOG and GHES log and trace consistently
Posted by Fabio M. De Francesco 3 months, 2 weeks ago
When Firmware First is enabled, BIOS handles errors first and then it
makes them available to the kernel via the Common Platform Error Record
(CPER) sections (UEFI 2.10 Appendix N). Linux parses the CPER sections
via one of two similar paths, either ELOG or GHES.

Currently, ELOG and GHES show some inconsistencies in how they print to
the kernel log as well as in how they report to userspace via trace
events.

Make the two mentioned paths act similarly for what relates to logging
and tracing.

--- Changes for v6 ---

	- Rename the helper that copies the CPER CXL protocol error
	  information to work struct (Dave)
	- Return -EOPNOTSUPP (instead of -EINVAL) from the two helpers if
	  ACPI_APEI_PCIEAER is not defined (Dave)

--- Changes for v5 ---

	- Add 3/6 to select ACPI_APEI_PCIEAER for GHES
	- Add 4,5/6 to move common code between ELOG and GHES out to new
	  helpers use them in 6/6 (Jonathan).

--- Changes for v4 ---

	- Re-base on top of recent changes of the AER error logging and
	  drop obsoleted 2/4 (Sathyanarayanan)
	- Log with pr_warn_ratelimited() (Dave)
	- Collect tags

--- Changes for v3 ---

    1/4, 2/4:
	- collect tags; no functional changes
    3/4:
	- Invert logic of checks (Yazen)
	- Select CONFIG_ACPI_APEI_PCIEAER (Yazen)
    4/4:
	- Check serial number only for CXL devices (Yazen)
	- Replace "invalid" with "unknown" in the output of a pr_err()
	  (Yazen)
	
--- Changes for v2 ---

	- Add a patch to pass log levels to pci_print_aer() (Dan)
	- Add a patch to trace CPER CXL Protocol Errors
	- Rework commit messages (Dan)
	- Use log_non_standard_event() (Bjorn)

--- Changes for v1 ---

	- Drop the RFC prefix and restart from PATCH v1
	- Drop patch 3/3 because a discussion on it has not yet been
	  settled
	- Drop namespacing in export of pci_print_aer while() (Dan)
	- Don't use '#ifdef' in *.c files (Dan)
	- Drop a reference on pdev after operation is complete (Dan)
	- Don't log an error message if pdev is NULL (Dan)

Fabio M. De Francesco (6):
  ACPI: extlog: Trace CPER Non-standard Section Body
  ACPI: extlog: Trace CPER PCI Express Error Section
  acpi/ghes: Make GHES select ACPI_APEI_PCIEAER
  acpi/ghes: Add helper for CPER CXL protocol errors validity checks
  acpi/ghes: Add helper to copy CPER CXL protocol error information to
    work struct
  ACPI: extlog: Trace CPER CXL Protocol Error Section

 drivers/acpi/Kconfig       |  1 +
 drivers/acpi/acpi_extlog.c | 60 ++++++++++++++++++++++++++++++++++++
 drivers/acpi/apei/Kconfig  |  1 +
 drivers/acpi/apei/ghes.c   | 62 +++++++++++++++++++++++++-------------
 drivers/cxl/core/ras.c     |  6 ++++
 drivers/pci/pcie/aer.c     |  2 +-
 include/cxl/event.h        | 22 ++++++++++++++
 7 files changed, 132 insertions(+), 22 deletions(-)


base-commit: 552c50713f273b494ac6c77052032a49bc9255e2
-- 
2.51.0
Re: [PATCH 0/6 v6] Make ELOG and GHES log and trace consistently
Posted by Rafael J. Wysocki 3 months, 1 week ago
On Thu, Oct 23, 2025 at 2:26 PM Fabio M. De Francesco
<fabio.m.de.francesco@linux.intel.com> wrote:
>
> When Firmware First is enabled, BIOS handles errors first and then it
> makes them available to the kernel via the Common Platform Error Record
> (CPER) sections (UEFI 2.10 Appendix N). Linux parses the CPER sections
> via one of two similar paths, either ELOG or GHES.
>
> Currently, ELOG and GHES show some inconsistencies in how they print to
> the kernel log as well as in how they report to userspace via trace
> events.
>
> Make the two mentioned paths act similarly for what relates to logging
> and tracing.
>
> --- Changes for v6 ---
>
>         - Rename the helper that copies the CPER CXL protocol error
>           information to work struct (Dave)
>         - Return -EOPNOTSUPP (instead of -EINVAL) from the two helpers if
>           ACPI_APEI_PCIEAER is not defined (Dave)
>
> --- Changes for v5 ---
>
>         - Add 3/6 to select ACPI_APEI_PCIEAER for GHES
>         - Add 4,5/6 to move common code between ELOG and GHES out to new
>           helpers use them in 6/6 (Jonathan).
>
> --- Changes for v4 ---
>
>         - Re-base on top of recent changes of the AER error logging and
>           drop obsoleted 2/4 (Sathyanarayanan)
>         - Log with pr_warn_ratelimited() (Dave)
>         - Collect tags
>
> --- Changes for v3 ---
>
>     1/4, 2/4:
>         - collect tags; no functional changes
>     3/4:
>         - Invert logic of checks (Yazen)
>         - Select CONFIG_ACPI_APEI_PCIEAER (Yazen)
>     4/4:
>         - Check serial number only for CXL devices (Yazen)
>         - Replace "invalid" with "unknown" in the output of a pr_err()
>           (Yazen)
>
> --- Changes for v2 ---
>
>         - Add a patch to pass log levels to pci_print_aer() (Dan)
>         - Add a patch to trace CPER CXL Protocol Errors
>         - Rework commit messages (Dan)
>         - Use log_non_standard_event() (Bjorn)
>
> --- Changes for v1 ---
>
>         - Drop the RFC prefix and restart from PATCH v1
>         - Drop patch 3/3 because a discussion on it has not yet been
>           settled
>         - Drop namespacing in export of pci_print_aer while() (Dan)
>         - Don't use '#ifdef' in *.c files (Dan)
>         - Drop a reference on pdev after operation is complete (Dan)
>         - Don't log an error message if pdev is NULL (Dan)
>
> Fabio M. De Francesco (6):
>   ACPI: extlog: Trace CPER Non-standard Section Body
>   ACPI: extlog: Trace CPER PCI Express Error Section
>   acpi/ghes: Make GHES select ACPI_APEI_PCIEAER
>   acpi/ghes: Add helper for CPER CXL protocol errors validity checks
>   acpi/ghes: Add helper to copy CPER CXL protocol error information to
>     work struct
>   ACPI: extlog: Trace CPER CXL Protocol Error Section
>
>  drivers/acpi/Kconfig       |  1 +
>  drivers/acpi/acpi_extlog.c | 60 ++++++++++++++++++++++++++++++++++++
>  drivers/acpi/apei/Kconfig  |  1 +
>  drivers/acpi/apei/ghes.c   | 62 +++++++++++++++++++++++++-------------
>  drivers/cxl/core/ras.c     |  6 ++++
>  drivers/pci/pcie/aer.c     |  2 +-
>  include/cxl/event.h        | 22 ++++++++++++++
>  7 files changed, 132 insertions(+), 22 deletions(-)
>
>
> base-commit: 552c50713f273b494ac6c77052032a49bc9255e2
> --

I need ACKs or equivalent for patches [3-5/6] from the designated APEI
reviewers.  Tony?
RE: [PATCH 0/6 v6] Make ELOG and GHES log and trace consistently
Posted by Luck, Tony 3 months, 1 week ago
> I need ACKs or equivalent for patches [3-5/6] from the designated APEI
> reviewers.  Tony?

There's an LKP complaint against patch 3 (perhaps for a crazy randconfig, but an indication that Kconfig dependencies aren't right).

The APEI bits look ok to me. But I think 3-6 need some CXL acks too.

-Tony