[PATCH v6 0/6] acpi/ghes, cper, cxl: Process CXL CPER Protocol errors

Smita Koralahalli posted 6 patches 11 months ago
There is a newer version of this series
drivers/acpi/apei/ghes.c        | 103 ++++++++++++++++++++++++++++++++
drivers/cxl/core/pci.c          |  62 +++++++++++++++++++
drivers/cxl/core/trace.h        |  47 +++++++++++++++
drivers/cxl/cxlpci.h            |   9 +++
drivers/cxl/pci.c               |  59 +++++++++++++++++-
drivers/firmware/efi/cper.c     |   6 +-
drivers/firmware/efi/cper_cxl.c |  39 +-----------
drivers/firmware/efi/cper_cxl.h |  66 --------------------
include/cxl/event.h             | 101 +++++++++++++++++++++++++++++++
include/linux/cper.h            |   8 +++
10 files changed, 394 insertions(+), 106 deletions(-)
delete mode 100644 drivers/firmware/efi/cper_cxl.h
[PATCH v6 0/6] acpi/ghes, cper, cxl: Process CXL CPER Protocol errors
Posted by Smita Koralahalli 11 months ago
This patchset adds logging support for CXL CPER endpoint and port protocol
errors.

The first 3 patches update the existing codebase to support CXL CPER
Protocol error reporting.

The last 3 patches introduce recognizing and reporting CXL CPER Protocol
errors.

Link to v5:
https://lore.kernel.org/linux-cxl/20250114120427.149260-1-Smita.KoralahalliChannabasappa@amd.com

Changes in v5 -> v6:
[Dave, Jonathan, Ira]: Reviewed-by tags.
[Dave]: Check for cxlds before assigning fe.
Merge one of the patches (Port error trace logging) from Terry's Port
error handling.
Rename host -> parent.

Changes in v4 -> v5:
[Dave]: Reviewed-by tags.
[Jonathan]: Remove blank line.
[Jonathan, Ira]: Change CXL -> "CXL".
[Ira]: Fix build error for CONFIG_ACPI_APEI_PCIEAER.

Changes in v3 -> v4:
[Ira]: Use memcpy() for RAS Cap struct.
[Jonathan]: Commit description edits.
[Jonathan]: Use separate work registration functions for protocol and
component errors.
[Jonathan, Ira]: Replace flags with separate functions for port and
device errors.
[Jonathan]: Use goto for register and unregister calls.

Changes in v2 -> v3:
[Dan]: Define a new workqueue for CXL CPER Protocol errors and avoid
reusing existing workqueue which handles CXL CPER events.
[Dan] Update function and struct names.
[Ira] Don't define common function get_cxl_devstate().
[Dan] Use switch cases rather than defining array of structures.
[Dan] Pass the entire cxl_cper_prot_err struct for CXL subsystem.
[Dan] Use pr_err_ratelimited().
[Dan] Use AER_ severities directly. Don't define CXL_ severities.
[Dan] Limit either to Device ID or Agent Info check.
[Dan] Validate size of RAS field matches expectations.

Changes in v2 -> v1:
[Jonathan] Refactor code for trace support. Rename get_cxl_dev()
to get_cxl_devstate().
[Jonathan] Cleanups for get_cxl_devstate().
[Alison, Jonathan]: Define array of structures for Device ID and Serial
number comparison.
[Dave] p_err -> rec/p_rec.
[Jonathan] Remove pr_warn.

Smita Koralahalli (6):
  efi/cper, cxl: Prefix protocol error struct and function names with
    cxl_
  efi/cper, cxl: Make definitions and structures global
  efi/cper, cxl: Remove cper_cxl.h
  acpi/ghes, cper: Recognize and cache CXL Protocol errors
  acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors
  cxl/pci: Add trace logging for CXL PCIe Port RAS errors

 drivers/acpi/apei/ghes.c        | 103 ++++++++++++++++++++++++++++++++
 drivers/cxl/core/pci.c          |  62 +++++++++++++++++++
 drivers/cxl/core/trace.h        |  47 +++++++++++++++
 drivers/cxl/cxlpci.h            |   9 +++
 drivers/cxl/pci.c               |  59 +++++++++++++++++-
 drivers/firmware/efi/cper.c     |   6 +-
 drivers/firmware/efi/cper_cxl.c |  39 +-----------
 drivers/firmware/efi/cper_cxl.h |  66 --------------------
 include/cxl/event.h             | 101 +++++++++++++++++++++++++++++++
 include/linux/cper.h            |   8 +++
 10 files changed, 394 insertions(+), 106 deletions(-)
 delete mode 100644 drivers/firmware/efi/cper_cxl.h

-- 
2.17.1
Re: [PATCH v6 0/6] acpi/ghes, cper, cxl: Process CXL CPER Protocol errors
Posted by Dave Jiang 10 months, 2 weeks ago

On 1/23/25 1:44 AM, Smita Koralahalli wrote:
> This patchset adds logging support for CXL CPER endpoint and port protocol
> errors.

Hi Ard,
I'd like to apply this series to cxl/next. If the EFI bits look ok to you, can you please ack the relevant patches? Thank you!
 
> 
> The first 3 patches update the existing codebase to support CXL CPER
> Protocol error reporting.
> 
> The last 3 patches introduce recognizing and reporting CXL CPER Protocol
> errors.
> 
> Link to v5:
> https://lore.kernel.org/linux-cxl/20250114120427.149260-1-Smita.KoralahalliChannabasappa@amd.com
> 
> Changes in v5 -> v6:
> [Dave, Jonathan, Ira]: Reviewed-by tags.
> [Dave]: Check for cxlds before assigning fe.
> Merge one of the patches (Port error trace logging) from Terry's Port
> error handling.
> Rename host -> parent.
> 
> Changes in v4 -> v5:
> [Dave]: Reviewed-by tags.
> [Jonathan]: Remove blank line.
> [Jonathan, Ira]: Change CXL -> "CXL".
> [Ira]: Fix build error for CONFIG_ACPI_APEI_PCIEAER.
> 
> Changes in v3 -> v4:
> [Ira]: Use memcpy() for RAS Cap struct.
> [Jonathan]: Commit description edits.
> [Jonathan]: Use separate work registration functions for protocol and
> component errors.
> [Jonathan, Ira]: Replace flags with separate functions for port and
> device errors.
> [Jonathan]: Use goto for register and unregister calls.
> 
> Changes in v2 -> v3:
> [Dan]: Define a new workqueue for CXL CPER Protocol errors and avoid
> reusing existing workqueue which handles CXL CPER events.
> [Dan] Update function and struct names.
> [Ira] Don't define common function get_cxl_devstate().
> [Dan] Use switch cases rather than defining array of structures.
> [Dan] Pass the entire cxl_cper_prot_err struct for CXL subsystem.
> [Dan] Use pr_err_ratelimited().
> [Dan] Use AER_ severities directly. Don't define CXL_ severities.
> [Dan] Limit either to Device ID or Agent Info check.
> [Dan] Validate size of RAS field matches expectations.
> 
> Changes in v2 -> v1:
> [Jonathan] Refactor code for trace support. Rename get_cxl_dev()
> to get_cxl_devstate().
> [Jonathan] Cleanups for get_cxl_devstate().
> [Alison, Jonathan]: Define array of structures for Device ID and Serial
> number comparison.
> [Dave] p_err -> rec/p_rec.
> [Jonathan] Remove pr_warn.
> 
> Smita Koralahalli (6):
>   efi/cper, cxl: Prefix protocol error struct and function names with
>     cxl_
>   efi/cper, cxl: Make definitions and structures global
>   efi/cper, cxl: Remove cper_cxl.h
>   acpi/ghes, cper: Recognize and cache CXL Protocol errors
>   acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors
>   cxl/pci: Add trace logging for CXL PCIe Port RAS errors
> 
>  drivers/acpi/apei/ghes.c        | 103 ++++++++++++++++++++++++++++++++
>  drivers/cxl/core/pci.c          |  62 +++++++++++++++++++
>  drivers/cxl/core/trace.h        |  47 +++++++++++++++
>  drivers/cxl/cxlpci.h            |   9 +++
>  drivers/cxl/pci.c               |  59 +++++++++++++++++-
>  drivers/firmware/efi/cper.c     |   6 +-
>  drivers/firmware/efi/cper_cxl.c |  39 +-----------
>  drivers/firmware/efi/cper_cxl.h |  66 --------------------
>  include/cxl/event.h             | 101 +++++++++++++++++++++++++++++++
>  include/linux/cper.h            |   8 +++
>  10 files changed, 394 insertions(+), 106 deletions(-)
>  delete mode 100644 drivers/firmware/efi/cper_cxl.h
>
Re: [PATCH v6 0/6] acpi/ghes, cper, cxl: Process CXL CPER Protocol errors
Posted by Dave Jiang 10 months, 2 weeks ago

On 1/23/25 1:44 AM, Smita Koralahalli wrote:
> This patchset adds logging support for CXL CPER endpoint and port protocol
> errors.
> 
> The first 3 patches update the existing codebase to support CXL CPER
> Protocol error reporting.
> 
> The last 3 patches introduce recognizing and reporting CXL CPER Protocol
> errors.

Patches 1-4 applied to cxl-next. I fixed up Gregory's review tag. :)
Patches 5 and 6 needs to address comments raised by Dan.

> 
> Link to v5:
> https://lore.kernel.org/linux-cxl/20250114120427.149260-1-Smita.KoralahalliChannabasappa@amd.com
> 
> Changes in v5 -> v6:
> [Dave, Jonathan, Ira]: Reviewed-by tags.
> [Dave]: Check for cxlds before assigning fe.
> Merge one of the patches (Port error trace logging) from Terry's Port
> error handling.
> Rename host -> parent.
> 
> Changes in v4 -> v5:
> [Dave]: Reviewed-by tags.
> [Jonathan]: Remove blank line.
> [Jonathan, Ira]: Change CXL -> "CXL".
> [Ira]: Fix build error for CONFIG_ACPI_APEI_PCIEAER.
> 
> Changes in v3 -> v4:
> [Ira]: Use memcpy() for RAS Cap struct.
> [Jonathan]: Commit description edits.
> [Jonathan]: Use separate work registration functions for protocol and
> component errors.
> [Jonathan, Ira]: Replace flags with separate functions for port and
> device errors.
> [Jonathan]: Use goto for register and unregister calls.
> 
> Changes in v2 -> v3:
> [Dan]: Define a new workqueue for CXL CPER Protocol errors and avoid
> reusing existing workqueue which handles CXL CPER events.
> [Dan] Update function and struct names.
> [Ira] Don't define common function get_cxl_devstate().
> [Dan] Use switch cases rather than defining array of structures.
> [Dan] Pass the entire cxl_cper_prot_err struct for CXL subsystem.
> [Dan] Use pr_err_ratelimited().
> [Dan] Use AER_ severities directly. Don't define CXL_ severities.
> [Dan] Limit either to Device ID or Agent Info check.
> [Dan] Validate size of RAS field matches expectations.
> 
> Changes in v2 -> v1:
> [Jonathan] Refactor code for trace support. Rename get_cxl_dev()
> to get_cxl_devstate().
> [Jonathan] Cleanups for get_cxl_devstate().
> [Alison, Jonathan]: Define array of structures for Device ID and Serial
> number comparison.
> [Dave] p_err -> rec/p_rec.
> [Jonathan] Remove pr_warn.
> 
> Smita Koralahalli (6):
>   efi/cper, cxl: Prefix protocol error struct and function names with
>     cxl_
>   efi/cper, cxl: Make definitions and structures global
>   efi/cper, cxl: Remove cper_cxl.h
>   acpi/ghes, cper: Recognize and cache CXL Protocol errors
>   acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors
>   cxl/pci: Add trace logging for CXL PCIe Port RAS errors
> 
>  drivers/acpi/apei/ghes.c        | 103 ++++++++++++++++++++++++++++++++
>  drivers/cxl/core/pci.c          |  62 +++++++++++++++++++
>  drivers/cxl/core/trace.h        |  47 +++++++++++++++
>  drivers/cxl/cxlpci.h            |   9 +++
>  drivers/cxl/pci.c               |  59 +++++++++++++++++-
>  drivers/firmware/efi/cper.c     |   6 +-
>  drivers/firmware/efi/cper_cxl.c |  39 +-----------
>  drivers/firmware/efi/cper_cxl.h |  66 --------------------
>  include/cxl/event.h             | 101 +++++++++++++++++++++++++++++++
>  include/linux/cper.h            |   8 +++
>  10 files changed, 394 insertions(+), 106 deletions(-)
>  delete mode 100644 drivers/firmware/efi/cper_cxl.h
>