Testing OS kernel ACPI APEI CPER support is tricky, as one depends on
having hardware with special-purpose BIOS and/or hardware.
With QEMU, it becomes a lot easier, as it can be done via QMP.
This series add support for injecting CPER records on ARM emulation.
The QEMU side changes add a QAPI able to do CPER error injection
on ARM, with a raw data parameter, making it very flexible.
A script is provided at the final patch implementing support for
ARM Processor CPER error injection according with ACPI 6.x and
UEFI 2.9A/2.10 specs, via QMP.
Injecting such errors can be done using the provided script:
$ ./scripts/ghes_inject.py arm
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 0, "major": 9}, "package": "v9.0.0-2621-g3de6991b870a"}, "capabilities": ["oob"]}}
{ "execute": "qmp_capabilities" }
{"return": {}}
{ "execute": "ghes-cper", "arguments": {"cper": {"notification-type": [22, 61, 158, 225, 17, 188, 228, 17, 156, 170, 194, 5, 29, 93, 70, 176], "raw-data": [0, 0, 0, 0, 1, 0, 0, 0, 72, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 4, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 239, 190, 173, 222, 0, 0, 0, 0, 173, 11, 186, 171, 0, 0, 0, 0]}} }
{"return": {}}
Produces a simple CPER register, properly handled by the Linux
Kernel:
[ 5876.041410] {18}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[ 5876.041775] {18}[Hardware Error]: event severity: recoverable
[ 5876.042023] {18}[Hardware Error]: Error 0, type: recoverable
[ 5876.042280] {18}[Hardware Error]: section_type: ARM processor error
[ 5876.042538] {18}[Hardware Error]: MIDR: 0x0000000000000000
[ 5876.042781] {18}[Hardware Error]: Error info structure 0:
[ 5876.043013] {18}[Hardware Error]: num errors: 2
[ 5876.043222] {18}[Hardware Error]: error_type: 0x02: cache error
[ 5876.043500] {18}[Hardware Error]: error_info: 0x0000000000000000
[ 5876.043800] [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error
More complex use cases can be done, like:
$ ./scripts/ghes_inject.py arm --mpidr 0x444 --running --affinity 1 --error-info 12345678 --vendor 0x13,123,4,5,1 --ctx-array 0,1,2,3,4,5 -t cache tlb bus vendor tlb,vendor
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 0, "major": 9}, "package": "v9.0.0-2621-g3de6991b870a"}, "capabilities": ["oob"]}}
{ "execute": "qmp_capabilities" }
{"return": {}}
{ "execute": "ghes-cper", "arguments": {"cper": {"notification-type": [22, 61, 158, 225, 17, 188, 228, 17, 156, 170, 194, 5, 29, 93, 70, 176], "raw-data": [7, 0, 0, 0, 5, 0, 1, 0, 13, 1, 0, 0, 1, 0, 0, 0, 68, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 32, 4, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 239, 190, 173, 222, 0, 0, 0, 0, 173, 11, 186, 171, 0, 0, 0, 0, 0, 32, 4, 0, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 239, 190, 173, 222, 0, 0, 0, 0, 173, 11, 186, 171, 0, 0, 0, 0, 0, 32, 4, 0, 8, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 239, 190, 173, 222, 0, 0, 0, 0, 173, 11, 186, 171, 0, 0, 0, 0, 0, 32, 4, 0, 16, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 239, 190, 173, 222, 0, 0, 0, 0, 173, 11, 186, 171, 0, 0, 0, 0, 0, 32, 0, 0, 20, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 239, 190, 173, 222, 0, 0, 0, 0, 173, 11, 186, 171, 0, 0, 0, 0, 0, 0, 5, 0, 56, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 19, 123, 4, 5, 1]}} }
{"return": {}}
964.134325] {19}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[ 5964.134692] {19}[Hardware Error]: event severity: recoverable
[ 5964.134942] {19}[Hardware Error]: Error 0, type: recoverable
[ 5964.135200] {19}[Hardware Error]: section_type: ARM processor error
[ 5964.135466] {19}[Hardware Error]: MIDR: 0x0000000000000000
[ 5964.135700] {19}[Hardware Error]: Multiprocessor Affinity Register (MPIDR): 0x0000000000000444
[ 5964.136025] {19}[Hardware Error]: error affinity level: 1
[ 5964.136255] {19}[Hardware Error]: running state: 0x1
[ 5964.136468] {19}[Hardware Error]: Power State Coordination Interface state: 0
[ 5964.136767] {19}[Hardware Error]: Error info structure 0:
[ 5964.137001] {19}[Hardware Error]: num errors: 2
[ 5964.137210] {19}[Hardware Error]: error_type: 0x02: cache error
[ 5964.137472] {19}[Hardware Error]: error_info: 0x0000000000000000
[ 5964.137737] {19}[Hardware Error]: Error info structure 1:
[ 5964.137976] {19}[Hardware Error]: num errors: 2
[ 5964.138192] {19}[Hardware Error]: error_type: 0x04: TLB error
[ 5964.138459] {19}[Hardware Error]: error_info: 0x0000000000000000
[ 5964.138727] {19}[Hardware Error]: Error info structure 2:
[ 5964.138967] {19}[Hardware Error]: num errors: 2
[ 5964.139185] {19}[Hardware Error]: error_type: 0x08: bus error
[ 5964.139451] {19}[Hardware Error]: error_info: 0x0000000000000000
[ 5964.139751] {19}[Hardware Error]: Error info structure 3:
[ 5964.139993] {19}[Hardware Error]: num errors: 2
[ 5964.140210] {19}[Hardware Error]: error_type: 0x10: micro-architectural error
[ 5964.140522] {19}[Hardware Error]: error_info: 0x0000000000000000
[ 5964.140790] {19}[Hardware Error]: Error info structure 4:
[ 5964.141030] {19}[Hardware Error]: num errors: 2
[ 5964.141261] {19}[Hardware Error]: error_type: 0x14: TLB error|micro-architectural error
[ 5964.141599] {19}[Hardware Error]: Context info structure 0:
[ 5964.141843] {19}[Hardware Error]: register context type: AArch64 EL1 context registers
[ 5964.142195] {19}[Hardware Error]: 00000000: 00000000 00000000 00000001 00000000
[ 5964.142534] {19}[Hardware Error]: 00000010: 00000002 00000000 00000003 00000000
[ 5964.142867] {19}[Hardware Error]: 00000020: 00000004 00000000 00000005 00000000
[ 5964.143193] {19}[Hardware Error]: 00000030: 00000000 00000000
[ 5964.143464] {19}[Hardware Error]: Vendor specific error info has 5 bytes:
[ 5964.143750] {19}[Hardware Error]: 00000000: 13 7b 04 05 01 .{...
[ 5964.144164] [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error
[ 5964.144483] [Firmware Warn]: GHES: Unhandled processor error type 0x04: TLB error
[ 5964.144793] [Firmware Warn]: GHES: Unhandled processor error type 0x08: bus error
[ 5964.145099] [Firmware Warn]: GHES: Unhandled processor error type 0x10: micro-architectural error
[ 5964.145454] [Firmware Warn]: GHES: Unhandled processor error type 0x14: TLB error|micro-architectural error
---
v4:
- CPER generation moved to happen outside QEMU;
- One patch adding support for mpidr query was removed.
v3:
- patch 1 cleanups with some comment changes and adding another place where
the poweroff GPIO define should be used. No changes on other patches (except
due to conflict resolution).
v2:
- added a new patch using a define for GPIO power pin;
- patch 2 changed to also use a define for generic error GPIO pin;
- a couple cleanups at patch 2 removing uneeded else clauses.
Jonathan Cameron (1):
acpi/ghes: Support GPIO error source
Mauro Carvalho Chehab (6):
arm/virt: place power button pin number on a define
acpi/generic_event_device: add an APEI error device
arm/virt: Wire up GPIO error source for ACPI / GHES
qapi/ghes-cper: add an interface to do generic CPER error injection
acpi/ghes: add support for generic error injection via QAPI
scripts/ghes_inject: add a script to generate GHES error inject
MAINTAINERS | 8 +
hw/acpi/Kconfig | 5 +
hw/acpi/generic_event_device.c | 17 +
hw/acpi/ghes.c | 178 ++++++-
hw/acpi/ghes_cper.c | 53 ++
hw/acpi/meson.build | 2 +
hw/arm/Kconfig | 5 +
hw/arm/virt-acpi-build.c | 25 +-
hw/arm/virt.c | 33 +-
include/hw/acpi/acpi_dev_interface.h | 1 +
include/hw/acpi/generic_event_device.h | 3 +
include/hw/acpi/ghes.h | 14 +-
include/hw/arm/virt.h | 5 +
qapi/ghes-cper.json | 54 ++
qapi/meson.build | 1 +
qapi/qapi-schema.json | 1 +
scripts/ghes_inject.py | 673 +++++++++++++++++++++++++
17 files changed, 1048 insertions(+), 30 deletions(-)
create mode 100644 hw/acpi/ghes_cper.c
create mode 100644 qapi/ghes-cper.json
create mode 100755 scripts/ghes_inject.py
--
2.45.2