hw/acpi/ghes-stub.c | 6 +++--- hw/acpi/ghes.c | 45 ++++++++++++++++++------------------------ include/hw/acpi/ghes.h | 6 +++--- target/arm/kvm.c | 10 +++------- 4 files changed, 28 insertions(+), 39 deletions(-)
This series is curved from that for memory error handling improvement [1] based on the received comments, to improve the error object handling in various aspects. [1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html Gavin Shan (5): acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary target/arm/kvm: Exit on error from acpi_ghes_memory_errors() acpi/ghes: Bail early on error from get_ghes_source_offsets() acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() hw/acpi/ghes-stub.c | 6 +++--- hw/acpi/ghes.c | 45 ++++++++++++++++++------------------------ include/hw/acpi/ghes.h | 6 +++--- target/arm/kvm.c | 10 +++------- 4 files changed, 28 insertions(+), 39 deletions(-) -- 2.51.1
On Thu, 27 Nov 2025 10:44:30 +1000
Gavin Shan <gshan@redhat.com> wrote:
> This series is curved from that for memory error handling improvement
^^^ confusing
based on above I'm not sure if it depends on [1] and shoul be applied on top
or it can be merged on its own
> [1] based on the received comments, to improve the error object handling
> in various aspects.
>
> [1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html
>
> Gavin Shan (5):
> acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors()
> acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary
> target/arm/kvm: Exit on error from acpi_ghes_memory_errors()
> acpi/ghes: Bail early on error from get_ghes_source_offsets()
> acpi/ghes: Use error_fatal in acpi_ghes_memory_errors()
>
> hw/acpi/ghes-stub.c | 6 +++---
> hw/acpi/ghes.c | 45 ++++++++++++++++++------------------------
> include/hw/acpi/ghes.h | 6 +++---
> target/arm/kvm.c | 10 +++-------
> 4 files changed, 28 insertions(+), 39 deletions(-)
>
Hi Igor, On 11/29/25 12:09 AM, Igor Mammedov wrote: > On Thu, 27 Nov 2025 10:44:30 +1000 > Gavin Shan <gshan@redhat.com> wrote: > >> This series is curved from that for memory error handling improvement > ^^^ confusing > based on above I'm not sure if it depends on [1] and shoul be applied on top > or it can be merged on its own > The current series is a standalone series and expected to be merged by its own. For (v4) series of memory error improvement [1], Jonathan wants to extend the handlers in the guest kernel so that the granularity in CPER record will be used to isolate the corresponding memory address range. With this, the patches in the (v4) series to send 16x continuous errors become useless. However, those patches in (v4) series to improve the Error (object) hanlding are still useful. So I pulled those patches for the Error (object) hanlding improvement from (v4) series to form this series. >> [1] based on the received comments, to improve the error object handling >> in various aspects. >> >> [1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html >> Thanks, Gavin >> Gavin Shan (5): >> acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() >> acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary >> target/arm/kvm: Exit on error from acpi_ghes_memory_errors() >> acpi/ghes: Bail early on error from get_ghes_source_offsets() >> acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() >> >> hw/acpi/ghes-stub.c | 6 +++--- >> hw/acpi/ghes.c | 45 ++++++++++++++++++------------------------ >> include/hw/acpi/ghes.h | 6 +++--- >> target/arm/kvm.c | 10 +++------- >> 4 files changed, 28 insertions(+), 39 deletions(-) >> >
On Sat, 29 Nov 2025 11:21:55 +1000 Gavin Shan <gshan@redhat.com> wrote: > Hi Igor, > > On 11/29/25 12:09 AM, Igor Mammedov wrote: > > On Thu, 27 Nov 2025 10:44:30 +1000 > > Gavin Shan <gshan@redhat.com> wrote: > > > >> This series is curved from that for memory error handling improvement > > ^^^ confusing > > based on above I'm not sure if it depends on [1] and shoul be applied on top > > or it can be merged on its own > > > > The current series is a standalone series and expected to be merged by its own. > > For (v4) series of memory error improvement [1], Jonathan wants to extend > the handlers in the guest kernel so that the granularity in CPER record > will be used to isolate the corresponding memory address range. With this, > the patches in the (v4) series to send 16x continuous errors become useless. > However, those patches in (v4) series to improve the Error (object) hanlding > are still useful. So I pulled those patches for the Error (object) hanlding > improvement from (v4) series to form this series. ok, then I'll review this series and skip v4 for now > > >> [1] based on the received comments, to improve the error object handling > >> in various aspects. > >> > >> [1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html > >> > > Thanks, > Gavin > > >> Gavin Shan (5): > >> acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() > >> acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary > >> target/arm/kvm: Exit on error from acpi_ghes_memory_errors() > >> acpi/ghes: Bail early on error from get_ghes_source_offsets() > >> acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() > >> > >> hw/acpi/ghes-stub.c | 6 +++--- > >> hw/acpi/ghes.c | 45 ++++++++++++++++++------------------------ > >> include/hw/acpi/ghes.h | 6 +++--- > >> target/arm/kvm.c | 10 +++------- > >> 4 files changed, 28 insertions(+), 39 deletions(-) > >> > > >
On Thu, 27 Nov 2025 10:44:30 +1000 Gavin Shan <gshan@redhat.com> wrote: > This series is curved from that for memory error handling improvement > [1] based on the received comments, to improve the error object handling > in various aspects. > > [1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html > > Gavin Shan (5): > acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() > acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary > target/arm/kvm: Exit on error from acpi_ghes_memory_errors() > acpi/ghes: Bail early on error from get_ghes_source_offsets() > acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() Patch series look ok on my eyes. Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> - Btw, what setup are you using to test memory errors? It would be nice to have it documented somewhere, maybe at docs/specs/acpi_hest_ghes.rst. Thanks, Mauro
Hi Mauro,
On 12/1/25 10:17 PM, Mauro Carvalho Chehab wrote:
> On Thu, 27 Nov 2025 10:44:30 +1000
> Gavin Shan <gshan@redhat.com> wrote:
>
>> This series is curved from that for memory error handling improvement
>> [1] based on the received comments, to improve the error object handling
>> in various aspects.
>>
>> [1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html
>>
>> Gavin Shan (5):
>> acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors()
>> acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary
>> target/arm/kvm: Exit on error from acpi_ghes_memory_errors()
>> acpi/ghes: Bail early on error from get_ghes_source_offsets()
>> acpi/ghes: Use error_fatal in acpi_ghes_memory_errors()
>
> Patch series look ok on my eyes.
>
> Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>
Thanks.
> -
>
> Btw, what setup are you using to test memory errors? It would be
> nice to have it documented somewhere, maybe at
> docs/specs/acpi_hest_ghes.rst.
>
I don't think docs/specs/acpi_hest_ghes.rst is the right place for that
as it's for specifications. I'm sharing how this is tested here to make
the thread complete.
- Both host and guest has 4KB page size
- Start the guest by the following command lines
/home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
-accel kvm -machine virt,gic-version=host,nvdimm=on,ras=on \
-cpu host -smp maxcpus=8,cpus=8,sockets=2,clusters=2,cores=2,threads=1 \
-m 4096M,slots=16,maxmem=128G \
-object memory-backend-ram,id=mem0,size=4096M \
-numa node,nodeid=0,cpus=0-7,memdev=mem0 \
-L /home/gavin/sandbox/qemu.main/build/pc-bios \
-monitor none -serial mon:stdio -nographic \
-gdb tcp::6666 -qmp tcp:localhost:5555,server,wait=off \
-bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd \
-boot c \
-device pcie-root-port,bus=pcie.0,chassis=1,id=pcie.1 \
-device pcie-root-port,bus=pcie.0,chassis=2,id=pcie.2 \
-device pcie-root-port,bus=pcie.0,chassis=3,id=pcie.3 \
: \
-device pcie-root-port,bus=pcie.0,chassis=16,id=pcie.16 \
-drive file=/home/gavin/sandbox/images/disk.qcow2,if=none,id=drive0 \
-device virtio-blk-pci,id=virtblk0,bus=pcie.1,drive=drive0,num-queues=4 \
-netdev tap,id=tap1,vhost=true,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-device virtio-net-pci,bus=pcie.8,netdev=tap1,mac=52:54:00:f1:26:b0
- Trigger 'victim -d' in the guest
guest$ ./victim -d
physical address of (0xffff8d9b7000) = 0x1251d6000
Hit any key to trigger error:
- Inject error to the GPA. "test.c" is attached
host$ ./test 0x1251d6000
- Press enter on the guest so that 'victim' continues its execution
[ 435.467481] EDAC MC0: 1 UE unknown on unknown memory ( page:0x1251d6 offset:0x0 grain:1 - APEI location: )
[ 435.467542] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[ 435.467543] {1}[Hardware Error]: event severity: recoverable
[ 435.467544] {1}[Hardware Error]: Error 0, type: recoverable
[ 435.467545] {1}[Hardware Error]: section_type: memory error
[ 435.467546] {1}[Hardware Error]: physical_address: 0x00000001251d6000
[ 435.467547] {1}[Hardware Error]: error_type: 0, unknown
[ 435.468380] Memory failure: 0x1251d6: recovery action for dirty LRU page: Recovered
Bus error (core dumped)
Thanks,
Gavin
> Thanks,
> Mauro
>
On Tue, 2 Dec 2025 00:13:06 +1000 Gavin Shan <gshan@redhat.com> wrote: > Hi Mauro, > > On 12/1/25 10:17 PM, Mauro Carvalho Chehab wrote: > > On Thu, 27 Nov 2025 10:44:30 +1000 > > Gavin Shan <gshan@redhat.com> wrote: > > > >> This series is curved from that for memory error handling improvement > >> [1] based on the received comments, to improve the error object handling > >> in various aspects. > >> > >> [1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html > >> > >> Gavin Shan (5): > >> acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() > >> acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary > >> target/arm/kvm: Exit on error from acpi_ghes_memory_errors() > >> acpi/ghes: Bail early on error from get_ghes_source_offsets() > >> acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() > > > > Patch series look ok on my eyes. > > > > Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> > > > > Thanks. > > > - > > > > Btw, what setup are you using to test memory errors? It would be > > nice to have it documented somewhere, maybe at > > docs/specs/acpi_hest_ghes.rst. > > > > I don't think docs/specs/acpi_hest_ghes.rst is the right place for that > as it's for specifications. Perhaps not, but it would be nice to have it documented somewhere, either there or at QEMU wiki. > I'm sharing how this is tested here to make the thread complete. Thanks! > > - Both host and guest has 4KB page size > > - Start the guest by the following command lines > > /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ > -accel kvm -machine virt,gic-version=host,nvdimm=on,ras=on \ > -cpu host -smp maxcpus=8,cpus=8,sockets=2,clusters=2,cores=2,threads=1 \ > -m 4096M,slots=16,maxmem=128G \ > -object memory-backend-ram,id=mem0,size=4096M \ > -numa node,nodeid=0,cpus=0-7,memdev=mem0 \ > -L /home/gavin/sandbox/qemu.main/build/pc-bios \ > -monitor none -serial mon:stdio -nographic \ > -gdb tcp::6666 -qmp tcp:localhost:5555,server,wait=off \ > -bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd \ > -boot c \ > -device pcie-root-port,bus=pcie.0,chassis=1,id=pcie.1 \ > -device pcie-root-port,bus=pcie.0,chassis=2,id=pcie.2 \ > -device pcie-root-port,bus=pcie.0,chassis=3,id=pcie.3 \ > : \ > -device pcie-root-port,bus=pcie.0,chassis=16,id=pcie.16 \ > -drive file=/home/gavin/sandbox/images/disk.qcow2,if=none,id=drive0 \ > -device virtio-blk-pci,id=virtblk0,bus=pcie.1,drive=drive0,num-queues=4 \ > -netdev tap,id=tap1,vhost=true,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ > -device virtio-net-pci,bus=pcie.8,netdev=tap1,mac=52:54:00:f1:26:b0 > > - Trigger 'victim -d' in the guest Hmm... from where I can get victim? Regards, Mauro
Hi Mauro, On 12/2/25 12:31 AM, Mauro Carvalho Chehab wrote: > On Tue, 2 Dec 2025 00:13:06 +1000 > Gavin Shan <gshan@redhat.com> wrote: >> On 12/1/25 10:17 PM, Mauro Carvalho Chehab wrote: >>> On Thu, 27 Nov 2025 10:44:30 +1000 >>> Gavin Shan <gshan@redhat.com> wrote: [...] >>> >>> Btw, what setup are you using to test memory errors? It would be >>> nice to have it documented somewhere, maybe at >>> docs/specs/acpi_hest_ghes.rst. >>> >> >> I don't think docs/specs/acpi_hest_ghes.rst is the right place for that >> as it's for specifications. > > Perhaps not, but it would be nice to have it documented somewhere, > either there or at QEMU wiki. > QEMU wiki may be the best place for it. I never updated to QEMU wiki and any guiding steps on how to do that? >> I'm sharing how this is tested here to make the thread complete. > > Thanks! > >> >> - Both host and guest has 4KB page size >> >> - Start the guest by the following command lines >> >> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ >> -accel kvm -machine virt,gic-version=host,nvdimm=on,ras=on \ >> -cpu host -smp maxcpus=8,cpus=8,sockets=2,clusters=2,cores=2,threads=1 \ >> -m 4096M,slots=16,maxmem=128G \ >> -object memory-backend-ram,id=mem0,size=4096M \ >> -numa node,nodeid=0,cpus=0-7,memdev=mem0 \ >> -L /home/gavin/sandbox/qemu.main/build/pc-bios \ >> -monitor none -serial mon:stdio -nographic \ >> -gdb tcp::6666 -qmp tcp:localhost:5555,server,wait=off \ >> -bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd \ >> -boot c \ >> -device pcie-root-port,bus=pcie.0,chassis=1,id=pcie.1 \ >> -device pcie-root-port,bus=pcie.0,chassis=2,id=pcie.2 \ >> -device pcie-root-port,bus=pcie.0,chassis=3,id=pcie.3 \ >> : \ >> -device pcie-root-port,bus=pcie.0,chassis=16,id=pcie.16 \ >> -drive file=/home/gavin/sandbox/images/disk.qcow2,if=none,id=drive0 \ >> -device virtio-blk-pci,id=virtblk0,bus=pcie.1,drive=drive0,num-queues=4 \ >> -netdev tap,id=tap1,vhost=true,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ >> -device virtio-net-pci,bus=pcie.8,netdev=tap1,mac=52:54:00:f1:26:b0 >> >> - Trigger 'victim -d' in the guest > > Hmm... from where I can get victim? > https://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git > Regards, > Mauro > Thanks, Gavin
On Mon, 1 Dec 2025 at 14:38, Gavin Shan <gshan@redhat.com> wrote: > > Hi Mauro, > > On 12/2/25 12:31 AM, Mauro Carvalho Chehab wrote: > > On Tue, 2 Dec 2025 00:13:06 +1000 > > Gavin Shan <gshan@redhat.com> wrote: > >> On 12/1/25 10:17 PM, Mauro Carvalho Chehab wrote: > >>> On Thu, 27 Nov 2025 10:44:30 +1000 > >>> Gavin Shan <gshan@redhat.com> wrote: > > [...] > > >>> > >>> Btw, what setup are you using to test memory errors? It would be > >>> nice to have it documented somewhere, maybe at > >>> docs/specs/acpi_hest_ghes.rst. > >>> > >> > >> I don't think docs/specs/acpi_hest_ghes.rst is the right place for that > >> as it's for specifications. > > > > Perhaps not, but it would be nice to have it documented somewhere, > > either there or at QEMU wiki. > > > > QEMU wiki may be the best place for it. I never updated to QEMU wiki and > any guiding steps on how to do that? I think in general we should prefer to document things in docs/ if we think users would want to know them. If it's just a test setup then perhaps docs/devel, or if feasible actually make it a test in tests/. The wiki is largely unused except for the changelog and planning docs. (In an ideal world we'd check for parts of the wiki that still have useful-to-users up to date information, and fold them into our manuals.) thanks -- PMM
On Tue, 2 Dec 2025 00:37:53 +1000 Gavin Shan <gshan@redhat.com> wrote: > Hi Mauro, > > On 12/2/25 12:31 AM, Mauro Carvalho Chehab wrote: > > On Tue, 2 Dec 2025 00:13:06 +1000 > > Gavin Shan <gshan@redhat.com> wrote: > >> On 12/1/25 10:17 PM, Mauro Carvalho Chehab wrote: > >>> On Thu, 27 Nov 2025 10:44:30 +1000 > >>> Gavin Shan <gshan@redhat.com> wrote: > > [...] > > >>> > >>> Btw, what setup are you using to test memory errors? It would be > >>> nice to have it documented somewhere, maybe at > >>> docs/specs/acpi_hest_ghes.rst. > >>> > >> > >> I don't think docs/specs/acpi_hest_ghes.rst is the right place for that > >> as it's for specifications. > > > > Perhaps not, but it would be nice to have it documented somewhere, > > either there or at QEMU wiki. > > > > QEMU wiki may be the best place for it. I never updated to QEMU wiki and > any guiding steps on how to do that? do you have an account already? > > >> I'm sharing how this is tested here to make the thread complete. > > > > Thanks! > > > >> > >> - Both host and guest has 4KB page size > >> > >> - Start the guest by the following command lines > >> > >> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ > >> -accel kvm -machine virt,gic-version=host,nvdimm=on,ras=on \ > >> -cpu host -smp maxcpus=8,cpus=8,sockets=2,clusters=2,cores=2,threads=1 \ > >> -m 4096M,slots=16,maxmem=128G \ > >> -object memory-backend-ram,id=mem0,size=4096M \ > >> -numa node,nodeid=0,cpus=0-7,memdev=mem0 \ > >> -L /home/gavin/sandbox/qemu.main/build/pc-bios \ > >> -monitor none -serial mon:stdio -nographic \ > >> -gdb tcp::6666 -qmp tcp:localhost:5555,server,wait=off \ > >> -bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd \ > >> -boot c \ > >> -device pcie-root-port,bus=pcie.0,chassis=1,id=pcie.1 \ > >> -device pcie-root-port,bus=pcie.0,chassis=2,id=pcie.2 \ > >> -device pcie-root-port,bus=pcie.0,chassis=3,id=pcie.3 \ > >> : \ > >> -device pcie-root-port,bus=pcie.0,chassis=16,id=pcie.16 \ > >> -drive file=/home/gavin/sandbox/images/disk.qcow2,if=none,id=drive0 \ > >> -device virtio-blk-pci,id=virtblk0,bus=pcie.1,drive=drive0,num-queues=4 \ > >> -netdev tap,id=tap1,vhost=true,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ > >> -device virtio-net-pci,bus=pcie.8,netdev=tap1,mac=52:54:00:f1:26:b0 > >> > >> - Trigger 'victim -d' in the guest > > > > Hmm... from where I can get victim? > > > > https://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git > > > Regards, > > Mauro > > > > Thanks, > Gavin >
© 2016 - 2026 Red Hat, Inc.