[v2] Fix MCE handling on AMD hosts

[PATCH v2 0/2] Fix MCE handling on AMD hosts

Posted by John Allen 2 years, 6 months ago

In the event that a guest process attempts to access memory that has
been poisoned in response to a deferred uncorrected MCE, an AMD system
will currently generate a SIGBUS error which will result in the entire
guest being shutdown. Ideally, we only want to kill the guest process
that accessed poisoned memory in this case.

This support has been included in qemu for Intel hosts for a long time,
but there are a couple of changes needed for AMD hosts. First, we will
need to expose the SUCCOR cpuid bit to guests. Second, we need to modify
the MCE injection code to avoid Intel specific behavior when we are
running on an AMD host.

v2:
  - Add "succor" feature word.
  - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.

John Allen (2):
  i386: Add support for SUCCOR feature
  i386: Fix MCE support for AMD hosts

 target/i386/cpu.c     | 18 +++++++++++++++++-
 target/i386/cpu.h     |  4 ++++
 target/i386/helper.c  |  4 ++++
 target/i386/kvm/kvm.c | 19 +++++++++++++------
 4 files changed, 38 insertions(+), 7 deletions(-)

-- 
2.39.3

Re: [PATCH v2 0/2] Fix MCE handling on AMD hosts

Posted by William Roche 2 years, 5 months ago

Hello John,

I could test your fixes and I can confirm that the BUS_MCEERR_AR is now 
working on AMD:

Before the fix, the VM panics with:

qemu-system-x86_64: Guest MCE Memory Error at QEMU addr 0x7f89573ce000 
and GUEST addr 0x10b5ce000 of type BUS_MCEERR_AR injected
[   83.562579] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 
Bank 1: a000000000000000
[   83.562585] mce: [Hardware Error]: RIP !INEXACT! 
10:<ffffffff81e8f6ff> {pv_native_safe_halt+0xf/0x20}
[   83.562592] mce: [Hardware Error]: TSC 3d39402bdc
[   83.562593] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693515449 
SOCKET 0 APIC 0 microcode 800126e
[   83.562596] mce: [Hardware Error]: Machine check: Uncorrected error 
without MCA Recovery
[   83.562597] Kernel panic - not syncing: Fatal local machine check
[   83.563401] Kernel Offset: disabled

With the fix, the same error injection doesn't kill the VM, but 
generates the following console messages:

qemu-system-x86_64: Guest MCE Memory Error at QEMU addr 0x7fa430ab9000 
and GUEST addr 0x118cb9000 of type BUS_MCEERR_AR injected
[  250.851996] Disabling lock debugging due to kernel taint
[  250.852928] mce: Uncorrected hardware memory error in user-access at 
118cb9000
[  250.853261] Memory failure: 0x118cb9: Sending SIGBUS to 
mce_process_rea:1227 due to hardware memory corruption
[  250.854933] mce: [Hardware Error]: Machine check events logged
[  250.855800] Memory failure: 0x118cb9: recovery action for dirty LRU 
page: Recovered
[  250.856661] mce: [Hardware Error]: CPU 2: Machine Check Exception: 7 
Bank 9: bc00000000000000
[  250.860552] mce: [Hardware Error]: RIP 33:<00007f56b9ecbee5>
[  250.861405] mce: [Hardware Error]: TSC 8c2c664410 ADDR 118cb9000 MISC 8c
[  250.862679] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693508937 
SOCKET 0 APIC 2 microcode 800126e


But a problem still exists with BUS_MCEERR_AO that kills the VM with:

qemu-system-x86_64: warning: Guest MCE Memory Error at QEMU addr 
0x7f1d108e5000 and GUEST addr 0x114ae5000 of type BUS_MCEERR_AO injected
[  157.392905] mce: [Hardware Error]: CPU 0: Machine Check Exception: 7 
Bank 9: bc00000000000000
[  157.392912] mce: [Hardware Error]: RIP 10:<ffffffff81e8f6ff> 
{pv_native_safe_halt+0xf/0x20}
[  157.392919] mce: [Hardware Error]: TSC 60b92a54d0 ADDR 114ae5000 MISC 8c
[  157.392921] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693500765 
SOCKET 0 APIC 0 microcode 800126e
[  157.392924] mce: [Hardware Error]: Machine check: Uncorrected 
unrecoverable error in kernel context
[  157.392925] Kernel panic - not syncing: Fatal local machine check
[  157.402582] Kernel Offset: disabled

As AMD guests can't currently deal with BUS_MCEERR_AO MCE injection, 
according to me the fix is not complete, the 'AO' case must be handled. 
The simplest way is probably to filter it at the qemu level, to only 
inject the 'AR' case -- and it also gives the possibility to let qemu 
provide a message about an ignored 'AO' error.

I would suggest to add a 3rd patch implementing this AMD specific filter:


commit bf8cc74df3fcc7bf958a7c42b876e9c059fe4d06
Author: William Roche <william.roche@oracle.com>
Date:   Thu Aug 31 18:54:57 2023 +0000

     i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

     AMD guests can't currently deal with BUS_MCEERR_AO MCE injection
     as it panics the VM kernel. We filter this event and provide a
     warning message.

     Signed-off-by: William Roche <william.roche@oracle.com>

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 9ca7187628..bd60d5697b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -606,6 +606,10 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr 
paddr, int code)
              mcg_status |= MCG_STATUS_RIPV;
          }
      } else {
+        if (code == BUS_MCEERR_AO) {
+            /* XXX we don't support BUS_MCEERR_AO injection on AMD yet */
+            return;
+        }
          mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
      }

@@ -657,7 +661,8 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, 
void *addr)
          if (ram_addr != RAM_ADDR_INVALID &&
              kvm_physical_memory_addr_from_host(c->kvm_state, addr, 
&paddr)) {
              kvm_hwpoison_page_add(ram_addr);
-            kvm_mce_inject(cpu, paddr, code);
+            if (!IS_AMD_CPU(env) || code != BUS_MCEERR_AO)
+                kvm_mce_inject(cpu, paddr, code);

              /*
               * Use different logging severity based on error type.
@@ -670,8 +675,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, 
void *addr)
                      addr, paddr, "BUS_MCEERR_AR");
              } else {
                   warn_report("Guest MCE Memory Error at QEMU addr %p and "
-                     "GUEST addr 0x%" HWADDR_PRIx " of type %s injected",
-                     addr, paddr, "BUS_MCEERR_AO");
+                     "GUEST addr 0x%" HWADDR_PRIx " of type %s %s",
+                     addr, paddr, "BUS_MCEERR_AO",
+                     IS_AMD_CPU(env) ? "ignored on AMD guest" : 
"injected");
              }

              return;
---


I hope this can help.

William.


On 7/26/23 22:41, John Allen wrote:
> In the event that a guest process attempts to access memory that has
> been poisoned in response to a deferred uncorrected MCE, an AMD system
> will currently generate a SIGBUS error which will result in the entire
> guest being shutdown. Ideally, we only want to kill the guest process
> that accessed poisoned memory in this case.
>
> This support has been included in qemu for Intel hosts for a long time,
> but there are a couple of changes needed for AMD hosts. First, we will
> need to expose the SUCCOR cpuid bit to guests. Second, we need to modify
> the MCE injection code to avoid Intel specific behavior when we are
> running on an AMD host.
>
> v2:
>    - Add "succor" feature word.
>    - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.
>
> John Allen (2):
>    i386: Add support for SUCCOR feature
>    i386: Fix MCE support for AMD hosts
>
>   target/i386/cpu.c     | 18 +++++++++++++++++-
>   target/i386/cpu.h     |  4 ++++
>   target/i386/helper.c  |  4 ++++
>   target/i386/kvm/kvm.c | 19 +++++++++++++------
>   4 files changed, 38 insertions(+), 7 deletions(-)
>

Re: [PATCH v2 0/2] Fix MCE handling on AMD hosts

Posted by John Allen via 2 years, 5 months ago

On Thu, Aug 31, 2023 at 11:40:08PM +0200, William Roche wrote:
> Hello John,
> 
> I could test your fixes and I can confirm that the BUS_MCEERR_AR is now
> working on AMD:
> 
> Before the fix, the VM panics with:
> 
> qemu-system-x86_64: Guest MCE Memory Error at QEMU addr 0x7f89573ce000 and
> GUEST addr 0x10b5ce000 of type BUS_MCEERR_AR injected
> [   83.562579] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank
> 1: a000000000000000
> [   83.562585] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff81e8f6ff>
> {pv_native_safe_halt+0xf/0x20}
> [   83.562592] mce: [Hardware Error]: TSC 3d39402bdc
> [   83.562593] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693515449
> SOCKET 0 APIC 0 microcode 800126e
> [   83.562596] mce: [Hardware Error]: Machine check: Uncorrected error
> without MCA Recovery
> [   83.562597] Kernel panic - not syncing: Fatal local machine check
> [   83.563401] Kernel Offset: disabled
> 
> With the fix, the same error injection doesn't kill the VM, but generates
> the following console messages:
> 
> qemu-system-x86_64: Guest MCE Memory Error at QEMU addr 0x7fa430ab9000 and
> GUEST addr 0x118cb9000 of type BUS_MCEERR_AR injected
> [  250.851996] Disabling lock debugging due to kernel taint
> [  250.852928] mce: Uncorrected hardware memory error in user-access at
> 118cb9000
> [  250.853261] Memory failure: 0x118cb9: Sending SIGBUS to
> mce_process_rea:1227 due to hardware memory corruption
> [  250.854933] mce: [Hardware Error]: Machine check events logged
> [  250.855800] Memory failure: 0x118cb9: recovery action for dirty LRU page:
> Recovered
> [  250.856661] mce: [Hardware Error]: CPU 2: Machine Check Exception: 7 Bank
> 9: bc00000000000000
> [  250.860552] mce: [Hardware Error]: RIP 33:<00007f56b9ecbee5>
> [  250.861405] mce: [Hardware Error]: TSC 8c2c664410 ADDR 118cb9000 MISC 8c
> [  250.862679] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693508937
> SOCKET 0 APIC 2 microcode 800126e
> 
> 
> But a problem still exists with BUS_MCEERR_AO that kills the VM with:
> 
> qemu-system-x86_64: warning: Guest MCE Memory Error at QEMU addr
> 0x7f1d108e5000 and GUEST addr 0x114ae5000 of type BUS_MCEERR_AO injected
> [  157.392905] mce: [Hardware Error]: CPU 0: Machine Check Exception: 7 Bank
> 9: bc00000000000000
> [  157.392912] mce: [Hardware Error]: RIP 10:<ffffffff81e8f6ff>
> {pv_native_safe_halt+0xf/0x20}
> [  157.392919] mce: [Hardware Error]: TSC 60b92a54d0 ADDR 114ae5000 MISC 8c
> [  157.392921] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693500765
> SOCKET 0 APIC 0 microcode 800126e
> [  157.392924] mce: [Hardware Error]: Machine check: Uncorrected
> unrecoverable error in kernel context
> [  157.392925] Kernel panic - not syncing: Fatal local machine check
> [  157.402582] Kernel Offset: disabled
> 
> As AMD guests can't currently deal with BUS_MCEERR_AO MCE injection,
> according to me the fix is not complete, the 'AO' case must be handled. The
> simplest way is probably to filter it at the qemu level, to only inject the
> 'AR' case -- and it also gives the possibility to let qemu provide a message
> about an ignored 'AO' error.
> 
> I would suggest to add a 3rd patch implementing this AMD specific filter:
> 
> 
> commit bf8cc74df3fcc7bf958a7c42b876e9c059fe4d06
> Author: William Roche <william.roche@oracle.com>
> Date:   Thu Aug 31 18:54:57 2023 +0000
> 
>     i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest
> 
>     AMD guests can't currently deal with BUS_MCEERR_AO MCE injection
>     as it panics the VM kernel. We filter this event and provide a
>     warning message.
> 
>     Signed-off-by: William Roche <william.roche@oracle.com>
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 9ca7187628..bd60d5697b 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -606,6 +606,10 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr,
> int code)
>              mcg_status |= MCG_STATUS_RIPV;
>          }
>      } else {
> +        if (code == BUS_MCEERR_AO) {
> +            /* XXX we don't support BUS_MCEERR_AO injection on AMD yet */
> +            return;
> +        }
>          mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
>      }
> 
> @@ -657,7 +661,8 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void
> *addr)
>          if (ram_addr != RAM_ADDR_INVALID &&
>              kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr))
> {
>              kvm_hwpoison_page_add(ram_addr);
> -            kvm_mce_inject(cpu, paddr, code);
> +            if (!IS_AMD_CPU(env) || code != BUS_MCEERR_AO)
> +                kvm_mce_inject(cpu, paddr, code);
> 
>              /*
>               * Use different logging severity based on error type.
> @@ -670,8 +675,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void
> *addr)
>                      addr, paddr, "BUS_MCEERR_AR");
>              } else {
>                   warn_report("Guest MCE Memory Error at QEMU addr %p and "
> -                     "GUEST addr 0x%" HWADDR_PRIx " of type %s injected",
> -                     addr, paddr, "BUS_MCEERR_AO");
> +                     "GUEST addr 0x%" HWADDR_PRIx " of type %s %s",
> +                     addr, paddr, "BUS_MCEERR_AO",
> +                     IS_AMD_CPU(env) ? "ignored on AMD guest" :
> "injected");
>              }
> 
>              return;
> ---

Thanks, I think this will be a good solution for now while we can't
fully support AO errors. I will test this and include in the next
version of the series.

Thanks,
John

> 
> 
> I hope this can help.
> 
> William.
> 
> 
> On 7/26/23 22:41, John Allen wrote:
> > In the event that a guest process attempts to access memory that has
> > been poisoned in response to a deferred uncorrected MCE, an AMD system
> > will currently generate a SIGBUS error which will result in the entire
> > guest being shutdown. Ideally, we only want to kill the guest process
> > that accessed poisoned memory in this case.
> > 
> > This support has been included in qemu for Intel hosts for a long time,
> > but there are a couple of changes needed for AMD hosts. First, we will
> > need to expose the SUCCOR cpuid bit to guests. Second, we need to modify
> > the MCE injection code to avoid Intel specific behavior when we are
> > running on an AMD host.
> > 
> > v2:
> >    - Add "succor" feature word.
> >    - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.
> > 
> > John Allen (2):
> >    i386: Add support for SUCCOR feature
> >    i386: Fix MCE support for AMD hosts
> > 
> >   target/i386/cpu.c     | 18 +++++++++++++++++-
> >   target/i386/cpu.h     |  4 ++++
> >   target/i386/helper.c  |  4 ++++
> >   target/i386/kvm/kvm.c | 19 +++++++++++++------
> >   4 files changed, 38 insertions(+), 7 deletions(-)
> >