[PATCH v1 0/1] AMD VM crashing on deferred memory error injection

“William Roche posted 1 patch 1 month, 2 weeks ago
There is a newer version of this series
arch/x86/kernel/cpu/mce/amd.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
[PATCH v1 0/1] AMD VM crashing on deferred memory error injection
Posted by “William Roche 1 month, 2 weeks ago
From: William Roche <william.roche@oracle.com>

After the integration of the following commit:
	7cb735d7c0cb x86/mce: Unify AMD DFR handler with MCA Polling

AMD Qemu VM started to crash when dealing with deferred memory error
injection with a stack trace like:

mce: MSR access error: WRMSR to 0xc0002098 (tried to write 0x0000000000000000)
at rIP: 0xffffffff8229894d (mce_wrmsrq+0x1d/0x60)

  amd_clear_bank+0x6e/0x70
  machine_check_poll+0x228/0x2e0
  ? __pfx_mce_timer_fn+0x10/0x10
  mce_timer_fn+0xb1/0x130
  ? __pfx_mce_timer_fn+0x10/0x10
  call_timer_fn+0x26/0x120
  __run_timers+0x202/0x290
  run_timer_softirq+0x49/0x100
  handle_softirqs+0xeb/0x2c0
  __irq_exit_rcu+0xda/0x100
  sysvec_apic_timer_interrupt+0x71/0x90
[...]
 Kernel panic - not syncing: MCA architectural violation!

See the discussion at:
https://lore.kernel.org/all/48d8e1c8-1eb9-49cc-8de8-78077f29c203@oracle.com/

We identified a problem with SMCA specific registers access from
non-SMCA platforms like a Qemu/KVM machine.

This patch is checkpatch.pl clean.
Unit test of memory error injection works fine with it.

The commit introducing this error has been integrated into the stable
tree too, that's the reason why I added the Cc: stable... entry.

Thanks in advance for your feedback.


William Roche (1):
  x86/mce: AMD deferred error handling crashes Qemu VMs

 arch/x86/kernel/cpu/mce/amd.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

-- 
2.47.3