arch/x86/include/asm/reboot.h | 1 + arch/x86/include/asm/virtext.h | 14 +----- arch/x86/kernel/crash.c | 16 +----- arch/x86/kernel/reboot.c | 89 +++++++++++++++++++++++++--------- 4 files changed, 69 insertions(+), 51 deletions(-)
Fix a double NMI shootdown bug found and debugged by Guilherme, who did all
the hard work. NMI shootdown is a one-time thing; the handler leaves NMIs
blocked and enters halt. At best, a second (or third...) shootdown is an
expensive nop, at worst it can hang the kernel and prevent kexec'ing into
a new kernel, e.g. prior to the hardening of register_nmi_handler(), a
double shootdown resulted in a double list_add(), which is fatal when running
with CONFIG_BUG_ON_DATA_CORRUPTION=y.
With the "right" kexec/kdump configuration, emergency_vmx_disable_all() can
be reached after kdump_nmi_shootdown_cpus() (currently the only two users
of nmi_shootdown_cpus()).
To fix, move the disabling of virtualization into crash_nmi_callback(),
remove emergency_vmx_disable_all()'s callback, and do a shootdown for
emergency_vmx_disable_all() if and only if a shootdown hasn't yet occurred.
The only thing emergency_vmx_disable_all() cares about is disabling VMX/SVM
(obviously), and since I can't envision a use case for an NMI shootdown that
doesn't want to disable virtualization, doing that in the core handler means
emergency_vmx_disable_all() only needs to ensure _a_ shootdown occurs, it
doesn't care when that shootdown happened or what callback may have run.
Patch 2 is a related bug fix found while exploring ideas for patch 1.
Patch 3 is a cleanup to try to prevent future "fixed VMX but not SVM"
style bugs.
Guilherme and Vitaly, I dropped your Tested-by and Reviewed-by tags
since the relevant patches changed a decent amount.
v2:
- Use a NULL handler and crash_ipi_issued instead of a magic nop
handler. [tglx]
- Add comments to call out that modifying the existing handler
once the NMI is sent may cause explosions.
- Add a patch to cleanup cpu_emergency_vmxoff().
v1: https://lore.kernel.org/all/20220511234332.3654455-1-seanjc@google.com
Sean Christopherson (3):
x86/crash: Disable virt in core NMI crash handler to avoid double
shootdown
x86/reboot: Disable virtualization in an emergency if SVM is supported
x86/virt: Fold __cpu_emergency_vmxoff() into its sole caller
arch/x86/include/asm/reboot.h | 1 +
arch/x86/include/asm/virtext.h | 14 +-----
arch/x86/kernel/crash.c | 16 +-----
arch/x86/kernel/reboot.c | 89 +++++++++++++++++++++++++---------
4 files changed, 69 insertions(+), 51 deletions(-)
base-commit: a7fed5c0431dbfa707037848830f980e0f93cfb3
--
2.36.0.550.gb090851708-goog
On 17/05/2022 21:16, Sean Christopherson wrote: > Fix a double NMI shootdown bug found and debugged by Guilherme, who did all > the hard work. NMI shootdown is a one-time thing; the handler leaves NMIs > blocked and enters halt. At best, a second (or third...) shootdown is an > expensive nop, at worst it can hang the kernel and prevent kexec'ing into > a new kernel, e.g. prior to the hardening of register_nmi_handler(), a > double shootdown resulted in a double list_add(), which is fatal when running > with CONFIG_BUG_ON_DATA_CORRUPTION=y. > > With the "right" kexec/kdump configuration, emergency_vmx_disable_all() can > be reached after kdump_nmi_shootdown_cpus() (currently the only two users > of nmi_shootdown_cpus()). > > To fix, move the disabling of virtualization into crash_nmi_callback(), > remove emergency_vmx_disable_all()'s callback, and do a shootdown for > emergency_vmx_disable_all() if and only if a shootdown hasn't yet occurred. > The only thing emergency_vmx_disable_all() cares about is disabling VMX/SVM > (obviously), and since I can't envision a use case for an NMI shootdown that > doesn't want to disable virtualization, doing that in the core handler means > emergency_vmx_disable_all() only needs to ensure _a_ shootdown occurs, it > doesn't care when that shootdown happened or what callback may have run. > > Patch 2 is a related bug fix found while exploring ideas for patch 1. > Patch 3 is a cleanup to try to prevent future "fixed VMX but not SVM" > style bugs. > > Guilherme and Vitaly, I dropped your Tested-by and Reviewed-by tags > since the relevant patches changed a decent amount. > > v2: > - Use a NULL handler and crash_ipi_issued instead of a magic nop > handler. [tglx] > - Add comments to call out that modifying the existing handler > once the NMI is sent may cause explosions. > - Add a patch to cleanup cpu_emergency_vmxoff(). > > v1: https://lore.kernel.org/all/20220511234332.3654455-1-seanjc@google.com > > Sean Christopherson (3): > x86/crash: Disable virt in core NMI crash handler to avoid double > shootdown > x86/reboot: Disable virtualization in an emergency if SVM is supported > x86/virt: Fold __cpu_emergency_vmxoff() into its sole caller > > arch/x86/include/asm/reboot.h | 1 + > arch/x86/include/asm/virtext.h | 14 +----- > arch/x86/kernel/crash.c | 16 +----- > arch/x86/kernel/reboot.c | 89 +++++++++++++++++++++++++--------- > 4 files changed, 69 insertions(+), 51 deletions(-) > > > base-commit: a7fed5c0431dbfa707037848830f980e0f93cfb3 Hi folks, monthly ping! Any news on this fix series? Just checked, still applies cleanly. Thanks, Guilherme
On 17/05/2022 21:16, Sean Christopherson wrote: > Fix a double NMI shootdown bug found and debugged by Guilherme, who did all > the hard work. NMI shootdown is a one-time thing; the handler leaves NMIs > blocked and enters halt. At best, a second (or third...) shootdown is an > expensive nop, at worst it can hang the kernel and prevent kexec'ing into > a new kernel, e.g. prior to the hardening of register_nmi_handler(), a > double shootdown resulted in a double list_add(), which is fatal when running > with CONFIG_BUG_ON_DATA_CORRUPTION=y. > > With the "right" kexec/kdump configuration, emergency_vmx_disable_all() can > be reached after kdump_nmi_shootdown_cpus() (currently the only two users > of nmi_shootdown_cpus()). > > To fix, move the disabling of virtualization into crash_nmi_callback(), > remove emergency_vmx_disable_all()'s callback, and do a shootdown for > emergency_vmx_disable_all() if and only if a shootdown hasn't yet occurred. > The only thing emergency_vmx_disable_all() cares about is disabling VMX/SVM > (obviously), and since I can't envision a use case for an NMI shootdown that > doesn't want to disable virtualization, doing that in the core handler means > emergency_vmx_disable_all() only needs to ensure _a_ shootdown occurs, it > doesn't care when that shootdown happened or what callback may have run. > > Patch 2 is a related bug fix found while exploring ideas for patch 1. > Patch 3 is a cleanup to try to prevent future "fixed VMX but not SVM" > style bugs. > > Guilherme and Vitaly, I dropped your Tested-by and Reviewed-by tags > since the relevant patches changed a decent amount. > > v2: > - Use a NULL handler and crash_ipi_issued instead of a magic nop > handler. [tglx] > - Add comments to call out that modifying the existing handler > once the NMI is sent may cause explosions. > - Add a patch to cleanup cpu_emergency_vmxoff(). > > v1: https://lore.kernel.org/all/20220511234332.3654455-1-seanjc@google.com > > Sean Christopherson (3): > x86/crash: Disable virt in core NMI crash handler to avoid double > shootdown > x86/reboot: Disable virtualization in an emergency if SVM is supported > x86/virt: Fold __cpu_emergency_vmxoff() into its sole caller > > arch/x86/include/asm/reboot.h | 1 + > arch/x86/include/asm/virtext.h | 14 +----- > arch/x86/kernel/crash.c | 16 +----- > arch/x86/kernel/reboot.c | 89 +++++++++++++++++++++++++--------- > 4 files changed, 69 insertions(+), 51 deletions(-) > > > base-commit: a7fed5c0431dbfa707037848830f980e0f93cfb3 Hey folks, is there any news about this series? I saw a ping from Sean some time ago...so, this is a re-ping heh Thanks, Guilherme
On Wed, May 18, 2022, Sean Christopherson wrote: > Fix a double NMI shootdown bug found and debugged by Guilherme, who did all > the hard work. NMI shootdown is a one-time thing; the handler leaves NMIs > blocked and enters halt. At best, a second (or third...) shootdown is an > expensive nop, at worst it can hang the kernel and prevent kexec'ing into > a new kernel, e.g. prior to the hardening of register_nmi_handler(), a > double shootdown resulted in a double list_add(), which is fatal when running > with CONFIG_BUG_ON_DATA_CORRUPTION=y. ... > Sean Christopherson (3): > x86/crash: Disable virt in core NMI crash handler to avoid double > shootdown > x86/reboot: Disable virtualization in an emergency if SVM is supported > x86/virt: Fold __cpu_emergency_vmxoff() into its sole caller > > arch/x86/include/asm/reboot.h | 1 + > arch/x86/include/asm/virtext.h | 14 +----- > arch/x86/kernel/crash.c | 16 +----- > arch/x86/kernel/reboot.c | 89 +++++++++++++++++++++++++--------- > 4 files changed, 69 insertions(+), 51 deletions(-) Ping! Still applies cleanly.
© 2016 - 2026 Red Hat, Inc.