[PATCH v13 24/35] x86/fred: Add a NMI entry stub for FRED

Xin Li posted 35 patches 2 years ago
[PATCH v13 24/35] x86/fred: Add a NMI entry stub for FRED
Posted by Xin Li 2 years ago
From: "H. Peter Anvin (Intel)" <hpa@zytor.com>

On a FRED system, NMIs nest both with themselves and faults, transient
information is saved into the stack frame, and NMI unblocking only
happens when the stack frame indicates that so should happen.

Thus, the NMI entry stub for FRED is really quite small...

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/kernel/nmi.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 17e955ab69fe..56350d839e44 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -35,6 +35,7 @@
 #include <asm/nospec-branch.h>
 #include <asm/microcode.h>
 #include <asm/sev.h>
+#include <asm/fred.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/nmi.h>
@@ -651,6 +652,33 @@ void nmi_backtrace_stall_check(const struct cpumask *btp)
 
 #endif
 
+#ifdef CONFIG_X86_FRED
+/*
+ * With FRED, CR2/DR6 is pushed to #PF/#DB stack frame during FRED
+ * event delivery, i.e., there is no problem of transient states.
+ * And NMI unblocking only happens when the stack frame indicates
+ * that so should happen.
+ *
+ * Thus, the NMI entry stub for FRED is really straightforward and
+ * as simple as most exception handlers. As such, #DB is allowed
+ * during NMI handling.
+ */
+DEFINE_FREDENTRY_NMI(exc_nmi)
+{
+	irqentry_state_t irq_state;
+
+	if (IS_ENABLED(CONFIG_SMP) && arch_cpu_is_offline(smp_processor_id()))
+		return;
+
+	irq_state = irqentry_nmi_enter(regs);
+
+	inc_irq_stat(__nmi_count);
+	default_do_nmi(regs);
+
+	irqentry_nmi_exit(regs, irq_state);
+}
+#endif
+
 void stop_nmi(void)
 {
 	ignore_nmis++;
-- 
2.43.0
Re: [PATCH v13 24/35] x86/fred: Add a NMI entry stub for FRED
Posted by H. Peter Anvin 2 years ago
So we have recently discovered an overlooked interaction with VT-x. 
Immediately before VMENTER and after VMEXIT, CR2 is live with the 
*guest* CR2. Regardless of if the guest uses FRED or not, this is guest 
state and SHOULD NOT be corrupted. Furthermore, host state MUST NOT leak 
into the guest.

NMIs are blocked on VMEXIT if the cause was an NMI, but not for other 
reasons, so a NMI coming in during this window that then #PFs could 
corrupt the guest CR2.

Intel is exploring ways to close this hole, but for schedule reasons, it 
will not be available at the same time as the first implementation of 
FRED. Therefore, as a workaround, it turns out that the FRED NMI stub 
*will*, unfortunately, have to save and restore CR2 after all when (at 
least) Intel KVM is in use.

Note that this is airtight: it does add a performance penalty to the NMI 
path (two read CR2 in the common case of no #PF), but there is no gap 
during which a bad CR2 value could be introduced in the guest, no matter 
in which sequence the events happen.

In theory the performance penalty could be further reduced by 
conditionalizing this on the NMI happening in the critical region in the 
KVM code, but it seems to be pretty far from necessary to me.

This obviously was an unfortunate oversight on our part, but the 
workaround is simple and doesn't affect any non-NMI paths.

	-hpa

On 12/5/23 02:50, Xin Li wrote:
> +
> +	if (IS_ENABLED(CONFIG_SMP) && arch_cpu_is_offline(smp_processor_id()))
> +		return;
> +

This is cut & paste from elsewhere in the NMI code, but I believe the 
IS_ENABLED() is unnecessary (not to mention ugly): smp_processor_id() 
should always return zero on UP, and arch_cpu_is_offline() reduces to 
!(cpu == 0), so this is a statically true condition on UP.

	-hpa
RE: [PATCH v13 24/35] x86/fred: Add a NMI entry stub for FRED
Posted by Li, Xin3 2 years ago
> So we have recently discovered an overlooked interaction with VT-x.
> Immediately before VMENTER and after VMEXIT, CR2 is live with the
> *guest* CR2. Regardless of if the guest uses FRED or not, this is guest
> state and SHOULD NOT be corrupted. Furthermore, host state MUST NOT leak
> into the guest.
> 
> NMIs are blocked on VMEXIT if the cause was an NMI, but not for other
> reasons, so a NMI coming in during this window that then #PFs could
> corrupt the guest CR2.

I add a comment to vmx_vcpu_enter_exit() in
https://lore.kernel.org/kvm/20231108183003.5981-1-xin3.li@intel.com/T/#m29616c02befc04305085b1cbac64df916364626a
for this.

> 
> Intel is exploring ways to close this hole, but for schedule reasons, it
> will not be available at the same time as the first implementation of
> FRED. Therefore, as a workaround, it turns out that the FRED NMI stub
> *will*, unfortunately, have to save and restore CR2 after all when (at
> least) Intel KVM is in use.
> 
> Note that this is airtight: it does add a performance penalty to the NMI
> path (two read CR2 in the common case of no #PF), but there is no gap
> during which a bad CR2 value could be introduced in the guest, no matter
> in which sequence the events happen.
> 
> In theory the performance penalty could be further reduced by
> conditionalizing this on the NMI happening in the critical region in the
> KVM code, but it seems to be pretty far from necessary to me.

We should keep the following code in the FRED NMI handler, right?

{
...
	this_cpu_write(nmi_cr2, read_cr2());
...
	if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))
		write_cr2(this_cpu_read(nmi_cr2));
...
}

> This obviously was an unfortunate oversight on our part, but the
> workaround is simple and doesn't affect any non-NMI paths.
> 
> > +
> > +	if (IS_ENABLED(CONFIG_SMP) &&
> arch_cpu_is_offline(smp_processor_id()))
> > +		return;
> > +
> 
> This is cut & paste from elsewhere in the NMI code, but I believe the
> IS_ENABLED() is unnecessary (not to mention ugly): smp_processor_id()
> should always return zero on UP, and arch_cpu_is_offline() reduces to
> !(cpu == 0), so this is a statically true condition on UP.

Ah, good point!
[PATCH v13A 24/35] x86/fred: Add a NMI entry stub for FRED
Posted by Xin Li 2 years ago
From: "H. Peter Anvin (Intel)" <hpa@zytor.com>

On a FRED system, NMIs nest both with themselves and faults, transient
information is saved into the stack frame, and NMI unblocking only
happens when the stack frame indicates that so should happen.

Thus, the NMI entry stub for FRED is really quite small...

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---

Changes since v13:
* Save and restore %cr2 upon entering and leaving the FRED NMI handler
  (H. Peter Anvin).
* Remove an unnecessary check "IS_ENABLED(CONFIG_SMP)" (H. Peter Anvin).
* Sync a microcode change to the IDT NMI handler from 8f849ff63bcbc to
  the FRED NMI handler.
---
 arch/x86/kernel/nmi.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 17e955ab69fe..1dd8838e5583 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -35,6 +35,7 @@
 #include <asm/nospec-branch.h>
 #include <asm/microcode.h>
 #include <asm/sev.h>
+#include <asm/fred.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/nmi.h>
@@ -651,6 +652,41 @@ void nmi_backtrace_stall_check(const struct cpumask *btp)
 
 #endif
 
+#ifdef CONFIG_X86_FRED
+/*
+ * With FRED, CR2/DR6 is pushed to #PF/#DB stack frame during FRED
+ * event delivery, i.e., there is no problem of transient states.
+ * And NMI unblocking only happens when the stack frame indicates
+ * that so should happen.
+ *
+ * Thus, the NMI entry stub for FRED is really straightforward and
+ * as simple as most exception handlers. As such, #DB is allowed
+ * during NMI handling.
+ */
+DEFINE_FREDENTRY_NMI(exc_nmi)
+{
+	irqentry_state_t irq_state;
+
+	if (arch_cpu_is_offline(smp_processor_id())) {
+		if (microcode_nmi_handler_enabled())
+			microcode_offline_nmi_handler();
+		return;
+	}
+
+	this_cpu_write(nmi_cr2, read_cr2());
+
+	irq_state = irqentry_nmi_enter(regs);
+
+	inc_irq_stat(__nmi_count);
+	default_do_nmi(regs);
+
+	irqentry_nmi_exit(regs, irq_state);
+
+	if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))
+		write_cr2(this_cpu_read(nmi_cr2));
+}
+#endif
+
 void stop_nmi(void)
 {
 	ignore_nmis++;
-- 
2.43.0
[tip: x86/fred] x86/fred: Add a NMI entry stub for FRED
Posted by tip-bot2 for H. Peter Anvin (Intel) 1 year, 10 months ago
The following commit has been merged into the x86/fred branch of tip:

Commit-ID:     f8b8ee45f82b681606d288bcec89c9071b4079fc
Gitweb:        https://git.kernel.org/tip/f8b8ee45f82b681606d288bcec89c9071b4079fc
Author:        H. Peter Anvin (Intel) <hpa@zytor.com>
AuthorDate:    Fri, 15 Dec 2023 22:31:39 -08:00
Committer:     Borislav Petkov (AMD) <bp@alien8.de>
CommitterDate: Wed, 31 Jan 2024 22:02:20 +01:00

x86/fred: Add a NMI entry stub for FRED

On a FRED system, NMIs nest both with themselves and faults, transient
information is saved into the stack frame, and NMI unblocking only
happens when the stack frame indicates that so should happen.

Thus, the NMI entry stub for FRED is really quite small...

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231216063139.25567-1-xin3.li@intel.com
---
 arch/x86/kernel/nmi.c | 42 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 17e955a..3130a66 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -35,6 +35,7 @@
 #include <asm/nospec-branch.h>
 #include <asm/microcode.h>
 #include <asm/sev.h>
+#include <asm/fred.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/nmi.h>
@@ -651,6 +652,47 @@ void nmi_backtrace_stall_check(const struct cpumask *btp)
 
 #endif
 
+#ifdef CONFIG_X86_FRED
+/*
+ * With FRED, CR2/DR6 is pushed to #PF/#DB stack frame during FRED
+ * event delivery, i.e., there is no problem of transient states.
+ * And NMI unblocking only happens when the stack frame indicates
+ * that so should happen.
+ *
+ * Thus, the NMI entry stub for FRED is really straightforward and
+ * as simple as most exception handlers. As such, #DB is allowed
+ * during NMI handling.
+ */
+DEFINE_FREDENTRY_NMI(exc_nmi)
+{
+	irqentry_state_t irq_state;
+
+	if (arch_cpu_is_offline(smp_processor_id())) {
+		if (microcode_nmi_handler_enabled())
+			microcode_offline_nmi_handler();
+		return;
+	}
+
+	/*
+	 * Save CR2 for eventual restore to cover the case where the NMI
+	 * hits the VMENTER/VMEXIT region where guest CR2 is life. This
+	 * prevents guest state corruption in case that the NMI handler
+	 * takes a page fault.
+	 */
+	this_cpu_write(nmi_cr2, read_cr2());
+
+	irq_state = irqentry_nmi_enter(regs);
+
+	inc_irq_stat(__nmi_count);
+	default_do_nmi(regs);
+
+	irqentry_nmi_exit(regs, irq_state);
+
+	if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))
+		write_cr2(this_cpu_read(nmi_cr2));
+}
+#endif
+
 void stop_nmi(void)
 {
 	ignore_nmis++;
[tip: x86/fred] x86/fred: Add a NMI entry stub for FRED
Posted by tip-bot2 for H. Peter Anvin (Intel) 1 year, 10 months ago
The following commit has been merged into the x86/fred branch of tip:

Commit-ID:     3e91abaa567300fd48a0fac4c9aaedd30fa2f3f9
Gitweb:        https://git.kernel.org/tip/3e91abaa567300fd48a0fac4c9aaedd30fa2f3f9
Author:        H. Peter Anvin (Intel) <hpa@zytor.com>
AuthorDate:    Fri, 15 Dec 2023 22:31:39 -08:00
Committer:     Borislav Petkov (AMD) <bp@alien8.de>
CommitterDate: Tue, 30 Jan 2024 18:20:35 +01:00

x86/fred: Add a NMI entry stub for FRED

On a FRED system, NMIs nest both with themselves and faults, transient
information is saved into the stack frame, and NMI unblocking only
happens when the stack frame indicates that so should happen.

Thus, the NMI entry stub for FRED is really quite small...

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231216063139.25567-1-xin3.li@intel.com

---
 arch/x86/kernel/nmi.c | 42 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 17e955a..3130a66 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -35,6 +35,7 @@
 #include <asm/nospec-branch.h>
 #include <asm/microcode.h>
 #include <asm/sev.h>
+#include <asm/fred.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/nmi.h>
@@ -651,6 +652,47 @@ void nmi_backtrace_stall_check(const struct cpumask *btp)
 
 #endif
 
+#ifdef CONFIG_X86_FRED
+/*
+ * With FRED, CR2/DR6 is pushed to #PF/#DB stack frame during FRED
+ * event delivery, i.e., there is no problem of transient states.
+ * And NMI unblocking only happens when the stack frame indicates
+ * that so should happen.
+ *
+ * Thus, the NMI entry stub for FRED is really straightforward and
+ * as simple as most exception handlers. As such, #DB is allowed
+ * during NMI handling.
+ */
+DEFINE_FREDENTRY_NMI(exc_nmi)
+{
+	irqentry_state_t irq_state;
+
+	if (arch_cpu_is_offline(smp_processor_id())) {
+		if (microcode_nmi_handler_enabled())
+			microcode_offline_nmi_handler();
+		return;
+	}
+
+	/*
+	 * Save CR2 for eventual restore to cover the case where the NMI
+	 * hits the VMENTER/VMEXIT region where guest CR2 is life. This
+	 * prevents guest state corruption in case that the NMI handler
+	 * takes a page fault.
+	 */
+	this_cpu_write(nmi_cr2, read_cr2());
+
+	irq_state = irqentry_nmi_enter(regs);
+
+	inc_irq_stat(__nmi_count);
+	default_do_nmi(regs);
+
+	irqentry_nmi_exit(regs, irq_state);
+
+	if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))
+		write_cr2(this_cpu_read(nmi_cr2));
+}
+#endif
+
 void stop_nmi(void)
 {
 	ignore_nmis++;
[tip: x86/fred] x86/fred: Add a NMI entry stub for FRED
Posted by tip-bot2 for H. Peter Anvin (Intel) 1 year, 10 months ago
The following commit has been merged into the x86/fred branch of tip:

Commit-ID:     994f75798425ca0865b54dcc7bbe13bba5e4cfe4
Gitweb:        https://git.kernel.org/tip/994f75798425ca0865b54dcc7bbe13bba5e4cfe4
Author:        H. Peter Anvin (Intel) <hpa@zytor.com>
AuthorDate:    Fri, 15 Dec 2023 22:31:39 -08:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 25 Jan 2024 19:10:32 +01:00

x86/fred: Add a NMI entry stub for FRED

On a FRED system, NMIs nest both with themselves and faults, transient
information is saved into the stack frame, and NMI unblocking only
happens when the stack frame indicates that so should happen.

Thus, the NMI entry stub for FRED is really quite small...

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231216063139.25567-1-xin3.li@intel.com

---
 arch/x86/kernel/nmi.c | 42 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 17e955a..3130a66 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -35,6 +35,7 @@
 #include <asm/nospec-branch.h>
 #include <asm/microcode.h>
 #include <asm/sev.h>
+#include <asm/fred.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/nmi.h>
@@ -651,6 +652,47 @@ void nmi_backtrace_stall_check(const struct cpumask *btp)
 
 #endif
 
+#ifdef CONFIG_X86_FRED
+/*
+ * With FRED, CR2/DR6 is pushed to #PF/#DB stack frame during FRED
+ * event delivery, i.e., there is no problem of transient states.
+ * And NMI unblocking only happens when the stack frame indicates
+ * that so should happen.
+ *
+ * Thus, the NMI entry stub for FRED is really straightforward and
+ * as simple as most exception handlers. As such, #DB is allowed
+ * during NMI handling.
+ */
+DEFINE_FREDENTRY_NMI(exc_nmi)
+{
+	irqentry_state_t irq_state;
+
+	if (arch_cpu_is_offline(smp_processor_id())) {
+		if (microcode_nmi_handler_enabled())
+			microcode_offline_nmi_handler();
+		return;
+	}
+
+	/*
+	 * Save CR2 for eventual restore to cover the case where the NMI
+	 * hits the VMENTER/VMEXIT region where guest CR2 is life. This
+	 * prevents guest state corruption in case that the NMI handler
+	 * takes a page fault.
+	 */
+	this_cpu_write(nmi_cr2, read_cr2());
+
+	irq_state = irqentry_nmi_enter(regs);
+
+	inc_irq_stat(__nmi_count);
+	default_do_nmi(regs);
+
+	irqentry_nmi_exit(regs, irq_state);
+
+	if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))
+		write_cr2(this_cpu_read(nmi_cr2));
+}
+#endif
+
 void stop_nmi(void)
 {
 	ignore_nmis++;