x86/svm: Support vNMI on capable hardware

[PATCH for-4.22 v6] x86/svm: Support vNMI on capable hardware

Posted by Andrew Cooper 2 weeks, 1 day ago

From: Abdelkareem Abdelsaamad <abdelkareem.abdelsaamad@citrix.com>

Starting with Zen4, AMD CPUs can virtualise NMIs for a guest.  On older
hardware, determining when an NMI is safe to deliver is a challenge and Xen
does not handle all corner cases correctly.

With vNMI, there is an enablement bit and two new bits of state in the VMCB; a
pending bit, and a blocked bit.  These directly map to the CPU state for
handling NMIs, and are maintained by hardware during the running of the vCPU.

When vNMI is enabled, have svm_{get,set}set_interrupt_shadow() work in terms
of the vnmi_blocking bit rather than the IRET intercept.  This allows an
emulated IRET instruction to re-enable NMIs.

When injecting a new NMI, simply set the vnmi_pending bit; hardware will
deliver the NMI to the guest at the next suitable juncture.

One complication is that, when delivering a second NMI before the first has
completed, the mix between common HVM logic and SVM specific logic will try to
open an NMI window, malfunctioning as it does so.  When vNMI is enabled, short
circuit this to not consider NMIs blocked.

Signed-off-by: Abdelkareem Abdelsaamad <abdelkareem.abdelsaamad@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <jbeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Teddy Astie <teddy.astie@vates.tech>
CC: Jason Andryuk <jason.andryuk@amd.com>
CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>

For 4.22.  This is somewhat overdue and makes a concrete improvement to NMI
handling on recent AMD hardware.

v6:
 * Plumb through svm_{get,set}set_interrupt_shadow() so that emulated IRET
   works, as requested several times during review of earlier revisions.
 * Expand the commit message

The !vNMI case is even more broken than I'd realised.  Besides the "what if
the IRET faults?" problem, svm_enable_intr_window() basically ignores the NMI
case and simply re-enters the VM.  This causes the pending NMI to only be
injected next time there is a VMExit.
---
 xen/arch/x86/hvm/svm/intr.c | 19 +++++++++++++++++++
 xen/arch/x86/hvm/svm/svm.c  | 23 +++++++++++++++++------
 xen/arch/x86/hvm/svm/vmcb.c |  2 ++
 3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/intr.c b/xen/arch/x86/hvm/svm/intr.c
index 6453a46b8508..cf0621d2f628 100644
--- a/xen/arch/x86/hvm/svm/intr.c
+++ b/xen/arch/x86/hvm/svm/intr.c
@@ -33,6 +33,12 @@ static void svm_inject_nmi(struct vcpu *v)
     u32 general1_intercepts = vmcb_get_general1_intercepts(vmcb);
     intinfo_t event;
 
+    if ( vmcb->_vintr.fields.vnmi_enable )
+    {
+        vmcb->_vintr.fields.vnmi_pending = true;
+        return;
+    }
+
     event.raw = 0;
     event.v = true;
     event.type = X86_ET_NMI;
@@ -142,6 +148,19 @@ void asmlinkage svm_intr_assist(void)
             return;
 
         intblk = hvm_interrupt_blocked(v, intack);
+
+        /*
+         * When vNMI is active, NMIs can be injected by setting vnmi_pending
+         * and hardware will deliver them at the next appropriate opportunity.
+         * Consider them not blocked, to avoid trying to open an NMI Window.
+         *
+         * Correctness here relies on the fact that all vNMI capable hardware
+         * has vGIF, and vGIF is always activated when appropriate.
+         */
+        if ( intblk == hvm_intblk_nmi_iret &&
+             vmcb->_vintr.fields.vnmi_enable )
+            intblk = hvm_intblk_none;
+
         if ( intblk == hvm_intblk_svm_gif )
         {
             ASSERT(nestedhvm_enabled(v->domain));
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index f49d2ebbfdd5..49fcdd906cf8 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -499,7 +499,9 @@ static unsigned cf_check int svm_get_interrupt_shadow(struct vcpu *v)
     if ( vmcb->int_stat.intr_shadow )
         intr_shadow |= HVM_INTR_SHADOW_MOV_SS | HVM_INTR_SHADOW_STI;
 
-    if ( vmcb_get_general1_intercepts(vmcb) & GENERAL1_INTERCEPT_IRET )
+    if ( vmcb->_vintr.fields.vnmi_enable
+         ? vmcb->_vintr.fields.vnmi_blocking
+         : (vmcb_get_general1_intercepts(vmcb) & GENERAL1_INTERCEPT_IRET) )
         intr_shadow |= HVM_INTR_SHADOW_NMI;
 
     return intr_shadow;
@@ -509,15 +511,23 @@ static void cf_check svm_set_interrupt_shadow(
     struct vcpu *v, unsigned int intr_shadow)
 {
     struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb;
-    u32 general1_intercepts = vmcb_get_general1_intercepts(vmcb);
+    bool block_nmi = intr_shadow & HVM_INTR_SHADOW_NMI;
 
     vmcb->int_stat.intr_shadow =
         !!(intr_shadow & (HVM_INTR_SHADOW_MOV_SS|HVM_INTR_SHADOW_STI));
 
-    general1_intercepts &= ~GENERAL1_INTERCEPT_IRET;
-    if ( intr_shadow & HVM_INTR_SHADOW_NMI )
-        general1_intercepts |= GENERAL1_INTERCEPT_IRET;
-    vmcb_set_general1_intercepts(vmcb, general1_intercepts);
+    if ( vmcb->_vintr.fields.vnmi_enable )
+        vmcb->_vintr.fields.vnmi_blocking = block_nmi;
+    else
+    {
+        uint32_t gen1 = vmcb_get_general1_intercepts(vmcb);
+
+        gen1 &= ~GENERAL1_INTERCEPT_IRET;
+        if ( block_nmi )
+            gen1 |= GENERAL1_INTERCEPT_IRET;
+
+        vmcb_set_general1_intercepts(vmcb, gen1);
+    }
 }
 
 static int cf_check svm_guest_x86_mode(struct vcpu *v)
@@ -2460,6 +2470,7 @@ const struct hvm_function_table * __init start_svm(void)
     P(cpu_has_tsc_ratio, "TSC Rate MSR");
     P(cpu_has_svm_sss, "NPT Supervisor Shadow Stack");
     P(cpu_has_svm_spec_ctrl, "MSR_SPEC_CTRL virtualisation");
+    P(cpu_has_svm_vnmi, "Virtual NMI");
     P(cpu_has_svm_bus_lock, "Bus Lock Filter");
 #undef P
 
diff --git a/xen/arch/x86/hvm/svm/vmcb.c b/xen/arch/x86/hvm/svm/vmcb.c
index 514e530cbda7..975a1eaef806 100644
--- a/xen/arch/x86/hvm/svm/vmcb.c
+++ b/xen/arch/x86/hvm/svm/vmcb.c
@@ -185,6 +185,8 @@ static int construct_vmcb(struct vcpu *v)
     if ( default_xen_spec_ctrl == SPEC_CTRL_STIBP )
         v->arch.msrs->spec_ctrl.raw = SPEC_CTRL_STIBP;
 
+    vmcb->_vintr.fields.vnmi_enable = cpu_has_svm_vnmi;
+
     return 0;
 }
 
-- 
2.39.5

Re: [PATCH for-4.22 v6] x86/svm: Support vNMI on capable hardware

Posted by Jan Beulich 1 week, 1 day ago

On 14.05.2026 19:56, Andrew Cooper wrote:
> From: Abdelkareem Abdelsaamad <abdelkareem.abdelsaamad@citrix.com>
> 
> Starting with Zen4, AMD CPUs can virtualise NMIs for a guest.  On older
> hardware, determining when an NMI is safe to deliver is a challenge and Xen
> does not handle all corner cases correctly.
> 
> With vNMI, there is an enablement bit and two new bits of state in the VMCB; a
> pending bit, and a blocked bit.  These directly map to the CPU state for
> handling NMIs, and are maintained by hardware during the running of the vCPU.
> 
> When vNMI is enabled, have svm_{get,set}set_interrupt_shadow() work in terms
> of the vnmi_blocking bit rather than the IRET intercept.  This allows an
> emulated IRET instruction to re-enable NMIs.
> 
> When injecting a new NMI, simply set the vnmi_pending bit; hardware will
> deliver the NMI to the guest at the next suitable juncture.
> 
> One complication is that, when delivering a second NMI before the first has
> completed, the mix between common HVM logic and SVM specific logic will try to
> open an NMI window, malfunctioning as it does so.  When vNMI is enabled, short
> circuit this to not consider NMIs blocked.
> 
> Signed-off-by: Abdelkareem Abdelsaamad <abdelkareem.abdelsaamad@citrix.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Teddy Astie <teddy.astie@vates.tech>
> CC: Jason Andryuk <jason.andryuk@amd.com>
> CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
> 
> For 4.22.  This is somewhat overdue and makes a concrete improvement to NMI
> handling on recent AMD hardware.

In particular with this remark in mind - should I perhaps pull this over onto
4.21 as well? Or are there dependencies I'm overlooking?

Jan

Re: [PATCH for-4.22 v6] x86/svm: Support vNMI on capable hardware

Posted by Andrew Cooper 1 week, 1 day ago

On 21/05/2026 4:51 pm, Jan Beulich wrote:
> On 14.05.2026 19:56, Andrew Cooper wrote:
>> From: Abdelkareem Abdelsaamad <abdelkareem.abdelsaamad@citrix.com>
>>
>> Starting with Zen4, AMD CPUs can virtualise NMIs for a guest.  On older
>> hardware, determining when an NMI is safe to deliver is a challenge and Xen
>> does not handle all corner cases correctly.
>>
>> With vNMI, there is an enablement bit and two new bits of state in the VMCB; a
>> pending bit, and a blocked bit.  These directly map to the CPU state for
>> handling NMIs, and are maintained by hardware during the running of the vCPU.
>>
>> When vNMI is enabled, have svm_{get,set}set_interrupt_shadow() work in terms
>> of the vnmi_blocking bit rather than the IRET intercept.  This allows an
>> emulated IRET instruction to re-enable NMIs.
>>
>> When injecting a new NMI, simply set the vnmi_pending bit; hardware will
>> deliver the NMI to the guest at the next suitable juncture.
>>
>> One complication is that, when delivering a second NMI before the first has
>> completed, the mix between common HVM logic and SVM specific logic will try to
>> open an NMI window, malfunctioning as it does so.  When vNMI is enabled, short
>> circuit this to not consider NMIs blocked.
>>
>> Signed-off-by: Abdelkareem Abdelsaamad <abdelkareem.abdelsaamad@citrix.com>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <jbeulich@suse.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> CC: Teddy Astie <teddy.astie@vates.tech>
>> CC: Jason Andryuk <jason.andryuk@amd.com>
>> CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>>
>> For 4.22.  This is somewhat overdue and makes a concrete improvement to NMI
>> handling on recent AMD hardware.
> In particular with this remark in mind - should I perhaps pull this over onto
> 4.21 as well? Or are there dependencies I'm overlooking?

There's this patch, and a prior patch from a while ago adding the vNMI
CPUID enumeration and fields.  There are no other dependencies that I'm
aware of.

I was intending to backport this to 4.21 in XenServer in due course.

vNMI will be a hard dependency for supporting FRED in guests on AMD, but
I doubt we'll be wanting to backport this as a feature.

~Andrew

Re: [PATCH for-4.22 v6] x86/svm: Support vNMI on capable hardware

Posted by Oleksii Kurochko 1 week, 1 day ago


On 5/14/26 7:56 PM, Andrew Cooper wrote:
> From: Abdelkareem Abdelsaamad <abdelkareem.abdelsaamad@citrix.com>
> 
> Starting with Zen4, AMD CPUs can virtualise NMIs for a guest.  On older
> hardware, determining when an NMI is safe to deliver is a challenge and Xen
> does not handle all corner cases correctly.
> 
> With vNMI, there is an enablement bit and two new bits of state in the VMCB; a
> pending bit, and a blocked bit.  These directly map to the CPU state for
> handling NMIs, and are maintained by hardware during the running of the vCPU.
> 
> When vNMI is enabled, have svm_{get,set}set_interrupt_shadow() work in terms
> of the vnmi_blocking bit rather than the IRET intercept.  This allows an
> emulated IRET instruction to re-enable NMIs.
> 
> When injecting a new NMI, simply set the vnmi_pending bit; hardware will
> deliver the NMI to the guest at the next suitable juncture.
> 
> One complication is that, when delivering a second NMI before the first has
> completed, the mix between common HVM logic and SVM specific logic will try to
> open an NMI window, malfunctioning as it does so.  When vNMI is enabled, short
> circuit this to not consider NMIs blocked.
> 
> Signed-off-by: Abdelkareem Abdelsaamad <abdelkareem.abdelsaamad@citrix.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Teddy Astie <teddy.astie@vates.tech>
> CC: Jason Andryuk <jason.andryuk@amd.com>
> CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
> 
> For 4.22.  This is somewhat overdue and makes a concrete improvement to NMI
> handling on recent AMD hardware.
> 
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

Thanks.

~ Oleksii

Re: [PATCH for-4.22 v6] x86/svm: Support vNMI on capable hardware

Posted by Jan Beulich 1 week, 5 days ago

On 14.05.2026 19:56, Andrew Cooper wrote:
> --- a/xen/arch/x86/hvm/svm/intr.c
> +++ b/xen/arch/x86/hvm/svm/intr.c
> @@ -33,6 +33,12 @@ static void svm_inject_nmi(struct vcpu *v)
>      u32 general1_intercepts = vmcb_get_general1_intercepts(vmcb);
>      intinfo_t event;
>  
> +    if ( vmcb->_vintr.fields.vnmi_enable )
> +    {
> +        vmcb->_vintr.fields.vnmi_pending = true;
> +        return;
> +    }

How does all of this work during migration to a vNMI-incapable host? The
hw feature is used ...

> --- a/xen/arch/x86/hvm/svm/vmcb.c
> +++ b/xen/arch/x86/hvm/svm/vmcb.c
> @@ -185,6 +185,8 @@ static int construct_vmcb(struct vcpu *v)
>      if ( default_xen_spec_ctrl == SPEC_CTRL_STIBP )
>          v->arch.msrs->spec_ctrl.raw = SPEC_CTRL_STIBP;
>  
> +    vmcb->_vintr.fields.vnmi_enable = cpu_has_svm_vnmi;

... unconditionally when available (i.e. the feature not being there
won't prevent the migration), yet the vnmi_{pend,block}ing fields are
lost during migration (aiui). Are building on the fact that all of this
state is already getting lost while migrating?

Jan

Re: [PATCH for-4.22 v6] x86/svm: Support vNMI on capable hardware

Posted by Andrew Cooper 1 week, 4 days ago

On 18/05/2026 8:53 am, Jan Beulich wrote:
> On 14.05.2026 19:56, Andrew Cooper wrote:
>> --- a/xen/arch/x86/hvm/svm/intr.c
>> +++ b/xen/arch/x86/hvm/svm/intr.c
>> @@ -33,6 +33,12 @@ static void svm_inject_nmi(struct vcpu *v)
>>      u32 general1_intercepts = vmcb_get_general1_intercepts(vmcb);
>>      intinfo_t event;
>>  
>> +    if ( vmcb->_vintr.fields.vnmi_enable )
>> +    {
>> +        vmcb->_vintr.fields.vnmi_pending = true;
>> +        return;
>> +    }
> How does all of this work during migration to a vNMI-incapable host? The
> hw feature is used ...
>
>> --- a/xen/arch/x86/hvm/svm/vmcb.c
>> +++ b/xen/arch/x86/hvm/svm/vmcb.c
>> @@ -185,6 +185,8 @@ static int construct_vmcb(struct vcpu *v)
>>      if ( default_xen_spec_ctrl == SPEC_CTRL_STIBP )
>>          v->arch.msrs->spec_ctrl.raw = SPEC_CTRL_STIBP;
>>  
>> +    vmcb->_vintr.fields.vnmi_enable = cpu_has_svm_vnmi;
> ... unconditionally when available (i.e. the feature not being there
> won't prevent the migration), yet the vnmi_{pend,block}ing fields are
> lost during migration (aiui). Are building on the fact that all of this
> state is already getting lost while migrating?

I can't quite parse the final sentence, but yes; migration has always
lost the NMI state.

This goes largely unnoticed because guests don't issue their final
suspend from NMI context.

~Andrew

Re: [PATCH for-4.22 v6] x86/svm: Support vNMI on capable hardware

Posted by Jan Beulich 1 week, 2 days ago

On 18.05.2026 13:12, Andrew Cooper wrote:
> On 18/05/2026 8:53 am, Jan Beulich wrote:
>> On 14.05.2026 19:56, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/hvm/svm/intr.c
>>> +++ b/xen/arch/x86/hvm/svm/intr.c
>>> @@ -33,6 +33,12 @@ static void svm_inject_nmi(struct vcpu *v)
>>>      u32 general1_intercepts = vmcb_get_general1_intercepts(vmcb);
>>>      intinfo_t event;
>>>  
>>> +    if ( vmcb->_vintr.fields.vnmi_enable )
>>> +    {
>>> +        vmcb->_vintr.fields.vnmi_pending = true;
>>> +        return;
>>> +    }
>> How does all of this work during migration to a vNMI-incapable host? The
>> hw feature is used ...
>>
>>> --- a/xen/arch/x86/hvm/svm/vmcb.c
>>> +++ b/xen/arch/x86/hvm/svm/vmcb.c
>>> @@ -185,6 +185,8 @@ static int construct_vmcb(struct vcpu *v)
>>>      if ( default_xen_spec_ctrl == SPEC_CTRL_STIBP )
>>>          v->arch.msrs->spec_ctrl.raw = SPEC_CTRL_STIBP;
>>>  
>>> +    vmcb->_vintr.fields.vnmi_enable = cpu_has_svm_vnmi;
>> ... unconditionally when available (i.e. the feature not being there
>> won't prevent the migration), yet the vnmi_{pend,block}ing fields are
>> lost during migration (aiui). Are building on the fact that all of this
>> state is already getting lost while migrating?
> 
> I can't quite parse the final sentence, but yes; migration has always
> lost the NMI state.

In which case:
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan

> This goes largely unnoticed because guests don't issue their final
> suspend from NMI context.
> 
> ~Andrew

Re: [PATCH for-4.22 v6] x86/svm: Support vNMI on capable hardware

Posted by Teddy Astie 2 weeks, 1 day ago

Le 14/05/2026 à 19:59, Andrew Cooper a écrit :
> From: Abdelkareem Abdelsaamad <abdelkareem.abdelsaamad@citrix.com>
> 
> Starting with Zen4, AMD CPUs can virtualise NMIs for a guest.  On older
> hardware, determining when an NMI is safe to deliver is a challenge and Xen
> does not handle all corner cases correctly.
> 
> With vNMI, there is an enablement bit and two new bits of state in the VMCB; a
> pending bit, and a blocked bit.  These directly map to the CPU state for
> handling NMIs, and are maintained by hardware during the running of the vCPU.
> 
> When vNMI is enabled, have svm_{get,set}set_interrupt_shadow() work in terms
> of the vnmi_blocking bit rather than the IRET intercept.  This allows an
> emulated IRET instruction to re-enable NMIs.
> 
> When injecting a new NMI, simply set the vnmi_pending bit; hardware will
> deliver the NMI to the guest at the next suitable juncture.
> 
> One complication is that, when delivering a second NMI before the first has
> completed, the mix between common HVM logic and SVM specific logic will try to
> open an NMI window, malfunctioning as it does so.  When vNMI is enabled, short
> circuit this to not consider NMIs blocked.
> 
> Signed-off-by: Abdelkareem Abdelsaamad <abdelkareem.abdelsaamad@citrix.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Teddy Astie <teddy.astie@vates.tech>
> CC: Jason Andryuk <jason.andryuk@amd.com>
> CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
> 
> For 4.22.  This is somewhat overdue and makes a concrete improvement to NMI
> handling on recent AMD hardware.
> 
> v6:
>   * Plumb through svm_{get,set}set_interrupt_shadow() so that emulated IRET
>     works, as requested several times during review of earlier revisions.
>   * Expand the commit message
> 
> The !vNMI case is even more broken than I'd realised.  Besides the "what if
> the IRET faults?" problem, svm_enable_intr_window() basically ignores the NMI
> case and simply re-enters the VM.  This causes the pending NMI to only be
> injected next time there is a VMExit.

Does that happens often in practice ?

> ---
>   xen/arch/x86/hvm/svm/intr.c | 19 +++++++++++++++++++
>   xen/arch/x86/hvm/svm/svm.c  | 23 +++++++++++++++++------
>   xen/arch/x86/hvm/svm/vmcb.c |  2 ++
>   3 files changed, 38 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/svm/intr.c b/xen/arch/x86/hvm/svm/intr.c
> index 6453a46b8508..cf0621d2f628 100644
> --- a/xen/arch/x86/hvm/svm/intr.c
> +++ b/xen/arch/x86/hvm/svm/intr.c
> @@ -33,6 +33,12 @@ static void svm_inject_nmi(struct vcpu *v)
>       u32 general1_intercepts = vmcb_get_general1_intercepts(vmcb);
>       intinfo_t event;
>   
> +    if ( vmcb->_vintr.fields.vnmi_enable )
> +    {
> +        vmcb->_vintr.fields.vnmi_pending = true;
> +        return;
> +    }
> +
>       event.raw = 0;
>       event.v = true;
>       event.type = X86_ET_NMI;
> @@ -142,6 +148,19 @@ void asmlinkage svm_intr_assist(void)
>               return;
>   
>           intblk = hvm_interrupt_blocked(v, intack);
> +
> +        /*
> +         * When vNMI is active, NMIs can be injected by setting vnmi_pending
> +         * and hardware will deliver them at the next appropriate opportunity.
> +         * Consider them not blocked, to avoid trying to open an NMI Window.
> +         *
> +         * Correctness here relies on the fact that all vNMI capable hardware
> +         * has vGIF, and vGIF is always activated when appropriate.
> +         */
> +        if ( intblk == hvm_intblk_nmi_iret &&
> +             vmcb->_vintr.fields.vnmi_enable )
> +            intblk = hvm_intblk_none;
> +
>           if ( intblk == hvm_intblk_svm_gif )
>           {
>               ASSERT(nestedhvm_enabled(v->domain));
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index f49d2ebbfdd5..49fcdd906cf8 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -499,7 +499,9 @@ static unsigned cf_check int svm_get_interrupt_shadow(struct vcpu *v)
>       if ( vmcb->int_stat.intr_shadow )
>           intr_shadow |= HVM_INTR_SHADOW_MOV_SS | HVM_INTR_SHADOW_STI;
>   
> -    if ( vmcb_get_general1_intercepts(vmcb) & GENERAL1_INTERCEPT_IRET )
> +    if ( vmcb->_vintr.fields.vnmi_enable
> +         ? vmcb->_vintr.fields.vnmi_blocking
> +         : (vmcb_get_general1_intercepts(vmcb) & GENERAL1_INTERCEPT_IRET) )
>           intr_shadow |= HVM_INTR_SHADOW_NMI;
>   
>       return intr_shadow;
> @@ -509,15 +511,23 @@ static void cf_check svm_set_interrupt_shadow(
>       struct vcpu *v, unsigned int intr_shadow)
>   {
>       struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb;
> -    u32 general1_intercepts = vmcb_get_general1_intercepts(vmcb);
> +    bool block_nmi = intr_shadow & HVM_INTR_SHADOW_NMI;
>   
>       vmcb->int_stat.intr_shadow =
>           !!(intr_shadow & (HVM_INTR_SHADOW_MOV_SS|HVM_INTR_SHADOW_STI));
>   
> -    general1_intercepts &= ~GENERAL1_INTERCEPT_IRET;
> -    if ( intr_shadow & HVM_INTR_SHADOW_NMI )
> -        general1_intercepts |= GENERAL1_INTERCEPT_IRET;
> -    vmcb_set_general1_intercepts(vmcb, general1_intercepts);
> +    if ( vmcb->_vintr.fields.vnmi_enable )
> +        vmcb->_vintr.fields.vnmi_blocking = block_nmi;
> +    else
> +    {
> +        uint32_t gen1 = vmcb_get_general1_intercepts(vmcb);
> +
> +        gen1 &= ~GENERAL1_INTERCEPT_IRET;
> +        if ( block_nmi )
> +            gen1 |= GENERAL1_INTERCEPT_IRET;
> +
> +        vmcb_set_general1_intercepts(vmcb, gen1);
> +    }
>   }
>   
>   static int cf_check svm_guest_x86_mode(struct vcpu *v)
> @@ -2460,6 +2470,7 @@ const struct hvm_function_table * __init start_svm(void)
>       P(cpu_has_tsc_ratio, "TSC Rate MSR");
>       P(cpu_has_svm_sss, "NPT Supervisor Shadow Stack");
>       P(cpu_has_svm_spec_ctrl, "MSR_SPEC_CTRL virtualisation");
> +    P(cpu_has_svm_vnmi, "Virtual NMI");
>       P(cpu_has_svm_bus_lock, "Bus Lock Filter");
>   #undef P
>   
> diff --git a/xen/arch/x86/hvm/svm/vmcb.c b/xen/arch/x86/hvm/svm/vmcb.c
> index 514e530cbda7..975a1eaef806 100644
> --- a/xen/arch/x86/hvm/svm/vmcb.c
> +++ b/xen/arch/x86/hvm/svm/vmcb.c
> @@ -185,6 +185,8 @@ static int construct_vmcb(struct vcpu *v)
>       if ( default_xen_spec_ctrl == SPEC_CTRL_STIBP )
>           v->arch.msrs->spec_ctrl.raw = SPEC_CTRL_STIBP;
>   
> +    vmcb->_vintr.fields.vnmi_enable = cpu_has_svm_vnmi;
> +
>       return 0;
>   }
>   

Reviewed-by: Teddy Astie <teddy.astie@vates.tech>

Teddy

Re: [PATCH for-4.22 v6] x86/svm: Support vNMI on capable hardware

Posted by Andrew Cooper 2 weeks, 1 day ago

On 15/05/2026 10:20 am, Teddy Astie wrote:
> Le 14/05/2026 à 19:59, Andrew Cooper a écrit :
>> From: Abdelkareem Abdelsaamad <abdelkareem.abdelsaamad@citrix.com>
>>
>> Starting with Zen4, AMD CPUs can virtualise NMIs for a guest.  On older
>> hardware, determining when an NMI is safe to deliver is a challenge
>> and Xen
>> does not handle all corner cases correctly.
>>
>> With vNMI, there is an enablement bit and two new bits of state in
>> the VMCB; a
>> pending bit, and a blocked bit.  These directly map to the CPU state for
>> handling NMIs, and are maintained by hardware during the running of
>> the vCPU.
>>
>> When vNMI is enabled, have svm_{get,set}set_interrupt_shadow() work
>> in terms
>> of the vnmi_blocking bit rather than the IRET intercept.  This allows an
>> emulated IRET instruction to re-enable NMIs.
>>
>> When injecting a new NMI, simply set the vnmi_pending bit; hardware will
>> deliver the NMI to the guest at the next suitable juncture.
>>
>> One complication is that, when delivering a second NMI before the
>> first has
>> completed, the mix between common HVM logic and SVM specific logic
>> will try to
>> open an NMI window, malfunctioning as it does so.  When vNMI is
>> enabled, short
>> circuit this to not consider NMIs blocked.
>>
>> Signed-off-by: Abdelkareem Abdelsaamad
>> <abdelkareem.abdelsaamad@citrix.com>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <jbeulich@suse.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> CC: Teddy Astie <teddy.astie@vates.tech>
>> CC: Jason Andryuk <jason.andryuk@amd.com>
>> CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>>
>> For 4.22.  This is somewhat overdue and makes a concrete improvement
>> to NMI
>> handling on recent AMD hardware.
>>
>> v6:
>>   * Plumb through svm_{get,set}set_interrupt_shadow() so that
>> emulated IRET
>>     works, as requested several times during review of earlier
>> revisions.
>>   * Expand the commit message
>>
>> The !vNMI case is even more broken than I'd realised.  Besides the
>> "what if
>> the IRET faults?" problem, svm_enable_intr_window() basically ignores
>> the NMI
>> case and simply re-enters the VM.  This causes the pending NMI to
>> only be
>> injected next time there is a VMExit.
>
> Does that happens often in practice ?

Which?

VMs don't tend to make as much use of NMIs as native does.

IRET faulting is mostly relegated to misbehaving userspace.  E.g. one
thread uses SYSCALL_modify_ldt to invalidate the %ss/%cs that another
thread was running on, at which point the next reload of that segment
(generally the next IRET) will fault.  But to attack this, userspace
needs to hit the IRET of the NMI handler with this race, and avoid the
IRET of all other interrupts and exceptions.  Perf counters is the
typically the only way userspace has to influence this, and we don't
have PMU available to guests by default.

Then we get into differences between Intel and AMD.  Intel unblocks NMIs
even if the IRET faults.  AMD unblocks NMIs only in the IRET completes.

With Xen's current SVM code and without vNMI, IRETs get Intel-like
behaviour WRT faulting, and already pending NMIs get delayed.  Software
needs to cope with the former, and the latter (while far from ideal) is
generally indistinguishable from the NMI being slightly later than it
was in practice.

> Reviewed-by: Teddy Astie <teddy.astie@vates.tech>

Thanks.

~Andrew