[v2] x86/vlapic: Fixes to APIC_ESR handling

[PATCH v2 2/2] x86/vlapic: Drop vlapic->esr_lock

Posted by Andrew Cooper 6 days, 3 hours ago

The exact behaviour of LVTERR interrupt generation is implementation
specific.

 * Newer Intel CPUs generate an interrupt when pending_esr becomes
   nonzero.

 * Older Intel and all AMD CPUs generate an interrupt when any
   individual bit in pending_esr becomes nonzero.

Neither vendor documents their behaviour very well.  Xen implements
the per-bit behaviour and has done since support was added.

Importantly, the per-bit behaviour can be expressed using the atomic
operations available in the x86 architecture, whereas the
former (interrupt only on pending_esr becoming nonzero) cannot.

With vlapic->hw.pending_esr held outside of the main regs page, it's
much easier to use atomic operations.

Use xchg() in vlapic_reg_write(), and *set_bit() in vlapic_error().

The only interesting change is that vlapic_error() now needs to take a
single bit only, rather than a mask, but this fine for all current
callers and forseable changes.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

Confirmed by Intel, AMD, and 3rd party sources.

https://sandpile.org/x86/apic.htm has been updated to note this
behaviour.  None of the vendors have indicated an enthusiasm to
clarify the behaviour in their docs.

v2:
 * Rewrite the commit message from scratch.
---
 xen/arch/x86/hvm/vlapic.c             | 39 ++++++++++-----------------
 xen/arch/x86/include/asm/hvm/vlapic.h |  1 -
 2 files changed, 14 insertions(+), 26 deletions(-)

diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 98394ed26a52..82b6d12e99d4 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -102,14 +102,16 @@ static int vlapic_find_highest_irr(struct vlapic *vlapic)
     return vlapic_find_highest_vector(&vlapic->regs->data[APIC_IRR]);
 }
 
-static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
+static void vlapic_error(struct vlapic *vlapic, unsigned int err_bit)
 {
-    unsigned long flags;
-    uint32_t esr;
-
-    spin_lock_irqsave(&vlapic->esr_lock, flags);
-    esr = vlapic->hw.pending_esr;
-    if ( (esr & errmask) != errmask )
+    /*
+     * Whether LVTERR is delivered on a per-bit basis, or only on
+     * pending_esr becoming nonzero is implementation specific.
+     *
+     * Xen implements the per-bit behaviour as it can be expressed
+     * locklessly.
+     */
+    if ( !test_and_set_bit(err_bit, &vlapic->hw.pending_esr) )
     {
         uint32_t lvterr = vlapic_get_reg(vlapic, APIC_LVTERR);
         bool inj = false;
@@ -124,15 +126,12 @@ static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
             if ( (lvterr & APIC_VECTOR_MASK) >= 16 )
                  inj = true;
             else
-                 errmask |= APIC_ESR_RECVILL;
+                set_bit(ilog2(APIC_ESR_RECVILL), &vlapic->hw.pending_esr);
         }
 
-        vlapic->hw.pending_esr |= errmask;
-
         if ( inj )
             vlapic_set_irq(vlapic, lvterr & APIC_VECTOR_MASK, 0);
     }
-    spin_unlock_irqrestore(&vlapic->esr_lock, flags);
 }
 
 bool vlapic_test_irq(const struct vlapic *vlapic, uint8_t vec)
@@ -153,7 +152,7 @@ void vlapic_set_irq(struct vlapic *vlapic, uint8_t vec, uint8_t trig)
 
     if ( unlikely(vec < 16) )
     {
-        vlapic_error(vlapic, APIC_ESR_RECVILL);
+        vlapic_error(vlapic, ilog2(APIC_ESR_RECVILL));
         return;
     }
 
@@ -525,7 +524,7 @@ void vlapic_ipi(
             vlapic_domain(vlapic), vlapic, short_hand, dest, dest_mode);
 
         if ( unlikely((icr_low & APIC_VECTOR_MASK) < 16) )
-            vlapic_error(vlapic, APIC_ESR_SENDILL);
+            vlapic_error(vlapic, ilog2(APIC_ESR_SENDILL));
         else if ( target )
             vlapic_accept_irq(vlapic_vcpu(target), icr_low);
         break;
@@ -534,7 +533,7 @@ void vlapic_ipi(
     case APIC_DM_FIXED:
         if ( unlikely((icr_low & APIC_VECTOR_MASK) < 16) )
         {
-            vlapic_error(vlapic, APIC_ESR_SENDILL);
+            vlapic_error(vlapic, ilog2(APIC_ESR_SENDILL));
             break;
         }
         /* fall through */
@@ -803,17 +802,9 @@ void vlapic_reg_write(struct vcpu *v, unsigned int reg, uint32_t val)
         break;
 
     case APIC_ESR:
-    {
-        unsigned long flags;
-
-        spin_lock_irqsave(&vlapic->esr_lock, flags);
-        val = vlapic->hw.pending_esr;
-        vlapic->hw.pending_esr = 0;
-        spin_unlock_irqrestore(&vlapic->esr_lock, flags);
-
+        val = xchg(&vlapic->hw.pending_esr, 0);
         vlapic_set_reg(vlapic, APIC_ESR, val);
         break;
-    }
 
     case APIC_TASKPRI:
         vlapic_set_reg(vlapic, APIC_TASKPRI, val & 0xff);
@@ -1716,8 +1707,6 @@ int vlapic_init(struct vcpu *v)
 
     vlapic_reset(vlapic);
 
-    spin_lock_init(&vlapic->esr_lock);
-
     tasklet_init(&vlapic->init_sipi.tasklet, vlapic_init_sipi_action, v);
 
     if ( v->vcpu_id == 0 )
diff --git a/xen/arch/x86/include/asm/hvm/vlapic.h b/xen/arch/x86/include/asm/hvm/vlapic.h
index 2c4ff94ae7a8..c38855119836 100644
--- a/xen/arch/x86/include/asm/hvm/vlapic.h
+++ b/xen/arch/x86/include/asm/hvm/vlapic.h
@@ -69,7 +69,6 @@ struct vlapic {
         bool                 hw, regs;
         uint32_t             id, ldr;
     }                        loaded;
-    spinlock_t               esr_lock;
     struct periodic_time     pt;
     s_time_t                 timer_last_update;
     struct page_info         *regs_page;
-- 
2.34.1

Re: [PATCH v2 2/2] x86/vlapic: Drop vlapic->esr_lock

Posted by Jan Beulich 4 days, 8 hours ago

On 03.03.2025 19:53, Andrew Cooper wrote:
> The exact behaviour of LVTERR interrupt generation is implementation
> specific.
> 
>  * Newer Intel CPUs generate an interrupt when pending_esr becomes
>    nonzero.
> 
>  * Older Intel and all AMD CPUs generate an interrupt when any
>    individual bit in pending_esr becomes nonzero.
> 
> Neither vendor documents their behaviour very well.  Xen implements
> the per-bit behaviour and has done since support was added.
> 
> Importantly, the per-bit behaviour can be expressed using the atomic
> operations available in the x86 architecture, whereas the
> former (interrupt only on pending_esr becoming nonzero) cannot.
> 
> With vlapic->hw.pending_esr held outside of the main regs page, it's
> much easier to use atomic operations.
> 
> Use xchg() in vlapic_reg_write(), and *set_bit() in vlapic_error().
> 
> The only interesting change is that vlapic_error() now needs to take a
> single bit only, rather than a mask, but this fine for all current
> callers and forseable changes.
> 
> No practical change.

From a guest perspective that is.

> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

> @@ -124,15 +126,12 @@ static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
>              if ( (lvterr & APIC_VECTOR_MASK) >= 16 )
>                   inj = true;

I wouldn't, btw, mind if you also corrected this indentation screw-up of
mine along with you doing so ...

>              else
> -                 errmask |= APIC_ESR_RECVILL;
> +                set_bit(ilog2(APIC_ESR_RECVILL), &vlapic->hw.pending_esr);

... here.

Jan

Re: [PATCH v2 2/2] x86/vlapic: Drop vlapic->esr_lock

Posted by Andrew Cooper 2 days, 21 hours ago

On 05/03/2025 1:56 pm, Jan Beulich wrote:
> On 03.03.2025 19:53, Andrew Cooper wrote:
>> The exact behaviour of LVTERR interrupt generation is implementation
>> specific.
>>
>>  * Newer Intel CPUs generate an interrupt when pending_esr becomes
>>    nonzero.
>>
>>  * Older Intel and all AMD CPUs generate an interrupt when any
>>    individual bit in pending_esr becomes nonzero.
>>
>> Neither vendor documents their behaviour very well.  Xen implements
>> the per-bit behaviour and has done since support was added.
>>
>> Importantly, the per-bit behaviour can be expressed using the atomic
>> operations available in the x86 architecture, whereas the
>> former (interrupt only on pending_esr becoming nonzero) cannot.
>>
>> With vlapic->hw.pending_esr held outside of the main regs page, it's
>> much easier to use atomic operations.
>>
>> Use xchg() in vlapic_reg_write(), and *set_bit() in vlapic_error().
>>
>> The only interesting change is that vlapic_error() now needs to take a
>> single bit only, rather than a mask, but this fine for all current
>> callers and forseable changes.
>>
>> No practical change.
> From a guest perspective that is.
>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks.

>
>> @@ -124,15 +126,12 @@ static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
>>              if ( (lvterr & APIC_VECTOR_MASK) >= 16 )
>>                   inj = true;
> I wouldn't, btw, mind if you also corrected this indentation screw-up of
> mine along with you doing so ...
>
>>              else
>> -                 errmask |= APIC_ESR_RECVILL;
>> +                set_bit(ilog2(APIC_ESR_RECVILL), &vlapic->hw.pending_esr);
> ... here.

Both fixed.

~Andrew