[v2] x86/irq: fixes for CPU hot{,un}plug

[PATCH v2 4/7] x86/irq: restrict CPU movement in set_desc_affinity()

Posted by Roger Pau Monne 1 year, 8 months ago

If external interrupts are using logical mode it's possible to have an overlap
between the current ->arch.cpu_mask and the provided mask (or TARGET_CPUS).  If
that's the case avoid assigning a new vector and just move the interrupt to a
member of ->arch.cpu_mask that overlaps with the provided mask and is online.

While there also add an extra assert to ensure the mask containing the possible
destinations is not empty before calling cpu_mask_to_apicid(), as at that point
having an empty mask is not expected.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/irq.c | 34 +++++++++++++++++++++++++++-------
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 263e502bc0f6..306e7756112f 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -837,19 +837,38 @@ void cf_check irq_complete_move(struct irq_desc *desc)
 
 unsigned int set_desc_affinity(struct irq_desc *desc, const cpumask_t *mask)
 {
-    int ret;
-    unsigned long flags;
     cpumask_t dest_mask;
 
     if ( mask && !cpumask_intersects(mask, &cpu_online_map) )
         return BAD_APICID;
 
-    spin_lock_irqsave(&vector_lock, flags);
-    ret = _assign_irq_vector(desc, mask ?: TARGET_CPUS);
-    spin_unlock_irqrestore(&vector_lock, flags);
+    /*
+     * mask input set can contain CPUs that are not online.  To decide whether
+     * the interrupt needs to be migrated restrict the input mask to the CPUs
+     * that are online.
+     */
+    if ( mask )
+        cpumask_and(&dest_mask, mask, &cpu_online_map);
+    else
+        cpumask_copy(&dest_mask, TARGET_CPUS);
 
-    if ( ret < 0 )
-        return BAD_APICID;
+    /*
+     * Only move the interrupt if there are no CPUs left in ->arch.cpu_mask
+     * that can handle it, otherwise just shuffle it around ->arch.cpu_mask
+     * to an available destination.
+     */
+    if ( !cpumask_intersects(desc->arch.cpu_mask, &dest_mask) )
+    {
+        int ret;
+        unsigned long flags;
+
+        spin_lock_irqsave(&vector_lock, flags);
+        ret = _assign_irq_vector(desc, mask ?: TARGET_CPUS);
+        spin_unlock_irqrestore(&vector_lock, flags);
+
+        if ( ret < 0 )
+            return BAD_APICID;
+    }
 
     if ( mask )
     {
@@ -862,6 +881,7 @@ unsigned int set_desc_affinity(struct irq_desc *desc, const cpumask_t *mask)
         cpumask_copy(&dest_mask, desc->arch.cpu_mask);
     }
     cpumask_and(&dest_mask, &dest_mask, &cpu_online_map);
+    ASSERT(!cpumask_empty(&dest_mask));
 
     return cpu_mask_to_apicid(&dest_mask);
 }
-- 
2.44.0

Re: [PATCH v2 4/7] x86/irq: restrict CPU movement in set_desc_affinity()

Posted by Jan Beulich 1 year, 8 months ago

On 10.06.2024 16:20, Roger Pau Monne wrote:
> If external interrupts are using logical mode it's possible to have an overlap
> between the current ->arch.cpu_mask and the provided mask (or TARGET_CPUS).  If
> that's the case avoid assigning a new vector and just move the interrupt to a
> member of ->arch.cpu_mask that overlaps with the provided mask and is online.

What I'm kind of missing here is an explanation of why what _assign_irq_vector()
does to avoid unnecessary migration (very first conditional there) isn't
sufficient.

> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -837,19 +837,38 @@ void cf_check irq_complete_move(struct irq_desc *desc)
>  
>  unsigned int set_desc_affinity(struct irq_desc *desc, const cpumask_t *mask)
>  {
> -    int ret;
> -    unsigned long flags;
>      cpumask_t dest_mask;
>  
>      if ( mask && !cpumask_intersects(mask, &cpu_online_map) )
>          return BAD_APICID;
>  
> -    spin_lock_irqsave(&vector_lock, flags);
> -    ret = _assign_irq_vector(desc, mask ?: TARGET_CPUS);
> -    spin_unlock_irqrestore(&vector_lock, flags);
> +    /*
> +     * mask input set can contain CPUs that are not online.  To decide whether
> +     * the interrupt needs to be migrated restrict the input mask to the CPUs
> +     * that are online.
> +     */
> +    if ( mask )
> +        cpumask_and(&dest_mask, mask, &cpu_online_map);
> +    else
> +        cpumask_copy(&dest_mask, TARGET_CPUS);

Why once &cpu_online_map and once TARGET_CPUS?

> -    if ( ret < 0 )
> -        return BAD_APICID;
> +    /*
> +     * Only move the interrupt if there are no CPUs left in ->arch.cpu_mask
> +     * that can handle it, otherwise just shuffle it around ->arch.cpu_mask
> +     * to an available destination.
> +     */

"an available destination" (singular) gives the impression that it's only
ever going to be a single CPU. Yet cpu_mask_to_apicid_flat() and
cpu_mask_to_apicid_x2apic_cluster() can produce sets of multiple CPUs.
Therefore maybe "an available destination / the (sub)set of available
destinations"? Or as that's getting longish "(an) available destination(s)"?

> +    if ( !cpumask_intersects(desc->arch.cpu_mask, &dest_mask) )
> +    {
> +        int ret;
> +        unsigned long flags;
> +
> +        spin_lock_irqsave(&vector_lock, flags);
> +        ret = _assign_irq_vector(desc, mask ?: TARGET_CPUS);

Why not pass dest_mask here, as you now calculate that up front? The
function will skip offline CPUs anyway.

> @@ -862,6 +881,7 @@ unsigned int set_desc_affinity(struct irq_desc *desc, const cpumask_t *mask)
>          cpumask_copy(&dest_mask, desc->arch.cpu_mask);
>      }
>      cpumask_and(&dest_mask, &dest_mask, &cpu_online_map);
> +    ASSERT(!cpumask_empty(&dest_mask));
>  
>      return cpu_mask_to_apicid(&dest_mask);

I wonder whether the assertion wouldn't better live in cpu_mask_to_apicid()
itself (the macro, not the backing functions).

Jan

Re: [PATCH v2 4/7] x86/irq: restrict CPU movement in set_desc_affinity()

Posted by Roger Pau Monné 1 year, 8 months ago

On Tue, Jun 11, 2024 at 12:20:33PM +0200, Jan Beulich wrote:
> On 10.06.2024 16:20, Roger Pau Monne wrote:
> > If external interrupts are using logical mode it's possible to have an overlap
> > between the current ->arch.cpu_mask and the provided mask (or TARGET_CPUS).  If
> > that's the case avoid assigning a new vector and just move the interrupt to a
> > member of ->arch.cpu_mask that overlaps with the provided mask and is online.
> 
> What I'm kind of missing here is an explanation of why what _assign_irq_vector()
> does to avoid unnecessary migration (very first conditional there) isn't
> sufficient.

Somehow I looked at that and think it wasn't enough, but now I cannot
figure out why, so it might be just fine, and this patch is not
needed.  Let me test again and get back to you, for the time being
ignore this patch.

Thanks, Roger.