[Xen-devel] [PATCH 0/9] x86: IRQ management adjustments

Jan Beulich posted 9 patches 4 years, 11 months ago
Only 0 patches received!
There is a newer version of this series
[Xen-devel] [PATCH 0/9] x86: IRQ management adjustments
Posted by Jan Beulich 4 years, 11 months ago
First and foremost this series is trying to deal with CPU offlining
issues, which have become more prominent with the recently
added SMT enable/disable operation in xen-hptool. Later patches
in the series then carry out more or less unrelated changes
(hopefully improvements) noticed while looking at various pieces
of involved code.

The first patch introduces an ASSERT() which I've observed to
trigger every once in a while. I'm still trying to find the cause of
this, hence the RFC for that one patch.

1: x86/IRQ: deal with move-in-progress state in fixup_irqs()
2: x86/IRQ: deal with move cleanup count state in fixup_irqs()
3: x86/IRQ: improve dump_irqs()
4: x86/IRQ: desc->affinity should strictly represent the requested value
5: x86/IRQ: fix locking around vector management
6: x86/IRQ: reduce unused space in struct arch_irq_desc
7: x86/IRQ: drop redundant cpumask_empty() from move_masked_irq()
8: x86/IRQ: make fixup_irqs() skip unconnected internally used interrupts
9: x86/IO-APIC: drop an unused variable from setup_IO_APIC_irqs()

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH RFC 1/9] x86/IRQ: deal with move-in-progress state in fixup_irqs()
Posted by Jan Beulich 4 years, 11 months ago
The flag being set may prevent affinity changes, as these often imply
assignment of a new vector. When there's no possible destination left
for the IRQ, the clearing of the flag needs to happen right from
fixup_irqs().

Additionally _assign_irq_vector() needs to avoid setting the flag when
there's no online CPU left in what gets put into ->arch.old_cpu_mask.
The old vector can be released right away in this case.

Also extend the log message about broken affinity to include the new
affinity as well, allowing to notice issues with affinity changes not
actually having taken place. Swap the if/else-if order there at the
same time to reduce the amount of conditions checked.

At the same time replace two open coded instances of the new helper
function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
RFC: I've seen the new ASSERT() in irq_move_cleanup_interrupt() trigger.
     I'm pretty sure that this assertion triggering means something else
     is wrong, and has been even prior to this change (adding the
     assertion without any of the other changes here should be valid in
     my understanding).

--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -242,6 +242,20 @@ void destroy_irq(unsigned int irq)
     xfree(action);
 }
 
+static void release_old_vec(struct irq_desc *desc)
+{
+    unsigned int vector = desc->arch.old_vector;
+
+    desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
+    cpumask_clear(desc->arch.old_cpu_mask);
+
+    if ( desc->arch.used_vectors )
+    {
+        ASSERT(test_bit(vector, desc->arch.used_vectors));
+        clear_bit(vector, desc->arch.used_vectors);
+    }
+}
+
 static void __clear_irq_vector(int irq)
 {
     int cpu, vector, old_vector;
@@ -285,14 +299,7 @@ static void __clear_irq_vector(int irq)
         per_cpu(vector_irq, cpu)[old_vector] = ~irq;
     }
 
-    desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
-    cpumask_clear(desc->arch.old_cpu_mask);
-
-    if ( desc->arch.used_vectors )
-    {
-        ASSERT(test_bit(old_vector, desc->arch.used_vectors));
-        clear_bit(old_vector, desc->arch.used_vectors);
-    }
+    release_old_vec(desc);
 
     desc->arch.move_in_progress = 0;
 }
@@ -517,12 +524,21 @@ next:
         /* Found one! */
         current_vector = vector;
         current_offset = offset;
-        if (old_vector > 0) {
-            desc->arch.move_in_progress = 1;
-            cpumask_copy(desc->arch.old_cpu_mask, desc->arch.cpu_mask);
+
+        if ( old_vector > 0 )
+        {
+            cpumask_and(desc->arch.old_cpu_mask, desc->arch.cpu_mask,
+                        &cpu_online_map);
             desc->arch.old_vector = desc->arch.vector;
+            if ( !cpumask_empty(desc->arch.old_cpu_mask) )
+                desc->arch.move_in_progress = 1;
+            else
+                /* This can happen while offlining a CPU. */
+                release_old_vec(desc);
         }
+
         trace_irq_mask(TRC_HW_IRQ_ASSIGN_VECTOR, irq, vector, &tmp_mask);
+
         for_each_cpu(new_cpu, &tmp_mask)
             per_cpu(vector_irq, new_cpu)[vector] = irq;
         desc->arch.vector = vector;
@@ -691,14 +707,8 @@ void irq_move_cleanup_interrupt(struct c
 
         if ( desc->arch.move_cleanup_count == 0 )
         {
-            desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
-            cpumask_clear(desc->arch.old_cpu_mask);
-
-            if ( desc->arch.used_vectors )
-            {
-                ASSERT(test_bit(vector, desc->arch.used_vectors));
-                clear_bit(vector, desc->arch.used_vectors);
-            }
+            ASSERT(vector == desc->arch.old_vector);
+            release_old_vec(desc);
         }
 unlock:
         spin_unlock(&desc->lock);
@@ -2391,6 +2401,24 @@ void fixup_irqs(const cpumask_t *mask, b
             continue;
         }
 
+        /*
+         * In order for the affinity adjustment below to be successful, we
+         * need __assign_irq_vector() to succeed. This in particular means
+         * clearing desc->arch.move_in_progress if this would otherwise
+         * prevent the function from succeeding. Since there's no way for the
+         * flag to get cleared anymore when there's no possible destination
+         * left (the only possibility then would be the IRQs enabled window
+         * after this loop), there's then also no race with us doing it here.
+         *
+         * Therefore the logic here and there need to remain in sync.
+         */
+        if ( desc->arch.move_in_progress &&
+             !cpumask_intersects(mask, desc->arch.cpu_mask) )
+        {
+            release_old_vec(desc);
+            desc->arch.move_in_progress = 0;
+        }
+
         cpumask_and(&affinity, &affinity, mask);
         if ( cpumask_empty(&affinity) )
         {
@@ -2409,15 +2437,18 @@ void fixup_irqs(const cpumask_t *mask, b
         if ( desc->handler->enable )
             desc->handler->enable(desc);
 
+        cpumask_copy(&affinity, desc->affinity);
+
         spin_unlock(&desc->lock);
 
         if ( !verbose )
             continue;
 
-        if ( break_affinity && set_affinity )
-            printk("Broke affinity for irq %i\n", irq);
-        else if ( !set_affinity )
-            printk("Cannot set affinity for irq %i\n", irq);
+        if ( !set_affinity )
+            printk("Cannot set affinity for IRQ%u\n", irq);
+        else if ( break_affinity )
+            printk("Broke affinity for IRQ%u, new: %*pb\n",
+                   irq, nr_cpu_ids, &affinity);
     }
 
     /* That doesn't seem sufficient.  Give it 1ms. */




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH RFC 1/9] x86/IRQ: deal with move-in-progress state in fixup_irqs()
Posted by Jan Beulich 4 years, 11 months ago
>>> On 29.04.19 at 13:22, <JBeulich@suse.com> wrote:
> RFC: I've seen the new ASSERT() in irq_move_cleanup_interrupt() trigger.
>      I'm pretty sure that this assertion triggering means something else
>      is wrong, and has been even prior to this change (adding the
>      assertion without any of the other changes here should be valid in
>      my understanding).

So I think what is missing is updating of vector_irq ...

> @@ -2391,6 +2401,24 @@ void fixup_irqs(const cpumask_t *mask, b
>              continue;
>          }
>  
> +        /*
> +         * In order for the affinity adjustment below to be successful, we
> +         * need __assign_irq_vector() to succeed. This in particular means
> +         * clearing desc->arch.move_in_progress if this would otherwise
> +         * prevent the function from succeeding. Since there's no way for the
> +         * flag to get cleared anymore when there's no possible destination
> +         * left (the only possibility then would be the IRQs enabled window
> +         * after this loop), there's then also no race with us doing it here.
> +         *
> +         * Therefore the logic here and there need to remain in sync.
> +         */
> +        if ( desc->arch.move_in_progress &&
> +             !cpumask_intersects(mask, desc->arch.cpu_mask) )
> +        {
> +            release_old_vec(desc);
> +            desc->arch.move_in_progress = 0;
> +        }

... here and in the somewhat similar logic patch 2 inserts a few lines
up. I'm about to try this out, but given how rarely I've seen the
problem this will take a while to feel confident (if, of course, it helps
in the first place).

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH RFC 1/9] x86/IRQ: deal with move-in-progress state in fixup_irqs()
Posted by Jan Beulich 4 years, 11 months ago
>>> On 29.04.19 at 14:55, <JBeulich@suse.com> wrote:
>>>> On 29.04.19 at 13:22, <JBeulich@suse.com> wrote:
>> RFC: I've seen the new ASSERT() in irq_move_cleanup_interrupt() trigger.
>>      I'm pretty sure that this assertion triggering means something else
>>      is wrong, and has been even prior to this change (adding the
>>      assertion without any of the other changes here should be valid in
>>      my understanding).
> 
> So I think what is missing is updating of vector_irq ...
> 
>> @@ -2391,6 +2401,24 @@ void fixup_irqs(const cpumask_t *mask, b
>>              continue;
>>          }
>>  
>> +        /*
>> +         * In order for the affinity adjustment below to be successful, we
>> +         * need __assign_irq_vector() to succeed. This in particular means
>> +         * clearing desc->arch.move_in_progress if this would otherwise
>> +         * prevent the function from succeeding. Since there's no way for the
>> +         * flag to get cleared anymore when there's no possible destination
>> +         * left (the only possibility then would be the IRQs enabled window
>> +         * after this loop), there's then also no race with us doing it here.
>> +         *
>> +         * Therefore the logic here and there need to remain in sync.
>> +         */
>> +        if ( desc->arch.move_in_progress &&
>> +             !cpumask_intersects(mask, desc->arch.cpu_mask) )
>> +        {
>> +            release_old_vec(desc);
>> +            desc->arch.move_in_progress = 0;
>> +        }
> 
> ... here and in the somewhat similar logic patch 2 inserts a few lines
> up. I'm about to try this out, but given how rarely I've seen the
> problem this will take a while to feel confident (if, of course, it helps
> in the first place).

Actually no, the 2nd patch doesn't need any change - the code
added there only deals with CPUs already marked offline.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v1b 1/9] x86/IRQ: deal with move-in-progress state in fixup_irqs()
Posted by Jan Beulich 4 years, 11 months ago
The flag being set may prevent affinity changes, as these often imply
assignment of a new vector. When there's no possible destination left
for the IRQ, the clearing of the flag needs to happen right from
fixup_irqs().

Additionally _assign_irq_vector() needs to avoid setting the flag when
there's no online CPU left in what gets put into ->arch.old_cpu_mask.
The old vector can be released right away in this case.

Also extend the log message about broken affinity to include the new
affinity as well, allowing to notice issues with affinity changes not
actually having taken place. Swap the if/else-if order there at the
same time to reduce the amount of conditions checked.

At the same time replace two open coded instances of the new helper
function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Also update vector_irq[] in the code added to fixup_irqs().

--- unstable.orig/xen/arch/x86/irq.c	2019-04-29 17:34:16.726542659 +0200
+++ unstable/xen/arch/x86/irq.c	2019-04-29 15:05:39.000000000 +0200
@@ -242,6 +242,20 @@ void destroy_irq(unsigned int irq)
     xfree(action);
 }
 
+static void release_old_vec(struct irq_desc *desc)
+{
+    unsigned int vector = desc->arch.old_vector;
+
+    desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
+    cpumask_clear(desc->arch.old_cpu_mask);
+
+    if ( desc->arch.used_vectors )
+    {
+        ASSERT(test_bit(vector, desc->arch.used_vectors));
+        clear_bit(vector, desc->arch.used_vectors);
+    }
+}
+
 static void __clear_irq_vector(int irq)
 {
     int cpu, vector, old_vector;
@@ -285,14 +299,7 @@ static void __clear_irq_vector(int irq)
         per_cpu(vector_irq, cpu)[old_vector] = ~irq;
     }
 
-    desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
-    cpumask_clear(desc->arch.old_cpu_mask);
-
-    if ( desc->arch.used_vectors )
-    {
-        ASSERT(test_bit(old_vector, desc->arch.used_vectors));
-        clear_bit(old_vector, desc->arch.used_vectors);
-    }
+    release_old_vec(desc);
 
     desc->arch.move_in_progress = 0;
 }
@@ -517,12 +524,21 @@ next:
         /* Found one! */
         current_vector = vector;
         current_offset = offset;
-        if (old_vector > 0) {
-            desc->arch.move_in_progress = 1;
-            cpumask_copy(desc->arch.old_cpu_mask, desc->arch.cpu_mask);
+
+        if ( old_vector > 0 )
+        {
+            cpumask_and(desc->arch.old_cpu_mask, desc->arch.cpu_mask,
+                        &cpu_online_map);
             desc->arch.old_vector = desc->arch.vector;
+            if ( !cpumask_empty(desc->arch.old_cpu_mask) )
+                desc->arch.move_in_progress = 1;
+            else
+                /* This can happen while offlining a CPU. */
+                release_old_vec(desc);
         }
+
         trace_irq_mask(TRC_HW_IRQ_ASSIGN_VECTOR, irq, vector, &tmp_mask);
+
         for_each_cpu(new_cpu, &tmp_mask)
             per_cpu(vector_irq, new_cpu)[vector] = irq;
         desc->arch.vector = vector;
@@ -691,14 +707,8 @@ void irq_move_cleanup_interrupt(struct c
 
         if ( desc->arch.move_cleanup_count == 0 )
         {
-            desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
-            cpumask_clear(desc->arch.old_cpu_mask);
-
-            if ( desc->arch.used_vectors )
-            {
-                ASSERT(test_bit(vector, desc->arch.used_vectors));
-                clear_bit(vector, desc->arch.used_vectors);
-            }
+            ASSERT(vector == desc->arch.old_vector);
+            release_old_vec(desc);
         }
 unlock:
         spin_unlock(&desc->lock);
@@ -2391,6 +2401,33 @@ void fixup_irqs(const cpumask_t *mask, b
             continue;
         }
 
+        /*
+         * In order for the affinity adjustment below to be successful, we
+         * need __assign_irq_vector() to succeed. This in particular means
+         * clearing desc->arch.move_in_progress if this would otherwise
+         * prevent the function from succeeding. Since there's no way for the
+         * flag to get cleared anymore when there's no possible destination
+         * left (the only possibility then would be the IRQs enabled window
+         * after this loop), there's then also no race with us doing it here.
+         *
+         * Therefore the logic here and there need to remain in sync.
+         */
+        if ( desc->arch.move_in_progress &&
+             !cpumask_intersects(mask, desc->arch.cpu_mask) )
+        {
+            unsigned int cpu;
+
+            cpumask_and(&affinity, desc->arch.old_cpu_mask, &cpu_online_map);
+
+            spin_lock(&vector_lock);
+            for_each_cpu(cpu, &affinity)
+                per_cpu(vector_irq, cpu)[desc->arch.old_vector] = ~irq;
+            spin_unlock(&vector_lock);
+
+            release_old_vec(desc);
+            desc->arch.move_in_progress = 0;
+        }
+
         cpumask_and(&affinity, &affinity, mask);
         if ( cpumask_empty(&affinity) )
         {
@@ -2409,15 +2446,18 @@ void fixup_irqs(const cpumask_t *mask, b
         if ( desc->handler->enable )
             desc->handler->enable(desc);
 
+        cpumask_copy(&affinity, desc->affinity);
+
         spin_unlock(&desc->lock);
 
         if ( !verbose )
             continue;
 
-        if ( break_affinity && set_affinity )
-            printk("Broke affinity for irq %i\n", irq);
-        else if ( !set_affinity )
-            printk("Cannot set affinity for irq %i\n", irq);
+        if ( !set_affinity )
+            printk("Cannot set affinity for IRQ%u\n", irq);
+        else if ( break_affinity )
+            printk("Broke affinity for IRQ%u, new: %*pb\n",
+                   irq, nr_cpu_ids, &affinity);
     }
 
     /* That doesn't seem sufficient.  Give it 1ms. */




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1b 1/9] x86/IRQ: deal with move-in-progress state in fixup_irqs()
Posted by Roger Pau Monné 4 years, 10 months ago
On Mon, Apr 29, 2019 at 09:40:14AM -0600, Jan Beulich wrote:
> The flag being set may prevent affinity changes, as these often imply
> assignment of a new vector. When there's no possible destination left
> for the IRQ, the clearing of the flag needs to happen right from
> fixup_irqs().
> 
> Additionally _assign_irq_vector() needs to avoid setting the flag when
> there's no online CPU left in what gets put into ->arch.old_cpu_mask.
> The old vector can be released right away in this case.
> 
> Also extend the log message about broken affinity to include the new
> affinity as well, allowing to notice issues with affinity changes not
> actually having taken place. Swap the if/else-if order there at the
> same time to reduce the amount of conditions checked.
> 
> At the same time replace two open coded instances of the new helper
> function.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Also update vector_irq[] in the code added to fixup_irqs().
> 
> --- unstable.orig/xen/arch/x86/irq.c	2019-04-29 17:34:16.726542659 +0200
> +++ unstable/xen/arch/x86/irq.c	2019-04-29 15:05:39.000000000 +0200
> @@ -242,6 +242,20 @@ void destroy_irq(unsigned int irq)
>      xfree(action);
>  }
>  
> +static void release_old_vec(struct irq_desc *desc)
> +{
> +    unsigned int vector = desc->arch.old_vector;
> +
> +    desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
> +    cpumask_clear(desc->arch.old_cpu_mask);
> +
> +    if ( desc->arch.used_vectors )

Wouldn't it be better to clean the bitmap when vector !=
IRQ_VECTOR_UNASSIGNED?

I haven't checked all the callers, but I don't think it's valid to
call release_old_vec with desc->arch.old_vector ==
IRQ_VECTOR_UNASSIGNED, in which case I would add an ASSERT.

> +    {
> +        ASSERT(test_bit(vector, desc->arch.used_vectors));
> +        clear_bit(vector, desc->arch.used_vectors);
> +    }
> +}
> +
>  static void __clear_irq_vector(int irq)
>  {
>      int cpu, vector, old_vector;
> @@ -285,14 +299,7 @@ static void __clear_irq_vector(int irq)

Kind of unrelated, but I think the check at the top of
__clear_irq_vector should be:

BUG_ON(desc->arch.vector == IRQ_VECTOR_UNASSIGNED);

Rather than the current:

BUG_ON(!desc->arch.vector);

There's a lot of logic that would go extremely wrong if vector is -1.

>          per_cpu(vector_irq, cpu)[old_vector] = ~irq;
>      }
>  
> -    desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
> -    cpumask_clear(desc->arch.old_cpu_mask);
> -
> -    if ( desc->arch.used_vectors )
> -    {
> -        ASSERT(test_bit(old_vector, desc->arch.used_vectors));
> -        clear_bit(old_vector, desc->arch.used_vectors);
> -    }
> +    release_old_vec(desc);
>  
>      desc->arch.move_in_progress = 0;

While there it might be nice to convert move_in_progress to a boolean.

>  }
> @@ -517,12 +524,21 @@ next:
>          /* Found one! */
>          current_vector = vector;
>          current_offset = offset;
> -        if (old_vector > 0) {
> -            desc->arch.move_in_progress = 1;
> -            cpumask_copy(desc->arch.old_cpu_mask, desc->arch.cpu_mask);
> +
> +        if ( old_vector > 0 )
> +        {
> +            cpumask_and(desc->arch.old_cpu_mask, desc->arch.cpu_mask,
> +                        &cpu_online_map);
>              desc->arch.old_vector = desc->arch.vector;
> +            if ( !cpumask_empty(desc->arch.old_cpu_mask) )
> +                desc->arch.move_in_progress = 1;
> +            else
> +                /* This can happen while offlining a CPU. */
> +                release_old_vec(desc);
>          }
> +
>          trace_irq_mask(TRC_HW_IRQ_ASSIGN_VECTOR, irq, vector, &tmp_mask);
> +
>          for_each_cpu(new_cpu, &tmp_mask)
>              per_cpu(vector_irq, new_cpu)[vector] = irq;
>          desc->arch.vector = vector;
> @@ -691,14 +707,8 @@ void irq_move_cleanup_interrupt(struct c
>  
>          if ( desc->arch.move_cleanup_count == 0 )
>          {
> -            desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
> -            cpumask_clear(desc->arch.old_cpu_mask);
> -
> -            if ( desc->arch.used_vectors )
> -            {
> -                ASSERT(test_bit(vector, desc->arch.used_vectors));
> -                clear_bit(vector, desc->arch.used_vectors);
> -            }
> +            ASSERT(vector == desc->arch.old_vector);
> +            release_old_vec(desc);
>          }
>  unlock:
>          spin_unlock(&desc->lock);
> @@ -2391,6 +2401,33 @@ void fixup_irqs(const cpumask_t *mask, b
>              continue;
>          }
>  
> +        /*
> +         * In order for the affinity adjustment below to be successful, we
> +         * need __assign_irq_vector() to succeed. This in particular means
> +         * clearing desc->arch.move_in_progress if this would otherwise
> +         * prevent the function from succeeding. Since there's no way for the
> +         * flag to get cleared anymore when there's no possible destination
> +         * left (the only possibility then would be the IRQs enabled window
> +         * after this loop), there's then also no race with us doing it here.
> +         *
> +         * Therefore the logic here and there need to remain in sync.
> +         */
> +        if ( desc->arch.move_in_progress &&
> +             !cpumask_intersects(mask, desc->arch.cpu_mask) )
> +        {
> +            unsigned int cpu;
> +
> +            cpumask_and(&affinity, desc->arch.old_cpu_mask, &cpu_online_map);
> +
> +            spin_lock(&vector_lock);
> +            for_each_cpu(cpu, &affinity)
> +                per_cpu(vector_irq, cpu)[desc->arch.old_vector] = ~irq;
> +            spin_unlock(&vector_lock);
> +
> +            release_old_vec(desc);
> +            desc->arch.move_in_progress = 0;
> +        }
> +
>          cpumask_and(&affinity, &affinity, mask);
>          if ( cpumask_empty(&affinity) )
>          {
> @@ -2409,15 +2446,18 @@ void fixup_irqs(const cpumask_t *mask, b
>          if ( desc->handler->enable )
>              desc->handler->enable(desc);
>  
> +        cpumask_copy(&affinity, desc->affinity);
> +
>          spin_unlock(&desc->lock);
>  
>          if ( !verbose )
>              continue;
>  
> -        if ( break_affinity && set_affinity )
> -            printk("Broke affinity for irq %i\n", irq);
> -        else if ( !set_affinity )
> -            printk("Cannot set affinity for irq %i\n", irq);
> +        if ( !set_affinity )
> +            printk("Cannot set affinity for IRQ%u\n", irq);
> +        else if ( break_affinity )
> +            printk("Broke affinity for IRQ%u, new: %*pb\n",
> +                   irq, nr_cpu_ids, &affinity);

I guess it's fine to have those without rate-limiting because
fixup_irqs is only called for admin-triggered actions, so there's no
risk of console flooding.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1b 1/9] x86/IRQ: deal with move-in-progress state in fixup_irqs()
Posted by Jan Beulich 4 years, 10 months ago
>>> On 03.05.19 at 11:19, <roger.pau@citrix.com> wrote:
> On Mon, Apr 29, 2019 at 09:40:14AM -0600, Jan Beulich wrote:
>> --- unstable.orig/xen/arch/x86/irq.c	
>> +++ unstable/xen/arch/x86/irq.c
>> @@ -242,6 +242,20 @@ void destroy_irq(unsigned int irq)
>>      xfree(action);
>>  }
>>  
>> +static void release_old_vec(struct irq_desc *desc)
>> +{
>> +    unsigned int vector = desc->arch.old_vector;
>> +
>> +    desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
>> +    cpumask_clear(desc->arch.old_cpu_mask);
>> +
>> +    if ( desc->arch.used_vectors )
> 
> Wouldn't it be better to clean the bitmap when vector !=
> IRQ_VECTOR_UNASSIGNED?

No code path does / should call into here without the need to
actually release the previous vector.

> I haven't checked all the callers, but I don't think it's valid to
> call release_old_vec with desc->arch.old_vector ==
> IRQ_VECTOR_UNASSIGNED, in which case I would add an ASSERT.

Well, yes, I probably could. However, as much as I'm in
favor of ASSERT()s, I don't think it makes sense to ASSERT()
basically every bit of expected state. In the end there would
otherwise be more ASSERT()s than actual code.

>> +    {
>> +        ASSERT(test_bit(vector, desc->arch.used_vectors));
>> +        clear_bit(vector, desc->arch.used_vectors);
>> +    }
>> +}
>> +
>>  static void __clear_irq_vector(int irq)
>>  {
>>      int cpu, vector, old_vector;
>> @@ -285,14 +299,7 @@ static void __clear_irq_vector(int irq)
> 
> Kind of unrelated, but I think the check at the top of
> __clear_irq_vector should be:
> 
> BUG_ON(desc->arch.vector == IRQ_VECTOR_UNASSIGNED);
> 
> Rather than the current:
> 
> BUG_ON(!desc->arch.vector);
> 
> There's a lot of logic that would go extremely wrong if vector is -1.

Yes indeed. Do you want to send a patch, or should I add
one at the end of this series?

>>          per_cpu(vector_irq, cpu)[old_vector] = ~irq;
>>      }
>>  
>> -    desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
>> -    cpumask_clear(desc->arch.old_cpu_mask);
>> -
>> -    if ( desc->arch.used_vectors )
>> -    {
>> -        ASSERT(test_bit(old_vector, desc->arch.used_vectors));
>> -        clear_bit(old_vector, desc->arch.used_vectors);
>> -    }
>> +    release_old_vec(desc);
>>  
>>      desc->arch.move_in_progress = 0;
> 
> While there it might be nice to convert move_in_progress to a boolean.

This would grow the patch quite a bit I think, so I prefer no to.

>> @@ -2409,15 +2446,18 @@ void fixup_irqs(const cpumask_t *mask, b
>>          if ( desc->handler->enable )
>>              desc->handler->enable(desc);
>>  
>> +        cpumask_copy(&affinity, desc->affinity);
>> +
>>          spin_unlock(&desc->lock);
>>  
>>          if ( !verbose )
>>              continue;
>>  
>> -        if ( break_affinity && set_affinity )
>> -            printk("Broke affinity for irq %i\n", irq);
>> -        else if ( !set_affinity )
>> -            printk("Cannot set affinity for irq %i\n", irq);
>> +        if ( !set_affinity )
>> +            printk("Cannot set affinity for IRQ%u\n", irq);
>> +        else if ( break_affinity )
>> +            printk("Broke affinity for IRQ%u, new: %*pb\n",
>> +                   irq, nr_cpu_ids, &affinity);
> 
> I guess it's fine to have those without rate-limiting because
> fixup_irqs is only called for admin-triggered actions, so there's no
> risk of console flooding.

Right, plus I'd rather not hide any of these messages: Them
being there was already a good indication that something
_might_ be going wrong. If we got to the point where we're
fully confident in the code, then we could think about lowering
their log level, or rate limiting them.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1b 1/9] x86/IRQ: deal with move-in-progress state in fixup_irqs()
Posted by Jan Beulich 4 years, 10 months ago
>>> On 03.05.19 at 16:10, <JBeulich@suse.com> wrote:
>>>> On 03.05.19 at 11:19, <roger.pau@citrix.com> wrote:
>> On Mon, Apr 29, 2019 at 09:40:14AM -0600, Jan Beulich wrote:
>>> --- unstable.orig/xen/arch/x86/irq.c	
>>> +++ unstable/xen/arch/x86/irq.c
>>> @@ -242,6 +242,20 @@ void destroy_irq(unsigned int irq)
>>>      xfree(action);
>>>  }
>>>  
>>> +static void release_old_vec(struct irq_desc *desc)
>>> +{
>>> +    unsigned int vector = desc->arch.old_vector;
>>> +
>>> +    desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
>>> +    cpumask_clear(desc->arch.old_cpu_mask);
>>> +
>>> +    if ( desc->arch.used_vectors )
>> 
>> Wouldn't it be better to clean the bitmap when vector !=
>> IRQ_VECTOR_UNASSIGNED?
> 
> No code path does / should call into here without the need to
> actually release the previous vector.
> 
>> I haven't checked all the callers, but I don't think it's valid to
>> call release_old_vec with desc->arch.old_vector ==
>> IRQ_VECTOR_UNASSIGNED, in which case I would add an ASSERT.
> 
> Well, yes, I probably could. However, as much as I'm in
> favor of ASSERT()s, I don't think it makes sense to ASSERT()
> basically every bit of expected state. In the end there would
> otherwise be more ASSERT()s than actual code.

Actually, upon second thought - let me add this, but then in an
even more strict form: Certain very low and very high numbered
vectors are illegal here as well, and we may then be able to use
the same validation helper elsewhere (in particular also for the
check that you've found to be wrong in _clear_irq_vector()).

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1b 1/9] x86/IRQ: deal with move-in-progress state in fixup_irqs()
Posted by Roger Pau Monné 4 years, 10 months ago
On Mon, May 06, 2019 at 01:15:59AM -0600, Jan Beulich wrote:
> >>> On 03.05.19 at 16:10, <JBeulich@suse.com> wrote:
> >>>> On 03.05.19 at 11:19, <roger.pau@citrix.com> wrote:
> >> On Mon, Apr 29, 2019 at 09:40:14AM -0600, Jan Beulich wrote:
> >>> --- unstable.orig/xen/arch/x86/irq.c	
> >>> +++ unstable/xen/arch/x86/irq.c
> >>> @@ -242,6 +242,20 @@ void destroy_irq(unsigned int irq)
> >>>      xfree(action);
> >>>  }
> >>>  
> >>> +static void release_old_vec(struct irq_desc *desc)
> >>> +{
> >>> +    unsigned int vector = desc->arch.old_vector;
> >>> +
> >>> +    desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
> >>> +    cpumask_clear(desc->arch.old_cpu_mask);
> >>> +
> >>> +    if ( desc->arch.used_vectors )
> >> 
> >> Wouldn't it be better to clean the bitmap when vector !=
> >> IRQ_VECTOR_UNASSIGNED?
> > 
> > No code path does / should call into here without the need to
> > actually release the previous vector.
> > 
> >> I haven't checked all the callers, but I don't think it's valid to
> >> call release_old_vec with desc->arch.old_vector ==
> >> IRQ_VECTOR_UNASSIGNED, in which case I would add an ASSERT.
> > 
> > Well, yes, I probably could. However, as much as I'm in
> > favor of ASSERT()s, I don't think it makes sense to ASSERT()
> > basically every bit of expected state. In the end there would
> > otherwise be more ASSERT()s than actual code.
> 
> Actually, upon second thought - let me add this, but then in an
> even more strict form: Certain very low and very high numbered
> vectors are illegal here as well, and we may then be able to use
> the same validation helper elsewhere (in particular also for the
> check that you've found to be wrong in _clear_irq_vector()).

Thanks, that LGTM.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1b 1/9] x86/IRQ: deal with move-in-progress state in fixup_irqs()
Posted by Jan Beulich 4 years, 10 months ago
>>> On 06.05.19 at 16:28, <roger.pau@citrix.com> wrote:
> On Mon, May 06, 2019 at 01:15:59AM -0600, Jan Beulich wrote:
>> >>> On 03.05.19 at 16:10, <JBeulich@suse.com> wrote:
>> >>>> On 03.05.19 at 11:19, <roger.pau@citrix.com> wrote:
>> >> On Mon, Apr 29, 2019 at 09:40:14AM -0600, Jan Beulich wrote:
>> >>> --- unstable.orig/xen/arch/x86/irq.c	
>> >>> +++ unstable/xen/arch/x86/irq.c
>> >>> @@ -242,6 +242,20 @@ void destroy_irq(unsigned int irq)
>> >>>      xfree(action);
>> >>>  }
>> >>>  
>> >>> +static void release_old_vec(struct irq_desc *desc)
>> >>> +{
>> >>> +    unsigned int vector = desc->arch.old_vector;
>> >>> +
>> >>> +    desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
>> >>> +    cpumask_clear(desc->arch.old_cpu_mask);
>> >>> +
>> >>> +    if ( desc->arch.used_vectors )
>> >> 
>> >> Wouldn't it be better to clean the bitmap when vector !=
>> >> IRQ_VECTOR_UNASSIGNED?
>> > 
>> > No code path does / should call into here without the need to
>> > actually release the previous vector.
>> > 
>> >> I haven't checked all the callers, but I don't think it's valid to
>> >> call release_old_vec with desc->arch.old_vector ==
>> >> IRQ_VECTOR_UNASSIGNED, in which case I would add an ASSERT.
>> > 
>> > Well, yes, I probably could. However, as much as I'm in
>> > favor of ASSERT()s, I don't think it makes sense to ASSERT()
>> > basically every bit of expected state. In the end there would
>> > otherwise be more ASSERT()s than actual code.
>> 
>> Actually, upon second thought - let me add this, but then in an
>> even more strict form: Certain very low and very high numbered
>> vectors are illegal here as well, and we may then be able to use
>> the same validation helper elsewhere (in particular also for the
>> check that you've found to be wrong in _clear_irq_vector()).
> 
> Thanks, that LGTM.

And FTR - it _does_ trigger. I'm still struggling to explain why.
The only place where ->arch.move_in_progress gets set is
in _assign_irq_vector(), and the check I've put there for
debugging purposes doesn't trigger, i.e. the vectors put there
into ->arch.old_vector are valid.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 2/9] x86/IRQ: deal with move cleanup count state in fixup_irqs()
Posted by Jan Beulich 4 years, 11 months ago
The cleanup IPI may get sent immediately before a CPU gets removed from
the online map. In such a case the IPI would get handled on the CPU
being offlined no earlier than in the interrupts disabled window after
fixup_irqs()' main loop. This is too late, however, because a possible
affinity change may incur the need for vector assignment, which will
fail when the IRQ's move cleanup count is still non-zero.

To fix this
- record the set of CPUs the cleanup IPIs gets actually sent to alongside
  setting their count,
- adjust the count in fixup_irqs(), accounting for all CPUs that the
  cleanup IPI was sent to, but that are no longer online,
- bail early from the cleanup IPI handler when the CPU is no longer
  online, to prevent double accounting.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
TBD: The proper recording of the IPI destinations actually makes the
     move_cleanup_count field redundant. Do we want to drop it, at the
     price of a few more CPU-mask operations?

--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -658,6 +658,9 @@ void irq_move_cleanup_interrupt(struct c
     ack_APIC_irq();
 
     me = smp_processor_id();
+    if ( !cpu_online(me) )
+        return;
+
     for ( vector = FIRST_DYNAMIC_VECTOR;
           vector <= LAST_HIPRIORITY_VECTOR; vector++)
     {
@@ -717,11 +720,14 @@ unlock:
 
 static void send_cleanup_vector(struct irq_desc *desc)
 {
-    cpumask_t cleanup_mask;
+    cpumask_and(desc->arch.old_cpu_mask, desc->arch.old_cpu_mask,
+                &cpu_online_map);
+    desc->arch.move_cleanup_count = cpumask_weight(desc->arch.old_cpu_mask);
 
-    cpumask_and(&cleanup_mask, desc->arch.old_cpu_mask, &cpu_online_map);
-    desc->arch.move_cleanup_count = cpumask_weight(&cleanup_mask);
-    send_IPI_mask(&cleanup_mask, IRQ_MOVE_CLEANUP_VECTOR);
+    if ( desc->arch.move_cleanup_count )
+        send_IPI_mask(desc->arch.old_cpu_mask, IRQ_MOVE_CLEANUP_VECTOR);
+    else
+        release_old_vec(desc);
 
     desc->arch.move_in_progress = 0;
 }
@@ -2394,6 +2400,16 @@ void fixup_irqs(const cpumask_t *mask, b
              vector <= LAST_HIPRIORITY_VECTOR )
             cpumask_and(desc->arch.cpu_mask, desc->arch.cpu_mask, mask);
 
+        if ( desc->arch.move_cleanup_count )
+        {
+            /* The cleanup IPI may have got sent while we were still online. */
+            cpumask_andnot(&affinity, desc->arch.old_cpu_mask,
+                           &cpu_online_map);
+            desc->arch.move_cleanup_count -= cpumask_weight(&affinity);
+            if ( !desc->arch.move_cleanup_count )
+                release_old_vec(desc);
+        }
+
         cpumask_copy(&affinity, desc->affinity);
         if ( !desc->action || cpumask_subset(&affinity, mask) )
         {





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/9] x86/IRQ: deal with move cleanup count state in fixup_irqs()
Posted by Roger Pau Monné 4 years, 10 months ago
On Mon, Apr 29, 2019 at 05:23:20AM -0600, Jan Beulich wrote:
> The cleanup IPI may get sent immediately before a CPU gets removed from
> the online map. In such a case the IPI would get handled on the CPU
> being offlined no earlier than in the interrupts disabled window after
> fixup_irqs()' main loop. This is too late, however, because a possible
> affinity change may incur the need for vector assignment, which will
> fail when the IRQ's move cleanup count is still non-zero.
> 
> To fix this
> - record the set of CPUs the cleanup IPIs gets actually sent to alongside
>   setting their count,
> - adjust the count in fixup_irqs(), accounting for all CPUs that the
>   cleanup IPI was sent to, but that are no longer online,
> - bail early from the cleanup IPI handler when the CPU is no longer
>   online, to prevent double accounting.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Just as a note, this whole interrupt migration business seems
extremely complex, and I wonder if Xen does really need it, or what's
exactly it's performance gain compared to more simple solutions. I
understand this is just fixes, but IMO it's making the logic even more
complex.

Maybe it would be simpler to have the interrupts hard-bound to pCPUs
and instead have a soft-affinity on the guest vCPUs that are assigned
as the destination?

> ---
> TBD: The proper recording of the IPI destinations actually makes the
>      move_cleanup_count field redundant. Do we want to drop it, at the
>      price of a few more CPU-mask operations?

AFAICT this is not a hot path, so I would remove the
move_cleanup_count field and just weight the cpu bitmap when needed.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/9] x86/IRQ: deal with move cleanup count state in fixup_irqs()
Posted by Jan Beulich 4 years, 10 months ago
>>> On 03.05.19 at 17:21, <roger.pau@citrix.com> wrote:
> On Mon, Apr 29, 2019 at 05:23:20AM -0600, Jan Beulich wrote:
>> ---
>> TBD: The proper recording of the IPI destinations actually makes the
>>      move_cleanup_count field redundant. Do we want to drop it, at the
>>      price of a few more CPU-mask operations?
> 
> AFAICT this is not a hot path, so I would remove the
> move_cleanup_count field and just weight the cpu bitmap when needed.

FTR: It's not fully redundant - the patch removing it that I had
added was actually the reason for seeing the ASSERT() trigger
that you did ask to add in patch 1. The reason is that the field
serves as a flag for irq_move_cleanup_interrupt() whether to
act on an IRQ in the first place. Without it, the function will
prematurely clean up the vector, thus confusing subsequent
code trying to do the cleanup when it's actually due.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/9] x86/IRQ: deal with move cleanup count state in fixup_irqs()
Posted by Roger Pau Monné 4 years, 10 months ago
On Tue, May 07, 2019 at 01:28:36AM -0600, Jan Beulich wrote:
> >>> On 03.05.19 at 17:21, <roger.pau@citrix.com> wrote:
> > On Mon, Apr 29, 2019 at 05:23:20AM -0600, Jan Beulich wrote:
> >> ---
> >> TBD: The proper recording of the IPI destinations actually makes the
> >>      move_cleanup_count field redundant. Do we want to drop it, at the
> >>      price of a few more CPU-mask operations?
> > 
> > AFAICT this is not a hot path, so I would remove the
> > move_cleanup_count field and just weight the cpu bitmap when needed.
> 
> FTR: It's not fully redundant - the patch removing it that I had
> added was actually the reason for seeing the ASSERT() trigger
> that you did ask to add in patch 1. The reason is that the field
> serves as a flag for irq_move_cleanup_interrupt() whether to
> act on an IRQ in the first place. Without it, the function will
> prematurely clean up the vector, thus confusing subsequent
> code trying to do the cleanup when it's actually due.

So weighing desc->arch.old_cpu_mask is not equivalent to
move_cleanup_count?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/9] x86/IRQ: deal with move cleanup count state in fixup_irqs()
Posted by Jan Beulich 4 years, 10 months ago
>>> On 07.05.19 at 10:12, <roger.pau@citrix.com> wrote:
> On Tue, May 07, 2019 at 01:28:36AM -0600, Jan Beulich wrote:
>> >>> On 03.05.19 at 17:21, <roger.pau@citrix.com> wrote:
>> > On Mon, Apr 29, 2019 at 05:23:20AM -0600, Jan Beulich wrote:
>> >> ---
>> >> TBD: The proper recording of the IPI destinations actually makes the
>> >>      move_cleanup_count field redundant. Do we want to drop it, at the
>> >>      price of a few more CPU-mask operations?
>> > 
>> > AFAICT this is not a hot path, so I would remove the
>> > move_cleanup_count field and just weight the cpu bitmap when needed.
>> 
>> FTR: It's not fully redundant - the patch removing it that I had
>> added was actually the reason for seeing the ASSERT() trigger
>> that you did ask to add in patch 1. The reason is that the field
>> serves as a flag for irq_move_cleanup_interrupt() whether to
>> act on an IRQ in the first place. Without it, the function will
>> prematurely clean up the vector, thus confusing subsequent
>> code trying to do the cleanup when it's actually due.
> 
> So weighing desc->arch.old_cpu_mask is not equivalent to
> move_cleanup_count?

Not exactly, no: While the field gets set from the cpumask_weight()
result, it matter _when_ that happens. Prior to that point, what bits
are set in the mask is of no interest to irq_move_cleanup_interrupt().

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/9] x86/IRQ: deal with move cleanup count state in fixup_irqs()
Posted by Jan Beulich 4 years, 10 months ago
>>> On 03.05.19 at 17:21, <roger.pau@citrix.com> wrote:
> On Mon, Apr 29, 2019 at 05:23:20AM -0600, Jan Beulich wrote:
>> The cleanup IPI may get sent immediately before a CPU gets removed from
>> the online map. In such a case the IPI would get handled on the CPU
>> being offlined no earlier than in the interrupts disabled window after
>> fixup_irqs()' main loop. This is too late, however, because a possible
>> affinity change may incur the need for vector assignment, which will
>> fail when the IRQ's move cleanup count is still non-zero.
>> 
>> To fix this
>> - record the set of CPUs the cleanup IPIs gets actually sent to alongside
>>   setting their count,
>> - adjust the count in fixup_irqs(), accounting for all CPUs that the
>>   cleanup IPI was sent to, but that are no longer online,
>> - bail early from the cleanup IPI handler when the CPU is no longer
>>   online, to prevent double accounting.
>> 
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

> Just as a note, this whole interrupt migration business seems
> extremely complex, and I wonder if Xen does really need it, or what's
> exactly it's performance gain compared to more simple solutions.

What more simple solutions would you think about? IRQ affinities
tracking their assigned-vCPU ones was added largely to avoid
high rate interrupts always arriving on a CPU other than the one
where the actual handling will take place. Arguably this may go
too far for low rate interrupts, but adding a respective heuristic
would rather further complicate handling.

> I understand this is just fixes, but IMO it's making the logic even more
> complex.
> 
> Maybe it would be simpler to have the interrupts hard-bound to pCPUs
> and instead have a soft-affinity on the guest vCPUs that are assigned
> as the destination?

How would the soft affinity of a vCPU be calculated that has
multiple IRQs (with at most partially overlapping affinities) to be
serviced by it?

>> ---
>> TBD: The proper recording of the IPI destinations actually makes the
>>      move_cleanup_count field redundant. Do we want to drop it, at the
>>      price of a few more CPU-mask operations?
> 
> AFAICT this is not a hot path, so I would remove the
> move_cleanup_count field and just weight the cpu bitmap when needed.

Added for v2 (pending successful testing).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 3/9] x86/IRQ: improve dump_irqs()
Posted by Jan Beulich 4 years, 11 months ago
Don't log a stray trailing comma. Shorten a few fields.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2318,7 +2318,7 @@ static void dump_irqs(unsigned char key)
 
         spin_lock_irqsave(&desc->lock, flags);
 
-        printk("   IRQ:%4d affinity:%*pb vec:%02x type=%-15s status=%08x ",
+        printk("   IRQ:%4d aff:%*pb vec:%02x %-15s status=%03x ",
                irq, nr_cpu_ids, cpumask_bits(desc->affinity), desc->arch.vector,
                desc->handler->typename, desc->status);
 
@@ -2329,23 +2329,21 @@ static void dump_irqs(unsigned char key)
         {
             action = (irq_guest_action_t *)desc->action;
 
-            printk("in-flight=%d domain-list=", action->in_flight);
+            printk("in-flight=%d%c",
+                   action->in_flight, action->nr_guests ? ' ' : '\n');
 
-            for ( i = 0; i < action->nr_guests; i++ )
+            for ( i = 0; i < action->nr_guests; )
             {
-                d = action->guest[i];
+                d = action->guest[i++];
                 pirq = domain_irq_to_pirq(d, irq);
                 info = pirq_info(d, pirq);
-                printk("%u:%3d(%c%c%c)",
+                printk("d%d:%3d(%c%c%c)%c",
                        d->domain_id, pirq,
                        evtchn_port_is_pending(d, info->evtchn) ? 'P' : '-',
                        evtchn_port_is_masked(d, info->evtchn) ? 'M' : '-',
-                       (info->masked ? 'M' : '-'));
-                if ( i != action->nr_guests )
-                    printk(",");
+                       info->masked ? 'M' : '-',
+                       i < action->nr_guests ? ',' : '\n');
             }
-
-            printk("\n");
         }
         else if ( desc->action )
             printk("%ps()\n", desc->action->handler);





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 3/9] x86/IRQ: improve dump_irqs()
Posted by Roger Pau Monné 4 years, 10 months ago
On Mon, Apr 29, 2019 at 05:23:49AM -0600, Jan Beulich wrote:
> Don't log a stray trailing comma. Shorten a few fields.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

> -            for ( i = 0; i < action->nr_guests; i++ )
> +            for ( i = 0; i < action->nr_guests; )
>              {
> -                d = action->guest[i];
> +                d = action->guest[i++];

Per my taste I would leave the increment in the for, but it's just
taste.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 3/9] x86/IRQ: improve dump_irqs()
Posted by Jan Beulich 4 years, 10 months ago
>>> On 03.05.19 at 17:43, <roger.pau@citrix.com> wrote:
> On Mon, Apr 29, 2019 at 05:23:49AM -0600, Jan Beulich wrote:
>> Don't log a stray trailing comma. Shorten a few fields.
>> 
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

>> -            for ( i = 0; i < action->nr_guests; i++ )
>> +            for ( i = 0; i < action->nr_guests; )
>>              {
>> -                d = action->guest[i];
>> +                d = action->guest[i++];
> 
> Per my taste I would leave the increment in the for, but it's just
> taste.

If it was just taste, I'd have left it there, but there is

                printk("d%d:%3d(%c%c%c)%c",
                       d->domain_id, pirq,
                       evtchn_port_is_pending(d, info->evtchn) ? 'P' : '-',
                       evtchn_port_is_masked(d, info->evtchn) ? 'M' : '-',
                       info->masked ? 'M' : '-',
                       i < action->nr_guests ? ',' : '\n');

which depends on the early increment (or else would need adding
" + 1" or " - 1" on one side of the < . In fact this change is the
"don't log a stray trailing comma" part.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 4/9] x86/IRQ: desc->affinity should strictly represent the requested value
Posted by Jan Beulich 4 years, 11 months ago
desc->arch.cpu_mask reflects the actual set of target CPUs. Don't ever
fiddle with desc->affinity itself, except to store caller requested
values.

This renders both set_native_irq_info() uses (which weren't using proper
locking anyway) redundant - drop the function altogether.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -1042,7 +1042,6 @@ static void __init setup_IO_APIC_irqs(vo
             SET_DEST(entry, logical, cpu_mask_to_apicid(TARGET_CPUS));
             spin_lock_irqsave(&ioapic_lock, flags);
             __ioapic_write_entry(apic, pin, 0, entry);
-            set_native_irq_info(irq, TARGET_CPUS);
             spin_unlock_irqrestore(&ioapic_lock, flags);
         }
     }
@@ -2251,7 +2250,6 @@ int io_apic_set_pci_routing (int ioapic,
 
     spin_lock_irqsave(&ioapic_lock, flags);
     __ioapic_write_entry(ioapic, pin, 0, entry);
-    set_native_irq_info(irq, TARGET_CPUS);
     spin_unlock(&ioapic_lock);
 
     spin_lock(&desc->lock);
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -572,11 +572,16 @@ int assign_irq_vector(int irq, const cpu
 
     spin_lock_irqsave(&vector_lock, flags);
     ret = __assign_irq_vector(irq, desc, mask ?: TARGET_CPUS);
-    if (!ret) {
+    if ( !ret )
+    {
         ret = desc->arch.vector;
-        cpumask_copy(desc->affinity, desc->arch.cpu_mask);
+        if ( mask )
+            cpumask_copy(desc->affinity, mask);
+        else
+            cpumask_setall(desc->affinity);
     }
     spin_unlock_irqrestore(&vector_lock, flags);
+
     return ret;
 }
 
@@ -2318,9 +2323,10 @@ static void dump_irqs(unsigned char key)
 
         spin_lock_irqsave(&desc->lock, flags);
 
-        printk("   IRQ:%4d aff:%*pb vec:%02x %-15s status=%03x ",
-               irq, nr_cpu_ids, cpumask_bits(desc->affinity), desc->arch.vector,
-               desc->handler->typename, desc->status);
+        printk("   IRQ:%4d aff:%*pb/%*pb vec:%02x %-15s status=%03x ",
+               irq, nr_cpu_ids, cpumask_bits(desc->affinity),
+               nr_cpu_ids, cpumask_bits(desc->arch.cpu_mask),
+               desc->arch.vector, desc->handler->typename, desc->status);
 
         if ( ssid )
             printk("Z=%-25s ", ssid);
@@ -2408,8 +2414,7 @@ void fixup_irqs(const cpumask_t *mask, b
                 release_old_vec(desc);
         }
 
-        cpumask_copy(&affinity, desc->affinity);
-        if ( !desc->action || cpumask_subset(&affinity, mask) )
+        if ( !desc->action || cpumask_subset(desc->affinity, mask) )
         {
             spin_unlock(&desc->lock);
             continue;
@@ -2433,12 +2438,13 @@ void fixup_irqs(const cpumask_t *mask, b
             desc->arch.move_in_progress = 0;
         }
 
-        cpumask_and(&affinity, &affinity, mask);
-        if ( cpumask_empty(&affinity) )
+        if ( !cpumask_intersects(mask, desc->affinity) )
         {
             break_affinity = true;
-            cpumask_copy(&affinity, mask);
+            cpumask_setall(&affinity);
         }
+        else
+            cpumask_copy(&affinity, desc->affinity);
 
         if ( desc->handler->disable )
             desc->handler->disable(desc);
--- a/xen/include/xen/irq.h
+++ b/xen/include/xen/irq.h
@@ -162,11 +162,6 @@ extern irq_desc_t *domain_spin_lock_irq_
 extern irq_desc_t *pirq_spin_lock_irq_desc(
     const struct pirq *, unsigned long *pflags);
 
-static inline void set_native_irq_info(unsigned int irq, const cpumask_t *mask)
-{
-    cpumask_copy(irq_to_desc(irq)->affinity, mask);
-}
-
 unsigned int set_desc_affinity(struct irq_desc *, const cpumask_t *);
 
 #ifndef arch_hwdom_irqs





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 4/9] x86/IRQ: desc->affinity should strictly represent the requested value
Posted by Roger Pau Monné 4 years, 10 months ago
On Mon, Apr 29, 2019 at 05:24:39AM -0600, Jan Beulich wrote:
> desc->arch.cpu_mask reflects the actual set of target CPUs. Don't ever
> fiddle with desc->affinity itself, except to store caller requested
> values.
> 
> This renders both set_native_irq_info() uses (which weren't using proper
> locking anyway) redundant - drop the function altogether.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

> 
> --- a/xen/arch/x86/io_apic.c
> +++ b/xen/arch/x86/io_apic.c
> @@ -1042,7 +1042,6 @@ static void __init setup_IO_APIC_irqs(vo
>              SET_DEST(entry, logical, cpu_mask_to_apicid(TARGET_CPUS));
>              spin_lock_irqsave(&ioapic_lock, flags);
>              __ioapic_write_entry(apic, pin, 0, entry);
> -            set_native_irq_info(irq, TARGET_CPUS);
>              spin_unlock_irqrestore(&ioapic_lock, flags);
>          }
>      }
> @@ -2251,7 +2250,6 @@ int io_apic_set_pci_routing (int ioapic,
>  
>      spin_lock_irqsave(&ioapic_lock, flags);
>      __ioapic_write_entry(ioapic, pin, 0, entry);
> -    set_native_irq_info(irq, TARGET_CPUS);
>      spin_unlock(&ioapic_lock);
>  
>      spin_lock(&desc->lock);
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -572,11 +572,16 @@ int assign_irq_vector(int irq, const cpu
>  
>      spin_lock_irqsave(&vector_lock, flags);
>      ret = __assign_irq_vector(irq, desc, mask ?: TARGET_CPUS);
> -    if (!ret) {
> +    if ( !ret )
> +    {
>          ret = desc->arch.vector;
> -        cpumask_copy(desc->affinity, desc->arch.cpu_mask);
> +        if ( mask )
> +            cpumask_copy(desc->affinity, mask);
> +        else
> +            cpumask_setall(desc->affinity);

I guess it's fine to use setall instead of copying the cpu online map
here?

AFAICT __assign_irq_vector already filters offline CPUs from the
passed mask.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 4/9] x86/IRQ: desc->affinity should strictly represent the requested value
Posted by Jan Beulich 4 years, 10 months ago
>>> On 03.05.19 at 18:21, <roger.pau@citrix.com> wrote:
> On Mon, Apr 29, 2019 at 05:24:39AM -0600, Jan Beulich wrote:
>> desc->arch.cpu_mask reflects the actual set of target CPUs. Don't ever
>> fiddle with desc->affinity itself, except to store caller requested
>> values.
>> 
>> This renders both set_native_irq_info() uses (which weren't using proper
>> locking anyway) redundant - drop the function altogether.
>> 
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

>> --- a/xen/arch/x86/irq.c
>> +++ b/xen/arch/x86/irq.c
>> @@ -572,11 +572,16 @@ int assign_irq_vector(int irq, const cpu
>>  
>>      spin_lock_irqsave(&vector_lock, flags);
>>      ret = __assign_irq_vector(irq, desc, mask ?: TARGET_CPUS);
>> -    if (!ret) {
>> +    if ( !ret )
>> +    {
>>          ret = desc->arch.vector;
>> -        cpumask_copy(desc->affinity, desc->arch.cpu_mask);
>> +        if ( mask )
>> +            cpumask_copy(desc->affinity, mask);
>> +        else
>> +            cpumask_setall(desc->affinity);
> 
> I guess it's fine to use setall instead of copying the cpu online map
> here?

It's not only fine, it's actually one of the goals: This way you can set
affinities such that they won't need adjustment after bringing up
another CPU. I've added a respective sentence to the description.

> AFAICT __assign_irq_vector already filters offline CPUs from the
> passed mask.

Indeed. And all other involved code should, too, by now. I think
there is at least one place left somewhere where the online map is
used for setting affinities, but I suppose this can be dealt with at
another time.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 5/9] x86/IRQ: fix locking around vector management
Posted by Jan Beulich 4 years, 11 months ago
All of __{assign,bind,clear}_irq_vector() manipulate struct irq_desc
fields, and hence ought to be called with the descriptor lock held in
addition to vector_lock. This is currently the case for only
set_desc_affinity() and destroy_irq(), which also clarifies what the
nesting behavior between the locks has to be. Reflect the new
expectation by having these functions all take a descriptor as
parameter instead of an interrupt number.

Drop one of the two leading underscores from all three functions at
the same time.

There's one case left where descriptors get manipulated with just
vector_lock held: setup_vector_irq() assumes its caller to acquire
vector_lock, and hence can't itself acquire the descriptor locks (wrong
lock order). I don't currently see how to address this.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -27,6 +27,7 @@
 #include <public/physdev.h>
 
 static int parse_irq_vector_map_param(const char *s);
+static void _clear_irq_vector(struct irq_desc *desc);
 
 /* opt_noirqbalance: If true, software IRQ balancing/affinity is disabled. */
 bool __read_mostly opt_noirqbalance;
@@ -112,13 +113,12 @@ static void trace_irq_mask(u32 event, in
     trace_var(event, 1, sizeof(d), &d);
 }
 
-static int __init __bind_irq_vector(int irq, int vector, const cpumask_t *cpu_mask)
+static int __init _bind_irq_vector(struct irq_desc *desc, int vector,
+                                   const cpumask_t *cpu_mask)
 {
     cpumask_t online_mask;
     int cpu;
-    struct irq_desc *desc = irq_to_desc(irq);
 
-    BUG_ON((unsigned)irq >= nr_irqs);
     BUG_ON((unsigned)vector >= NR_VECTORS);
 
     cpumask_and(&online_mask, cpu_mask, &cpu_online_map);
@@ -129,9 +129,9 @@ static int __init __bind_irq_vector(int
         return 0;
     if ( desc->arch.vector != IRQ_VECTOR_UNASSIGNED )
         return -EBUSY;
-    trace_irq_mask(TRC_HW_IRQ_BIND_VECTOR, irq, vector, &online_mask);
+    trace_irq_mask(TRC_HW_IRQ_BIND_VECTOR, desc->irq, vector, &online_mask);
     for_each_cpu(cpu, &online_mask)
-        per_cpu(vector_irq, cpu)[vector] = irq;
+        per_cpu(vector_irq, cpu)[vector] = desc->irq;
     desc->arch.vector = vector;
     cpumask_copy(desc->arch.cpu_mask, &online_mask);
     if ( desc->arch.used_vectors )
@@ -145,12 +145,18 @@ static int __init __bind_irq_vector(int
 
 int __init bind_irq_vector(int irq, int vector, const cpumask_t *cpu_mask)
 {
+    struct irq_desc *desc = irq_to_desc(irq);
     unsigned long flags;
     int ret;
 
-    spin_lock_irqsave(&vector_lock, flags);
-    ret = __bind_irq_vector(irq, vector, cpu_mask);
-    spin_unlock_irqrestore(&vector_lock, flags);
+    BUG_ON((unsigned)irq >= nr_irqs);
+
+    spin_lock_irqsave(&desc->lock, flags);
+    spin_lock(&vector_lock);
+    ret = _bind_irq_vector(desc, vector, cpu_mask);
+    spin_unlock(&vector_lock);
+    spin_unlock_irqrestore(&desc->lock, flags);
+
     return ret;
 }
 
@@ -235,7 +241,9 @@ void destroy_irq(unsigned int irq)
 
     spin_lock_irqsave(&desc->lock, flags);
     desc->handler = &no_irq_type;
-    clear_irq_vector(irq);
+    spin_lock(&vector_lock);
+    _clear_irq_vector(desc);
+    spin_unlock(&vector_lock);
     desc->arch.used_vectors = NULL;
     spin_unlock_irqrestore(&desc->lock, flags);
 
@@ -256,11 +264,11 @@ static void release_old_vec(struct irq_d
     }
 }
 
-static void __clear_irq_vector(int irq)
+static void _clear_irq_vector(struct irq_desc *desc)
 {
-    int cpu, vector, old_vector;
+    unsigned int cpu;
+    int vector, old_vector, irq = desc->irq;
     cpumask_t tmp_mask;
-    struct irq_desc *desc = irq_to_desc(irq);
 
     BUG_ON(!desc->arch.vector);
 
@@ -306,11 +314,14 @@ static void __clear_irq_vector(int irq)
 
 void clear_irq_vector(int irq)
 {
+    struct irq_desc *desc = irq_to_desc(irq);
     unsigned long flags;
 
-    spin_lock_irqsave(&vector_lock, flags);
-    __clear_irq_vector(irq);
-    spin_unlock_irqrestore(&vector_lock, flags);
+    spin_lock_irqsave(&desc->lock, flags);
+    spin_lock(&vector_lock);
+    _clear_irq_vector(desc);
+    spin_unlock(&vector_lock);
+    spin_unlock_irqrestore(&desc->lock, flags);
 }
 
 int irq_to_vector(int irq)
@@ -445,8 +456,7 @@ static vmask_t *irq_get_used_vector_mask
     return ret;
 }
 
-static int __assign_irq_vector(
-    int irq, struct irq_desc *desc, const cpumask_t *mask)
+static int _assign_irq_vector(struct irq_desc *desc, const cpumask_t *mask)
 {
     /*
      * NOTE! The local APIC isn't very good at handling
@@ -460,7 +470,8 @@ static int __assign_irq_vector(
      * 0x80, because int 0x80 is hm, kind of importantish. ;)
      */
     static int current_vector = FIRST_DYNAMIC_VECTOR, current_offset = 0;
-    int cpu, err, old_vector;
+    unsigned int cpu;
+    int err, old_vector, irq = desc->irq;
     cpumask_t tmp_mask;
     vmask_t *irq_used_vectors = NULL;
 
@@ -570,8 +581,12 @@ int assign_irq_vector(int irq, const cpu
     
     BUG_ON(irq >= nr_irqs || irq <0);
 
-    spin_lock_irqsave(&vector_lock, flags);
-    ret = __assign_irq_vector(irq, desc, mask ?: TARGET_CPUS);
+    spin_lock_irqsave(&desc->lock, flags);
+
+    spin_lock(&vector_lock);
+    ret = _assign_irq_vector(desc, mask ?: TARGET_CPUS);
+    spin_unlock(&vector_lock);
+
     if ( !ret )
     {
         ret = desc->arch.vector;
@@ -580,7 +595,8 @@ int assign_irq_vector(int irq, const cpu
         else
             cpumask_setall(desc->affinity);
     }
-    spin_unlock_irqrestore(&vector_lock, flags);
+
+    spin_unlock_irqrestore(&desc->lock, flags);
 
     return ret;
 }
@@ -754,7 +770,6 @@ void irq_complete_move(struct irq_desc *
 
 unsigned int set_desc_affinity(struct irq_desc *desc, const cpumask_t *mask)
 {
-    unsigned int irq;
     int ret;
     unsigned long flags;
     cpumask_t dest_mask;
@@ -762,10 +777,8 @@ unsigned int set_desc_affinity(struct ir
     if (!cpumask_intersects(mask, &cpu_online_map))
         return BAD_APICID;
 
-    irq = desc->irq;
-
     spin_lock_irqsave(&vector_lock, flags);
-    ret = __assign_irq_vector(irq, desc, mask);
+    ret = _assign_irq_vector(desc, mask);
     spin_unlock_irqrestore(&vector_lock, flags);
 
     if (ret < 0)




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 5/9] x86/IRQ: fix locking around vector management
Posted by Roger Pau Monné 4 years, 10 months ago
On Mon, Apr 29, 2019 at 05:25:03AM -0600, Jan Beulich wrote:
> All of __{assign,bind,clear}_irq_vector() manipulate struct irq_desc
> fields, and hence ought to be called with the descriptor lock held in
> addition to vector_lock. This is currently the case for only
> set_desc_affinity() and destroy_irq(), which also clarifies what the

AFAICT set_desc_affinity is called from set_ioapic_affinity_irq which in
turn is called from setup_ioapic_dest without holding the desc lock.
Is this fine because that's only used a boot time?

> nesting behavior between the locks has to be. Reflect the new
> expectation by having these functions all take a descriptor as
> parameter instead of an interrupt number.
> 
> Drop one of the two leading underscores from all three functions at
> the same time.
> 
> There's one case left where descriptors get manipulated with just
> vector_lock held: setup_vector_irq() assumes its caller to acquire
> vector_lock, and hence can't itself acquire the descriptor locks (wrong
> lock order). I don't currently see how to address this.

Can you take the desc lock and vector lock for each irq in the second
loop of setup_vector_irq and remove the vector locking from the caller?

That might be inefficient, but it's just done for CPU initialization.

AFAICT the first loop of setup_vector_irq doesn't require any locking
since it's per-cpu initialization.

> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Change looks good to me:

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

> 
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -27,6 +27,7 @@
>  #include <public/physdev.h>
>  
>  static int parse_irq_vector_map_param(const char *s);
> +static void _clear_irq_vector(struct irq_desc *desc);
>  
>  /* opt_noirqbalance: If true, software IRQ balancing/affinity is disabled. */
>  bool __read_mostly opt_noirqbalance;
> @@ -112,13 +113,12 @@ static void trace_irq_mask(u32 event, in
>      trace_var(event, 1, sizeof(d), &d);
>  }
>  
> -static int __init __bind_irq_vector(int irq, int vector, const cpumask_t *cpu_mask)
> +static int __init _bind_irq_vector(struct irq_desc *desc, int vector,

I wouldn't be opposed to adding ASSERTs here (and in
_{assign,bind,clear}_irq_vector, set_desc_affinity and destroy_irq)
to check for lock correctness.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 5/9] x86/IRQ: fix locking around vector management
Posted by Jan Beulich 4 years, 10 months ago
>>> On 06.05.19 at 13:48, <roger.pau@citrix.com> wrote:
> On Mon, Apr 29, 2019 at 05:25:03AM -0600, Jan Beulich wrote:
>> All of __{assign,bind,clear}_irq_vector() manipulate struct irq_desc
>> fields, and hence ought to be called with the descriptor lock held in
>> addition to vector_lock. This is currently the case for only
>> set_desc_affinity() and destroy_irq(), which also clarifies what the
> 
> AFAICT set_desc_affinity is called from set_ioapic_affinity_irq which in
> turn is called from setup_ioapic_dest without holding the desc lock.
> Is this fine because that's only used a boot time?

No, this isn't fine, and it's also not only called at boot time. I
simply didn't spot this case of function re-use - I had come to
the conclusion that all calls to set_desc_affinity() would come
through the .set_affinity hook pointers (or happen sufficiently
early).

VT-d's adjust_irq_affinity() has a similar issue.

At boot time alone would be insufficient anyway. Not taking
locks can only be safe prior to bringing up APs; any later
skipping of locking would at least require additional justification.

>> nesting behavior between the locks has to be. Reflect the new
>> expectation by having these functions all take a descriptor as
>> parameter instead of an interrupt number.
>> 
>> Drop one of the two leading underscores from all three functions at
>> the same time.
>> 
>> There's one case left where descriptors get manipulated with just
>> vector_lock held: setup_vector_irq() assumes its caller to acquire
>> vector_lock, and hence can't itself acquire the descriptor locks (wrong
>> lock order). I don't currently see how to address this.
> 
> Can you take the desc lock and vector lock for each irq in the second
> loop of setup_vector_irq and remove the vector locking from the caller?
> 
> That might be inefficient, but it's just done for CPU initialization.
> 
> AFAICT the first loop of setup_vector_irq doesn't require any locking
> since it's per-cpu initialization.

It's not so much the first lock afaict. It's the combined action of
calling this function and setting the online bit which needs the
lock held around it. I.e. the function setting bits in various
descriptors' CPU masks (and the tracking of the vector -> IRQ
relationships) has to be atomic (to the outside world) with the
setting of the CPU's bit in cpu_online_map.

>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Change looks good to me:
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, but I'll not add this for now, given the further locking to
be added as per above.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 6/9] x86/IRQ: reduce unused space in struct arch_irq_desc
Posted by Jan Beulich 4 years, 11 months ago
Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/include/asm-x86/irq.h
+++ b/xen/include/asm-x86/irq.h
@@ -35,8 +35,8 @@ struct arch_irq_desc {
         cpumask_var_t cpu_mask;
         cpumask_var_t old_cpu_mask;
         cpumask_var_t pending_mask;
-        unsigned move_cleanup_count;
         vmask_t *used_vectors;
+        unsigned move_cleanup_count;
         u8 move_in_progress : 1;
         s8 used;
 };



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 6/9] x86/IRQ: reduce unused space in struct arch_irq_desc
Posted by Andrew Cooper 4 years, 11 months ago
On 29/04/2019 12:25, Jan Beulich wrote:
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 7/9] x86/IRQ: drop redundant cpumask_empty() from move_masked_irq()
Posted by Jan Beulich 4 years, 11 months ago
The subsequent cpumask_intersects() covers the "empty" case quite fine.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -638,9 +638,6 @@ void move_masked_irq(struct irq_desc *de
     
     desc->status &= ~IRQ_MOVE_PENDING;
 
-    if (unlikely(cpumask_empty(pending_mask)))
-        return;
-
     if (!desc->handler->set_affinity)
         return;
 



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 7/9] x86/IRQ: drop redundant cpumask_empty() from move_masked_irq()
Posted by Roger Pau Monné 4 years, 10 months ago
On Mon, Apr 29, 2019 at 05:26:10AM -0600, Jan Beulich wrote:
> The subsequent cpumask_intersects() covers the "empty" case quite fine.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 8/9] x86/IRQ: make fixup_irqs() skip unconnected internally used interrupts
Posted by Jan Beulich 4 years, 11 months ago
Since the "Cannot set affinity ..." warning is a one time one, avoid
triggering it already at boot time when parking secondary threads and
the serial console uses a (still unconnected at that time) PCI IRQ.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2412,8 +2412,20 @@ void fixup_irqs(const cpumask_t *mask, b
         vector = irq_to_vector(irq);
         if ( vector >= FIRST_HIPRIORITY_VECTOR &&
              vector <= LAST_HIPRIORITY_VECTOR )
+        {
             cpumask_and(desc->arch.cpu_mask, desc->arch.cpu_mask, mask);
 
+            /*
+             * This can in particular happen when parking secondary threads
+             * during boot and when the serial console wants to use a PCI IRQ.
+             */
+            if ( desc->handler == &no_irq_type )
+            {
+                spin_unlock(&desc->lock);
+                continue;
+            }
+        }
+
         if ( desc->arch.move_cleanup_count )
         {
             /* The cleanup IPI may have got sent while we were still online. */




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 8/9] x86/IRQ: make fixup_irqs() skip unconnected internally used interrupts
Posted by Roger Pau Monné 4 years, 10 months ago
On Mon, Apr 29, 2019 at 05:26:41AM -0600, Jan Beulich wrote:
> Since the "Cannot set affinity ..." warning is a one time one, avoid
> triggering it already at boot time when parking secondary threads and
> the serial console uses a (still unconnected at that time) PCI IRQ.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -2412,8 +2412,20 @@ void fixup_irqs(const cpumask_t *mask, b
>          vector = irq_to_vector(irq);
>          if ( vector >= FIRST_HIPRIORITY_VECTOR &&
>               vector <= LAST_HIPRIORITY_VECTOR )
> +        {
>              cpumask_and(desc->arch.cpu_mask, desc->arch.cpu_mask, mask);
>  
> +            /*
> +             * This can in particular happen when parking secondary threads
> +             * during boot and when the serial console wants to use a PCI IRQ.
> +             */
> +            if ( desc->handler == &no_irq_type )

I found it weird that a irq has a vector assigned (in this case a
high-priority vector) but no irq type set.

Shouldn't the vector be assigned when the type is set?

> +            {
> +                spin_unlock(&desc->lock);
> +                continue;
> +            }
> +        }
> +
>          if ( desc->arch.move_cleanup_count )
>          {
>              /* The cleanup IPI may have got sent while we were still online. */

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 8/9] x86/IRQ: make fixup_irqs() skip unconnected internally used interrupts
Posted by Jan Beulich 4 years, 10 months ago
>>> On 06.05.19 at 15:52, <roger.pau@citrix.com> wrote:
> On Mon, Apr 29, 2019 at 05:26:41AM -0600, Jan Beulich wrote:
>> --- a/xen/arch/x86/irq.c
>> +++ b/xen/arch/x86/irq.c
>> @@ -2412,8 +2412,20 @@ void fixup_irqs(const cpumask_t *mask, b
>>          vector = irq_to_vector(irq);
>>          if ( vector >= FIRST_HIPRIORITY_VECTOR &&
>>               vector <= LAST_HIPRIORITY_VECTOR )
>> +        {
>>              cpumask_and(desc->arch.cpu_mask, desc->arch.cpu_mask, mask);
>>  
>> +            /*
>> +             * This can in particular happen when parking secondary threads
>> +             * during boot and when the serial console wants to use a PCI IRQ.
>> +             */
>> +            if ( desc->handler == &no_irq_type )
> 
> I found it weird that a irq has a vector assigned (in this case a
> high-priority vector) but no irq type set.
> 
> Shouldn't the vector be assigned when the type is set?

In general I would agree, but the way the serial console IRQ
gets set up is different - see smp_intr_init(). When it's a PCI
IRQ (IO-APIC pin 16 or above), we'll know how to program
the IO-APIC RTE (edge/level, activity high/low) only when
Dom0 boots, and hence we don't set ->handler early.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 8/9] x86/IRQ: make fixup_irqs() skip unconnected internally used interrupts
Posted by Roger Pau Monné 4 years, 10 months ago
On Mon, May 06, 2019 at 08:25:51AM -0600, Jan Beulich wrote:
> >>> On 06.05.19 at 15:52, <roger.pau@citrix.com> wrote:
> > On Mon, Apr 29, 2019 at 05:26:41AM -0600, Jan Beulich wrote:
> >> --- a/xen/arch/x86/irq.c
> >> +++ b/xen/arch/x86/irq.c
> >> @@ -2412,8 +2412,20 @@ void fixup_irqs(const cpumask_t *mask, b
> >>          vector = irq_to_vector(irq);
> >>          if ( vector >= FIRST_HIPRIORITY_VECTOR &&
> >>               vector <= LAST_HIPRIORITY_VECTOR )
> >> +        {
> >>              cpumask_and(desc->arch.cpu_mask, desc->arch.cpu_mask, mask);
> >>  
> >> +            /*
> >> +             * This can in particular happen when parking secondary threads
> >> +             * during boot and when the serial console wants to use a PCI IRQ.
> >> +             */
> >> +            if ( desc->handler == &no_irq_type )
> > 
> > I found it weird that a irq has a vector assigned (in this case a
> > high-priority vector) but no irq type set.
> > 
> > Shouldn't the vector be assigned when the type is set?
> 
> In general I would agree, but the way the serial console IRQ
> gets set up is different - see smp_intr_init(). When it's a PCI
> IRQ (IO-APIC pin 16 or above), we'll know how to program
> the IO-APIC RTE (edge/level, activity high/low) only when
> Dom0 boots, and hence we don't set ->handler early.

Oh, OK. I guess assuming level triggered active low unless dom0
provides a different configuration is not safe.

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 9/9] x86/IO-APIC: drop an unused variable from setup_IO_APIC_irqs()
Posted by Jan Beulich 4 years, 11 months ago
Must be a left-over from earlier days.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -984,8 +984,6 @@ static void __init setup_IO_APIC_irqs(vo
 
     for (apic = 0; apic < nr_ioapics; apic++) {
         for (pin = 0; pin < nr_ioapic_entries[apic]; pin++) {
-            struct irq_desc *desc;
-
             /*
              * add it to the IO-APIC irq-routing table:
              */
@@ -1038,7 +1036,6 @@ static void __init setup_IO_APIC_irqs(vo
             if (platform_legacy_irq(irq))
                 disable_8259A_irq(irq_to_desc(irq));
 
-            desc = irq_to_desc(irq);
             SET_DEST(entry, logical, cpu_mask_to_apicid(TARGET_CPUS));
             spin_lock_irqsave(&ioapic_lock, flags);
             __ioapic_write_entry(apic, pin, 0, entry);



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 9/9] x86/IO-APIC: drop an unused variable from setup_IO_APIC_irqs()
Posted by Andrew Cooper 4 years, 11 months ago
On 29/04/2019 12:27, Jan Beulich wrote:
> Must be a left-over from earlier days.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel