xen/x86: limit interrupt movement done by fixup_irqs()

[PATCH for-4.19] xen/x86: limit interrupt movement done by fixup_irqs()

Posted by Roger Pau Monne 6 months, 1 week ago

The current check used in fixup_irqs() to decide whether to move around
interrupts is based on the affinity mask, but such mask can have all bits set,
and hence is unlikely to be a subset of the input mask.  For example if an
interrupt has an affinity mask of all 1s, any input to fixup_irqs() that's not
an all set CPU mask would cause that interrupt to be shuffled around
unconditionally.

What fixup_irqs() care about is evacuating interrupts from CPUs not set on the
input CPU mask, and for that purpose it should check whether the interrupt is
assigned to a CPU not present in the input mask.

Note that the shuffling done by fixup_irqs() is racy: the old destination
target is not allowed to drain any pending interrupts before the new
destination is set, which can lead to spurious 'No irq handler for vector ...'
messages.  While the proposed change doesn't fix the issue, limiting the
amount of shuffling to only strictly necessary reduces the chances of stray
interrupts.

While there also adjust the comment as to the purpose of fixup_irqs().

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/include/asm/irq.h | 2 +-
 xen/arch/x86/irq.c             | 9 +++++++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/include/asm/irq.h b/xen/arch/x86/include/asm/irq.h
index 413994d2133b..33dd7667137b 100644
--- a/xen/arch/x86/include/asm/irq.h
+++ b/xen/arch/x86/include/asm/irq.h
@@ -130,7 +130,7 @@ void free_domain_pirqs(struct domain *d);
 int map_domain_emuirq_pirq(struct domain *d, int pirq, int emuirq);
 int unmap_domain_pirq_emuirq(struct domain *d, int pirq);
 
-/* Reset irq affinities to match the given CPU mask. */
+/* Evacuate interrupts assigned to CPUs not present in the input CPU mask. */
 void fixup_irqs(const cpumask_t *mask, bool verbose);
 void fixup_eoi(void);
 
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 3b951d81bd6d..223f813f6ceb 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2527,7 +2527,7 @@ static int __init cf_check setup_dump_irqs(void)
 }
 __initcall(setup_dump_irqs);
 
-/* Reset irq affinities to match the given CPU mask. */
+/* Evacuate interrupts assigned to CPUs not present in the input CPU mask. */
 void fixup_irqs(const cpumask_t *mask, bool verbose)
 {
     unsigned int irq;
@@ -2576,7 +2576,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
                 release_old_vec(desc);
         }
 
-        if ( !desc->action || cpumask_subset(desc->affinity, mask) )
+        /*
+         * Avoid shuffling the interrupt around if it's assigned to a CPU set
+         * that's all covered by the requested affinity mask.
+         */
+        cpumask_and(affinity, desc->arch.cpu_mask, &cpu_online_map);
+        if ( !desc->action || cpumask_subset(affinity, mask) )
         {
             spin_unlock(&desc->lock);
             continue;
-- 
2.44.0

Re: [PATCH for-4.19] xen/x86: limit interrupt movement done by fixup_irqs()

Posted by Jan Beulich 6 months, 1 week ago

On 16.05.2024 15:22, Roger Pau Monne wrote:
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -2527,7 +2527,7 @@ static int __init cf_check setup_dump_irqs(void)
>  }
>  __initcall(setup_dump_irqs);
>  
> -/* Reset irq affinities to match the given CPU mask. */
> +/* Evacuate interrupts assigned to CPUs not present in the input CPU mask. */
>  void fixup_irqs(const cpumask_t *mask, bool verbose)
>  {

Evacuating is one purpose. Updating affinity, if need be, is another. I've
been wondering more than once though whether it is actually correct /
necessary for ->affinity to be updated by the function. As it stands you
don't remove the respective code, though.

> @@ -2576,7 +2576,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
>                  release_old_vec(desc);
>          }
>  
> -        if ( !desc->action || cpumask_subset(desc->affinity, mask) )
> +        /*
> +         * Avoid shuffling the interrupt around if it's assigned to a CPU set
> +         * that's all covered by the requested affinity mask.
> +         */
> +        cpumask_and(affinity, desc->arch.cpu_mask, &cpu_online_map);
> +        if ( !desc->action || cpumask_subset(affinity, mask) )
>          {
>              spin_unlock(&desc->lock);
>              continue;

First my understanding of how the two CPU sets are used: ->affinity is
merely a representation of where the IRQ is permitted to be handled.
->arch.cpu_mask describes all CPUs where the assigned vector is valid
(and would thus need cleaning up when a new vector is assigned). Neither
of the two needs to be a strict subset of the other.

(It's not really clear whether ->arch.cpu_mask is [supposed to be] a
subset of cpu_online_map.)

If that understanding of mine is correct, going from just ->arch.cpu_mask
doesn't look quite right to me, as that may include CPUs not in ->affinity.
As in: Things could be further limited, by also ANDing in ->affinity.

At the same time your(?) and my variant suffer from cpumask_subset()
possibly returning true despite the CPU the IRQ is presently being
handled on being the one that we're in the process of bringing down. In
that case we absolutely cannot skip the move. (I'd like to note that
there are only two possible input values of "mask" for the function. The
case I think you've been looking into is when it's &cpu_online_map. In
which case cpumask_subset() is going to always return true with your
change in place, if I'm not mistaken. That seems to make your change
questionable. Yet with that I guess I'm overlooking something.)

Plus there remains the question of whether updating ->affinity can indeed
be skipped in more cases than it is right now (prior to you patch), or
whether up front we may want to get rid of that updating (in line, I think,
with earlier changes we did elsewhere) altogether.

Jan

Re: [PATCH for-4.19] xen/x86: limit interrupt movement done by fixup_irqs()

Posted by Roger Pau Monné 6 months, 1 week ago

On Thu, May 16, 2024 at 05:00:54PM +0200, Jan Beulich wrote:
> On 16.05.2024 15:22, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/irq.c
> > +++ b/xen/arch/x86/irq.c
> > @@ -2527,7 +2527,7 @@ static int __init cf_check setup_dump_irqs(void)
> >  }
> >  __initcall(setup_dump_irqs);
> >  
> > -/* Reset irq affinities to match the given CPU mask. */
> > +/* Evacuate interrupts assigned to CPUs not present in the input CPU mask. */
> >  void fixup_irqs(const cpumask_t *mask, bool verbose)
> >  {
> 
> Evacuating is one purpose. Updating affinity, if need be, is another. I've
> been wondering more than once though whether it is actually correct /
> necessary for ->affinity to be updated by the function. As it stands you
> don't remove the respective code, though.

Yeah, I didn't want to get into updating ->affinity in this patch, so
decided to leave that as-is.

Note however that if we shuffle the interrupt around we should update
->affinity, so that the new destination is part of ->affinity?

Otherwise we could end up with the interrupt assigned to CPU(s) that
are not part of the ->affinity mask.  Maybe that's OK, TBH I'm not
sure I understand the purpose of the ->affinity mask, hence why I've
decided to leave it alone in this patch.

> 
> > @@ -2576,7 +2576,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
> >                  release_old_vec(desc);
> >          }
> >  
> > -        if ( !desc->action || cpumask_subset(desc->affinity, mask) )
> > +        /*
> > +         * Avoid shuffling the interrupt around if it's assigned to a CPU set
> > +         * that's all covered by the requested affinity mask.
> > +         */
> > +        cpumask_and(affinity, desc->arch.cpu_mask, &cpu_online_map);
> > +        if ( !desc->action || cpumask_subset(affinity, mask) )
> >          {
> >              spin_unlock(&desc->lock);
> >              continue;
> 
> First my understanding of how the two CPU sets are used: ->affinity is
> merely a representation of where the IRQ is permitted to be handled.
> ->arch.cpu_mask describes all CPUs where the assigned vector is valid
> (and would thus need cleaning up when a new vector is assigned). Neither
> of the two needs to be a strict subset of the other.

Oh, so it's allowed to have the interrupt target a CPU
(->arch.cpu_mask) that's not set in the affinity mask?

> 
> (It's not really clear whether ->arch.cpu_mask is [supposed to be] a
> subset of cpu_online_map.)

Not according to the description in arch_irq_desc:

        /*
         * Except for high priority interrupts @cpu_mask may have bits set for
         * offline CPUs.  Consumers need to be careful to mask this down to
         * online ones as necessary.  There is supposed to always be a non-
         * empty intersection with cpu_online_map.
         */

So ->arch.cpu_mask can (according to the comment) not strictly be a
subset of cpu_online_map.

Note this is only possible when using logical destination mode, so
removing that would turn the destination field into an unsigned int
that would point to a single CPU that must be present in
cpu_online_map.

> If that understanding of mine is correct, going from just ->arch.cpu_mask
> doesn't look quite right to me, as that may include CPUs not in ->affinity.
> As in: Things could be further limited, by also ANDing in ->affinity.

Hm, my expectation would be that ->arch.cpu_mask is a subset of
->affinity, but even if it's not, what we do care in fixup_cpus() is
what CPUs the interrupt targets, as we need to move the interrupt if
the target set is not in the input mask set.  I don't think ->affinity
should be taken into account for that decision, it should be done
based exclusively on which CPU(s) the interrupt target
(->arch.cpu_mask).

> At the same time your(?) and my variant suffer from cpumask_subset()
> possibly returning true despite the CPU the IRQ is presently being
> handled on being the one that we're in the process of bringing down.

No, I'm not sure I see how that can happen.  The CPU we are bringing
down will always be clear from the input CPU mask, and hence
cpumask_subset(->arch.cpu_mask, mask) will only return true if all the
set CPUs in ->arch.cpu_mask are also set in mask.  IOW: none of the
possible target destinations is a CPU to be removed.

> In
> that case we absolutely cannot skip the move. (I'd like to note that
> there are only two possible input values of "mask" for the function. The
> case I think you've been looking into is when it's &cpu_online_map.

Well, it's cpu_online_map which already has the CPU to be offlined
cleared.  See the call to cpumask_clear_cpu() ahead of calling
fixup_irqs().

> In
> which case cpumask_subset() is going to always return true with your
> change in place, if I'm not mistaken. That seems to make your change
> questionable. Yet with that I guess I'm overlooking something.)

I might we wrong, but I think you are missing that the to be offlined
CPU has been removed from cpu_online_map by the time it gets passed
to fixup_irqs().

> Plus there remains the question of whether updating ->affinity can indeed
> be skipped in more cases than it is right now (prior to you patch), or
> whether up front we may want to get rid of that updating (in line, I think,
> with earlier changes we did elsewhere) altogether.

I'm not sure about ->affinity TBH, that's why I've opted to leave it
alone for the time being.  It doesn't seem like a very helpful field
right now.

I would expect it to be mostly static, and populated with the set of
CPUs that are closer to (NUMA-wise) the device the interrupt belongs
to, but I'm not seeing any of this.  It seems to be set arbitrarily to
the CPU that binds the interrupt.

Thanks, Roger.

Re: [PATCH for-4.19] xen/x86: limit interrupt movement done by fixup_irqs()

Posted by Jan Beulich 6 months ago

On 16.05.2024 17:56, Roger Pau Monné wrote:
> On Thu, May 16, 2024 at 05:00:54PM +0200, Jan Beulich wrote:
>> On 16.05.2024 15:22, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/irq.c
>>> +++ b/xen/arch/x86/irq.c
>>> @@ -2527,7 +2527,7 @@ static int __init cf_check setup_dump_irqs(void)
>>>  }
>>>  __initcall(setup_dump_irqs);
>>>  
>>> -/* Reset irq affinities to match the given CPU mask. */
>>> +/* Evacuate interrupts assigned to CPUs not present in the input CPU mask. */
>>>  void fixup_irqs(const cpumask_t *mask, bool verbose)
>>>  {
>>
>> Evacuating is one purpose. Updating affinity, if need be, is another. I've
>> been wondering more than once though whether it is actually correct /
>> necessary for ->affinity to be updated by the function. As it stands you
>> don't remove the respective code, though.
> 
> Yeah, I didn't want to get into updating ->affinity in this patch, so
> decided to leave that as-is.
> 
> Note however that if we shuffle the interrupt around we should update
> ->affinity, so that the new destination is part of ->affinity?

I would put it differently: If we shuffle the IRQ around, we want to
respect ->affinity if at all possible. Only if that's impossible (all CPUs
in ->affinity offline) we may need to update ->affinity as well. Issue is
that ...

> Otherwise we could end up with the interrupt assigned to CPU(s) that
> are not part of the ->affinity mask.  Maybe that's OK, TBH I'm not
> sure I understand the purpose of the ->affinity mask, hence why I've
> decided to leave it alone in this patch.

..., as you say, it's not entirely clear what ->affinity's purpose is, and
hence whether it might be okay(ish) to leave it without any intersection
with online CPUs. If we were to permit that, I'm relatively sure though
that then other code may need updating (it'll at least need auditing).

>>> @@ -2576,7 +2576,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
>>>                  release_old_vec(desc);
>>>          }
>>>  
>>> -        if ( !desc->action || cpumask_subset(desc->affinity, mask) )
>>> +        /*
>>> +         * Avoid shuffling the interrupt around if it's assigned to a CPU set
>>> +         * that's all covered by the requested affinity mask.
>>> +         */
>>> +        cpumask_and(affinity, desc->arch.cpu_mask, &cpu_online_map);
>>> +        if ( !desc->action || cpumask_subset(affinity, mask) )
>>>          {
>>>              spin_unlock(&desc->lock);
>>>              continue;
>>
>> First my understanding of how the two CPU sets are used: ->affinity is
>> merely a representation of where the IRQ is permitted to be handled.
>> ->arch.cpu_mask describes all CPUs where the assigned vector is valid
>> (and would thus need cleaning up when a new vector is assigned). Neither
>> of the two needs to be a strict subset of the other.
> 
> Oh, so it's allowed to have the interrupt target a CPU
> (->arch.cpu_mask) that's not set in the affinity mask?

To be honest I'm not quite sure whether it's "allowed" or merely "happens
to".

Jan

Re: [PATCH for-4.19] xen/x86: limit interrupt movement done by fixup_irqs()

Posted by Jan Beulich 6 months, 1 week ago

On 16.05.2024 17:56, Roger Pau Monné wrote:
> On Thu, May 16, 2024 at 05:00:54PM +0200, Jan Beulich wrote:
>> On 16.05.2024 15:22, Roger Pau Monne wrote:
>>> @@ -2576,7 +2576,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
>>>                  release_old_vec(desc);
>>>          }
>>>  
>>> -        if ( !desc->action || cpumask_subset(desc->affinity, mask) )
>>> +        /*
>>> +         * Avoid shuffling the interrupt around if it's assigned to a CPU set
>>> +         * that's all covered by the requested affinity mask.
>>> +         */
>>> +        cpumask_and(affinity, desc->arch.cpu_mask, &cpu_online_map);
>>> +        if ( !desc->action || cpumask_subset(affinity, mask) )
>>>          {
>>>              spin_unlock(&desc->lock);
>>>              continue;
>>[...]
>> In
>> which case cpumask_subset() is going to always return true with your
>> change in place, if I'm not mistaken. That seems to make your change
>> questionable. Yet with that I guess I'm overlooking something.)
> 
> I might we wrong, but I think you are missing that the to be offlined
> CPU has been removed from cpu_online_map by the time it gets passed
> to fixup_irqs().

Just on this part (I'll need to take more time to reply to other parts):
No, I've specifically paid attention to that fact. Yet for this particular
observation of mine is doesn't matter. If mask == &cpu_online_map, then
no matter what is in cpu_online_map

        cpumask_and(affinity, desc->arch.cpu_mask, &cpu_online_map);

will mask things down to a subset of cpu_online_map, and hence

        if ( !desc->action || cpumask_subset(affinity, mask) )

(effectively being

        if ( !desc->action || cpumask_subset(affinity, &cpu_online_map) )

) is nothing else than

        if ( !desc->action || true )

. Yet that doesn't feel quite right.

Jan

Re: [PATCH for-4.19] xen/x86: limit interrupt movement done by fixup_irqs()

Posted by Roger Pau Monné 6 months, 1 week ago

On Thu, May 16, 2024 at 06:04:22PM +0200, Jan Beulich wrote:
> On 16.05.2024 17:56, Roger Pau Monné wrote:
> > On Thu, May 16, 2024 at 05:00:54PM +0200, Jan Beulich wrote:
> >> On 16.05.2024 15:22, Roger Pau Monne wrote:
> >>> @@ -2576,7 +2576,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
> >>>                  release_old_vec(desc);
> >>>          }
> >>>  
> >>> -        if ( !desc->action || cpumask_subset(desc->affinity, mask) )
> >>> +        /*
> >>> +         * Avoid shuffling the interrupt around if it's assigned to a CPU set
> >>> +         * that's all covered by the requested affinity mask.
> >>> +         */
> >>> +        cpumask_and(affinity, desc->arch.cpu_mask, &cpu_online_map);
> >>> +        if ( !desc->action || cpumask_subset(affinity, mask) )
> >>>          {
> >>>              spin_unlock(&desc->lock);
> >>>              continue;
> >>[...]
> >> In
> >> which case cpumask_subset() is going to always return true with your
> >> change in place, if I'm not mistaken. That seems to make your change
> >> questionable. Yet with that I guess I'm overlooking something.)
> > 
> > I might we wrong, but I think you are missing that the to be offlined
> > CPU has been removed from cpu_online_map by the time it gets passed
> > to fixup_irqs().
> 
> Just on this part (I'll need to take more time to reply to other parts):
> No, I've specifically paid attention to that fact. Yet for this particular
> observation of mine is doesn't matter. If mask == &cpu_online_map, then
> no matter what is in cpu_online_map
> 
>         cpumask_and(affinity, desc->arch.cpu_mask, &cpu_online_map);
> 
> will mask things down to a subset of cpu_online_map, and hence
> 
>         if ( !desc->action || cpumask_subset(affinity, mask) )
> 
> (effectively being
> 
>         if ( !desc->action || cpumask_subset(affinity, &cpu_online_map) )
> 
> ) is nothing else than
> 
>         if ( !desc->action || true )
> 
> . Yet that doesn't feel quite right.

Oh, I get it now.  Ideally we would use cpu_online_map with the to be
removed CPU set, but that's complicated in this context.

For the purposes here we might as well avoid the AND of
->arch.cpu_mask with cpu_online_map and just check:

if ( !desc->action || cpumask_subset(desc->arch.cpu_mask, mask) )

As even if ->arch.cpu_mask has non-online CPUs set aside from the to
be offlined CPU, it would just mean that we might be shuffling more
than strictly necessary.  Note this will only be an issue with cluster
mode, physical mode must always have a single online CPU set in
->arch.cpu_mask.

Thanks, Roger.

Re: [PATCH for-4.19] xen/x86: limit interrupt movement done by fixup_irqs()

Posted by Jan Beulich 6 months ago

On 16.05.2024 18:23, Roger Pau Monné wrote:
> On Thu, May 16, 2024 at 06:04:22PM +0200, Jan Beulich wrote:
>> On 16.05.2024 17:56, Roger Pau Monné wrote:
>>> On Thu, May 16, 2024 at 05:00:54PM +0200, Jan Beulich wrote:
>>>> On 16.05.2024 15:22, Roger Pau Monne wrote:
>>>>> @@ -2576,7 +2576,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
>>>>>                  release_old_vec(desc);
>>>>>          }
>>>>>  
>>>>> -        if ( !desc->action || cpumask_subset(desc->affinity, mask) )
>>>>> +        /*
>>>>> +         * Avoid shuffling the interrupt around if it's assigned to a CPU set
>>>>> +         * that's all covered by the requested affinity mask.
>>>>> +         */
>>>>> +        cpumask_and(affinity, desc->arch.cpu_mask, &cpu_online_map);
>>>>> +        if ( !desc->action || cpumask_subset(affinity, mask) )
>>>>>          {
>>>>>              spin_unlock(&desc->lock);
>>>>>              continue;
>>>> [...]
>>>> In
>>>> which case cpumask_subset() is going to always return true with your
>>>> change in place, if I'm not mistaken. That seems to make your change
>>>> questionable. Yet with that I guess I'm overlooking something.)
>>>
>>> I might we wrong, but I think you are missing that the to be offlined
>>> CPU has been removed from cpu_online_map by the time it gets passed
>>> to fixup_irqs().
>>
>> Just on this part (I'll need to take more time to reply to other parts):
>> No, I've specifically paid attention to that fact. Yet for this particular
>> observation of mine is doesn't matter. If mask == &cpu_online_map, then
>> no matter what is in cpu_online_map
>>
>>         cpumask_and(affinity, desc->arch.cpu_mask, &cpu_online_map);
>>
>> will mask things down to a subset of cpu_online_map, and hence
>>
>>         if ( !desc->action || cpumask_subset(affinity, mask) )
>>
>> (effectively being
>>
>>         if ( !desc->action || cpumask_subset(affinity, &cpu_online_map) )
>>
>> ) is nothing else than
>>
>>         if ( !desc->action || true )
>>
>> . Yet that doesn't feel quite right.
> 
> Oh, I get it now.  Ideally we would use cpu_online_map with the to be
> removed CPU set, but that's complicated in this context.
> 
> For the purposes here we might as well avoid the AND of
> ->arch.cpu_mask with cpu_online_map and just check:
> 
> if ( !desc->action || cpumask_subset(desc->arch.cpu_mask, mask) )

Right, just that I wouldn't say "as well" - we simply may not mask with
cpu_online_map, for the reason stated in the earlier reply.

However, I remain unconvinced that we can outright drop the check of
->affinity. While I doubt cpumask_subset() was correct before, if there's
no intersection with cpu_online_map we still need to update ->affinity
too, to avoid it becoming an "impossible" setting. So I continue to think
that the logic as we have it right now may need splitting into two parts,
one dealing with IRQ movement and the other with ->affinity.

> As even if ->arch.cpu_mask has non-online CPUs set aside from the to
> be offlined CPU, it would just mean that we might be shuffling more
> than strictly necessary.

Limiting the overall benefit of your change, but yes.

>  Note this will only be an issue with cluster
> mode, physical mode must always have a single online CPU set in
> ->arch.cpu_mask.

Sure.

Jan