irqchip/sifive-plic: Fix plic_set_affinity() only enables 1 cpu

[PATCH] irqchip/sifive-plic: Fix plic_set_affinity() only enables 1 cpu

Posted by Nam Cao 1 year, 7 months ago

plic_set_affinity() only enables interrupt for the first possible CPU in
the mask. The point is to prevent all CPUs trying to claim an interrupt,
but only one CPU succeeds and the other CPUs wasted some clock cycles for
nothing.

However, there are two problems with that:
1. Users cannot enable interrupt on multiple CPUs (for example, to minimize
interrupt latency).
2. Even if users do not touch SMP interrupt affinity, plic_set_affinity()
is still invoked once (in plic_irqdomain_map()). Thus, by default, only
CPU0 handles interrupts from PLIC. That may overload CPU0.

Considering this optimization is not strictly the best (it is tradeoff
between CPU cycles and interrupt latency), it should not be forced on
users.

Rewrite plic_set_affinity() to enable interrupt for all possible CPUs in
the mask.

Before:
$ cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
 10:       2538       2695       3080       2309  RISC-V INTC   5 Edge      riscv-timer
 12:          3          0          0          0  SiFive PLIC 111 Edge      17030000.power-controller
 13:       1163          0          0          0  SiFive PLIC  25 Edge      13010000.spi
 14:         60          0          0          0  SiFive PLIC   7 Edge      end0
 15:          0          0          0          0  SiFive PLIC   6 Edge      end0
 16:          0          0          0          0  SiFive PLIC   5 Edge      end0
 17:          0          0          0          0  SiFive PLIC  78 Edge      end1
 18:          0          0          0          0  SiFive PLIC  77 Edge      end1
 19:          0          0          0          0  SiFive PLIC  76 Edge      end1
 22:        796          0          0          0  SiFive PLIC  32 Edge      ttyS0
 23:          0          0          0          0  SiFive PLIC  38 Edge      pl022
 24:       9062          0          0          0  SiFive PLIC  75 Edge      dw-mci
 25:          0          0          0          0  SiFive PLIC  35 Edge      10030000.i2c
 26:          0          0          0          0  SiFive PLIC  37 Edge      10050000.i2c
 27:          1          0          0          0  SiFive PLIC  50 Edge      12050000.i2c
 28:          0          0          0          0  SiFive PLIC  51 Edge      12060000.i2c
IPI0:       118         98         88        138  Rescheduling interrupts
IPI1:      2272       1910       3758       3200  Function call interrupts
IPI2:         0          0          0          0  CPU stop interrupts
IPI3:         0          0          0          0  CPU stop (for crash dump) interrupts
IPI4:         0          0          0          0  IRQ work interrupts
IPI5:         0          0          0          0  Timer broadcast interrupts

After:
$ cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
 10:       2539       2734       2295       2552  RISC-V INTC   5 Edge      riscv-timer
 12:          2          1          0          0  SiFive PLIC 111 Edge      17030000.power-controller
 13:        643        194        368         75  SiFive PLIC  25 Edge      13010000.spi
 14:          6         22         19         27  SiFive PLIC   7 Edge      end0
 15:          0          0          0          0  SiFive PLIC   6 Edge      end0
 16:          0          0          0          0  SiFive PLIC   5 Edge      end0
 17:          0          0          0          0  SiFive PLIC  78 Edge      end1
 18:          0          0          0          0  SiFive PLIC  77 Edge      end1
 19:          0          0          0          0  SiFive PLIC  76 Edge      end1
 22:        158        254        226        207  SiFive PLIC  32 Edge      ttyS0
 23:          0          0          0          0  SiFive PLIC  38 Edge      pl022
 24:       2265       2250       1452       2024  SiFive PLIC  75 Edge      dw-mci
 25:          0          0          0          0  SiFive PLIC  35 Edge      10030000.i2c
 26:          0          0          0          0  SiFive PLIC  37 Edge      10050000.i2c
 27:          0          0          0          1  SiFive PLIC  50 Edge      12050000.i2c
 28:          0          0          0          0  SiFive PLIC  51 Edge      12060000.i2c
IPI0:        92        118        116        120  Rescheduling interrupts
IPI1:      4135       2653       2170       3160  Function call interrupts
IPI2:         0          0          0          0  CPU stop interrupts
IPI3:         0          0          0          0  CPU stop (for crash dump) interrupts
IPI4:         0          0          0          0  IRQ work interrupts
IPI5:         0          0          0          0  Timer broadcast interrupts

Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: Anup Patel <anup@brainfault.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
---
 drivers/irqchip/irq-sifive-plic.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
index 9e22f7e378f5..f30bdb94ceeb 100644
--- a/drivers/irqchip/irq-sifive-plic.c
+++ b/drivers/irqchip/irq-sifive-plic.c
@@ -163,20 +163,19 @@ static void plic_irq_eoi(struct irq_data *d)
 static int plic_set_affinity(struct irq_data *d,
 			     const struct cpumask *mask_val, bool force)
 {
-	unsigned int cpu;
 	struct plic_priv *priv = irq_data_get_irq_chip_data(d);
+	struct cpumask new_mask;
 
-	if (force)
-		cpu = cpumask_first_and(&priv->lmask, mask_val);
-	else
-		cpu = cpumask_first_and_and(&priv->lmask, mask_val, cpu_online_mask);
+	cpumask_and(&new_mask, mask_val, &priv->lmask);
+	if (!force)
+		cpumask_and(&new_mask, &new_mask, cpu_online_mask);
 
-	if (cpu >= nr_cpu_ids)
+	if (cpumask_empty(&new_mask))
 		return -EINVAL;
 
 	plic_irq_disable(d);
 
-	irq_data_update_effective_affinity(d, cpumask_of(cpu));
+	irq_data_update_effective_affinity(d, &new_mask);
 
 	if (!irqd_irq_disabled(d))
 		plic_irq_enable(d);
-- 
2.39.2

Re: [PATCH] irqchip/sifive-plic: Fix plic_set_affinity() only enables 1 cpu

Posted by Anup Patel 1 year, 7 months ago

On Wed, Jul 3, 2024 at 12:57 PM Nam Cao <namcao@linutronix.de> wrote:
>
> plic_set_affinity() only enables interrupt for the first possible CPU in
> the mask. The point is to prevent all CPUs trying to claim an interrupt,
> but only one CPU succeeds and the other CPUs wasted some clock cycles for
> nothing.
>
> However, there are two problems with that:
> 1. Users cannot enable interrupt on multiple CPUs (for example, to minimize
> interrupt latency).

Well, you are assuming that multiple CPUs are always idle or available
to process interrupts. In other words, if the system is loaded running
some workload on each CPU then performance on multiple CPUs
will degrade since multiple CPUs will wastefully try to claim interrupt.

In reality, we can't make such assumptions and it is better to target a
particular CPU for processing interrupts (just like various other interrupt
controllers). For balancing interrupt processing load, we have software
irq balancers running in user-space (or kernel space) which do a
reasonably fine job of picking appropriate CPU for interrupt processing.

> 2. Even if users do not touch SMP interrupt affinity, plic_set_affinity()
> is still invoked once (in plic_irqdomain_map()). Thus, by default, only
> CPU0 handles interrupts from PLIC. That may overload CPU0.
>
> Considering this optimization is not strictly the best (it is tradeoff
> between CPU cycles and interrupt latency), it should not be forced on
> users.

Randomly attempting to take an interrupt on multiple CPUs affects the
workload running all such CPUs (see above comment).

It's better to have a more predictable and targeted interrupt affinity
so that software irq balancers work effectively.

>
> Rewrite plic_set_affinity() to enable interrupt for all possible CPUs in
> the mask.

At least from my side, it is a NO to this approach.

Regards,
Anup

>
> Before:
> $ cat /proc/interrupts
>            CPU0       CPU1       CPU2       CPU3
>  10:       2538       2695       3080       2309  RISC-V INTC   5 Edge      riscv-timer
>  12:          3          0          0          0  SiFive PLIC 111 Edge      17030000.power-controller
>  13:       1163          0          0          0  SiFive PLIC  25 Edge      13010000.spi
>  14:         60          0          0          0  SiFive PLIC   7 Edge      end0
>  15:          0          0          0          0  SiFive PLIC   6 Edge      end0
>  16:          0          0          0          0  SiFive PLIC   5 Edge      end0
>  17:          0          0          0          0  SiFive PLIC  78 Edge      end1
>  18:          0          0          0          0  SiFive PLIC  77 Edge      end1
>  19:          0          0          0          0  SiFive PLIC  76 Edge      end1
>  22:        796          0          0          0  SiFive PLIC  32 Edge      ttyS0
>  23:          0          0          0          0  SiFive PLIC  38 Edge      pl022
>  24:       9062          0          0          0  SiFive PLIC  75 Edge      dw-mci
>  25:          0          0          0          0  SiFive PLIC  35 Edge      10030000.i2c
>  26:          0          0          0          0  SiFive PLIC  37 Edge      10050000.i2c
>  27:          1          0          0          0  SiFive PLIC  50 Edge      12050000.i2c
>  28:          0          0          0          0  SiFive PLIC  51 Edge      12060000.i2c
> IPI0:       118         98         88        138  Rescheduling interrupts
> IPI1:      2272       1910       3758       3200  Function call interrupts
> IPI2:         0          0          0          0  CPU stop interrupts
> IPI3:         0          0          0          0  CPU stop (for crash dump) interrupts
> IPI4:         0          0          0          0  IRQ work interrupts
> IPI5:         0          0          0          0  Timer broadcast interrupts
>
> After:
> $ cat /proc/interrupts
>            CPU0       CPU1       CPU2       CPU3
>  10:       2539       2734       2295       2552  RISC-V INTC   5 Edge      riscv-timer
>  12:          2          1          0          0  SiFive PLIC 111 Edge      17030000.power-controller
>  13:        643        194        368         75  SiFive PLIC  25 Edge      13010000.spi
>  14:          6         22         19         27  SiFive PLIC   7 Edge      end0
>  15:          0          0          0          0  SiFive PLIC   6 Edge      end0
>  16:          0          0          0          0  SiFive PLIC   5 Edge      end0
>  17:          0          0          0          0  SiFive PLIC  78 Edge      end1
>  18:          0          0          0          0  SiFive PLIC  77 Edge      end1
>  19:          0          0          0          0  SiFive PLIC  76 Edge      end1
>  22:        158        254        226        207  SiFive PLIC  32 Edge      ttyS0
>  23:          0          0          0          0  SiFive PLIC  38 Edge      pl022
>  24:       2265       2250       1452       2024  SiFive PLIC  75 Edge      dw-mci
>  25:          0          0          0          0  SiFive PLIC  35 Edge      10030000.i2c
>  26:          0          0          0          0  SiFive PLIC  37 Edge      10050000.i2c
>  27:          0          0          0          1  SiFive PLIC  50 Edge      12050000.i2c
>  28:          0          0          0          0  SiFive PLIC  51 Edge      12060000.i2c
> IPI0:        92        118        116        120  Rescheduling interrupts
> IPI1:      4135       2653       2170       3160  Function call interrupts
> IPI2:         0          0          0          0  CPU stop interrupts
> IPI3:         0          0          0          0  CPU stop (for crash dump) interrupts
> IPI4:         0          0          0          0  IRQ work interrupts
> IPI5:         0          0          0          0  Timer broadcast interrupts
>
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> Cc: Anup Patel <anup@brainfault.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  drivers/irqchip/irq-sifive-plic.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
> index 9e22f7e378f5..f30bdb94ceeb 100644
> --- a/drivers/irqchip/irq-sifive-plic.c
> +++ b/drivers/irqchip/irq-sifive-plic.c
> @@ -163,20 +163,19 @@ static void plic_irq_eoi(struct irq_data *d)
>  static int plic_set_affinity(struct irq_data *d,
>                              const struct cpumask *mask_val, bool force)
>  {
> -       unsigned int cpu;
>         struct plic_priv *priv = irq_data_get_irq_chip_data(d);
> +       struct cpumask new_mask;
>
> -       if (force)
> -               cpu = cpumask_first_and(&priv->lmask, mask_val);
> -       else
> -               cpu = cpumask_first_and_and(&priv->lmask, mask_val, cpu_online_mask);
> +       cpumask_and(&new_mask, mask_val, &priv->lmask);
> +       if (!force)
> +               cpumask_and(&new_mask, &new_mask, cpu_online_mask);
>
> -       if (cpu >= nr_cpu_ids)
> +       if (cpumask_empty(&new_mask))
>                 return -EINVAL;
>
>         plic_irq_disable(d);
>
> -       irq_data_update_effective_affinity(d, cpumask_of(cpu));
> +       irq_data_update_effective_affinity(d, &new_mask);
>
>         if (!irqd_irq_disabled(d))
>                 plic_irq_enable(d);
> --
> 2.39.2
>

Re: [PATCH] irqchip/sifive-plic: Fix plic_set_affinity() only enables 1 cpu

Posted by Nam Cao 1 year, 7 months ago

On Wed, Jul 03, 2024 at 05:28:23PM +0530, Anup Patel wrote:
> On Wed, Jul 3, 2024 at 12:57 PM Nam Cao <namcao@linutronix.de> wrote:
> >
> > plic_set_affinity() only enables interrupt for the first possible CPU in
> > the mask. The point is to prevent all CPUs trying to claim an interrupt,
> > but only one CPU succeeds and the other CPUs wasted some clock cycles for
> > nothing.
> >
> > However, there are two problems with that:
> > 1. Users cannot enable interrupt on multiple CPUs (for example, to minimize
> > interrupt latency).
> 
> Well, you are assuming that multiple CPUs are always idle or available
> to process interrupts. In other words, if the system is loaded running
> some workload on each CPU then performance on multiple CPUs
> will degrade since multiple CPUs will wastefully try to claim interrupt.
> 
> In reality, we can't make such assumptions and it is better to target a
> particular CPU for processing interrupts (just like various other interrupt
> controllers). For balancing interrupt processing load, we have software
> irq balancers running in user-space (or kernel space) which do a
> reasonably fine job of picking appropriate CPU for interrupt processing.

Then we should leave the job of distributing interrupts to those tools,
right? Not all use cases want minimally wasted CPU cycles. For example, if
a particular interrupt does not arrive very often, but when it does, it
needs to be handled fast; in this example, clearly enabling this interrupt
for all CPUs is superior.

But I am sure there are users who don't use something like irqbalance and
just let the system do the default behavior. So I see your point of not
wasting CPU cycles. So, how about we keep this patch, but also add a
"default policy" which evenly distributes the interrupts to individually
CPUs (best effort only). Something like the un-tested patch below?

Best regards,
Nam

diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
index f30bdb94ceeb..953f375835b0 100644
--- a/drivers/irqchip/irq-sifive-plic.c
+++ b/drivers/irqchip/irq-sifive-plic.c
@@ -312,7 +312,7 @@ static int plic_irqdomain_map(struct irq_domain *d, unsigned int irq,
 	irq_domain_set_info(d, irq, hwirq, &plic_chip, d->host_data,
 			    handle_fasteoi_irq, NULL, NULL);
 	irq_set_noprobe(irq);
-	irq_set_affinity(irq, &priv->lmask);
+	irq_set_affinity(irq, cpumask_of(cpumask_any_distribute(&priv->lmask)));
 	return 0;
 }

Re: [PATCH] irqchip/sifive-plic: Fix plic_set_affinity() only enables 1 cpu

Posted by Anup Patel 1 year, 7 months ago

On Wed, Jul 3, 2024 at 6:03 PM Nam Cao <namcao@linutronix.de> wrote:
>
> On Wed, Jul 03, 2024 at 05:28:23PM +0530, Anup Patel wrote:
> > On Wed, Jul 3, 2024 at 12:57 PM Nam Cao <namcao@linutronix.de> wrote:
> > >
> > > plic_set_affinity() only enables interrupt for the first possible CPU in
> > > the mask. The point is to prevent all CPUs trying to claim an interrupt,
> > > but only one CPU succeeds and the other CPUs wasted some clock cycles for
> > > nothing.
> > >
> > > However, there are two problems with that:
> > > 1. Users cannot enable interrupt on multiple CPUs (for example, to minimize
> > > interrupt latency).
> >
> > Well, you are assuming that multiple CPUs are always idle or available
> > to process interrupts. In other words, if the system is loaded running
> > some workload on each CPU then performance on multiple CPUs
> > will degrade since multiple CPUs will wastefully try to claim interrupt.
> >
> > In reality, we can't make such assumptions and it is better to target a
> > particular CPU for processing interrupts (just like various other interrupt
> > controllers). For balancing interrupt processing load, we have software
> > irq balancers running in user-space (or kernel space) which do a
> > reasonably fine job of picking appropriate CPU for interrupt processing.
>
> Then we should leave the job of distributing interrupts to those tools,
> right? Not all use cases want minimally wasted CPU cycles. For example, if
> a particular interrupt does not arrive very often, but when it does, it
> needs to be handled fast; in this example, clearly enabling this interrupt
> for all CPUs is superior.

This is a very specific case which you are trying to optimize and in the
process hurting performance in many other cases. There are many high
speed IOs (network, storage, etc) where rate of interrupt is high so for
such IO your patch will degrade performance on multiple CPUs.

>
> But I am sure there are users who don't use something like irqbalance and
> just let the system do the default behavior. So I see your point of not
> wasting CPU cycles. So, how about we keep this patch, but also add a
> "default policy" which evenly distributes the interrupts to individually
> CPUs (best effort only). Something like the un-tested patch below?

I would suggest dropping this patch and for the sake of distributing
interrupts at boot time we can have the below change.

>
> Best regards,
> Nam
>
> diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
> index f30bdb94ceeb..953f375835b0 100644
> --- a/drivers/irqchip/irq-sifive-plic.c
> +++ b/drivers/irqchip/irq-sifive-plic.c
> @@ -312,7 +312,7 @@ static int plic_irqdomain_map(struct irq_domain *d, unsigned int irq,
>         irq_domain_set_info(d, irq, hwirq, &plic_chip, d->host_data,
>                             handle_fasteoi_irq, NULL, NULL);
>         irq_set_noprobe(irq);
> -       irq_set_affinity(irq, &priv->lmask);
> +       irq_set_affinity(irq, cpumask_of(cpumask_any_distribute(&priv->lmask)));
>         return 0;
>  }
>

Regards,
Anup

Re: [PATCH] irqchip/sifive-plic: Fix plic_set_affinity() only enables 1 cpu

Posted by Nam Cao 1 year, 7 months ago

On Wed, Jul 03, 2024 at 08:31:31PM +0530, Anup Patel wrote:
> On Wed, Jul 3, 2024 at 6:03 PM Nam Cao <namcao@linutronix.de> wrote:
> >
> > On Wed, Jul 03, 2024 at 05:28:23PM +0530, Anup Patel wrote:
> > > On Wed, Jul 3, 2024 at 12:57 PM Nam Cao <namcao@linutronix.de> wrote:
> > > >
> > > > plic_set_affinity() only enables interrupt for the first possible CPU in
> > > > the mask. The point is to prevent all CPUs trying to claim an interrupt,
> > > > but only one CPU succeeds and the other CPUs wasted some clock cycles for
> > > > nothing.
> > > >
> > > > However, there are two problems with that:
> > > > 1. Users cannot enable interrupt on multiple CPUs (for example, to minimize
> > > > interrupt latency).
> > >
> > > Well, you are assuming that multiple CPUs are always idle or available
> > > to process interrupts. In other words, if the system is loaded running
> > > some workload on each CPU then performance on multiple CPUs
> > > will degrade since multiple CPUs will wastefully try to claim interrupt.
> > >
> > > In reality, we can't make such assumptions and it is better to target a
> > > particular CPU for processing interrupts (just like various other interrupt
> > > controllers). For balancing interrupt processing load, we have software
> > > irq balancers running in user-space (or kernel space) which do a
> > > reasonably fine job of picking appropriate CPU for interrupt processing.
> >
> > Then we should leave the job of distributing interrupts to those tools,
> > right? Not all use cases want minimally wasted CPU cycles. For example, if
> > a particular interrupt does not arrive very often, but when it does, it
> > needs to be handled fast; in this example, clearly enabling this interrupt
> > for all CPUs is superior.
> 
> This is a very specific case which you are trying to optimize and in the
> process hurting performance in many other cases. There are many high
> speed IOs (network, storage, etc) where rate of interrupt is high so for
> such IO your patch will degrade performance on multiple CPUs.

No, it wouldn't "hurting performance in many other cases". It would give
users the ability to do what they want, including hurting performance as
you said, or improving performance as I pointed out earlier.

I am willing to bet that most users don't ever touch this. But if they do,
they better know what they are doing. If they want to waste their CPU
cycles, so be it.

My point essentially is that kernel shouldn't force any policy on users.
The only case this makes sense is when the policy is _strictly_ better than
anything else, which is not true here. What the driver should do is
providing a "good enough for most" default, but still let users decide
what's best for them.

Side note: if I am not mistaken, the effective affinity mask thing is for
hardware limitation of the chips who cannot enable interrupt for all CPUs
in the mask. RISC-V PLIC, on the other hand, can enable interrupts for any
CPU, and therefore should do so.

Best regards,
Nam

Re: [PATCH] irqchip/sifive-plic: Fix plic_set_affinity() only enables 1 cpu

Posted by Thomas Gleixner 1 year, 7 months ago

On Sun, Jul 07 2024 at 16:34, Nam Cao wrote:
> On Wed, Jul 03, 2024 at 08:31:31PM +0530, Anup Patel wrote:
>> On Wed, Jul 3, 2024 at 6:03 PM Nam Cao <namcao@linutronix.de> wrote:
>> > Then we should leave the job of distributing interrupts to those tools,
>> > right? Not all use cases want minimally wasted CPU cycles. For example, if
>> > a particular interrupt does not arrive very often, but when it does, it
>> > needs to be handled fast; in this example, clearly enabling this interrupt
>> > for all CPUs is superior.

It's not really superior at all.

The problem is that a single interrupt is delivered to multiple CPUs at
once and there is no mechanism in the hardware to select one CPU from
the given set which can handle it quickly because it does not have
interrupts disabled. The spec is clear about this:

"The PLIC hardware only supports multicasting of interrupts, such that
 all enabled targets will receive interrupt notifications for a given
 active interrupt. Multicasting provides rapid response since the
 fastest responder claims the interrupt, but can be wasteful in
 high-interrupt-rate scenarios if multiple harts take a trap for an
 interrupt that only one can successfully claim."

It's even worse. $N CPUs will in the worst case congest on the interrupt
descriptor lock and for edge type interrupts it will cause the interrupt
to be masked, marked pending and the handling CPU is forced to unmask
and run another handler invocation. That's just wrong.

Aside of that this can cause the loss of cache and memory locality in
high speed scenarios when the interrupt handler bounces around between
CPUs.

>> This is a very specific case which you are trying to optimize and in the
>> process hurting performance in many other cases. There are many high
>> speed IOs (network, storage, etc) where rate of interrupt is high so for
>> such IO your patch will degrade performance on multiple CPUs.
>
> No, it wouldn't "hurting performance in many other cases". It would give
> users the ability to do what they want, including hurting performance as
> you said, or improving performance as I pointed out earlier.

Kinda, but you make the bad case the default for very dubious benefits.

I grant you that the current implementation which defaults everything to
CPU0 is suboptimal, but that's an orthogonal problem. E.g. X86 selects
the single target CPU from the mask by spreading the interrupts out
halfways evenly.

But if you really care about low latency, then you want to selectively
associate interrupts to particular CPUs and ensure that the associated
processing is bound to the same CPU.

> I am willing to bet that most users don't ever touch this. But if they do,
> they better know what they are doing. If they want to waste their CPU
> cycles, so be it.

That's not what your patch does. It defaults to bad behaviour.

> My point essentially is that kernel shouldn't force any policy on users.
> The only case this makes sense is when the policy is _strictly_ better than
> anything else, which is not true here. What the driver should do is
> providing a "good enough for most" default, but still let users decide
> what's best for them.

See what I explained you above. Changing this to multi-CPU delivery is
not really good enough and there is a valid technical reason not to do
that.

> Side note: if I am not mistaken, the effective affinity mask thing is for
> hardware limitation of the chips who cannot enable interrupt for all CPUs
> in the mask. RISC-V PLIC, on the other hand, can enable interrupts for any
> CPU, and therefore should do so.

It's not only hardware limitations which cause architectures to limit
the CPU mask to a single target. On X86 systems which support logical
destination or cluster mode this was disabled even though the 'multiple
CPUs try to handle it' problem is mitigated in hardware. The benefit is
marginal for the common case and is not sufficient for the low latency
requirements case.

Using a spreading algorithm for the default case will help for the
common case, but won't be sufficient for special latency sensitive
scenarios. Those are the scenarios where the user needs to take care and
set the affinities correctly.

Thanks,

        tglx

Re: [PATCH] irqchip/sifive-plic: Fix plic_set_affinity() only enables 1 cpu

Posted by Nam Cao 1 year, 6 months ago

On Wed, Jul 10, 2024 at 10:19:54PM +0200, Thomas Gleixner wrote:
> On Sun, Jul 07 2024 at 16:34, Nam Cao wrote:
> > On Wed, Jul 03, 2024 at 08:31:31PM +0530, Anup Patel wrote:
> >> On Wed, Jul 3, 2024 at 6:03 PM Nam Cao <namcao@linutronix.de> wrote:
> >> > Then we should leave the job of distributing interrupts to those tools,
> >> > right? Not all use cases want minimally wasted CPU cycles. For example, if
> >> > a particular interrupt does not arrive very often, but when it does, it
> >> > needs to be handled fast; in this example, clearly enabling this interrupt
> >> > for all CPUs is superior.
> 
> It's not really superior at all.
> 
> The problem is that a single interrupt is delivered to multiple CPUs at
> once and there is no mechanism in the hardware to select one CPU from
> the given set which can handle it quickly because it does not have
> interrupts disabled. The spec is clear about this:
> 
> "The PLIC hardware only supports multicasting of interrupts, such that
>  all enabled targets will receive interrupt notifications for a given
>  active interrupt. Multicasting provides rapid response since the
>  fastest responder claims the interrupt, but can be wasteful in
>  high-interrupt-rate scenarios if multiple harts take a trap for an
>  interrupt that only one can successfully claim."
> 
> It's even worse. $N CPUs will in the worst case congest on the interrupt
> descriptor lock and for edge type interrupts it will cause the interrupt
> to be masked, marked pending and the handling CPU is forced to unmask
> and run another handler invocation. That's just wrong.

Hmm I'm not sure if this case can happen. CPUs do not touch the interrupt,
including taking the description lock, unless it has claimed the interrupt
successfully; and only 1 CPU can claim successfully.

But it doesn't matter, your other points are enough for me to drop this
patch.

> Aside of that this can cause the loss of cache and memory locality in
> high speed scenarios when the interrupt handler bounces around between
> CPUs.
> 
> >> This is a very specific case which you are trying to optimize and in the
> >> process hurting performance in many other cases. There are many high
> >> speed IOs (network, storage, etc) where rate of interrupt is high so for
> >> such IO your patch will degrade performance on multiple CPUs.
> >
> > No, it wouldn't "hurting performance in many other cases". It would give
> > users the ability to do what they want, including hurting performance as
> > you said, or improving performance as I pointed out earlier.
> 
> Kinda, but you make the bad case the default for very dubious benefits.
> 
> I grant you that the current implementation which defaults everything to
> CPU0 is suboptimal, but that's an orthogonal problem. E.g. X86 selects
> the single target CPU from the mask by spreading the interrupts out
> halfways evenly.
> 
> But if you really care about low latency, then you want to selectively
> associate interrupts to particular CPUs and ensure that the associated
> processing is bound to the same CPU.
> 
> > I am willing to bet that most users don't ever touch this. But if they do,
> > they better know what they are doing. If they want to waste their CPU
> > cycles, so be it.
> 
> That's not what your patch does. It defaults to bad behaviour.
> 
> > My point essentially is that kernel shouldn't force any policy on users.
> > The only case this makes sense is when the policy is _strictly_ better than
> > anything else, which is not true here. What the driver should do is
> > providing a "good enough for most" default, but still let users decide
> > what's best for them.
> 
> See what I explained you above. Changing this to multi-CPU delivery is
> not really good enough and there is a valid technical reason not to do
> that.
> 
> > Side note: if I am not mistaken, the effective affinity mask thing is for
> > hardware limitation of the chips who cannot enable interrupt for all CPUs
> > in the mask. RISC-V PLIC, on the other hand, can enable interrupts for any
> > CPU, and therefore should do so.
> 
> It's not only hardware limitations which cause architectures to limit
> the CPU mask to a single target. On X86 systems which support logical
> destination or cluster mode this was disabled even though the 'multiple
> CPUs try to handle it' problem is mitigated in hardware. The benefit is
> marginal for the common case and is not sufficient for the low latency
> requirements case.
> 
> Using a spreading algorithm for the default case will help for the
> common case, but won't be sufficient for special latency sensitive
> scenarios. Those are the scenarios where the user needs to take care and
> set the affinities correctly.

Thanks for the explanation, I didn't see all the angles on this. Let's drop
it then.

Best regards,
Nam