drivers/irqchip/irq-sifive-plic.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
plic_set_affinity() only enables interrupt for the first possible CPU in
the mask. The point is to prevent all CPUs trying to claim an interrupt,
but only one CPU succeeds and the other CPUs wasted some clock cycles for
nothing.
However, there are two problems with that:
1. Users cannot enable interrupt on multiple CPUs (for example, to minimize
interrupt latency).
2. Even if users do not touch SMP interrupt affinity, plic_set_affinity()
is still invoked once (in plic_irqdomain_map()). Thus, by default, only
CPU0 handles interrupts from PLIC. That may overload CPU0.
Considering this optimization is not strictly the best (it is tradeoff
between CPU cycles and interrupt latency), it should not be forced on
users.
Rewrite plic_set_affinity() to enable interrupt for all possible CPUs in
the mask.
Before:
$ cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
10: 2538 2695 3080 2309 RISC-V INTC 5 Edge riscv-timer
12: 3 0 0 0 SiFive PLIC 111 Edge 17030000.power-controller
13: 1163 0 0 0 SiFive PLIC 25 Edge 13010000.spi
14: 60 0 0 0 SiFive PLIC 7 Edge end0
15: 0 0 0 0 SiFive PLIC 6 Edge end0
16: 0 0 0 0 SiFive PLIC 5 Edge end0
17: 0 0 0 0 SiFive PLIC 78 Edge end1
18: 0 0 0 0 SiFive PLIC 77 Edge end1
19: 0 0 0 0 SiFive PLIC 76 Edge end1
22: 796 0 0 0 SiFive PLIC 32 Edge ttyS0
23: 0 0 0 0 SiFive PLIC 38 Edge pl022
24: 9062 0 0 0 SiFive PLIC 75 Edge dw-mci
25: 0 0 0 0 SiFive PLIC 35 Edge 10030000.i2c
26: 0 0 0 0 SiFive PLIC 37 Edge 10050000.i2c
27: 1 0 0 0 SiFive PLIC 50 Edge 12050000.i2c
28: 0 0 0 0 SiFive PLIC 51 Edge 12060000.i2c
IPI0: 118 98 88 138 Rescheduling interrupts
IPI1: 2272 1910 3758 3200 Function call interrupts
IPI2: 0 0 0 0 CPU stop interrupts
IPI3: 0 0 0 0 CPU stop (for crash dump) interrupts
IPI4: 0 0 0 0 IRQ work interrupts
IPI5: 0 0 0 0 Timer broadcast interrupts
After:
$ cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
10: 2539 2734 2295 2552 RISC-V INTC 5 Edge riscv-timer
12: 2 1 0 0 SiFive PLIC 111 Edge 17030000.power-controller
13: 643 194 368 75 SiFive PLIC 25 Edge 13010000.spi
14: 6 22 19 27 SiFive PLIC 7 Edge end0
15: 0 0 0 0 SiFive PLIC 6 Edge end0
16: 0 0 0 0 SiFive PLIC 5 Edge end0
17: 0 0 0 0 SiFive PLIC 78 Edge end1
18: 0 0 0 0 SiFive PLIC 77 Edge end1
19: 0 0 0 0 SiFive PLIC 76 Edge end1
22: 158 254 226 207 SiFive PLIC 32 Edge ttyS0
23: 0 0 0 0 SiFive PLIC 38 Edge pl022
24: 2265 2250 1452 2024 SiFive PLIC 75 Edge dw-mci
25: 0 0 0 0 SiFive PLIC 35 Edge 10030000.i2c
26: 0 0 0 0 SiFive PLIC 37 Edge 10050000.i2c
27: 0 0 0 1 SiFive PLIC 50 Edge 12050000.i2c
28: 0 0 0 0 SiFive PLIC 51 Edge 12060000.i2c
IPI0: 92 118 116 120 Rescheduling interrupts
IPI1: 4135 2653 2170 3160 Function call interrupts
IPI2: 0 0 0 0 CPU stop interrupts
IPI3: 0 0 0 0 CPU stop (for crash dump) interrupts
IPI4: 0 0 0 0 IRQ work interrupts
IPI5: 0 0 0 0 Timer broadcast interrupts
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: Anup Patel <anup@brainfault.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
---
drivers/irqchip/irq-sifive-plic.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
index 9e22f7e378f5..f30bdb94ceeb 100644
--- a/drivers/irqchip/irq-sifive-plic.c
+++ b/drivers/irqchip/irq-sifive-plic.c
@@ -163,20 +163,19 @@ static void plic_irq_eoi(struct irq_data *d)
static int plic_set_affinity(struct irq_data *d,
const struct cpumask *mask_val, bool force)
{
- unsigned int cpu;
struct plic_priv *priv = irq_data_get_irq_chip_data(d);
+ struct cpumask new_mask;
- if (force)
- cpu = cpumask_first_and(&priv->lmask, mask_val);
- else
- cpu = cpumask_first_and_and(&priv->lmask, mask_val, cpu_online_mask);
+ cpumask_and(&new_mask, mask_val, &priv->lmask);
+ if (!force)
+ cpumask_and(&new_mask, &new_mask, cpu_online_mask);
- if (cpu >= nr_cpu_ids)
+ if (cpumask_empty(&new_mask))
return -EINVAL;
plic_irq_disable(d);
- irq_data_update_effective_affinity(d, cpumask_of(cpu));
+ irq_data_update_effective_affinity(d, &new_mask);
if (!irqd_irq_disabled(d))
plic_irq_enable(d);
--
2.39.2
On Wed, Jul 3, 2024 at 12:57 PM Nam Cao <namcao@linutronix.de> wrote:
>
> plic_set_affinity() only enables interrupt for the first possible CPU in
> the mask. The point is to prevent all CPUs trying to claim an interrupt,
> but only one CPU succeeds and the other CPUs wasted some clock cycles for
> nothing.
>
> However, there are two problems with that:
> 1. Users cannot enable interrupt on multiple CPUs (for example, to minimize
> interrupt latency).
Well, you are assuming that multiple CPUs are always idle or available
to process interrupts. In other words, if the system is loaded running
some workload on each CPU then performance on multiple CPUs
will degrade since multiple CPUs will wastefully try to claim interrupt.
In reality, we can't make such assumptions and it is better to target a
particular CPU for processing interrupts (just like various other interrupt
controllers). For balancing interrupt processing load, we have software
irq balancers running in user-space (or kernel space) which do a
reasonably fine job of picking appropriate CPU for interrupt processing.
> 2. Even if users do not touch SMP interrupt affinity, plic_set_affinity()
> is still invoked once (in plic_irqdomain_map()). Thus, by default, only
> CPU0 handles interrupts from PLIC. That may overload CPU0.
>
> Considering this optimization is not strictly the best (it is tradeoff
> between CPU cycles and interrupt latency), it should not be forced on
> users.
Randomly attempting to take an interrupt on multiple CPUs affects the
workload running all such CPUs (see above comment).
It's better to have a more predictable and targeted interrupt affinity
so that software irq balancers work effectively.
>
> Rewrite plic_set_affinity() to enable interrupt for all possible CPUs in
> the mask.
At least from my side, it is a NO to this approach.
Regards,
Anup
>
> Before:
> $ cat /proc/interrupts
> CPU0 CPU1 CPU2 CPU3
> 10: 2538 2695 3080 2309 RISC-V INTC 5 Edge riscv-timer
> 12: 3 0 0 0 SiFive PLIC 111 Edge 17030000.power-controller
> 13: 1163 0 0 0 SiFive PLIC 25 Edge 13010000.spi
> 14: 60 0 0 0 SiFive PLIC 7 Edge end0
> 15: 0 0 0 0 SiFive PLIC 6 Edge end0
> 16: 0 0 0 0 SiFive PLIC 5 Edge end0
> 17: 0 0 0 0 SiFive PLIC 78 Edge end1
> 18: 0 0 0 0 SiFive PLIC 77 Edge end1
> 19: 0 0 0 0 SiFive PLIC 76 Edge end1
> 22: 796 0 0 0 SiFive PLIC 32 Edge ttyS0
> 23: 0 0 0 0 SiFive PLIC 38 Edge pl022
> 24: 9062 0 0 0 SiFive PLIC 75 Edge dw-mci
> 25: 0 0 0 0 SiFive PLIC 35 Edge 10030000.i2c
> 26: 0 0 0 0 SiFive PLIC 37 Edge 10050000.i2c
> 27: 1 0 0 0 SiFive PLIC 50 Edge 12050000.i2c
> 28: 0 0 0 0 SiFive PLIC 51 Edge 12060000.i2c
> IPI0: 118 98 88 138 Rescheduling interrupts
> IPI1: 2272 1910 3758 3200 Function call interrupts
> IPI2: 0 0 0 0 CPU stop interrupts
> IPI3: 0 0 0 0 CPU stop (for crash dump) interrupts
> IPI4: 0 0 0 0 IRQ work interrupts
> IPI5: 0 0 0 0 Timer broadcast interrupts
>
> After:
> $ cat /proc/interrupts
> CPU0 CPU1 CPU2 CPU3
> 10: 2539 2734 2295 2552 RISC-V INTC 5 Edge riscv-timer
> 12: 2 1 0 0 SiFive PLIC 111 Edge 17030000.power-controller
> 13: 643 194 368 75 SiFive PLIC 25 Edge 13010000.spi
> 14: 6 22 19 27 SiFive PLIC 7 Edge end0
> 15: 0 0 0 0 SiFive PLIC 6 Edge end0
> 16: 0 0 0 0 SiFive PLIC 5 Edge end0
> 17: 0 0 0 0 SiFive PLIC 78 Edge end1
> 18: 0 0 0 0 SiFive PLIC 77 Edge end1
> 19: 0 0 0 0 SiFive PLIC 76 Edge end1
> 22: 158 254 226 207 SiFive PLIC 32 Edge ttyS0
> 23: 0 0 0 0 SiFive PLIC 38 Edge pl022
> 24: 2265 2250 1452 2024 SiFive PLIC 75 Edge dw-mci
> 25: 0 0 0 0 SiFive PLIC 35 Edge 10030000.i2c
> 26: 0 0 0 0 SiFive PLIC 37 Edge 10050000.i2c
> 27: 0 0 0 1 SiFive PLIC 50 Edge 12050000.i2c
> 28: 0 0 0 0 SiFive PLIC 51 Edge 12060000.i2c
> IPI0: 92 118 116 120 Rescheduling interrupts
> IPI1: 4135 2653 2170 3160 Function call interrupts
> IPI2: 0 0 0 0 CPU stop interrupts
> IPI3: 0 0 0 0 CPU stop (for crash dump) interrupts
> IPI4: 0 0 0 0 IRQ work interrupts
> IPI5: 0 0 0 0 Timer broadcast interrupts
>
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> Cc: Anup Patel <anup@brainfault.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> ---
> drivers/irqchip/irq-sifive-plic.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
> index 9e22f7e378f5..f30bdb94ceeb 100644
> --- a/drivers/irqchip/irq-sifive-plic.c
> +++ b/drivers/irqchip/irq-sifive-plic.c
> @@ -163,20 +163,19 @@ static void plic_irq_eoi(struct irq_data *d)
> static int plic_set_affinity(struct irq_data *d,
> const struct cpumask *mask_val, bool force)
> {
> - unsigned int cpu;
> struct plic_priv *priv = irq_data_get_irq_chip_data(d);
> + struct cpumask new_mask;
>
> - if (force)
> - cpu = cpumask_first_and(&priv->lmask, mask_val);
> - else
> - cpu = cpumask_first_and_and(&priv->lmask, mask_val, cpu_online_mask);
> + cpumask_and(&new_mask, mask_val, &priv->lmask);
> + if (!force)
> + cpumask_and(&new_mask, &new_mask, cpu_online_mask);
>
> - if (cpu >= nr_cpu_ids)
> + if (cpumask_empty(&new_mask))
> return -EINVAL;
>
> plic_irq_disable(d);
>
> - irq_data_update_effective_affinity(d, cpumask_of(cpu));
> + irq_data_update_effective_affinity(d, &new_mask);
>
> if (!irqd_irq_disabled(d))
> plic_irq_enable(d);
> --
> 2.39.2
>
On Wed, Jul 03, 2024 at 05:28:23PM +0530, Anup Patel wrote: > On Wed, Jul 3, 2024 at 12:57 PM Nam Cao <namcao@linutronix.de> wrote: > > > > plic_set_affinity() only enables interrupt for the first possible CPU in > > the mask. The point is to prevent all CPUs trying to claim an interrupt, > > but only one CPU succeeds and the other CPUs wasted some clock cycles for > > nothing. > > > > However, there are two problems with that: > > 1. Users cannot enable interrupt on multiple CPUs (for example, to minimize > > interrupt latency). > > Well, you are assuming that multiple CPUs are always idle or available > to process interrupts. In other words, if the system is loaded running > some workload on each CPU then performance on multiple CPUs > will degrade since multiple CPUs will wastefully try to claim interrupt. > > In reality, we can't make such assumptions and it is better to target a > particular CPU for processing interrupts (just like various other interrupt > controllers). For balancing interrupt processing load, we have software > irq balancers running in user-space (or kernel space) which do a > reasonably fine job of picking appropriate CPU for interrupt processing. Then we should leave the job of distributing interrupts to those tools, right? Not all use cases want minimally wasted CPU cycles. For example, if a particular interrupt does not arrive very often, but when it does, it needs to be handled fast; in this example, clearly enabling this interrupt for all CPUs is superior. But I am sure there are users who don't use something like irqbalance and just let the system do the default behavior. So I see your point of not wasting CPU cycles. So, how about we keep this patch, but also add a "default policy" which evenly distributes the interrupts to individually CPUs (best effort only). Something like the un-tested patch below? Best regards, Nam diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c index f30bdb94ceeb..953f375835b0 100644 --- a/drivers/irqchip/irq-sifive-plic.c +++ b/drivers/irqchip/irq-sifive-plic.c @@ -312,7 +312,7 @@ static int plic_irqdomain_map(struct irq_domain *d, unsigned int irq, irq_domain_set_info(d, irq, hwirq, &plic_chip, d->host_data, handle_fasteoi_irq, NULL, NULL); irq_set_noprobe(irq); - irq_set_affinity(irq, &priv->lmask); + irq_set_affinity(irq, cpumask_of(cpumask_any_distribute(&priv->lmask))); return 0; }
On Wed, Jul 3, 2024 at 6:03 PM Nam Cao <namcao@linutronix.de> wrote: > > On Wed, Jul 03, 2024 at 05:28:23PM +0530, Anup Patel wrote: > > On Wed, Jul 3, 2024 at 12:57 PM Nam Cao <namcao@linutronix.de> wrote: > > > > > > plic_set_affinity() only enables interrupt for the first possible CPU in > > > the mask. The point is to prevent all CPUs trying to claim an interrupt, > > > but only one CPU succeeds and the other CPUs wasted some clock cycles for > > > nothing. > > > > > > However, there are two problems with that: > > > 1. Users cannot enable interrupt on multiple CPUs (for example, to minimize > > > interrupt latency). > > > > Well, you are assuming that multiple CPUs are always idle or available > > to process interrupts. In other words, if the system is loaded running > > some workload on each CPU then performance on multiple CPUs > > will degrade since multiple CPUs will wastefully try to claim interrupt. > > > > In reality, we can't make such assumptions and it is better to target a > > particular CPU for processing interrupts (just like various other interrupt > > controllers). For balancing interrupt processing load, we have software > > irq balancers running in user-space (or kernel space) which do a > > reasonably fine job of picking appropriate CPU for interrupt processing. > > Then we should leave the job of distributing interrupts to those tools, > right? Not all use cases want minimally wasted CPU cycles. For example, if > a particular interrupt does not arrive very often, but when it does, it > needs to be handled fast; in this example, clearly enabling this interrupt > for all CPUs is superior. This is a very specific case which you are trying to optimize and in the process hurting performance in many other cases. There are many high speed IOs (network, storage, etc) where rate of interrupt is high so for such IO your patch will degrade performance on multiple CPUs. > > But I am sure there are users who don't use something like irqbalance and > just let the system do the default behavior. So I see your point of not > wasting CPU cycles. So, how about we keep this patch, but also add a > "default policy" which evenly distributes the interrupts to individually > CPUs (best effort only). Something like the un-tested patch below? I would suggest dropping this patch and for the sake of distributing interrupts at boot time we can have the below change. > > Best regards, > Nam > > diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c > index f30bdb94ceeb..953f375835b0 100644 > --- a/drivers/irqchip/irq-sifive-plic.c > +++ b/drivers/irqchip/irq-sifive-plic.c > @@ -312,7 +312,7 @@ static int plic_irqdomain_map(struct irq_domain *d, unsigned int irq, > irq_domain_set_info(d, irq, hwirq, &plic_chip, d->host_data, > handle_fasteoi_irq, NULL, NULL); > irq_set_noprobe(irq); > - irq_set_affinity(irq, &priv->lmask); > + irq_set_affinity(irq, cpumask_of(cpumask_any_distribute(&priv->lmask))); > return 0; > } > Regards, Anup
On Wed, Jul 03, 2024 at 08:31:31PM +0530, Anup Patel wrote: > On Wed, Jul 3, 2024 at 6:03 PM Nam Cao <namcao@linutronix.de> wrote: > > > > On Wed, Jul 03, 2024 at 05:28:23PM +0530, Anup Patel wrote: > > > On Wed, Jul 3, 2024 at 12:57 PM Nam Cao <namcao@linutronix.de> wrote: > > > > > > > > plic_set_affinity() only enables interrupt for the first possible CPU in > > > > the mask. The point is to prevent all CPUs trying to claim an interrupt, > > > > but only one CPU succeeds and the other CPUs wasted some clock cycles for > > > > nothing. > > > > > > > > However, there are two problems with that: > > > > 1. Users cannot enable interrupt on multiple CPUs (for example, to minimize > > > > interrupt latency). > > > > > > Well, you are assuming that multiple CPUs are always idle or available > > > to process interrupts. In other words, if the system is loaded running > > > some workload on each CPU then performance on multiple CPUs > > > will degrade since multiple CPUs will wastefully try to claim interrupt. > > > > > > In reality, we can't make such assumptions and it is better to target a > > > particular CPU for processing interrupts (just like various other interrupt > > > controllers). For balancing interrupt processing load, we have software > > > irq balancers running in user-space (or kernel space) which do a > > > reasonably fine job of picking appropriate CPU for interrupt processing. > > > > Then we should leave the job of distributing interrupts to those tools, > > right? Not all use cases want minimally wasted CPU cycles. For example, if > > a particular interrupt does not arrive very often, but when it does, it > > needs to be handled fast; in this example, clearly enabling this interrupt > > for all CPUs is superior. > > This is a very specific case which you are trying to optimize and in the > process hurting performance in many other cases. There are many high > speed IOs (network, storage, etc) where rate of interrupt is high so for > such IO your patch will degrade performance on multiple CPUs. No, it wouldn't "hurting performance in many other cases". It would give users the ability to do what they want, including hurting performance as you said, or improving performance as I pointed out earlier. I am willing to bet that most users don't ever touch this. But if they do, they better know what they are doing. If they want to waste their CPU cycles, so be it. My point essentially is that kernel shouldn't force any policy on users. The only case this makes sense is when the policy is _strictly_ better than anything else, which is not true here. What the driver should do is providing a "good enough for most" default, but still let users decide what's best for them. Side note: if I am not mistaken, the effective affinity mask thing is for hardware limitation of the chips who cannot enable interrupt for all CPUs in the mask. RISC-V PLIC, on the other hand, can enable interrupts for any CPU, and therefore should do so. Best regards, Nam
On Sun, Jul 07 2024 at 16:34, Nam Cao wrote:
> On Wed, Jul 03, 2024 at 08:31:31PM +0530, Anup Patel wrote:
>> On Wed, Jul 3, 2024 at 6:03 PM Nam Cao <namcao@linutronix.de> wrote:
>> > Then we should leave the job of distributing interrupts to those tools,
>> > right? Not all use cases want minimally wasted CPU cycles. For example, if
>> > a particular interrupt does not arrive very often, but when it does, it
>> > needs to be handled fast; in this example, clearly enabling this interrupt
>> > for all CPUs is superior.
It's not really superior at all.
The problem is that a single interrupt is delivered to multiple CPUs at
once and there is no mechanism in the hardware to select one CPU from
the given set which can handle it quickly because it does not have
interrupts disabled. The spec is clear about this:
"The PLIC hardware only supports multicasting of interrupts, such that
all enabled targets will receive interrupt notifications for a given
active interrupt. Multicasting provides rapid response since the
fastest responder claims the interrupt, but can be wasteful in
high-interrupt-rate scenarios if multiple harts take a trap for an
interrupt that only one can successfully claim."
It's even worse. $N CPUs will in the worst case congest on the interrupt
descriptor lock and for edge type interrupts it will cause the interrupt
to be masked, marked pending and the handling CPU is forced to unmask
and run another handler invocation. That's just wrong.
Aside of that this can cause the loss of cache and memory locality in
high speed scenarios when the interrupt handler bounces around between
CPUs.
>> This is a very specific case which you are trying to optimize and in the
>> process hurting performance in many other cases. There are many high
>> speed IOs (network, storage, etc) where rate of interrupt is high so for
>> such IO your patch will degrade performance on multiple CPUs.
>
> No, it wouldn't "hurting performance in many other cases". It would give
> users the ability to do what they want, including hurting performance as
> you said, or improving performance as I pointed out earlier.
Kinda, but you make the bad case the default for very dubious benefits.
I grant you that the current implementation which defaults everything to
CPU0 is suboptimal, but that's an orthogonal problem. E.g. X86 selects
the single target CPU from the mask by spreading the interrupts out
halfways evenly.
But if you really care about low latency, then you want to selectively
associate interrupts to particular CPUs and ensure that the associated
processing is bound to the same CPU.
> I am willing to bet that most users don't ever touch this. But if they do,
> they better know what they are doing. If they want to waste their CPU
> cycles, so be it.
That's not what your patch does. It defaults to bad behaviour.
> My point essentially is that kernel shouldn't force any policy on users.
> The only case this makes sense is when the policy is _strictly_ better than
> anything else, which is not true here. What the driver should do is
> providing a "good enough for most" default, but still let users decide
> what's best for them.
See what I explained you above. Changing this to multi-CPU delivery is
not really good enough and there is a valid technical reason not to do
that.
> Side note: if I am not mistaken, the effective affinity mask thing is for
> hardware limitation of the chips who cannot enable interrupt for all CPUs
> in the mask. RISC-V PLIC, on the other hand, can enable interrupts for any
> CPU, and therefore should do so.
It's not only hardware limitations which cause architectures to limit
the CPU mask to a single target. On X86 systems which support logical
destination or cluster mode this was disabled even though the 'multiple
CPUs try to handle it' problem is mitigated in hardware. The benefit is
marginal for the common case and is not sufficient for the low latency
requirements case.
Using a spreading algorithm for the default case will help for the
common case, but won't be sufficient for special latency sensitive
scenarios. Those are the scenarios where the user needs to take care and
set the affinities correctly.
Thanks,
tglx
On Wed, Jul 10, 2024 at 10:19:54PM +0200, Thomas Gleixner wrote: > On Sun, Jul 07 2024 at 16:34, Nam Cao wrote: > > On Wed, Jul 03, 2024 at 08:31:31PM +0530, Anup Patel wrote: > >> On Wed, Jul 3, 2024 at 6:03 PM Nam Cao <namcao@linutronix.de> wrote: > >> > Then we should leave the job of distributing interrupts to those tools, > >> > right? Not all use cases want minimally wasted CPU cycles. For example, if > >> > a particular interrupt does not arrive very often, but when it does, it > >> > needs to be handled fast; in this example, clearly enabling this interrupt > >> > for all CPUs is superior. > > It's not really superior at all. > > The problem is that a single interrupt is delivered to multiple CPUs at > once and there is no mechanism in the hardware to select one CPU from > the given set which can handle it quickly because it does not have > interrupts disabled. The spec is clear about this: > > "The PLIC hardware only supports multicasting of interrupts, such that > all enabled targets will receive interrupt notifications for a given > active interrupt. Multicasting provides rapid response since the > fastest responder claims the interrupt, but can be wasteful in > high-interrupt-rate scenarios if multiple harts take a trap for an > interrupt that only one can successfully claim." > > It's even worse. $N CPUs will in the worst case congest on the interrupt > descriptor lock and for edge type interrupts it will cause the interrupt > to be masked, marked pending and the handling CPU is forced to unmask > and run another handler invocation. That's just wrong. Hmm I'm not sure if this case can happen. CPUs do not touch the interrupt, including taking the description lock, unless it has claimed the interrupt successfully; and only 1 CPU can claim successfully. But it doesn't matter, your other points are enough for me to drop this patch. > Aside of that this can cause the loss of cache and memory locality in > high speed scenarios when the interrupt handler bounces around between > CPUs. > > >> This is a very specific case which you are trying to optimize and in the > >> process hurting performance in many other cases. There are many high > >> speed IOs (network, storage, etc) where rate of interrupt is high so for > >> such IO your patch will degrade performance on multiple CPUs. > > > > No, it wouldn't "hurting performance in many other cases". It would give > > users the ability to do what they want, including hurting performance as > > you said, or improving performance as I pointed out earlier. > > Kinda, but you make the bad case the default for very dubious benefits. > > I grant you that the current implementation which defaults everything to > CPU0 is suboptimal, but that's an orthogonal problem. E.g. X86 selects > the single target CPU from the mask by spreading the interrupts out > halfways evenly. > > But if you really care about low latency, then you want to selectively > associate interrupts to particular CPUs and ensure that the associated > processing is bound to the same CPU. > > > I am willing to bet that most users don't ever touch this. But if they do, > > they better know what they are doing. If they want to waste their CPU > > cycles, so be it. > > That's not what your patch does. It defaults to bad behaviour. > > > My point essentially is that kernel shouldn't force any policy on users. > > The only case this makes sense is when the policy is _strictly_ better than > > anything else, which is not true here. What the driver should do is > > providing a "good enough for most" default, but still let users decide > > what's best for them. > > See what I explained you above. Changing this to multi-CPU delivery is > not really good enough and there is a valid technical reason not to do > that. > > > Side note: if I am not mistaken, the effective affinity mask thing is for > > hardware limitation of the chips who cannot enable interrupt for all CPUs > > in the mask. RISC-V PLIC, on the other hand, can enable interrupts for any > > CPU, and therefore should do so. > > It's not only hardware limitations which cause architectures to limit > the CPU mask to a single target. On X86 systems which support logical > destination or cluster mode this was disabled even though the 'multiple > CPUs try to handle it' problem is mitigated in hardware. The benefit is > marginal for the common case and is not sufficient for the low latency > requirements case. > > Using a spreading algorithm for the default case will help for the > common case, but won't be sufficient for special latency sensitive > scenarios. Those are the scenarios where the user needs to take care and > set the affinities correctly. Thanks for the explanation, I didn't see all the angles on this. Let's drop it then. Best regards, Nam
© 2016 - 2026 Red Hat, Inc.