[PATCH 2/3] arm64: smp: Implement cpus_has_pending_ipi()

Ulf Hansson posted 3 patches 2 months, 2 weeks ago
There is a newer version of this series
[PATCH 2/3] arm64: smp: Implement cpus_has_pending_ipi()
Posted by Ulf Hansson 2 months, 2 weeks ago
To add support for keeping track of whether there may be a pending IPI
scheduled for a CPU or a group of CPUs, let's implement
cpus_has_pending_ipi() for arm64.

Note, the implementation is intentionally lightweight and doesn't use any
additional lock. This is good enough for cpuidle based decisions.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 arch/arm64/kernel/smp.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 68cea3a4a35c..dd1acfa91d44 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -55,6 +55,8 @@
 
 #include <trace/events/ipi.h>
 
+static DEFINE_PER_CPU(bool, pending_ipi);
+
 /*
  * as from 2.5, kernels no longer have an init_tasks structure
  * so we need some other way of telling a new secondary core
@@ -1012,6 +1014,8 @@ static void do_handle_IPI(int ipinr)
 
 	if ((unsigned)ipinr < NR_IPI)
 		trace_ipi_exit(ipi_types[ipinr]);
+
+	per_cpu(pending_ipi, cpu) = false;
 }
 
 static irqreturn_t ipi_handler(int irq, void *data)
@@ -1024,10 +1028,26 @@ static irqreturn_t ipi_handler(int irq, void *data)
 
 static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
 {
+	unsigned int cpu;
+
+	for_each_cpu(cpu, target)
+		per_cpu(pending_ipi, cpu) = true;
+
 	trace_ipi_raise(target, ipi_types[ipinr]);
 	arm64_send_ipi(target, ipinr);
 }
 
+bool cpus_has_pending_ipi(const struct cpumask *mask)
+{
+	unsigned int cpu;
+
+	for_each_cpu(cpu, mask) {
+		if (per_cpu(pending_ipi, cpu))
+			return true;
+	}
+	return false;
+}
+
 static bool ipi_should_be_nmi(enum ipi_msg_type ipi)
 {
 	if (!system_uses_irq_prio_masking())
-- 
2.43.0
Re: [PATCH 2/3] arm64: smp: Implement cpus_has_pending_ipi()
Posted by Thomas Gleixner 2 months ago
On Fri, Oct 03 2025 at 17:02, Ulf Hansson wrote:
> Note, the implementation is intentionally lightweight and doesn't use
> any

By some definition of lightweight.

>  static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
>  {
> +	unsigned int cpu;
> +
> +	for_each_cpu(cpu, target)
> +		per_cpu(pending_ipi, cpu) = true;

Iterating over a full cpumask on a big system is not necessarily
considered lightweight. And that comes on top of the loop in
smp_call_function_many_cond() plus the potential loop in
arm64_send_ipi()...

None of this is actually needed. If you want a lightweight racy check
whether there is an IPI en route to a set of CPUs then you can simply do
that in kernel/smp.c:

bool smp_pending_ipis_crystalball(mask)
{
	for_each_cpu(cpu, mask) {
                if (!llist_empty(per_cpu_ptr(&call_single_queue, cpu)))
                	return true;
        }
        return false;
}

No?

Thanks,

        tglx
Re: [PATCH 2/3] arm64: smp: Implement cpus_has_pending_ipi()
Posted by Ulf Hansson 1 month, 4 weeks ago
+ Marc

On Fri, 17 Oct 2025 at 16:01, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Fri, Oct 03 2025 at 17:02, Ulf Hansson wrote:
> > Note, the implementation is intentionally lightweight and doesn't use
> > any
>
> By some definition of lightweight.
>
> >  static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
> >  {
> > +     unsigned int cpu;
> > +
> > +     for_each_cpu(cpu, target)
> > +             per_cpu(pending_ipi, cpu) = true;
>
> Iterating over a full cpumask on a big system is not necessarily
> considered lightweight. And that comes on top of the loop in
> smp_call_function_many_cond() plus the potential loop in
> arm64_send_ipi()...
>
> None of this is actually needed. If you want a lightweight racy check
> whether there is an IPI en route to a set of CPUs then you can simply do
> that in kernel/smp.c:
>
> bool smp_pending_ipis_crystalball(mask)
> {
>         for_each_cpu(cpu, mask) {
>                 if (!llist_empty(per_cpu_ptr(&call_single_queue, cpu)))
>                         return true;
>         }
>         return false;
> }
>
> No?

Indeed this is way better, thanks for your suggestion!

I have also tried this out and can confirm that it gives the same
improved results on the Dragonboard 410c!

I will submit a new version of the series and I will try to
incorporate all the valuable feedback I have received.

Thanks everyone and kind regards
Uffe
Re: [PATCH 2/3] arm64: smp: Implement cpus_has_pending_ipi()
Posted by Marc Zyngier 2 months, 1 week ago
On Fri, 03 Oct 2025 16:02:44 +0100,
Ulf Hansson <ulf.hansson@linaro.org> wrote:
> 
> To add support for keeping track of whether there may be a pending IPI
> scheduled for a CPU or a group of CPUs, let's implement
> cpus_has_pending_ipi() for arm64.
> 
> Note, the implementation is intentionally lightweight and doesn't use any
> additional lock. This is good enough for cpuidle based decisions.
> 
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> ---
>  arch/arm64/kernel/smp.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 68cea3a4a35c..dd1acfa91d44 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -55,6 +55,8 @@
>  
>  #include <trace/events/ipi.h>
>  
> +static DEFINE_PER_CPU(bool, pending_ipi);
> +
>  /*
>   * as from 2.5, kernels no longer have an init_tasks structure
>   * so we need some other way of telling a new secondary core
> @@ -1012,6 +1014,8 @@ static void do_handle_IPI(int ipinr)
>  
>  	if ((unsigned)ipinr < NR_IPI)
>  		trace_ipi_exit(ipi_types[ipinr]);
> +
> +	per_cpu(pending_ipi, cpu) = false;
>  }
>  
>  static irqreturn_t ipi_handler(int irq, void *data)
> @@ -1024,10 +1028,26 @@ static irqreturn_t ipi_handler(int irq, void *data)
>  
>  static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
>  {
> +	unsigned int cpu;
> +
> +	for_each_cpu(cpu, target)
> +		per_cpu(pending_ipi, cpu) = true;
> +

Why isn't all of this part of the core IRQ management? We already
track things like timers, I assume for similar reasons. If IPIs have
to be singled out, I'd rather this is done in common code, and not on
a per architecture basis.

>  	trace_ipi_raise(target, ipi_types[ipinr]);
>  	arm64_send_ipi(target, ipinr);
>  }
>  
> +bool cpus_has_pending_ipi(const struct cpumask *mask)
> +{
> +	unsigned int cpu;
> +
> +	for_each_cpu(cpu, mask) {
> +		if (per_cpu(pending_ipi, cpu))
> +			return true;
> +	}
> +	return false;
> +}
> +

The lack of memory barriers makes me wonder how reliable this is.
Maybe this is relying on the IPIs themselves acting as such, but
that's extremely racy no matter how you look at it.

	M.

-- 
Without deviation from the norm, progress is not possible.
Re: [PATCH 2/3] arm64: smp: Implement cpus_has_pending_ipi()
Posted by Ulf Hansson 2 months, 1 week ago
On Mon, 6 Oct 2025 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
>
> On Fri, 03 Oct 2025 16:02:44 +0100,
> Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > To add support for keeping track of whether there may be a pending IPI
> > scheduled for a CPU or a group of CPUs, let's implement
> > cpus_has_pending_ipi() for arm64.
> >
> > Note, the implementation is intentionally lightweight and doesn't use any
> > additional lock. This is good enough for cpuidle based decisions.
> >
> > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > ---
> >  arch/arm64/kernel/smp.c | 20 ++++++++++++++++++++
> >  1 file changed, 20 insertions(+)
> >
> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > index 68cea3a4a35c..dd1acfa91d44 100644
> > --- a/arch/arm64/kernel/smp.c
> > +++ b/arch/arm64/kernel/smp.c
> > @@ -55,6 +55,8 @@
> >
> >  #include <trace/events/ipi.h>
> >
> > +static DEFINE_PER_CPU(bool, pending_ipi);
> > +
> >  /*
> >   * as from 2.5, kernels no longer have an init_tasks structure
> >   * so we need some other way of telling a new secondary core
> > @@ -1012,6 +1014,8 @@ static void do_handle_IPI(int ipinr)
> >
> >       if ((unsigned)ipinr < NR_IPI)
> >               trace_ipi_exit(ipi_types[ipinr]);
> > +
> > +     per_cpu(pending_ipi, cpu) = false;
> >  }
> >
> >  static irqreturn_t ipi_handler(int irq, void *data)
> > @@ -1024,10 +1028,26 @@ static irqreturn_t ipi_handler(int irq, void *data)
> >
> >  static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
> >  {
> > +     unsigned int cpu;
> > +
> > +     for_each_cpu(cpu, target)
> > +             per_cpu(pending_ipi, cpu) = true;
> > +
>
> Why isn't all of this part of the core IRQ management? We already
> track things like timers, I assume for similar reasons. If IPIs have
> to be singled out, I'd rather this is done in common code, and not on
> a per architecture basis.

The idea was to start simple, avoid running code for architectures
that don't seem to need it, by using this opt-in and lightweight
approach.

I guess we could do this in generic IRQ code too. Perhaps making it
conditional behind a Kconfig, if required.

>
> >       trace_ipi_raise(target, ipi_types[ipinr]);
> >       arm64_send_ipi(target, ipinr);
> >  }
> >
> > +bool cpus_has_pending_ipi(const struct cpumask *mask)
> > +{
> > +     unsigned int cpu;
> > +
> > +     for_each_cpu(cpu, mask) {
> > +             if (per_cpu(pending_ipi, cpu))
> > +                     return true;
> > +     }
> > +     return false;
> > +}
> > +
>
> The lack of memory barriers makes me wonder how reliable this is.
> Maybe this is relying on the IPIs themselves acting as such, but
> that's extremely racy no matter how you look at it.

It's deliberately lightweight. I am worried about introducing
locking/barriers, as those could be costly and introduce latencies in
these paths.

Still this is good enough to significantly improve cpuidle based
decisions in this regard. Please have a look at the commit message of
patch3.

That said, for sure I am open to suggestions on how to improve the
"racyness", while still keeping it lightweight.

Kind regards
Uffe
Re: [PATCH 2/3] arm64: smp: Implement cpus_has_pending_ipi()
Posted by Mark Rutland 2 months, 1 week ago
On Fri, Oct 10, 2025 at 10:30:11AM +0200, Ulf Hansson wrote:
> On Mon, 6 Oct 2025 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
> > On Fri, 03 Oct 2025 16:02:44 +0100,
> > Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > To add support for keeping track of whether there may be a pending IPI
> > > scheduled for a CPU or a group of CPUs, let's implement
> > > cpus_has_pending_ipi() for arm64.
> > >
> > > Note, the implementation is intentionally lightweight and doesn't use any
> > > additional lock. This is good enough for cpuidle based decisions.
> > >
> > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

> > > +bool cpus_has_pending_ipi(const struct cpumask *mask)
> > > +{
> > > +     unsigned int cpu;
> > > +
> > > +     for_each_cpu(cpu, mask) {
> > > +             if (per_cpu(pending_ipi, cpu))
> > > +                     return true;
> > > +     }
> > > +     return false;
> > > +}
> > > +
> >
> > The lack of memory barriers makes me wonder how reliable this is.
> > Maybe this is relying on the IPIs themselves acting as such, but
> > that's extremely racy no matter how you look at it.
> 
> It's deliberately lightweight. I am worried about introducing
> locking/barriers, as those could be costly and introduce latencies in
> these paths.

I think the concern is that the naming implies a precise semantic that
the code doesn't actually provide. As written and commented, this
function definitely has false positives and false negatives.

The commit message says "This is good enough for cpuidle based
decisions", but doesn't say what those decisions require nor why this is
good enough.

If false positives and/or false negatives are ok, add a comment block
above the function to mention that those are acceptable. Presumably
there's some boundary at which incorrectness is not acceptable (e.g. if
it's wrong 50% of the time), and we'd want to understand how we can
ensure that we're the right side of that boundary.

Mark.
Re: [PATCH 2/3] arm64: smp: Implement cpus_has_pending_ipi()
Posted by Marc Zyngier 2 months, 1 week ago
On Fri, 10 Oct 2025 09:30:11 +0100,
Ulf Hansson <ulf.hansson@linaro.org> wrote:
> 
> On Mon, 6 Oct 2025 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Fri, 03 Oct 2025 16:02:44 +0100,
> > Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > To add support for keeping track of whether there may be a pending IPI
> > > scheduled for a CPU or a group of CPUs, let's implement
> > > cpus_has_pending_ipi() for arm64.
> > >
> > > Note, the implementation is intentionally lightweight and doesn't use any
> > > additional lock. This is good enough for cpuidle based decisions.
> > >
> > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > > ---
> > >  arch/arm64/kernel/smp.c | 20 ++++++++++++++++++++
> > >  1 file changed, 20 insertions(+)
> > >
> > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > > index 68cea3a4a35c..dd1acfa91d44 100644
> > > --- a/arch/arm64/kernel/smp.c
> > > +++ b/arch/arm64/kernel/smp.c
> > > @@ -55,6 +55,8 @@
> > >
> > >  #include <trace/events/ipi.h>
> > >
> > > +static DEFINE_PER_CPU(bool, pending_ipi);
> > > +
> > >  /*
> > >   * as from 2.5, kernels no longer have an init_tasks structure
> > >   * so we need some other way of telling a new secondary core
> > > @@ -1012,6 +1014,8 @@ static void do_handle_IPI(int ipinr)
> > >
> > >       if ((unsigned)ipinr < NR_IPI)
> > >               trace_ipi_exit(ipi_types[ipinr]);
> > > +
> > > +     per_cpu(pending_ipi, cpu) = false;
> > >  }
> > >
> > >  static irqreturn_t ipi_handler(int irq, void *data)
> > > @@ -1024,10 +1028,26 @@ static irqreturn_t ipi_handler(int irq, void *data)
> > >
> > >  static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
> > >  {
> > > +     unsigned int cpu;
> > > +
> > > +     for_each_cpu(cpu, target)
> > > +             per_cpu(pending_ipi, cpu) = true;
> > > +
> >
> > Why isn't all of this part of the core IRQ management? We already
> > track things like timers, I assume for similar reasons. If IPIs have
> > to be singled out, I'd rather this is done in common code, and not on
> > a per architecture basis.
> 
> The idea was to start simple, avoid running code for architectures
> that don't seem to need it, by using this opt-in and lightweight
> approach.

If this stuff is remotely useful, then it is useful to everyone, and I
don't see the point in littering the arch code with it. We have plenty
of buy-in features that can be selected by an architecture and ignored
by others if they see fit.

> 
> I guess we could do this in generic IRQ code too. Perhaps making it
> conditional behind a Kconfig, if required.
> 
> >
> > >       trace_ipi_raise(target, ipi_types[ipinr]);
> > >       arm64_send_ipi(target, ipinr);
> > >  }
> > >
> > > +bool cpus_has_pending_ipi(const struct cpumask *mask)
> > > +{
> > > +     unsigned int cpu;
> > > +
> > > +     for_each_cpu(cpu, mask) {
> > > +             if (per_cpu(pending_ipi, cpu))
> > > +                     return true;
> > > +     }
> > > +     return false;
> > > +}
> > > +
> >
> > The lack of memory barriers makes me wonder how reliable this is.
> > Maybe this is relying on the IPIs themselves acting as such, but
> > that's extremely racy no matter how you look at it.
> 
> It's deliberately lightweight. I am worried about introducing
> locking/barriers, as those could be costly and introduce latencies in
> these paths.

"I've made this car 10% faster by removing the brakes. It's great! Try
it!"

> Still this is good enough to significantly improve cpuidle based
> decisions in this regard. Please have a look at the commit message of
> patch3.

If I can't see how this thing is *correct*, I really don't care how
fast it is. You might as well remove most locks and barriers from the
kernel -- it will be even faster!

	M.

-- 
Without deviation from the norm, progress is not possible.
Re: [PATCH 2/3] arm64: smp: Implement cpus_has_pending_ipi()
Posted by Sudeep Holla 2 months, 1 week ago
On Fri, Oct 03, 2025 at 05:02:44PM +0200, Ulf Hansson wrote:
> To add support for keeping track of whether there may be a pending IPI
> scheduled for a CPU or a group of CPUs, let's implement
> cpus_has_pending_ipi() for arm64.
> 
> Note, the implementation is intentionally lightweight and doesn't use any
> additional lock. This is good enough for cpuidle based decisions.
> 

I’m not completely against this change, but I’d like to discuss a few points
based on my understanding (which might also be incorrect):

1. For systems that don’t use PM domains for idle, wouldn’t this be
   unnecessary? It might be worth making this conditional if we decide to
   proceed.

2. I understand this is intended for the DragonBoard 410c, where the firmware
   can’t be updated. However, ideally, the PSCI firmware should handle checking
   for pending IPIs if that’s important for the platform. The firmware could
   perform this check at the CPU PPU/HW level and prevent entering the
   state if needed.

3. I’m not an expert, but on systems with a large number of CPUs, tracking
   this for idle (which may or may not be enabled) seems a bit excessive,
   especially under heavy load when the system isn’t really idling.

-- 
Regards,
Sudeep
Re: [PATCH 2/3] arm64: smp: Implement cpus_has_pending_ipi()
Posted by Ulf Hansson 2 months, 1 week ago
On Mon, 6 Oct 2025 at 12:54, Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> On Fri, Oct 03, 2025 at 05:02:44PM +0200, Ulf Hansson wrote:
> > To add support for keeping track of whether there may be a pending IPI
> > scheduled for a CPU or a group of CPUs, let's implement
> > cpus_has_pending_ipi() for arm64.
> >
> > Note, the implementation is intentionally lightweight and doesn't use any
> > additional lock. This is good enough for cpuidle based decisions.
> >
>
> I’m not completely against this change, but I’d like to discuss a few points
> based on my understanding (which might also be incorrect):
>
> 1. For systems that don’t use PM domains for idle, wouldn’t this be
>    unnecessary? It might be worth making this conditional if we decide to
>    proceed.

For the non PM domain case, cpuidle_idle_call() calls need_resched()
and bails out if it returns true. I think that does the job, for other
more common cases.

Making this conditional could make sense. Not sure how costly it is to
update the per CPU variables.

>
> 2. I understand this is intended for the DragonBoard 410c, where the firmware
>    can’t be updated. However, ideally, the PSCI firmware should handle checking
>    for pending IPIs if that’s important for the platform. The firmware could
>    perform this check at the CPU PPU/HW level and prevent entering the
>    state if needed.

I think this is exactly what is happening on Dragonboard 410c (see the
stats I shared in the commit message in patch3).

The PSCI FW refuses to enter the suggested idlestate and the call fails.

>
> 3. I’m not an expert, but on systems with a large number of CPUs, tracking
>    this for idle (which may or may not be enabled) seems a bit excessive,
>    especially under heavy load when the system isn’t really idling.

Right, making the tracking mechanism conditional sounds like worth
exploring. I guess the trick is to find a good way to dynamically
enable it.

Kind regards
Uffe
Re: [PATCH 2/3] arm64: smp: Implement cpus_has_pending_ipi()
Posted by Sudeep Holla 2 months, 1 week ago
On Mon, Oct 06, 2025 at 02:22:49PM +0200, Ulf Hansson wrote:
> On Mon, 6 Oct 2025 at 12:54, Sudeep Holla <sudeep.holla@arm.com> wrote:
> >
> > 2. I understand this is intended for the DragonBoard 410c, where the firmware
> >    can’t be updated. However, ideally, the PSCI firmware should handle checking
> >    for pending IPIs if that’s important for the platform. The firmware could
> >    perform this check at the CPU PPU/HW level and prevent entering the
> >    state if needed.
> 
> I think this is exactly what is happening on Dragonboard 410c (see the
> stats I shared in the commit message in patch3).
> 
> The PSCI FW refuses to enter the suggested idlestate and the call fails.
> 

Ah OK, the PSCI FW is doing the job correctly, we are just attempting to
reduce the failures by catching few cases earlier in the OSPM itself ?
Sure it only reduces the failures but it can't eliminate those as IPI might
be issued after this check in the OSPM. I understand the call to firmware
can be prevented.

-- 
Regards,
Sudeep
Re: [PATCH 2/3] arm64: smp: Implement cpus_has_pending_ipi()
Posted by Ulf Hansson 2 months, 1 week ago
On Mon, 6 Oct 2025 at 16:41, Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> On Mon, Oct 06, 2025 at 02:22:49PM +0200, Ulf Hansson wrote:
> > On Mon, 6 Oct 2025 at 12:54, Sudeep Holla <sudeep.holla@arm.com> wrote:
> > >
> > > 2. I understand this is intended for the DragonBoard 410c, where the firmware
> > >    can’t be updated. However, ideally, the PSCI firmware should handle checking
> > >    for pending IPIs if that’s important for the platform. The firmware could
> > >    perform this check at the CPU PPU/HW level and prevent entering the
> > >    state if needed.
> >
> > I think this is exactly what is happening on Dragonboard 410c (see the
> > stats I shared in the commit message in patch3).
> >
> > The PSCI FW refuses to enter the suggested idlestate and the call fails.
> >
>
> Ah OK, the PSCI FW is doing the job correctly, we are just attempting to
> reduce the failures by catching few cases earlier in the OSPM itself ?

Correct!

> Sure it only reduces the failures but it can't eliminate those as IPI might
> be issued after this check in the OSPM. I understand the call to firmware
> can be prevented.

Yes!

Although, it seems we are ending up doing a ping-pong game with the
FW. Note that, if the FW responds with an error because we have tried
to enter an idlestate for a group of CPUs, nothing prevents idling the
CPU again and hence we might re-try with the same idlestate (at least
until the pending IPI gets delivered).

My point is, this problem seems not negligible, as you can see from
the stats I have shared in the commit message of patch3.

Kind regards
Uffe