smp_processor_id() is pointless to fetch in the loop. Move it out.
No functional change.
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
kernel/sched/fair.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 02cca2c7a98d..9e561c1234b2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12635,11 +12635,11 @@ static inline int on_null_domain(struct rq *rq)
*/
static inline int find_new_ilb(struct cpumask *cpus)
{
+ int this_cpu = smp_processor_id();
int ilb_cpu;
for_each_cpu(ilb_cpu, cpus) {
-
- if (ilb_cpu == smp_processor_id())
+ if (ilb_cpu == this_cpu)
continue;
if (idle_cpu(ilb_cpu))
--
2.43.0
Hello Shrikanth, On 3/19/2026 12:23 PM, Shrikanth Hegde wrote: > smp_processor_id() is pointless to fetch in the loop. Move it out. > No functional change. > > Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> Makes sense! Feel free to include: Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com> -- Thanks and Regards, Prateek
On Thu, Mar 19, 2026 at 12:23:14PM +0530, Shrikanth Hegde wrote: > smp_processor_id() is pointless to fetch in the loop. Move it out. > No functional change. Isn't this what compilers are for?
Hi Peter. On 3/19/26 2:50 PM, Peter Zijlstra wrote: > On Thu, Mar 19, 2026 at 12:23:14PM +0530, Shrikanth Hegde wrote: >> smp_processor_id() is pointless to fetch in the loop. Move it out. >> No functional change. > > Isn't this what compilers are for? This is what i see in pseries. c00000000028cc7c: 1d 70 80 48 bl c000000000a93c98 <_find_next_and_bit> c00000000028cc80: 00 00 00 60 nop c00000000028cc84: 00 00 bd 80 lwz r5,0(r29) c00000000028cc88: b4 07 7e 7c extsw r30,r3 c00000000028cc8c: 78 1b 7f 7c mr r31,r3 << This is ilb_cpu c00000000028cc90: 78 1b 7a 7c mr r26,r3 c00000000028cc94: 40 18 05 7c cmplw r5,r3 c00000000028cc98: 78 f3 c3 7f mr r3,r30 c00000000028cc9c: 5c 00 81 40 ble c00000000028ccf8 <kick_ilb+0x10c> c00000000028cca0: 08 00 2d a1 lhz r9,8(r13) << cpu comes from paca_index c00000000028cca4: 00 f8 09 7c cmpw r9,r31 c00000000028cca8: 18 00 82 41 beq c00000000028ccc0 <kick_ilb+0xd4> c00000000028ccac: 8d 0c 04 48 bl c0000000002cd938 <idle_cpu+0x8> c00000000028ccb0: 00 00 00 60 nop c00000000028ccb4: 00 00 03 2c cmpwi r3,0 c00000000028ccb8: 78 00 82 40 bne c00000000028cd30 <kick_ilb+0x144> c00000000028ccbc: 00 00 bd 80 lwz r5,0(r29) c00000000028ccc0: 01 00 df 38 addi r6,r31,1 c00000000028ccc4: 20 00 a5 78 clrldi r5,r5,32 c00000000028ccc8: 78 e3 84 7f mr r4,r28 c00000000028cccc: 78 db 63 7f mr r3,r27 c00000000028ccd0: b4 07 c6 7c extsw r6,r6 c00000000028ccd4: c5 6f 80 48 bl c000000000a93c98 <_find_next_and_bit> So, yes it is done inside the loop. But accessing it should be cheap as it is r13 access which is the paca pointer. Since different archs implement it in their own ways (many would have it cheap, but for some may be costly) and there is CONFIG_DEBUG_PREEMPT too. So it would be good to hoist it out of the loop IMHO.
On Thu, Mar 19, 2026 at 06:33:19PM +0530, Shrikanth Hegde wrote: > Hi Peter. > > On 3/19/26 2:50 PM, Peter Zijlstra wrote: > > On Thu, Mar 19, 2026 at 12:23:14PM +0530, Shrikanth Hegde wrote: > > > smp_processor_id() is pointless to fetch in the loop. Move it out. > > > No functional change. > > > > Isn't this what compilers are for? > > This is what i see in pseries. > > c00000000028cc7c: 1d 70 80 48 bl c000000000a93c98 <_find_next_and_bit> > c00000000028cc80: 00 00 00 60 nop > c00000000028cc84: 00 00 bd 80 lwz r5,0(r29) > c00000000028cc88: b4 07 7e 7c extsw r30,r3 > c00000000028cc8c: 78 1b 7f 7c mr r31,r3 << This is ilb_cpu > c00000000028cc90: 78 1b 7a 7c mr r26,r3 > c00000000028cc94: 40 18 05 7c cmplw r5,r3 > c00000000028cc98: 78 f3 c3 7f mr r3,r30 > c00000000028cc9c: 5c 00 81 40 ble c00000000028ccf8 <kick_ilb+0x10c> > c00000000028cca0: 08 00 2d a1 lhz r9,8(r13) << cpu comes from paca_index > c00000000028cca4: 00 f8 09 7c cmpw r9,r31 > c00000000028cca8: 18 00 82 41 beq c00000000028ccc0 <kick_ilb+0xd4> > c00000000028ccac: 8d 0c 04 48 bl c0000000002cd938 <idle_cpu+0x8> > c00000000028ccb0: 00 00 00 60 nop > c00000000028ccb4: 00 00 03 2c cmpwi r3,0 > c00000000028ccb8: 78 00 82 40 bne c00000000028cd30 <kick_ilb+0x144> > c00000000028ccbc: 00 00 bd 80 lwz r5,0(r29) > c00000000028ccc0: 01 00 df 38 addi r6,r31,1 > c00000000028ccc4: 20 00 a5 78 clrldi r5,r5,32 > c00000000028ccc8: 78 e3 84 7f mr r4,r28 > c00000000028cccc: 78 db 63 7f mr r3,r27 > c00000000028ccd0: b4 07 c6 7c extsw r6,r6 > c00000000028ccd4: c5 6f 80 48 bl c000000000a93c98 <_find_next_and_bit> > > > > So, yes it is done inside the loop. But accessing it should be cheap as it is r13 access > which is the paca pointer. > > Since different archs implement it in their own ways > (many would have it cheap, but for some may be costly) > and there is CONFIG_DEBUG_PREEMPT too. > So it would be good to hoist it out of the loop IMHO. Ah, right. And I suppose it isn't generally sound to hoist smp_processor_id() unless you have disabled preemption, and since we can't tell the compiler this, it just never can. Fair enough.
On Thu, Mar 19, 2026 at 12:23:14PM +0530, Shrikanth Hegde wrote:
> smp_processor_id() is pointless to fetch in the loop. Move it out.
> No functional change.
>
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> ---
> kernel/sched/fair.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 02cca2c7a98d..9e561c1234b2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -12635,11 +12635,11 @@ static inline int on_null_domain(struct rq *rq)
> */
> static inline int find_new_ilb(struct cpumask *cpus)
> {
> + int this_cpu = smp_processor_id();
> int ilb_cpu;
>
> for_each_cpu(ilb_cpu, cpus) {
> -
> - if (ilb_cpu == smp_processor_id())
> + if (ilb_cpu == this_cpu)
> continue;
>
> if (idle_cpu(ilb_cpu))
> --
> 2.43.0
>
LGTM
Reviewed-by: Mukesh Kumar Chaurasiya (IBM) <mkchauras@gmail.com>
© 2016 - 2026 Red Hat, Inc.