sched: Skip queued wakeups only when L2 is shared

[RFC PATCH v3 0/3] sched: Skip queued wakeups only when L2 is shared

Posted by Mathieu Desnoyers 2 years, 5 months ago

This series improves performance of scheduler wakeups on large systems
by skipping queued wakeups only when CPUs share their L2 cache, rather
than when they share their LLC.

The speedup mainly reproduces on workloads which have at least *some*
idle time (because it significantly increases the number of migrations,
and thus remote wakeups), *and* it needs to have a sufficient load to
cause contention on the runqueue locks.

Feedback is welcome,

Thanks,

Mathieu

Mathieu Desnoyers (3):
  sched: Rename cpus_share_cache to cpus_share_llc
  sched: Introduce cpus_share_l2c (v3)
  sched: ttwu_queue_cond: skip queued wakeups across different l2 caches

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Swapnil Sapkal <Swapnil.Sapkal@amd.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Julien Desfossez <jdesfossez@digitalocean.com>
Cc: x86@kernel.org

 block/blk-mq.c                 |  2 +-
 include/linux/sched/topology.h | 10 ++++++++--
 kernel/sched/core.c            | 14 +++++++++++---
 kernel/sched/fair.c            |  8 ++++----
 kernel/sched/sched.h           |  2 ++
 kernel/sched/topology.c        | 32 +++++++++++++++++++++++++++++---
 6 files changed, 55 insertions(+), 13 deletions(-)

-- 
2.39.2

Re: [RFC PATCH v3 0/3] sched: Skip queued wakeups only when L2 is shared

Posted by Swapnil Sapkal 2 years, 5 months ago

Hello Mathieu,

On 8/22/2023 5:01 PM, Mathieu Desnoyers wrote:
> This series improves performance of scheduler wakeups on large systems
> by skipping queued wakeups only when CPUs share their L2 cache, rather
> than when they share their LLC.
> 
> The speedup mainly reproduces on workloads which have at least *some*
> idle time (because it significantly increases the number of migrations,
> and thus remote wakeups), *and* it needs to have a sufficient load to
> cause contention on the runqueue locks.
> 
> Feedback is welcome,

I ran some micro-benchmarks as part of testing this series. Here are the
observations:

- Hackbench shows improvement with this patch and Aaron's patch with
   6.5-rc1 kernel as the baseline.

- tbench and netperf shows shows some dip in performance with highly
   overloaded case.

- Other micro-benchmarks shows more or less similar performance with
   these patches.

o System Details

- 4th Generation EPYC System
- 2 x 128C/256T
- NPS1 mode

o Kernels

base:	                                6.5.0-rc1
base + mathieu-queued-wakeup:		6.5.0-rc1 + Mathieu's patches [1]
base + aaron-tg-load-avg: 		6.5.0-rc1 + Aaron's patch [2]
base + queued-wakeup + tg-load-avg:     6.5.0-rc1 + Mathieu's patches [1] + Aaron's patch [2]

[References]

[1] "sched: Skip queued wakeups only when L2 is shared"
     (https://lore.kernel.org/all/20230822113133.643238-1-mathieu.desnoyers@efficios.com/)
[2] "Reduce cost of accessing tg->load_avg"
     (https://lore.kernel.org/lkml/20230823060832.454842-1-aaron.lu@intel.com/)

==================================================================
Test          : hackbench
Units         : Time in seconds
Interpretation: Lower is better
Statistic     : AMean
==================================================================
Test:        6.5.0-rc1 (base)    base + mathieu-queued-wakeup       base + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
  1-groups:   22.15 (0.00 pct)      22.46 (-1.39 pct)                    22.35 (-0.90 pct)                   21.20 (4.28 pct)
  2-groups:   22.76 (0.00 pct)      21.78 (4.30 pct)                     22.60 (0.70 pct)                    21.90 (3.77 pct)
  4-groups:   22.12 (0.00 pct)      22.02 (0.45 pct)                     22.22 (-0.45 pct)                   21.94 (0.81 pct)
  8-groups:   24.80 (0.00 pct)      22.36 (9.83 pct)                     22.99 (7.29 pct)                    22.00 (11.29 pct)
16-groups:   31.09 (0.00 pct)      21.56 (30.65 pct)                    22.13 (28.81 pct)                   20.60 (33.74 pct)

==================================================================
Test          : tbench
Units         : Throughput
Interpretation: Higher is better
Statistic     : AMean
==================================================================
Clients: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup           base + aaron-tg-load-avg       base + queued-wakeup + tg-load-avg
     1    261.49 (0.00 pct)       261.18 (-0.11 pct)                     262.29 (0.30 pct)                   257.80 (-1.41 pct)
     2    514.08 (0.00 pct)       521.30 (1.40 pct)                      517.66 (0.69 pct)                   510.96 (-0.60 pct)
     4    1002.51 (0.00 pct)      988.81 (-1.36 pct)                     995.04 (-0.74 pct)                  987.74 (-1.47 pct)
     8    1978.74 (0.00 pct)      1966.60 (-0.61 pct)                    1991.85 (0.66 pct)                  1941.39 (-1.88 pct)
    16    3864.14 (0.00 pct)      3952.03 (2.27 pct)                     3914.80 (1.31 pct)                  3873.88 (0.25 pct)
    32    7473.19 (0.00 pct)      7602.38 (1.72 pct)                     7585.94 (1.50 pct)                  7423.44 (-0.66 pct)
    64    14335.10 (0.00 pct)     14313.17 (-0.15 pct)                   14474.67 (0.97 pct)                 14030.63 (-2.12 pct)
   128    27275.73 (0.00 pct)     25176.80 (-7.69 pct)                   28066.53 (2.89 pct)                 25045.53 (-8.17 pct)
   256    41688.17 (0.00 pct)     44373.40 (6.44 pct)                    43779.37 (5.01 pct)                 41427.00 (-0.62 pct)
   512    137481.33 (0.00 pct)    136466.67 (-0.73 pct)                  134824.00 (-1.93 pct)               141280.00 (2.76 pct)
  1024    140534.00 (0.00 pct)    141916.33 (0.98 pct)                   137008.33 (-2.50 pct)               126319.33 (-10.11 pct)
  2048    145378.00 (0.00 pct)    145479.33 (0.06 pct)                   138763.67 (-4.54 pct)               124471.00 (-14.38 pct)

  ==================================================================
  Test          : netperf
  Units         : Througput
  Interpretation: Higher is better
  Statistic     : AMean
  ==================================================================
                  6.5.0-rc1 (base)    base + mathieu-queued-wakeup       base + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
   1-clients:      59642.88 (0.00 pct)        61647.37 (3.36 pct)         61186.24 (2.58 pct)                 59099.11 (-0.91 pct)
   2-clients:      59349.65 (0.00 pct)        60896.01 (2.60 pct)         60582.49 (2.07 pct)                 62738.47 (5.70 pct)
   4-clients:      59197.37 (0.00 pct)        60457.29 (2.12 pct)         63042.52 (6.49 pct)                 60879.58 (2.84 pct)
   8-clients:      61977.66 (0.00 pct)        60389.92 (-2.56 pct)        62078.15 (0.16 pct)                 60314.65 (-2.68 pct)
  16-clients:      61518.83 (0.00 pct)        61143.51 (-0.61 pct)        60946.08 (-0.93 pct)                59388.78 (-3.46 pct)
  32-clients:      58230.81 (0.00 pct)        58653.20 (0.72 pct)         58594.14 (0.62 pct)                 58188.52 (-0.07 pct)
  64-clients:      58050.92 (0.00 pct)        57834.55 (-0.37 pct)        58183.51 (0.22 pct)                 57565.75 (-0.83 pct)
  128-clients:     54324.55 (0.00 pct)        54385.60 (0.11 pct)         54913.43 (1.08 pct)                 53917.11 (-0.75 pct)
  256-clients:     70155.29 (0.00 pct)        69390.68 (-1.08 pct)        70097.50 (-0.08 pct)                64410.66 (-8.18 pct)
  512-clients:     61511.77 (0.00 pct)        61480.99 (-0.05 pct)        54493.82 (-11.40 pct)               46227.05 (-24.84 pct)

==================================================================
Test          : stream-10
Units         : Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:      6.5.0-rc1 (base)      base + mathieu-queued-wakeup         base + aaron-tg-load-avg       base + queued-wakeup + tg-load-avg
  Copy:   353336.76 (0.00 pct)       352956.36 (-0.10 pct)               349583.67 (-1.06 pct)               351152.80 (-0.61 pct)
Scale:   353474.88 (0.00 pct)       354582.35 (0.31 pct)                350543.75 (-0.82 pct)               353275.74 (-0.05 pct)
   Add:   371984.24 (0.00 pct)       372824.87 (0.22 pct)                369173.72 (-0.75 pct)               370483.63 (-0.40 pct)
Triad:   372625.41 (0.00 pct)       278389.62 (-25.28 pct)              369504.06 (-0.83 pct)               369070.11 (-0.95 pct)

==================================================================
Test          : stream-100
Units         : Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:     6.5.0-rc1 (base)        base + mathieu-queued-wakeup       base + aaron-tg-load-avg       base + queued-wakeup + tg-load-avg
  Copy:   353476.35 (0.00 pct)       354954.50 (0.41 pct)                354614.56 (0.32 pct)                353512.71 (0.01 pct)
Scale:   353214.73 (0.00 pct)       354884.12 (0.47 pct)                355841.17 (0.74 pct)                353220.53 (0.00 pct)
   Add:   370755.48 (0.00 pct)       372292.72 (0.41 pct)                375307.35 (1.22 pct)                369917.77 (-0.22 pct)
Triad:   370652.02 (0.00 pct)       372732.11 (0.56 pct)                375718.85 (1.36 pct)                369926.26 (-0.19 pct)

==================================================================
Test          : schbench (old)
Units         : 99th percentile latency in us
Interpretation: Lower is better
Statistic     : Median
==================================================================
#workers: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup          base + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
   1:      56.00 (0.00 pct)        58.00 (-3.57 pct)                      60.00 (-7.14 pct)                   60.00 (-7.14 pct)
   2:      61.00 (0.00 pct)        56.00 (8.19 pct)                       59.00 (3.27 pct)                    60.00 (1.63 pct)
   4:      64.00 (0.00 pct)        62.00 (3.12 pct)                       66.00 (-3.12 pct)                   64.00 (0.00 pct)
   8:      96.00 (0.00 pct)        78.00 (18.75 pct)                      76.00 (20.83 pct)                   93.00 (3.12 pct)
  16:      98.00 (0.00 pct)        95.00 (3.06 pct)                       98.00 (0.00 pct)                    95.00 (3.06 pct)
  32:     137.00 (0.00 pct)       144.00 (-5.10 pct)                     133.00 (2.91 pct)                   130.00 (5.10 pct)
  64:     206.00 (0.00 pct)       210.00 (-1.94 pct)                     200.00 (2.91 pct)                   217.00 (-5.33 pct)
128:     348.00 (0.00 pct)       347.00 (0.28 pct)                      413.00 (-18.67 pct)                 366.00 (-5.17 pct)
256:     679.00 (0.00 pct)       669.00 (1.47 pct)                      669.00 (1.47 pct)                   675.00 (0.58 pct)
512:     1366.00 (0.00 pct)      1366.00 (0.00 pct)                     1442.00 (-5.56 pct)                 1430.00 (-4.68 pct)


==================================================================
Test          : schbench (new)
Units         : 99th percentile latency in us
Interpretation: Lower is better
Statistic     : Median
==================================================================
Metric: wakeup_lat_summary
#workers: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup          base + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
   1:      15.00 (0.00 pct)        15.00 (0.00 pct)                       16.00 (-6.66 pct)                   17.00 (-13.33 pct)
   2:      16.00 (0.00 pct)        16.00 (0.00 pct)                       17.00 (-6.25 pct)                   17.00 (-6.25 pct)
   4:      17.00 (0.00 pct)        17.00 (0.00 pct)                       15.00 (11.76 pct)                   17.00 (0.00 pct)
   8:      11.00 (0.00 pct)        13.00 (-18.18 pct)                     11.00 (0.00 pct)                    11.00 (0.00 pct)
  16:      11.00 (0.00 pct)        11.00 (0.00 pct)                       10.00 (9.09 pct)                     9.00 (18.18 pct)
  32:      11.00 (0.00 pct)        11.00 (0.00 pct)                       11.00 (0.00 pct)                    11.00 (0.00 pct)
  64:      10.00 (0.00 pct)        11.00 (-10.00 pct)                     10.00 (0.00 pct)                    10.00 (0.00 pct)
128:      11.00 (0.00 pct)        12.00 (-9.09 pct)                      12.00 (-9.09 pct)                   11.00 (0.00 pct)
256:     117.00 (0.00 pct)       162.00 (-38.46 pct)                     90.00 (23.07 pct)                  103.00 (11.96 pct)
512:     22496.00 (0.00 pct)     21664.00 (3.69 pct)                    22368.00 (0.56 pct)                 21408.00 (4.83 pct)

Metric: request_lat_summary
#workers: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup          base + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
   1:     6872.00 (0.00 pct)      6872.00 (0.00 pct)                     6792.00 (1.16 pct)                  6856.00 (0.23 pct)
   2:     6824.00 (0.00 pct)      6824.00 (0.00 pct)                     6872.00 (-0.70 pct)                 6856.00 (-0.46 pct)
   4:     6824.00 (0.00 pct)      6808.00 (0.23 pct)                     6872.00 (-0.70 pct)                 6824.00 (0.00 pct)
   8:     6824.00 (0.00 pct)      6824.00 (0.00 pct)                     6872.00 (-0.70 pct)                 6824.00 (0.00 pct)
  16:     6824.00 (0.00 pct)      6840.00 (-0.23 pct)                    6872.00 (-0.70 pct)                 6840.00 (-0.23 pct)
  32:     6840.00 (0.00 pct)      6840.00 (0.00 pct)                     6888.00 (-0.70 pct)                 6856.00 (-0.23 pct)
  64:     6840.00 (0.00 pct)      6872.00 (-0.46 pct)                    6888.00 (-0.70 pct)                 6872.00 (-0.46 pct)
128:     12272.00 (0.00 pct)     12784.00 (-4.17 pct)                   13200.00 (-7.56 pct)                12016.00 (2.08 pct)
256:     13328.00 (0.00 pct)     13392.00 (-0.48 pct)                   13712.00 (-2.88 pct)                13552.00 (-1.68 pct)
512:     88832.00 (0.00 pct)     86400.00 (2.73 pct)                    88192.00 (0.72 pct)                 85632.00 (3.60 pct)

Metric: rps_summary
#workers: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup          base + aaron-tg-load-avg       base + queued-wakeup + tg-load-avg
   1:     297.00 (0.00 pct)       297.00 (0.00 pct)                      297.00 (0.00 pct)                   299.00 (-0.67 pct)
   2:     601.00 (0.00 pct)       603.00 (-0.33 pct)                     595.00 (0.99 pct)                   601.00 (0.00 pct)
   4:     1206.00 (0.00 pct)      1206.00 (0.00 pct)                     1190.00 (1.32 pct)                  1206.00 (0.00 pct)
   8:     2412.00 (0.00 pct)      2412.00 (0.00 pct)                     2396.00 (0.66 pct)                  2420.00 (-0.33 pct)
  16:     4840.00 (0.00 pct)      4824.00 (0.33 pct)                     4792.00 (0.99 pct)                  4840.00 (0.00 pct)
  32:     9648.00 (0.00 pct)      9648.00 (0.00 pct)                     9584.00 (0.66 pct)                  9680.00 (-0.33 pct)
  64:     19360.00 (0.00 pct)     19296.00 (0.33 pct)                    19168.00 (0.99 pct)                 19296.00 (0.33 pct)
128:     37952.00 (0.00 pct)     35264.00 (7.08 pct)                    36672.00 (3.37 pct)                 38080.00 (-0.33 pct)
256:     41408.00 (0.00 pct)     41536.00 (-0.30 pct)                   39744.00 (4.01 pct)                 40896.00 (1.23 pct)
512:     36288.00 (0.00 pct)     36800.00 (-1.41 pct)                   35264.00 (2.82 pct)                 35776.00 (1.41 pct)

Tested-by: Swapnil Sapkal <Swapnil.Sapkal@amd.com>

> 
> Thanks,
> 
> Mathieu
> 
> Mathieu Desnoyers (3):
>    sched: Rename cpus_share_cache to cpus_share_llc
>    sched: Introduce cpus_share_l2c (v3)
>    sched: ttwu_queue_cond: skip queued wakeups across different l2 caches
> 
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Swapnil Sapkal <Swapnil.Sapkal@amd.com>
> Cc: Aaron Lu <aaron.lu@intel.com>
> Cc: Julien Desfossez <jdesfossez@digitalocean.com>
> Cc: x86@kernel.org
> 
>   block/blk-mq.c                 |  2 +-
>   include/linux/sched/topology.h | 10 ++++++++--
>   kernel/sched/core.c            | 14 +++++++++++---
>   kernel/sched/fair.c            |  8 ++++----
>   kernel/sched/sched.h           |  2 ++
>   kernel/sched/topology.c        | 32 +++++++++++++++++++++++++++++---
>   6 files changed, 55 insertions(+), 13 deletions(-)
> 
--
Thanks and Regards,
Swapnil

Re: [RFC PATCH v3 0/3] sched: Skip queued wakeups only when L2 is shared

Posted by Mathieu Desnoyers 2 years, 5 months ago

On 8/25/23 06:11, Swapnil Sapkal wrote:
> Hello Mathieu,
> 
> On 8/22/2023 5:01 PM, Mathieu Desnoyers wrote:
>> This series improves performance of scheduler wakeups on large systems
>> by skipping queued wakeups only when CPUs share their L2 cache, rather
>> than when they share their LLC.
>>
>> The speedup mainly reproduces on workloads which have at least *some*
>> idle time (because it significantly increases the number of migrations,
>> and thus remote wakeups), *and* it needs to have a sufficient load to
>> cause contention on the runqueue locks.
>>
>> Feedback is welcome,
> 
> I ran some micro-benchmarks as part of testing this series. Here are the
> observations:
> 
> - Hackbench shows improvement with this patch and Aaron's patch with
>    6.5-rc1 kernel as the baseline.
> 
> - tbench and netperf shows shows some dip in performance with highly
>    overloaded case.
> 
> - Other micro-benchmarks shows more or less similar performance with
>    these patches.

Those results look promising! Thanks for testing!

Mathieu


> 
> o System Details
> 
> - 4th Generation EPYC System
> - 2 x 128C/256T
> - NPS1 mode
> 
> o Kernels
> 
> base:                                    6.5.0-rc1
> base + mathieu-queued-wakeup:        6.5.0-rc1 + Mathieu's patches [1]
> base + aaron-tg-load-avg:         6.5.0-rc1 + Aaron's patch [2]
> base + queued-wakeup + tg-load-avg:     6.5.0-rc1 + Mathieu's patches 
> [1] + Aaron's patch [2]
> 
> [References]
> 
> [1] "sched: Skip queued wakeups only when L2 is shared"
>      
> (https://lore.kernel.org/all/20230822113133.643238-1-mathieu.desnoyers@efficios.com/)
> [2] "Reduce cost of accessing tg->load_avg"
>      
> (https://lore.kernel.org/lkml/20230823060832.454842-1-aaron.lu@intel.com/)
> 
> ==================================================================
> Test          : hackbench
> Units         : Time in seconds
> Interpretation: Lower is better
> Statistic     : AMean
> ==================================================================
> Test:        6.5.0-rc1 (base)    base + mathieu-queued-wakeup       base 
> + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
>   1-groups:   22.15 (0.00 pct)      22.46 (-1.39 pct)                    
> 22.35 (-0.90 pct)                   21.20 (4.28 pct)
>   2-groups:   22.76 (0.00 pct)      21.78 (4.30 pct)                     
> 22.60 (0.70 pct)                    21.90 (3.77 pct)
>   4-groups:   22.12 (0.00 pct)      22.02 (0.45 pct)                     
> 22.22 (-0.45 pct)                   21.94 (0.81 pct)
>   8-groups:   24.80 (0.00 pct)      22.36 (9.83 pct)                     
> 22.99 (7.29 pct)                    22.00 (11.29 pct)
> 16-groups:   31.09 (0.00 pct)      21.56 (30.65 pct)                    
> 22.13 (28.81 pct)                   20.60 (33.74 pct)
> 
> ==================================================================
> Test          : tbench
> Units         : Throughput
> Interpretation: Higher is better
> Statistic     : AMean
> ==================================================================
> Clients: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup           base 
> + aaron-tg-load-avg       base + queued-wakeup + tg-load-avg
>      1    261.49 (0.00 pct)       261.18 (-0.11 pct)                     
> 262.29 (0.30 pct)                   257.80 (-1.41 pct)
>      2    514.08 (0.00 pct)       521.30 (1.40 pct)                      
> 517.66 (0.69 pct)                   510.96 (-0.60 pct)
>      4    1002.51 (0.00 pct)      988.81 (-1.36 pct)                     
> 995.04 (-0.74 pct)                  987.74 (-1.47 pct)
>      8    1978.74 (0.00 pct)      1966.60 (-0.61 pct)                    
> 1991.85 (0.66 pct)                  1941.39 (-1.88 pct)
>     16    3864.14 (0.00 pct)      3952.03 (2.27 pct)                     
> 3914.80 (1.31 pct)                  3873.88 (0.25 pct)
>     32    7473.19 (0.00 pct)      7602.38 (1.72 pct)                     
> 7585.94 (1.50 pct)                  7423.44 (-0.66 pct)
>     64    14335.10 (0.00 pct)     14313.17 (-0.15 pct)                   
> 14474.67 (0.97 pct)                 14030.63 (-2.12 pct)
>    128    27275.73 (0.00 pct)     25176.80 (-7.69 pct)                   
> 28066.53 (2.89 pct)                 25045.53 (-8.17 pct)
>    256    41688.17 (0.00 pct)     44373.40 (6.44 pct)                    
> 43779.37 (5.01 pct)                 41427.00 (-0.62 pct)
>    512    137481.33 (0.00 pct)    136466.67 (-0.73 pct)                  
> 134824.00 (-1.93 pct)               141280.00 (2.76 pct)
>   1024    140534.00 (0.00 pct)    141916.33 (0.98 pct)                   
> 137008.33 (-2.50 pct)               126319.33 (-10.11 pct)
>   2048    145378.00 (0.00 pct)    145479.33 (0.06 pct)                   
> 138763.67 (-4.54 pct)               124471.00 (-14.38 pct)
> 
>   ==================================================================
>   Test          : netperf
>   Units         : Througput
>   Interpretation: Higher is better
>   Statistic     : AMean
>   ==================================================================
>                   6.5.0-rc1 (base)    base + mathieu-queued-wakeup       
> base + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
>    1-clients:      59642.88 (0.00 pct)        61647.37 (3.36 
> pct)         61186.24 (2.58 pct)                 59099.11 (-0.91 pct)
>    2-clients:      59349.65 (0.00 pct)        60896.01 (2.60 
> pct)         60582.49 (2.07 pct)                 62738.47 (5.70 pct)
>    4-clients:      59197.37 (0.00 pct)        60457.29 (2.12 
> pct)         63042.52 (6.49 pct)                 60879.58 (2.84 pct)
>    8-clients:      61977.66 (0.00 pct)        60389.92 (-2.56 
> pct)        62078.15 (0.16 pct)                 60314.65 (-2.68 pct)
>   16-clients:      61518.83 (0.00 pct)        61143.51 (-0.61 
> pct)        60946.08 (-0.93 pct)                59388.78 (-3.46 pct)
>   32-clients:      58230.81 (0.00 pct)        58653.20 (0.72 
> pct)         58594.14 (0.62 pct)                 58188.52 (-0.07 pct)
>   64-clients:      58050.92 (0.00 pct)        57834.55 (-0.37 
> pct)        58183.51 (0.22 pct)                 57565.75 (-0.83 pct)
>   128-clients:     54324.55 (0.00 pct)        54385.60 (0.11 
> pct)         54913.43 (1.08 pct)                 53917.11 (-0.75 pct)
>   256-clients:     70155.29 (0.00 pct)        69390.68 (-1.08 
> pct)        70097.50 (-0.08 pct)                64410.66 (-8.18 pct)
>   512-clients:     61511.77 (0.00 pct)        61480.99 (-0.05 
> pct)        54493.82 (-11.40 pct)               46227.05 (-24.84 pct)
> 
> ==================================================================
> Test          : stream-10
> Units         : Bandwidth, MB/s
> Interpretation: Higher is better
> Statistic     : HMean
> ==================================================================
> Test:      6.5.0-rc1 (base)      base + mathieu-queued-wakeup         
> base + aaron-tg-load-avg       base + queued-wakeup + tg-load-avg
>   Copy:   353336.76 (0.00 pct)       352956.36 (-0.10 pct)               
> 349583.67 (-1.06 pct)               351152.80 (-0.61 pct)
> Scale:   353474.88 (0.00 pct)       354582.35 (0.31 pct)                
> 350543.75 (-0.82 pct)               353275.74 (-0.05 pct)
>    Add:   371984.24 (0.00 pct)       372824.87 (0.22 pct)                
> 369173.72 (-0.75 pct)               370483.63 (-0.40 pct)
> Triad:   372625.41 (0.00 pct)       278389.62 (-25.28 pct)              
> 369504.06 (-0.83 pct)               369070.11 (-0.95 pct)
> 
> ==================================================================
> Test          : stream-100
> Units         : Bandwidth, MB/s
> Interpretation: Higher is better
> Statistic     : HMean
> ==================================================================
> Test:     6.5.0-rc1 (base)        base + mathieu-queued-wakeup       
> base + aaron-tg-load-avg       base + queued-wakeup + tg-load-avg
>   Copy:   353476.35 (0.00 pct)       354954.50 (0.41 pct)                
> 354614.56 (0.32 pct)                353512.71 (0.01 pct)
> Scale:   353214.73 (0.00 pct)       354884.12 (0.47 pct)                
> 355841.17 (0.74 pct)                353220.53 (0.00 pct)
>    Add:   370755.48 (0.00 pct)       372292.72 (0.41 pct)                
> 375307.35 (1.22 pct)                369917.77 (-0.22 pct)
> Triad:   370652.02 (0.00 pct)       372732.11 (0.56 pct)                
> 375718.85 (1.36 pct)                369926.26 (-0.19 pct)
> 
> ==================================================================
> Test          : schbench (old)
> Units         : 99th percentile latency in us
> Interpretation: Lower is better
> Statistic     : Median
> ==================================================================
> #workers: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup          base 
> + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
>    1:      56.00 (0.00 pct)        58.00 (-3.57 
> pct)                      60.00 (-7.14 pct)                   60.00 
> (-7.14 pct)
>    2:      61.00 (0.00 pct)        56.00 (8.19 
> pct)                       59.00 (3.27 pct)                    60.00 
> (1.63 pct)
>    4:      64.00 (0.00 pct)        62.00 (3.12 
> pct)                       66.00 (-3.12 pct)                   64.00 
> (0.00 pct)
>    8:      96.00 (0.00 pct)        78.00 (18.75 
> pct)                      76.00 (20.83 pct)                   93.00 
> (3.12 pct)
>   16:      98.00 (0.00 pct)        95.00 (3.06 
> pct)                       98.00 (0.00 pct)                    95.00 
> (3.06 pct)
>   32:     137.00 (0.00 pct)       144.00 (-5.10 pct)                     
> 133.00 (2.91 pct)                   130.00 (5.10 pct)
>   64:     206.00 (0.00 pct)       210.00 (-1.94 pct)                     
> 200.00 (2.91 pct)                   217.00 (-5.33 pct)
> 128:     348.00 (0.00 pct)       347.00 (0.28 pct)                      
> 413.00 (-18.67 pct)                 366.00 (-5.17 pct)
> 256:     679.00 (0.00 pct)       669.00 (1.47 pct)                      
> 669.00 (1.47 pct)                   675.00 (0.58 pct)
> 512:     1366.00 (0.00 pct)      1366.00 (0.00 pct)                     
> 1442.00 (-5.56 pct)                 1430.00 (-4.68 pct)
> 
> 
> ==================================================================
> Test          : schbench (new)
> Units         : 99th percentile latency in us
> Interpretation: Lower is better
> Statistic     : Median
> ==================================================================
> Metric: wakeup_lat_summary
> #workers: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup          base 
> + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
>    1:      15.00 (0.00 pct)        15.00 (0.00 
> pct)                       16.00 (-6.66 pct)                   17.00 
> (-13.33 pct)
>    2:      16.00 (0.00 pct)        16.00 (0.00 
> pct)                       17.00 (-6.25 pct)                   17.00 
> (-6.25 pct)
>    4:      17.00 (0.00 pct)        17.00 (0.00 
> pct)                       15.00 (11.76 pct)                   17.00 
> (0.00 pct)
>    8:      11.00 (0.00 pct)        13.00 (-18.18 
> pct)                     11.00 (0.00 pct)                    11.00 (0.00 
> pct)
>   16:      11.00 (0.00 pct)        11.00 (0.00 
> pct)                       10.00 (9.09 pct)                     9.00 
> (18.18 pct)
>   32:      11.00 (0.00 pct)        11.00 (0.00 
> pct)                       11.00 (0.00 pct)                    11.00 
> (0.00 pct)
>   64:      10.00 (0.00 pct)        11.00 (-10.00 
> pct)                     10.00 (0.00 pct)                    10.00 (0.00 
> pct)
> 128:      11.00 (0.00 pct)        12.00 (-9.09 pct)                      
> 12.00 (-9.09 pct)                   11.00 (0.00 pct)
> 256:     117.00 (0.00 pct)       162.00 (-38.46 pct)                     
> 90.00 (23.07 pct)                  103.00 (11.96 pct)
> 512:     22496.00 (0.00 pct)     21664.00 (3.69 pct)                    
> 22368.00 (0.56 pct)                 21408.00 (4.83 pct)
> 
> Metric: request_lat_summary
> #workers: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup          base 
> + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
>    1:     6872.00 (0.00 pct)      6872.00 (0.00 pct)                     
> 6792.00 (1.16 pct)                  6856.00 (0.23 pct)
>    2:     6824.00 (0.00 pct)      6824.00 (0.00 pct)                     
> 6872.00 (-0.70 pct)                 6856.00 (-0.46 pct)
>    4:     6824.00 (0.00 pct)      6808.00 (0.23 pct)                     
> 6872.00 (-0.70 pct)                 6824.00 (0.00 pct)
>    8:     6824.00 (0.00 pct)      6824.00 (0.00 pct)                     
> 6872.00 (-0.70 pct)                 6824.00 (0.00 pct)
>   16:     6824.00 (0.00 pct)      6840.00 (-0.23 pct)                    
> 6872.00 (-0.70 pct)                 6840.00 (-0.23 pct)
>   32:     6840.00 (0.00 pct)      6840.00 (0.00 pct)                     
> 6888.00 (-0.70 pct)                 6856.00 (-0.23 pct)
>   64:     6840.00 (0.00 pct)      6872.00 (-0.46 pct)                    
> 6888.00 (-0.70 pct)                 6872.00 (-0.46 pct)
> 128:     12272.00 (0.00 pct)     12784.00 (-4.17 pct)                   
> 13200.00 (-7.56 pct)                12016.00 (2.08 pct)
> 256:     13328.00 (0.00 pct)     13392.00 (-0.48 pct)                   
> 13712.00 (-2.88 pct)                13552.00 (-1.68 pct)
> 512:     88832.00 (0.00 pct)     86400.00 (2.73 pct)                    
> 88192.00 (0.72 pct)                 85632.00 (3.60 pct)
> 
> Metric: rps_summary
> #workers: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup          base 
> + aaron-tg-load-avg       base + queued-wakeup + tg-load-avg
>    1:     297.00 (0.00 pct)       297.00 (0.00 pct)                      
> 297.00 (0.00 pct)                   299.00 (-0.67 pct)
>    2:     601.00 (0.00 pct)       603.00 (-0.33 pct)                     
> 595.00 (0.99 pct)                   601.00 (0.00 pct)
>    4:     1206.00 (0.00 pct)      1206.00 (0.00 pct)                     
> 1190.00 (1.32 pct)                  1206.00 (0.00 pct)
>    8:     2412.00 (0.00 pct)      2412.00 (0.00 pct)                     
> 2396.00 (0.66 pct)                  2420.00 (-0.33 pct)
>   16:     4840.00 (0.00 pct)      4824.00 (0.33 pct)                     
> 4792.00 (0.99 pct)                  4840.00 (0.00 pct)
>   32:     9648.00 (0.00 pct)      9648.00 (0.00 pct)                     
> 9584.00 (0.66 pct)                  9680.00 (-0.33 pct)
>   64:     19360.00 (0.00 pct)     19296.00 (0.33 pct)                    
> 19168.00 (0.99 pct)                 19296.00 (0.33 pct)
> 128:     37952.00 (0.00 pct)     35264.00 (7.08 pct)                    
> 36672.00 (3.37 pct)                 38080.00 (-0.33 pct)
> 256:     41408.00 (0.00 pct)     41536.00 (-0.30 pct)                   
> 39744.00 (4.01 pct)                 40896.00 (1.23 pct)
> 512:     36288.00 (0.00 pct)     36800.00 (-1.41 pct)                   
> 35264.00 (2.82 pct)                 35776.00 (1.41 pct)
> 
> Tested-by: Swapnil Sapkal <Swapnil.Sapkal@amd.com>
> 
>>
>> Thanks,
>>
>> Mathieu
>>
>> Mathieu Desnoyers (3):
>>    sched: Rename cpus_share_cache to cpus_share_llc
>>    sched: Introduce cpus_share_l2c (v3)
>>    sched: ttwu_queue_cond: skip queued wakeups across different l2 caches
>>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Valentin Schneider <vschneid@redhat.com>
>> Cc: Steven Rostedt <rostedt@goodmis.org>
>> Cc: Ben Segall <bsegall@google.com>
>> Cc: Mel Gorman <mgorman@suse.de>
>> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
>> Cc: Vincent Guittot <vincent.guittot@linaro.org>
>> Cc: Juri Lelli <juri.lelli@redhat.com>
>> Cc: Swapnil Sapkal <Swapnil.Sapkal@amd.com>
>> Cc: Aaron Lu <aaron.lu@intel.com>
>> Cc: Julien Desfossez <jdesfossez@digitalocean.com>
>> Cc: x86@kernel.org
>>
>>   block/blk-mq.c                 |  2 +-
>>   include/linux/sched/topology.h | 10 ++++++++--
>>   kernel/sched/core.c            | 14 +++++++++++---
>>   kernel/sched/fair.c            |  8 ++++----
>>   kernel/sched/sched.h           |  2 ++
>>   kernel/sched/topology.c        | 32 +++++++++++++++++++++++++++++---
>>   6 files changed, 55 insertions(+), 13 deletions(-)
>>
> -- 
> Thanks and Regards,
> Swapnil

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com