[RFC PATCH v0 0/3] sched/numa: Process Adaptive autoNUMA

Bharata B Rao posted 3 patches 4 years, 5 months ago
include/linux/mm_types.h |  14 ++
kernel/sched/debug.c     |   2 +
kernel/sched/fair.c      | 344 ++++++++++++++++++++++++++++++++++++++-
kernel/sched/sched.h     |   2 +
4 files changed, 358 insertions(+), 4 deletions(-)
[RFC PATCH v0 0/3] sched/numa: Process Adaptive autoNUMA
Posted by Bharata B Rao 4 years, 5 months ago
Hi,

This patchset implements an adaptive algorithm for calculating the autonuma
scan period. In the existing mechanism of scan period calculation,

- scan period is derived from the per-thread stats.
- static threshold (NUMA_PERIOD_THRESHOLD) is used for changing the
  scan rate.

In this new approach (Process Adaptive autoNUMA or PAN), we gather NUMA
fault stats at per-process level which allows for capturing the application
behaviour better. In addition, the algorithm learns and adjusts the scan
rate based on remote fault rate. By not sticking to a static threshold, the
algorithm can respond better to different workload behaviours.

Since the threads of a processes are already considered as a group,
we add a bunch of metrics to the task's mm to track the various
types of faults and derive the scan rate from them.

The new per-process fault stats contribute only to the per-process
scan period calculation, while the existing per-thread stats continue
to contribute towards the numa_group stats which eventually
determine the thresholds for migrating memory and threads
across nodes.

This patchset has been tested with a bunch of benchmarks on the
following system:

2 socket AMD Milan System
32 cores or 64 threads per socket
256GB memory per socket amounting to 512GB in total
transparent_hugepage=never has been used for all the below tests.
NPS1 NUMA configuration where each socket is a NUMA node

$ numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
node 0 size: 257645 MB
node 0 free: 255193 MB
node 1 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
node 1 size: 257984 MB
node 1 free: 257017 MB
node distances:
node   0   1 
  0:  10  32 
  1:  32  10 

While this is an early version that we are experimenting on NPS1
configuration with THP off, we plan to test it on other configurations
as well. However posting it now for some early feedback.

Here are the numbers from some of the benchmarks that we have
tried. A brief description of these benchmarks is also given at
end of this mail. The detailed results are available at
https://drive.google.com/drive/folders/1O7sY-YBsT3F5GHZMiOGQbXkgpp6O0BEo

Plots of how scan period varies in default vs PAN for a couple
of benchmarks are also present in the above folder.

------------------------------------------------------
% gain of PAN vs default (Avg of 3 runs)
------------------------------------------------------
NAS-BT		-0.17
NAS-CG		+9.39
NAS-MG		+8.19
NAS-FT		+2.23
Hashjoin	+0.58
Graph500	+14.93
Pagerank	+0.37
------------------------------------------------------
		Default		PAN		%diff
------------------------------------------------------
		NUMA hint faults(Total of 3 runs)
------------------------------------------------------
NAS-BT		758282358	539850429	+29
NAS-CG		2179458823	1180301361	+46
NAS-MG		517641172	346066391	+33
NAS-FT		297044964	230033861	+23
Hashjoin	201684863	268436275	-33
Graph500	261808733	154338827	+41
Pagerank	217917818	211260310	+03
------------------------------------------------------
		Migrations(Total of 3 runs)
------------------------------------------------------
NAS-BT		106888517	86482076	+19
NAS-CG		81191368	12859924	+84
NAS-MG		83927451	39651254	+53
NAS-FT		61807715	38934618	+37
Hashjoin	45406983	59828843	-32
Graph500	22798837	21560714	+05
Pagerank	59072135	44968673	+24
------------------------------------------------------

And here are some tests from a few microbenchmarks of mmtests suite.
(The results are trimmed a bit here, the complete results can
be viewed in the above mentioned link)

Hackbench
---------
hackbench-process-pipes
                           hackbench              hackbench
                             default                    pan
Min       256     23.5510 (   0.00%)     23.1900 (   1.53%)
Amean     256     24.4604 (   0.00%)     24.0353 *   1.74%*
Stddev    256      0.4420 (   0.00%)      0.7611 ( -72.18%)
CoeffVar  256      1.8072 (   0.00%)      3.1666 ( -75.22%)
Max       256     25.4930 (   0.00%)     30.5450 ( -19.82%)
BAmean-50 256     24.1074 (   0.00%)     23.6616 (   1.85%)
BAmean-95 256     24.4111 (   0.00%)     23.9308 (   1.97%)
BAmean-99 256     24.4499 (   0.00%)     23.9696 (   1.96%)

                   hackbench   hackbench
                     default         pan
Duration User       25810.02    25158.93
Duration System    276322.70   271729.32
Duration Elapsed     2707.75     2671.33

                                      hackbench      hackbench
                                        default            pan
Ops NUMA alloc hit                1082415453.00  1088025994.00
Ops NUMA alloc miss                        0.00           0.00
Ops NUMA interleave hit                    0.00           0.00
Ops NUMA alloc local              1082415441.00  1088025974.00
Ops NUMA base-page range updates       33475.00      228900.00
Ops NUMA PTE updates                   33475.00      228900.00
Ops NUMA PMD updates                       0.00           0.00
Ops NUMA hint faults                   15758.00      222100.00
Ops NUMA hint local faults %           15371.00      214570.00
Ops NUMA hint local percent               97.54          96.61
Ops NUMA pages migrated                  235.00        4029.00
Ops AutoNUMA cost                         79.03        1112.18

tbench
------
tbench4
                              tbench                 tbench
                             default                    pan
Hmean     1        436.89 (   0.00%)      432.73 *  -0.95%*
Hmean     2        834.27 (   0.00%)      848.11 *   1.66%*
Hmean     4       1629.50 (   0.00%)     1614.22 *  -0.94%*
Hmean     8       2944.06 (   0.00%)     3031.66 *   2.98%*
Hmean     16      5418.25 (   0.00%)     5674.74 *   4.73%*
Hmean     32      9959.60 (   0.00%)     9009.82 *  -9.54%*
Hmean     64     13999.14 (   0.00%)    12160.51 * -13.13%*
Hmean     128    16797.09 (   0.00%)    16506.14 *  -1.73%*
Hmean     256    25344.27 (   0.00%)    25683.66 *   1.34%*
Hmean     512    25289.03 (   0.00%)    25513.77 *   0.89%*
BHmean-50 1        437.13 (   0.00%)      433.01 (  -0.94%)
BHmean-50 2        836.35 (   0.00%)      848.85 (   1.49%)
BHmean-50 4       1631.39 (   0.00%)     1618.43 (  -0.79%)
BHmean-50 8       2948.25 (   0.00%)     3037.86 (   3.04%)
BHmean-50 16      5425.17 (   0.00%)     5684.25 (   4.78%)
BHmean-50 32      9969.17 (   0.00%)     9034.06 (  -9.38%)
BHmean-50 64     14013.93 (   0.00%)    12202.07 ( -12.93%)
BHmean-50 128    16881.94 (   0.00%)    16571.27 (  -1.84%)
BHmean-50 256    25379.59 (   0.00%)    25819.18 (   1.73%)
BHmean-50 512    25435.41 (   0.00%)    25718.02 (   1.11%)
BHmean-95 1        436.92 (   0.00%)      432.81 (  -0.94%)
BHmean-95 2        834.59 (   0.00%)      848.23 (   1.63%)
BHmean-95 4       1629.73 (   0.00%)     1614.83 (  -0.91%)
BHmean-95 8       2945.02 (   0.00%)     3032.19 (   2.96%)
BHmean-95 16      5418.86 (   0.00%)     5675.91 (   4.74%)
BHmean-95 32      9962.57 (   0.00%)     9014.17 (  -9.52%)
BHmean-95 64     14002.44 (   0.00%)    12164.32 ( -13.13%)
BHmean-95 128    16820.56 (   0.00%)    16522.82 (  -1.77%)
BHmean-95 256    25347.34 (   0.00%)    25692.56 (   1.36%)
BHmean-95 512    25302.10 (   0.00%)    25528.52 (   0.89%)
BHmean-99 1        436.90 (   0.00%)      432.75 (  -0.95%)
BHmean-99 2        834.35 (   0.00%)      848.17 (   1.66%)
BHmean-99 4       1629.57 (   0.00%)     1614.38 (  -0.93%)
BHmean-99 8       2944.36 (   0.00%)     3031.77 (   2.97%)
BHmean-99 16      5418.40 (   0.00%)     5675.01 (   4.74%)
BHmean-99 32      9961.01 (   0.00%)     9011.43 (  -9.53%)
BHmean-99 64     14000.68 (   0.00%)    12161.34 ( -13.14%)
BHmean-99 128    16803.44 (   0.00%)    16511.94 (  -1.73%)
BHmean-99 256    25344.93 (   0.00%)    25685.57 (   1.34%)
BHmean-99 512    25291.87 (   0.00%)    25516.94 (   0.89%)

                      tbench      tbench
                     default         pan
Duration User        8482.50     8289.35
Duration System     49462.63    49364.56
Duration Elapsed     2217.10     2217.08

                                         tbench         tbench
                                        default            pan
Ops NUMA alloc hit                 388738400.00   378941469.00
Ops NUMA alloc miss                        0.00           0.00
Ops NUMA interleave hit                    0.00           0.00
Ops NUMA alloc local               388738391.00   378941455.00
Ops NUMA base-page range updates      266760.00      266275.00
Ops NUMA PTE updates                  266760.00      266275.00
Ops NUMA PMD updates                       0.00           0.00
Ops NUMA hint faults                  241547.00      257790.00
Ops NUMA hint local faults %          145814.00      126410.00
Ops NUMA hint local percent               60.37          49.04
Ops NUMA pages migrated                51535.00       66083.00
Ops AutoNUMA cost                       1210.58        1292.07

dbench
------
dbench4 Latency
                                       dbench                 dbench
                                      default                    pan
Amean     latency-1           2.02 (   0.00%)        2.05 *  -1.52%*
Amean     latency-2           2.60 (   0.00%)        2.55 *   1.64%*
Amean     latency-4           3.52 (   0.00%)        3.56 *  -1.17%*
Amean     latency-8          12.79 (   0.00%)       11.83 *   7.49%*
Amean     latency-16         23.33 (   0.00%)       19.09 *  18.19%*
Amean     latency-32         19.30 (   0.00%)       18.83 *   2.43%*
Amean     latency-64         25.32 (   0.00%)       24.30 *   4.00%*
Amean     latency-128        45.25 (   0.00%)       42.93 *   5.13%*
Amean     latency-1024        0.00 (   0.00%)        0.00 *   0.00%*
BAmean-50 latency-1           1.65 (   0.00%)        1.74 (  -5.16%)
BAmean-50 latency-2           2.10 (   0.00%)        2.10 (  -0.13%)
BAmean-50 latency-4           2.65 (   0.00%)        2.71 (  -2.28%)
BAmean-50 latency-8           6.21 (   0.00%)        4.64 (  25.30%)
BAmean-50 latency-16         17.64 (   0.00%)       14.08 (  20.16%)
BAmean-50 latency-32         15.58 (   0.00%)       15.90 (  -2.07%)
BAmean-50 latency-64         20.76 (   0.00%)       20.31 (   2.15%)
BAmean-50 latency-128        36.22 (   0.00%)       34.85 (   3.80%)
BAmean-50 latency-1024        0.00 (   0.00%)        0.00 (   0.00%)
BAmean-95 latency-1           1.88 (   0.00%)        1.94 (  -3.17%)
BAmean-95 latency-2           2.25 (   0.00%)        2.26 (  -0.26%)
BAmean-95 latency-4           3.00 (   0.00%)        3.08 (  -2.71%)
BAmean-95 latency-8          11.66 (   0.00%)       10.03 (  13.97%)
BAmean-95 latency-16         22.30 (   0.00%)       17.68 (  20.73%)
BAmean-95 latency-32         17.95 (   0.00%)       17.70 (   1.38%)
BAmean-95 latency-64         23.57 (   0.00%)       22.72 (   3.62%)
BAmean-95 latency-128        42.44 (   0.00%)       39.96 (   5.84%)
BAmean-95 latency-1024        0.00 (   0.00%)        0.00 (   0.00%)
BAmean-99 latency-1           1.90 (   0.00%)        1.96 (  -3.30%)
BAmean-99 latency-2           2.38 (   0.00%)        2.37 (   0.48%)
BAmean-99 latency-4           3.24 (   0.00%)        3.34 (  -3.26%)
BAmean-99 latency-8          12.34 (   0.00%)       10.71 (  13.27%)
BAmean-99 latency-16         22.79 (   0.00%)       18.27 (  19.82%)
BAmean-99 latency-32         18.68 (   0.00%)       18.32 (   1.93%)
BAmean-99 latency-64         24.69 (   0.00%)       23.69 (   4.06%)
BAmean-99 latency-128        44.44 (   0.00%)       42.15 (   5.17%)
BAmean-99 latency-1024        0.00 (   0.00%)        0.00 (   0.00%)

dbench4 Throughput (misleading but traditional)
                               dbench                 dbench
                              default                    pan
Hmean     1         505.12 (   0.00%)      492.96 *  -2.41%*
Hmean     2         824.14 (   0.00%)      824.06 *  -0.01%*
Hmean     4        1174.61 (   0.00%)     1207.86 *   2.83%*
Hmean     8        1665.10 (   0.00%)     1667.27 *   0.13%*
Hmean     16       2215.59 (   0.00%)     2160.93 *  -2.47%*
Hmean     32       2727.05 (   0.00%)     2633.26 *  -3.44%*
Hmean     64       3128.64 (   0.00%)     3098.73 *  -0.96%*
Hmean     128      3282.89 (   0.00%)     3340.26 *   1.75%*
Hmean     1024     2551.02 (   0.00%)     2559.41 *   0.33%*
BHmean-50 1         509.87 (   0.00%)      495.10 (  -2.90%)
BHmean-50 2         829.35 (   0.00%)      828.14 (  -0.15%)
BHmean-50 4        1182.38 (   0.00%)     1219.30 (   3.12%)
BHmean-50 8        1678.49 (   0.00%)     1678.83 (   0.02%)
BHmean-50 16       2251.01 (   0.00%)     2194.52 (  -2.51%)
BHmean-50 32       2751.39 (   0.00%)     2678.45 (  -2.65%)
BHmean-50 64       3189.69 (   0.00%)     3154.45 (  -1.10%)
BHmean-50 128      3396.18 (   0.00%)     3451.59 (   1.63%)
BHmean-50 1024     2836.80 (   0.00%)     2836.84 (   0.00%)
BHmean-95 1         506.13 (   0.00%)      493.24 (  -2.55%)
BHmean-95 2         824.84 (   0.00%)      824.30 (  -0.06%)
BHmean-95 4        1175.91 (   0.00%)     1208.57 (   2.78%)
BHmean-95 8        1666.46 (   0.00%)     1668.22 (   0.11%)
BHmean-95 16       2219.59 (   0.00%)     2163.86 (  -2.51%)
BHmean-95 32       2731.26 (   0.00%)     2640.34 (  -3.33%)
BHmean-95 64       3144.73 (   0.00%)     3108.59 (  -1.15%)
BHmean-95 128      3306.51 (   0.00%)     3363.33 (   1.72%)
BHmean-95 1024     2658.37 (   0.00%)     2668.88 (   0.40%)
BHmean-99 1         505.37 (   0.00%)      493.08 (  -2.43%)
BHmean-99 2         824.31 (   0.00%)      824.12 (  -0.02%)
BHmean-99 4        1174.94 (   0.00%)     1208.02 (   2.81%)
BHmean-99 8        1665.40 (   0.00%)     1667.48 (   0.12%)
BHmean-99 16       2216.51 (   0.00%)     2161.60 (  -2.48%)
BHmean-99 32       2728.09 (   0.00%)     2635.09 (  -3.41%)
BHmean-99 64       3135.81 (   0.00%)     3102.12 (  -1.07%)
BHmean-99 128      3291.11 (   0.00%)     3349.16 (   1.76%)
BHmean-99 1024     2645.54 (   0.00%)     2655.67 (   0.38%)


                      dbench      dbench
                     default         pan
Duration User         822.55      827.85
Duration System      8384.99     8164.83
Duration Elapsed     1671.36     1670.74

                                         dbench         dbench
                                        default            pan
Ops NUMA alloc hit                 183324626.00   182350114.00
Ops NUMA alloc miss                        0.00           0.00
Ops NUMA interleave hit                    0.00           0.00
Ops NUMA alloc local               183324508.00   182350004.00
Ops NUMA base-page range updates      181531.00      515929.00
Ops NUMA PTE updates                  181531.00      515929.00
Ops NUMA PMD updates                       0.00           0.00
Ops NUMA hint faults                  162742.00      510979.00
Ops NUMA hint local faults %          120309.00      426848.00
Ops NUMA hint local percent               73.93          83.54
Ops NUMA pages migrated                37605.00       59519.00
Ops AutoNUMA cost                        815.70        2559.64

Netperf-RR
----------
netperf-udp-rr
                           netperf                netperf
                        rr-default                 rr-pan
Min       1   104915.69 (   0.00%)   104505.71 (  -0.39%)
Hmean     1   105865.46 (   0.00%)   105899.22 *   0.03%*
Stddev    1      528.45 (   0.00%)      881.92 ( -66.89%)
CoeffVar  1        0.50 (   0.00%)        0.83 ( -66.83%)
Max       1   106410.28 (   0.00%)   107196.52 (   0.74%)
BHmean-50 1   106232.53 (   0.00%)   106568.26 (   0.32%)
BHmean-95 1   105972.05 (   0.00%)   106056.35 (   0.08%)
BHmean-99 1   105972.05 (   0.00%)   106056.35 (   0.08%)

                     netperf     netperf
                  rr-default      rr-pan
Duration User          11.20       10.74
Duration System       202.40      201.32
Duration Elapsed      303.09      303.08

                                        netperf        netperf
                                     rr-default         rr-pan
Ops NUMA alloc hit                    183999.00      183853.00
Ops NUMA alloc miss                        0.00           0.00
Ops NUMA interleave hit                    0.00           0.00
Ops NUMA alloc local                  183999.00      183853.00
Ops NUMA base-page range updates           0.00       24370.00
Ops NUMA PTE updates                       0.00       24370.00
Ops NUMA PMD updates                       0.00           0.00
Ops NUMA hint faults                     539.00       24470.00
Ops NUMA hint local faults %             539.00       24447.00
Ops NUMA hint local percent              100.00          99.91
Ops NUMA pages migrated                    0.00          23.00
Ops AutoNUMA cost                          2.69         122.52

netperf-tcp-rr
                           netperf                netperf
                        rr-default                 rr-pan
Min       1    96156.03 (   0.00%)    96556.87 (   0.42%)
Hmean     1    96627.24 (   0.00%)    97551.38 *   0.96%*
Stddev    1      284.71 (   0.00%)      637.74 (-123.99%)
CoeffVar  1        0.29 (   0.00%)        0.65 (-121.87%)
Max       1    96974.45 (   0.00%)    98554.94 (   1.63%)
BHmean-50 1    96840.81 (   0.00%)    98067.19 (   1.27%)
BHmean-95 1    96679.89 (   0.00%)    97663.14 (   1.02%)
BHmean-99 1    96679.89 (   0.00%)    97663.14 (   1.02%)

                     netperf     netperf
                  rr-default      rr-pan
Duration User          10.21       10.26
Duration System       207.90      208.28
Duration Elapsed      302.99      303.02

                                        netperf        netperf
                                     rr-default         rr-pan
Ops NUMA alloc hit                    183669.00      183695.00
Ops NUMA alloc miss                        0.00           0.00
Ops NUMA interleave hit                    0.00           0.00
Ops NUMA alloc local                  183657.00      183695.00
Ops NUMA base-page range updates        3949.00       38561.00
Ops NUMA PTE updates                    3949.00       38561.00
Ops NUMA PMD updates                       0.00           0.00
Ops NUMA hint faults                    4186.00       43328.00
Ops NUMA hint local faults %            4100.00       43195.00
Ops NUMA hint local percent               97.95          99.69
Ops NUMA pages migrated                    9.00          73.00
Ops AutoNUMA cost                         20.96         216.91

Autonumabench
-------------
autonumabench
                                           autonumabench          autonumabench
                                                 default                    pan
Amean     syst-NUMA01                11664.40 (   0.00%)    11616.17 *   0.41%*
Amean     syst-NUMA01_THREADLOCAL        0.24 (   0.00%)        0.22 *   7.78%*
Amean     syst-NUMA02                    1.55 (   0.00%)        9.31 *-499.26%*
Amean     syst-NUMA02_SMT                1.14 (   0.00%)        4.04 *-254.39%*
Amean     elsp-NUMA01                  223.52 (   0.00%)      221.43 *   0.93%*
Amean     elsp-NUMA01_THREADLOCAL        0.95 (   0.00%)        0.94 *   0.76%*
Amean     elsp-NUMA02                    6.83 (   0.00%)        5.74 *  15.90%*
Amean     elsp-NUMA02_SMT                6.65 (   0.00%)        6.25 *   5.97%*
BAmean-50 syst-NUMA01                11455.44 (   0.00%)    10985.76 (   4.10%)
BAmean-50 syst-NUMA01_THREADLOCAL        0.22 (   0.00%)        0.21 (   7.46%)
BAmean-50 syst-NUMA02                    1.11 (   0.00%)        8.91 (-703.00%)
BAmean-50 syst-NUMA02_SMT                0.94 (   0.00%)        3.42 (-262.19%)
BAmean-50 elsp-NUMA01                  217.38 (   0.00%)      214.03 (   1.54%)
BAmean-50 elsp-NUMA01_THREADLOCAL        0.94 (   0.00%)        0.94 (   0.35%)
BAmean-50 elsp-NUMA02                    6.66 (   0.00%)        5.45 (  18.08%)
BAmean-50 elsp-NUMA02_SMT                6.50 (   0.00%)        6.09 (   6.31%)
BAmean-95 syst-NUMA01                11611.74 (   0.00%)    11448.30 (   1.41%)
BAmean-95 syst-NUMA01_THREADLOCAL        0.23 (   0.00%)        0.22 (   7.14%)
BAmean-95 syst-NUMA02                    1.27 (   0.00%)        9.21 (-624.93%)
BAmean-95 syst-NUMA02_SMT                0.97 (   0.00%)        3.90 (-300.34%)
BAmean-95 elsp-NUMA01                  221.75 (   0.00%)      218.53 (   1.45%)
BAmean-95 elsp-NUMA01_THREADLOCAL        0.94 (   0.00%)        0.94 (   0.53%)
BAmean-95 elsp-NUMA02                    6.75 (   0.00%)        5.68 (  15.81%)
BAmean-95 elsp-NUMA02_SMT                6.61 (   0.00%)        6.23 (   5.82%)
BAmean-99 syst-NUMA01                11611.74 (   0.00%)    11448.30 (   1.41%)
BAmean-99 syst-NUMA01_THREADLOCAL        0.23 (   0.00%)        0.22 (   7.14%)
BAmean-99 syst-NUMA02                    1.27 (   0.00%)        9.21 (-624.93%)
BAmean-99 syst-NUMA02_SMT                0.97 (   0.00%)        3.90 (-300.34%)
BAmean-99 elsp-NUMA01                  221.75 (   0.00%)      218.53 (   1.45%)
BAmean-99 elsp-NUMA01_THREADLOCAL        0.94 (   0.00%)        0.94 (   0.53%)
BAmean-99 elsp-NUMA02                    6.75 (   0.00%)        5.68 (  15.81%)
BAmean-99 elsp-NUMA02_SMT                6.61 (   0.00%)        6.23 (   5.82%)

                autonumabenchautonumabench
                     default         pan
Duration User       94363.43    94436.71
Duration System     81671.72    81408.53
Duration Elapsed     1676.81     1647.99

                                  autonumabench  autonumabench
                                        default            pan
Ops NUMA alloc hit                 539544115.00   539522029.00
Ops NUMA alloc miss                        0.00           0.00
Ops NUMA interleave hit                    0.00           0.00
Ops NUMA alloc local               279025768.00   281735736.00
Ops NUMA base-page range updates    69695169.00    84767502.00
Ops NUMA PTE updates                69695169.00    84767502.00
Ops NUMA PMD updates                       0.00           0.00
Ops NUMA hint faults                69691818.00    87895044.00
Ops NUMA hint local faults %        56565519.00    65819747.00
Ops NUMA hint local percent               81.17          74.88
Ops NUMA pages migrated              5950362.00     8310169.00
Ops AutoNUMA cost                     349060.01      440226.49

Here is a short description of different benchmarks used:

NAS Parallel benchmarks
-----------------------
Using OpenMP version from https://www.nas.nasa.gov/software/npb.html
Variations tried: BT, CG, MT and FT
Memory Footprint:
BT - 170GB
CG - 196GB
MG - 211GB
FT - 81GB
Results: Operations per second.

Hashjoin
--------
Benchmark from IISc that does SQL join between two tables.
Used the version from https://github.com/mitosis-project/mitosis-asplos20-artifact 
It performs a fixed number of join operations
Memory Footprint: 80GB
Results: Time taken

Graph500
--------
Benchmark from https://graph500.org/ that does bread first search
on a graph. The score is reported as TEPS (Traversed Edges Per Second).
It comprises of both search and validation phases, but only the
score phase is contributing to the final score here.
Memory Footprint: 70GB
Results: TEPS

PageRank
--------
This is part of GAP Benchmark Suite from https://github.com/sbeamer/gapbs
It performs basic search operations.
Memory Footprint: 91GB
Results: Time taken

Disha Talreja (3):
  sched/numa: Process based autonuma scan period framework
  sched/numa: Add cumulative history of per-process fault stats
  sched/numa: Add adaptive scan period calculation

 include/linux/mm_types.h |  14 ++
 kernel/sched/debug.c     |   2 +
 kernel/sched/fair.c      | 344 ++++++++++++++++++++++++++++++++++++++-
 kernel/sched/sched.h     |   2 +
 4 files changed, 358 insertions(+), 4 deletions(-)

-- 
2.25.1

Re: [RFC PATCH v0 0/3] sched/numa: Process Adaptive autoNUMA
Posted by Mel Gorman 4 years, 4 months ago
On Fri, Jan 28, 2022 at 10:58:48AM +0530, Bharata B Rao wrote:
> Hi,
> 
> This patchset implements an adaptive algorithm for calculating the autonuma
> scan period.

autonuma refers to the khugepaged-like approach to NUMA balancing that
was later superceded by NUMA Balancing (NUMAB) and is generally reflected
by the naming e.g. git grep -i autonuma and note how few references there
are to autonuma versus numab or "NUMA balancing". I know MMTests still
refers to AutoNUMA but mostly because at the time it was written,
autoNUMA was what was being evaluated and I never updated the naming.

> In the existing mechanism of scan period calculation,
> 
> - scan period is derived from the per-thread stats.
> - static threshold (NUMA_PERIOD_THRESHOLD) is used for changing the
>   scan rate.
> 
> In this new approach (Process Adaptive autoNUMA or PAN), we gather NUMA
> fault stats at per-process level which allows for capturing the application
> behaviour better. In addition, the algorithm learns and adjusts the scan
> rate based on remote fault rate. By not sticking to a static threshold, the
> algorithm can respond better to different workload behaviours.
> 

NUMA Balancing is concerned with threads (task) and an address space (mm)
so basing the naming on Address Space rather than process may be more
appropriate although I admit the acronym is not as snappy.

> Since the threads of a processes are already considered as a group,
> we add a bunch of metrics to the task's mm to track the various
> types of faults and derive the scan rate from them.
> 

Enumerate the types of faults and note how the per-thread and
per-address-space metrics are related.

> The new per-process fault stats contribute only to the per-process
> scan period calculation, while the existing per-thread stats continue
> to contribute towards the numa_group stats which eventually
> determine the thresholds for migrating memory and threads
> across nodes.
> 
> This patchset has been tested with a bunch of benchmarks on the
> following system:
> 

Please include the comparisons of both the headline metrics and notes on
the change in scan rates in the changelog of the patch. Not all people
are access to Google drive and it is not guaranteed to remain forever.
Similarly, the leader is not guaranteed to appear in the git history

> ------------------------------------------------------
> % gain of PAN vs default (Avg of 3 runs)
> ------------------------------------------------------
> NAS-BT		-0.17
> NAS-CG		+9.39
> NAS-MG		+8.19
> NAS-FT		+2.23
> Hashjoin	+0.58
> Graph500	+14.93
> Pagerank	+0.37



> ------------------------------------------------------
> 		Default		PAN		%diff
> ------------------------------------------------------
> 		NUMA hint faults(Total of 3 runs)
> ------------------------------------------------------
> NAS-BT		758282358	539850429	+29
> NAS-CG		2179458823	1180301361	+46
> NAS-MG		517641172	346066391	+33
> NAS-FT		297044964	230033861	+23
> Hashjoin	201684863	268436275	-33
> Graph500	261808733	154338827	+41
> Pagerank	217917818	211260310	+03
> ------------------------------------------------------
> 		Migrations(Total of 3 runs)
> ------------------------------------------------------
> NAS-BT		106888517	86482076	+19
> NAS-CG		81191368	12859924	+84
> NAS-MG		83927451	39651254	+53
> NAS-FT		61807715	38934618	+37
> Hashjoin	45406983	59828843	-32
> Graph500	22798837	21560714	+05
> Pagerank	59072135	44968673	+24
> ------------------------------------------------------
> 
> And here are some tests from a few microbenchmarks of mmtests suite.
> (The results are trimmed a bit here, the complete results can
> be viewed in the above mentioned link)
> 
> Hackbench
> ---------
> hackbench-process-pipes
>                            hackbench              hackbench
>                              default                    pan
> Min       256     23.5510 (   0.00%)     23.1900 (   1.53%)
> Amean     256     24.4604 (   0.00%)     24.0353 *   1.74%*
> Stddev    256      0.4420 (   0.00%)      0.7611 ( -72.18%)
> CoeffVar  256      1.8072 (   0.00%)      3.1666 ( -75.22%)
> Max       256     25.4930 (   0.00%)     30.5450 ( -19.82%)
> BAmean-50 256     24.1074 (   0.00%)     23.6616 (   1.85%)
> BAmean-95 256     24.4111 (   0.00%)     23.9308 (   1.97%)
> BAmean-99 256     24.4499 (   0.00%)     23.9696 (   1.96%)
> 
>                    hackbench   hackbench
>                      default         pan
> Duration User       25810.02    25158.93
> Duration System    276322.70   271729.32
> Duration Elapsed     2707.75     2671.33
> 

>                                       hackbench      hackbench
>                                         default            pan
> Ops NUMA alloc hit                1082415453.00  1088025994.00
> Ops NUMA alloc miss                        0.00           0.00
> Ops NUMA interleave hit                    0.00           0.00
> Ops NUMA alloc local              1082415441.00  1088025974.00
> Ops NUMA base-page range updates       33475.00      228900.00
> Ops NUMA PTE updates                   33475.00      228900.00
> Ops NUMA PMD updates                       0.00           0.00
> Ops NUMA hint faults                   15758.00      222100.00
> Ops NUMA hint local faults %           15371.00      214570.00
> Ops NUMA hint local percent               97.54          96.61
> Ops NUMA pages migrated                  235.00        4029.00
> Ops AutoNUMA cost                         79.03        1112.18
> 

Hackbench processes are generally short-lived enough that NUMA balancing
has a marginal impact. Interesting though that updates and hints were
increased by a lot relatively speaking.

> tbench
> ------
> tbench4
>                               tbench                 tbench
>                              default                    pan
> Hmean     1        436.89 (   0.00%)      432.73 *  -0.95%*
> Hmean     2        834.27 (   0.00%)      848.11 *   1.66%*
> Hmean     4       1629.50 (   0.00%)     1614.22 *  -0.94%*
> Hmean     8       2944.06 (   0.00%)     3031.66 *   2.98%*
> Hmean     16      5418.25 (   0.00%)     5674.74 *   4.73%*
> Hmean     32      9959.60 (   0.00%)     9009.82 *  -9.54%*
> Hmean     64     13999.14 (   0.00%)    12160.51 * -13.13%*
> Hmean     128    16797.09 (   0.00%)    16506.14 *  -1.73%*
> Hmean     256    25344.27 (   0.00%)    25683.66 *   1.34%*
> Hmean     512    25289.03 (   0.00%)    25513.77 *   0.89%*
> BHmean-50 1        437.13 (   0.00%)      433.01 (  -0.94%)
> BHmean-50 2        836.35 (   0.00%)      848.85 (   1.49%)
> BHmean-50 4       1631.39 (   0.00%)     1618.43 (  -0.79%)
> BHmean-50 8       2948.25 (   0.00%)     3037.86 (   3.04%)
> BHmean-50 16      5425.17 (   0.00%)     5684.25 (   4.78%)
> BHmean-50 32      9969.17 (   0.00%)     9034.06 (  -9.38%)
> BHmean-50 64     14013.93 (   0.00%)    12202.07 ( -12.93%)
> BHmean-50 128    16881.94 (   0.00%)    16571.27 (  -1.84%)
> BHmean-50 256    25379.59 (   0.00%)    25819.18 (   1.73%)
> BHmean-50 512    25435.41 (   0.00%)    25718.02 (   1.11%)
> BHmean-95 1        436.92 (   0.00%)      432.81 (  -0.94%)
> BHmean-95 2        834.59 (   0.00%)      848.23 (   1.63%)
> BHmean-95 4       1629.73 (   0.00%)     1614.83 (  -0.91%)
> BHmean-95 8       2945.02 (   0.00%)     3032.19 (   2.96%)
> BHmean-95 16      5418.86 (   0.00%)     5675.91 (   4.74%)
> BHmean-95 32      9962.57 (   0.00%)     9014.17 (  -9.52%)
> BHmean-95 64     14002.44 (   0.00%)    12164.32 ( -13.13%)
> BHmean-95 128    16820.56 (   0.00%)    16522.82 (  -1.77%)
> BHmean-95 256    25347.34 (   0.00%)    25692.56 (   1.36%)
> BHmean-95 512    25302.10 (   0.00%)    25528.52 (   0.89%)
> BHmean-99 1        436.90 (   0.00%)      432.75 (  -0.95%)
> BHmean-99 2        834.35 (   0.00%)      848.17 (   1.66%)
> BHmean-99 4       1629.57 (   0.00%)     1614.38 (  -0.93%)
> BHmean-99 8       2944.36 (   0.00%)     3031.77 (   2.97%)
> BHmean-99 16      5418.40 (   0.00%)     5675.01 (   4.74%)
> BHmean-99 32      9961.01 (   0.00%)     9011.43 (  -9.53%)
> BHmean-99 64     14000.68 (   0.00%)    12161.34 ( -13.14%)
> BHmean-99 128    16803.44 (   0.00%)    16511.94 (  -1.73%)
> BHmean-99 256    25344.93 (   0.00%)    25685.57 (   1.34%)
> BHmean-99 512    25291.87 (   0.00%)    25516.94 (   0.89%)
> 
>                       tbench      tbench
>                      default         pan
> Duration User        8482.50     8289.35
> Duration System     49462.63    49364.56
> Duration Elapsed     2217.10     2217.08
> 
>                                          tbench         tbench
>                                         default            pan
> Ops NUMA alloc hit                 388738400.00   378941469.00
> Ops NUMA alloc miss                        0.00           0.00
> Ops NUMA interleave hit                    0.00           0.00
> Ops NUMA alloc local               388738391.00   378941455.00
> Ops NUMA base-page range updates      266760.00      266275.00
> Ops NUMA PTE updates                  266760.00      266275.00
> Ops NUMA PMD updates                       0.00           0.00
> Ops NUMA hint faults                  241547.00      257790.00
> Ops NUMA hint local faults %          145814.00      126410.00
> Ops NUMA hint local percent               60.37          49.04
> Ops NUMA pages migrated                51535.00       66083.00
> Ops AutoNUMA cost                       1210.58        1292.07
> 

Not much change.

> dbench
> ------
> dbench4 Latency
>                                        dbench                 dbench
>                                       default                    pan
> Amean     latency-1           2.02 (   0.00%)        2.05 *  -1.52%*
> Amean     latency-2           2.60 (   0.00%)        2.55 *   1.64%*
> Amean     latency-4           3.52 (   0.00%)        3.56 *  -1.17%*
> Amean     latency-8          12.79 (   0.00%)       11.83 *   7.49%*
> Amean     latency-16         23.33 (   0.00%)       19.09 *  18.19%*
> Amean     latency-32         19.30 (   0.00%)       18.83 *   2.43%*
> Amean     latency-64         25.32 (   0.00%)       24.30 *   4.00%*
> Amean     latency-128        45.25 (   0.00%)       42.93 *   5.13%*
> Amean     latency-1024        0.00 (   0.00%)        0.00 *   0.00%*
> BAmean-50 latency-1           1.65 (   0.00%)        1.74 (  -5.16%)
> BAmean-50 latency-2           2.10 (   0.00%)        2.10 (  -0.13%)
> BAmean-50 latency-4           2.65 (   0.00%)        2.71 (  -2.28%)
> BAmean-50 latency-8           6.21 (   0.00%)        4.64 (  25.30%)
> BAmean-50 latency-16         17.64 (   0.00%)       14.08 (  20.16%)
> BAmean-50 latency-32         15.58 (   0.00%)       15.90 (  -2.07%)
> BAmean-50 latency-64         20.76 (   0.00%)       20.31 (   2.15%)
> BAmean-50 latency-128        36.22 (   0.00%)       34.85 (   3.80%)
> BAmean-50 latency-1024        0.00 (   0.00%)        0.00 (   0.00%)
> BAmean-95 latency-1           1.88 (   0.00%)        1.94 (  -3.17%)
> BAmean-95 latency-2           2.25 (   0.00%)        2.26 (  -0.26%)
> BAmean-95 latency-4           3.00 (   0.00%)        3.08 (  -2.71%)
> BAmean-95 latency-8          11.66 (   0.00%)       10.03 (  13.97%)
> BAmean-95 latency-16         22.30 (   0.00%)       17.68 (  20.73%)
> BAmean-95 latency-32         17.95 (   0.00%)       17.70 (   1.38%)
> BAmean-95 latency-64         23.57 (   0.00%)       22.72 (   3.62%)
> BAmean-95 latency-128        42.44 (   0.00%)       39.96 (   5.84%)
> BAmean-95 latency-1024        0.00 (   0.00%)        0.00 (   0.00%)
> BAmean-99 latency-1           1.90 (   0.00%)        1.96 (  -3.30%)
> BAmean-99 latency-2           2.38 (   0.00%)        2.37 (   0.48%)
> BAmean-99 latency-4           3.24 (   0.00%)        3.34 (  -3.26%)
> BAmean-99 latency-8          12.34 (   0.00%)       10.71 (  13.27%)
> BAmean-99 latency-16         22.79 (   0.00%)       18.27 (  19.82%)
> BAmean-99 latency-32         18.68 (   0.00%)       18.32 (   1.93%)
> BAmean-99 latency-64         24.69 (   0.00%)       23.69 (   4.06%)
> BAmean-99 latency-128        44.44 (   0.00%)       42.15 (   5.17%)
> BAmean-99 latency-1024        0.00 (   0.00%)        0.00 (   0.00%)
> 
> dbench4 Throughput (misleading but traditional)
>                                dbench                 dbench
>                               default                    pan
> Hmean     1         505.12 (   0.00%)      492.96 *  -2.41%*
> Hmean     2         824.14 (   0.00%)      824.06 *  -0.01%*
> Hmean     4        1174.61 (   0.00%)     1207.86 *   2.83%*
> Hmean     8        1665.10 (   0.00%)     1667.27 *   0.13%*
> Hmean     16       2215.59 (   0.00%)     2160.93 *  -2.47%*
> Hmean     32       2727.05 (   0.00%)     2633.26 *  -3.44%*
> Hmean     64       3128.64 (   0.00%)     3098.73 *  -0.96%*
> Hmean     128      3282.89 (   0.00%)     3340.26 *   1.75%*
> Hmean     1024     2551.02 (   0.00%)     2559.41 *   0.33%*
> BHmean-50 1         509.87 (   0.00%)      495.10 (  -2.90%)
> BHmean-50 2         829.35 (   0.00%)      828.14 (  -0.15%)
> BHmean-50 4        1182.38 (   0.00%)     1219.30 (   3.12%)
> BHmean-50 8        1678.49 (   0.00%)     1678.83 (   0.02%)
> BHmean-50 16       2251.01 (   0.00%)     2194.52 (  -2.51%)
> BHmean-50 32       2751.39 (   0.00%)     2678.45 (  -2.65%)
> BHmean-50 64       3189.69 (   0.00%)     3154.45 (  -1.10%)
> BHmean-50 128      3396.18 (   0.00%)     3451.59 (   1.63%)
> BHmean-50 1024     2836.80 (   0.00%)     2836.84 (   0.00%)
> BHmean-95 1         506.13 (   0.00%)      493.24 (  -2.55%)
> BHmean-95 2         824.84 (   0.00%)      824.30 (  -0.06%)
> BHmean-95 4        1175.91 (   0.00%)     1208.57 (   2.78%)
> BHmean-95 8        1666.46 (   0.00%)     1668.22 (   0.11%)
> BHmean-95 16       2219.59 (   0.00%)     2163.86 (  -2.51%)
> BHmean-95 32       2731.26 (   0.00%)     2640.34 (  -3.33%)
> BHmean-95 64       3144.73 (   0.00%)     3108.59 (  -1.15%)
> BHmean-95 128      3306.51 (   0.00%)     3363.33 (   1.72%)
> BHmean-95 1024     2658.37 (   0.00%)     2668.88 (   0.40%)
> BHmean-99 1         505.37 (   0.00%)      493.08 (  -2.43%)
> BHmean-99 2         824.31 (   0.00%)      824.12 (  -0.02%)
> BHmean-99 4        1174.94 (   0.00%)     1208.02 (   2.81%)
> BHmean-99 8        1665.40 (   0.00%)     1667.48 (   0.12%)
> BHmean-99 16       2216.51 (   0.00%)     2161.60 (  -2.48%)
> BHmean-99 32       2728.09 (   0.00%)     2635.09 (  -3.41%)
> BHmean-99 64       3135.81 (   0.00%)     3102.12 (  -1.07%)
> BHmean-99 128      3291.11 (   0.00%)     3349.16 (   1.76%)
> BHmean-99 1024     2645.54 (   0.00%)     2655.67 (   0.38%)
> 
> 
>                       dbench      dbench
>                      default         pan
> Duration User         822.55      827.85
> Duration System      8384.99     8164.83
> Duration Elapsed     1671.36     1670.74
> 
>                                          dbench         dbench
>                                         default            pan
> Ops NUMA alloc hit                 183324626.00   182350114.00
> Ops NUMA alloc miss                        0.00           0.00
> Ops NUMA interleave hit                    0.00           0.00
> Ops NUMA alloc local               183324508.00   182350004.00
> Ops NUMA base-page range updates      181531.00      515929.00
> Ops NUMA PTE updates                  181531.00      515929.00
> Ops NUMA PMD updates                       0.00           0.00
> Ops NUMA hint faults                  162742.00      510979.00
> Ops NUMA hint local faults %          120309.00      426848.00
> Ops NUMA hint local percent               73.93          83.54
> Ops NUMA pages migrated                37605.00       59519.00
> Ops AutoNUMA cost                        815.70        2559.64
> 

More hinting faults and migrations

> Netperf-RR
> ----------
> netperf-udp-rr
>                            netperf                netperf
>                         rr-default                 rr-pan
> Min       1   104915.69 (   0.00%)   104505.71 (  -0.39%)
> Hmean     1   105865.46 (   0.00%)   105899.22 *   0.03%*
> Stddev    1      528.45 (   0.00%)      881.92 ( -66.89%)
> CoeffVar  1        0.50 (   0.00%)        0.83 ( -66.83%)
> Max       1   106410.28 (   0.00%)   107196.52 (   0.74%)
> BHmean-50 1   106232.53 (   0.00%)   106568.26 (   0.32%)
> BHmean-95 1   105972.05 (   0.00%)   106056.35 (   0.08%)
> BHmean-99 1   105972.05 (   0.00%)   106056.35 (   0.08%)
> 
>                      netperf     netperf
>                   rr-default      rr-pan
> Duration User          11.20       10.74
> Duration System       202.40      201.32
> Duration Elapsed      303.09      303.08
> 
>                                         netperf        netperf
>                                      rr-default         rr-pan
> Ops NUMA alloc hit                    183999.00      183853.00
> Ops NUMA alloc miss                        0.00           0.00
> Ops NUMA interleave hit                    0.00           0.00
> Ops NUMA alloc local                  183999.00      183853.00
> Ops NUMA base-page range updates           0.00       24370.00
> Ops NUMA PTE updates                       0.00       24370.00
> Ops NUMA PMD updates                       0.00           0.00
> Ops NUMA hint faults                     539.00       24470.00
> Ops NUMA hint local faults %             539.00       24447.00
> Ops NUMA hint local percent              100.00          99.91
> Ops NUMA pages migrated                    0.00          23.00
> Ops AutoNUMA cost                          2.69         122.52
> 

Netperf these days usually runs on the same node so NUMA balancing
triggers very rarely.

> netperf-tcp-rr
>                            netperf                netperf
>                         rr-default                 rr-pan
> Min       1    96156.03 (   0.00%)    96556.87 (   0.42%)
> Hmean     1    96627.24 (   0.00%)    97551.38 *   0.96%*
> Stddev    1      284.71 (   0.00%)      637.74 (-123.99%)
> CoeffVar  1        0.29 (   0.00%)        0.65 (-121.87%)
> Max       1    96974.45 (   0.00%)    98554.94 (   1.63%)
> BHmean-50 1    96840.81 (   0.00%)    98067.19 (   1.27%)
> BHmean-95 1    96679.89 (   0.00%)    97663.14 (   1.02%)
> BHmean-99 1    96679.89 (   0.00%)    97663.14 (   1.02%)
> 
>                      netperf     netperf
>                   rr-default      rr-pan
> Duration User          10.21       10.26
> Duration System       207.90      208.28
> Duration Elapsed      302.99      303.02
> 
>                                         netperf        netperf
>                                      rr-default         rr-pan
> Ops NUMA alloc hit                    183669.00      183695.00
> Ops NUMA alloc miss                        0.00           0.00
> Ops NUMA interleave hit                    0.00           0.00
> Ops NUMA alloc local                  183657.00      183695.00
> Ops NUMA base-page range updates        3949.00       38561.00
> Ops NUMA PTE updates                    3949.00       38561.00
> Ops NUMA PMD updates                       0.00           0.00
> Ops NUMA hint faults                    4186.00       43328.00
> Ops NUMA hint local faults %            4100.00       43195.00
> Ops NUMA hint local percent               97.95          99.69
> Ops NUMA pages migrated                    9.00          73.00
> Ops AutoNUMA cost                         20.96         216.91
> 

Same.

> Autonumabench
> -------------
> autonumabench
>                                            autonumabench          autonumabench
>                                                  default                    pan
> Amean     syst-NUMA01                11664.40 (   0.00%)    11616.17 *   0.41%*
> Amean     syst-NUMA01_THREADLOCAL        0.24 (   0.00%)        0.22 *   7.78%*
> Amean     syst-NUMA02                    1.55 (   0.00%)        9.31 *-499.26%*
> Amean     syst-NUMA02_SMT                1.14 (   0.00%)        4.04 *-254.39%*
> Amean     elsp-NUMA01                  223.52 (   0.00%)      221.43 *   0.93%*
> Amean     elsp-NUMA01_THREADLOCAL        0.95 (   0.00%)        0.94 *   0.76%*
> Amean     elsp-NUMA02                    6.83 (   0.00%)        5.74 *  15.90%*
> Amean     elsp-NUMA02_SMT                6.65 (   0.00%)        6.25 *   5.97%*
> BAmean-50 syst-NUMA01                11455.44 (   0.00%)    10985.76 (   4.10%)
> BAmean-50 syst-NUMA01_THREADLOCAL        0.22 (   0.00%)        0.21 (   7.46%)
> BAmean-50 syst-NUMA02                    1.11 (   0.00%)        8.91 (-703.00%)
> BAmean-50 syst-NUMA02_SMT                0.94 (   0.00%)        3.42 (-262.19%)
> BAmean-50 elsp-NUMA01                  217.38 (   0.00%)      214.03 (   1.54%)
> BAmean-50 elsp-NUMA01_THREADLOCAL        0.94 (   0.00%)        0.94 (   0.35%)
> BAmean-50 elsp-NUMA02                    6.66 (   0.00%)        5.45 (  18.08%)
> BAmean-50 elsp-NUMA02_SMT                6.50 (   0.00%)        6.09 (   6.31%)
> BAmean-95 syst-NUMA01                11611.74 (   0.00%)    11448.30 (   1.41%)
> BAmean-95 syst-NUMA01_THREADLOCAL        0.23 (   0.00%)        0.22 (   7.14%)
> BAmean-95 syst-NUMA02                    1.27 (   0.00%)        9.21 (-624.93%)
> BAmean-95 syst-NUMA02_SMT                0.97 (   0.00%)        3.90 (-300.34%)
> BAmean-95 elsp-NUMA01                  221.75 (   0.00%)      218.53 (   1.45%)
> BAmean-95 elsp-NUMA01_THREADLOCAL        0.94 (   0.00%)        0.94 (   0.53%)
> BAmean-95 elsp-NUMA02                    6.75 (   0.00%)        5.68 (  15.81%)
> BAmean-95 elsp-NUMA02_SMT                6.61 (   0.00%)        6.23 (   5.82%)
> BAmean-99 syst-NUMA01                11611.74 (   0.00%)    11448.30 (   1.41%)
> BAmean-99 syst-NUMA01_THREADLOCAL        0.23 (   0.00%)        0.22 (   7.14%)
> BAmean-99 syst-NUMA02                    1.27 (   0.00%)        9.21 (-624.93%)
> BAmean-99 syst-NUMA02_SMT                0.97 (   0.00%)        3.90 (-300.34%)
> BAmean-99 elsp-NUMA01                  221.75 (   0.00%)      218.53 (   1.45%)
> BAmean-99 elsp-NUMA01_THREADLOCAL        0.94 (   0.00%)        0.94 (   0.53%)
> BAmean-99 elsp-NUMA02                    6.75 (   0.00%)        5.68 (  15.81%)
> BAmean-99 elsp-NUMA02_SMT                6.61 (   0.00%)        6.23 (   5.82%)
> 
>                 autonumabenchautonumabench
>                      default         pan
> Duration User       94363.43    94436.71
> Duration System     81671.72    81408.53
> Duration Elapsed     1676.81     1647.99
> 
>                                   autonumabench  autonumabench
>                                         default            pan
> Ops NUMA alloc hit                 539544115.00   539522029.00
> Ops NUMA alloc miss                        0.00           0.00
> Ops NUMA interleave hit                    0.00           0.00
> Ops NUMA alloc local               279025768.00   281735736.00
> Ops NUMA base-page range updates    69695169.00    84767502.00
> Ops NUMA PTE updates                69695169.00    84767502.00
> Ops NUMA PMD updates                       0.00           0.00
> Ops NUMA hint faults                69691818.00    87895044.00
> Ops NUMA hint local faults %        56565519.00    65819747.00
> Ops NUMA hint local percent               81.17          74.88
> Ops NUMA pages migrated              5950362.00     8310169.00
> Ops AutoNUMA cost                     349060.01      440226.49
> 

More hinting faults and migrations. Not clear which sub-test exactly but
most likely NUMA02.

-- 
Mel Gorman
SUSE Labs
Re: [RFC PATCH v0 0/3] sched/numa: Process Adaptive autoNUMA
Posted by Bharata B Rao 4 years, 4 months ago
On 1/31/2022 5:47 PM, Mel Gorman wrote:
> On Fri, Jan 28, 2022 at 10:58:48AM +0530, Bharata B Rao wrote:
>> Hi,
>>
>> This patchset implements an adaptive algorithm for calculating the autonuma
>> scan period.
> 
> autonuma refers to the khugepaged-like approach to NUMA balancing that
> was later superceded by NUMA Balancing (NUMAB) and is generally reflected
> by the naming e.g. git grep -i autonuma and note how few references there
> are to autonuma versus numab or "NUMA balancing". I know MMTests still
> refers to AutoNUMA but mostly because at the time it was written,
> autoNUMA was what was being evaluated and I never updated the naming.

Thanks. Noted and will use appropriate terminologies next time onward.

> 
>> In the existing mechanism of scan period calculation,
>>
>> - scan period is derived from the per-thread stats.
>> - static threshold (NUMA_PERIOD_THRESHOLD) is used for changing the
>>   scan rate.
>>
>> In this new approach (Process Adaptive autoNUMA or PAN), we gather NUMA
>> fault stats at per-process level which allows for capturing the application
>> behaviour better. In addition, the algorithm learns and adjusts the scan
>> rate based on remote fault rate. By not sticking to a static threshold, the
>> algorithm can respond better to different workload behaviours.
>>
> 
> NUMA Balancing is concerned with threads (task) and an address space (mm)
> so basing the naming on Address Space rather than process may be more
> appropriate although I admit the acronym is not as snappy.

Sure, will think about more appropriate naming.

> 
>> Since the threads of a processes are already considered as a group,
>> we add a bunch of metrics to the task's mm to track the various
>> types of faults and derive the scan rate from them.
>>
> 
> Enumerate the types of faults and note how the per-thread and
> per-address-space metrics are related.

Sure will list the type of faults and describe the.

Per-address-space metrics are essentially aggregate of the existing per-thread
metrics. Unlike the existing task_numa_group mechanism, the threads are
implicitly/already considered part of the address space group (p->mm).

> 
>> The new per-process fault stats contribute only to the per-process
>> scan period calculation, while the existing per-thread stats continue
>> to contribute towards the numa_group stats which eventually
>> determine the thresholds for migrating memory and threads
>> across nodes.
>>
>> This patchset has been tested with a bunch of benchmarks on the
>> following system:
>>
> 
> Please include the comparisons of both the headline metrics and notes on
> the change in scan rates in the changelog of the patch. Not all people
> are access to Google drive and it is not guaranteed to remain forever.
> Similarly, the leader is not guaranteed to appear in the git history

Sure, noted.

> 
>> ------------------------------------------------------
>> % gain of PAN vs default (Avg of 3 runs)
>> ------------------------------------------------------
>> NAS-BT		-0.17
>> NAS-CG		+9.39
>> NAS-MG		+8.19
>> NAS-FT		+2.23
>> Hashjoin	+0.58
>> Graph500	+14.93
>> Pagerank	+0.37
> 
> 
> 
>> ------------------------------------------------------
>> 		Default		PAN		%diff
>> ------------------------------------------------------
>> 		NUMA hint faults(Total of 3 runs)
>> ------------------------------------------------------
>> NAS-BT		758282358	539850429	+29
>> NAS-CG		2179458823	1180301361	+46
>> NAS-MG		517641172	346066391	+33
>> NAS-FT		297044964	230033861	+23
>> Hashjoin	201684863	268436275	-33
>> Graph500	261808733	154338827	+41
>> Pagerank	217917818	211260310	+03
>> ------------------------------------------------------
>> 		Migrations(Total of 3 runs)
>> ------------------------------------------------------
>> NAS-BT		106888517	86482076	+19
>> NAS-CG		81191368	12859924	+84
>> NAS-MG		83927451	39651254	+53
>> NAS-FT		61807715	38934618	+37
>> Hashjoin	45406983	59828843	-32
>> Graph500	22798837	21560714	+05
>> Pagerank	59072135	44968673	+24
>> ------------------------------------------------------
>>
>> And here are some tests from a few microbenchmarks of mmtests suite.
>> (The results are trimmed a bit here, the complete results can
>> be viewed in the above mentioned link)
>>
>> Hackbench
>> ---------
>> hackbench-process-pipes
>>                            hackbench              hackbench
>>                              default                    pan
>> Min       256     23.5510 (   0.00%)     23.1900 (   1.53%)
>> Amean     256     24.4604 (   0.00%)     24.0353 *   1.74%*
>> Stddev    256      0.4420 (   0.00%)      0.7611 ( -72.18%)
>> CoeffVar  256      1.8072 (   0.00%)      3.1666 ( -75.22%)
>> Max       256     25.4930 (   0.00%)     30.5450 ( -19.82%)
>> BAmean-50 256     24.1074 (   0.00%)     23.6616 (   1.85%)
>> BAmean-95 256     24.4111 (   0.00%)     23.9308 (   1.97%)
>> BAmean-99 256     24.4499 (   0.00%)     23.9696 (   1.96%)
>>
>>                    hackbench   hackbench
>>                      default         pan
>> Duration User       25810.02    25158.93
>> Duration System    276322.70   271729.32
>> Duration Elapsed     2707.75     2671.33
>>
> 
>>                                       hackbench      hackbench
>>                                         default            pan
>> Ops NUMA alloc hit                1082415453.00  1088025994.00
>> Ops NUMA alloc miss                        0.00           0.00
>> Ops NUMA interleave hit                    0.00           0.00
>> Ops NUMA alloc local              1082415441.00  1088025974.00
>> Ops NUMA base-page range updates       33475.00      228900.00
>> Ops NUMA PTE updates                   33475.00      228900.00
>> Ops NUMA PMD updates                       0.00           0.00
>> Ops NUMA hint faults                   15758.00      222100.00
>> Ops NUMA hint local faults %           15371.00      214570.00
>> Ops NUMA hint local percent               97.54          96.61
>> Ops NUMA pages migrated                  235.00        4029.00
>> Ops AutoNUMA cost                         79.03        1112.18
>>
> 
> Hackbench processes are generally short-lived enough that NUMA balancing
> has a marginal impact. Interesting though that updates and hints were
> increased by a lot relatively speaking.

Yes, this increased AutoNUMA cost seen mostly with these micro benchmarks
are not seen typically with the other benchmarks that we have listed at
the beginning which we believe contributes to the gain that those
benchmarks see.

The algorithm tries aggressively to learn the application behaviour
at the beginning and short-lived tasks will see more scanning than
default.

Having said that, we need to investigate and check why some of these
micro benchmarks incur higher autonuma cost.

> 
>> Netperf-RR
>> ----------
>> netperf-udp-rr
>>                            netperf                netperf
>>                         rr-default                 rr-pan
>> Min       1   104915.69 (   0.00%)   104505.71 (  -0.39%)
>> Hmean     1   105865.46 (   0.00%)   105899.22 *   0.03%*
>> Stddev    1      528.45 (   0.00%)      881.92 ( -66.89%)
>> CoeffVar  1        0.50 (   0.00%)        0.83 ( -66.83%)
>> Max       1   106410.28 (   0.00%)   107196.52 (   0.74%)
>> BHmean-50 1   106232.53 (   0.00%)   106568.26 (   0.32%)
>> BHmean-95 1   105972.05 (   0.00%)   106056.35 (   0.08%)
>> BHmean-99 1   105972.05 (   0.00%)   106056.35 (   0.08%)
>>
>>                      netperf     netperf
>>                   rr-default      rr-pan
>> Duration User          11.20       10.74
>> Duration System       202.40      201.32
>> Duration Elapsed      303.09      303.08
>>
>>                                         netperf        netperf
>>                                      rr-default         rr-pan
>> Ops NUMA alloc hit                    183999.00      183853.00
>> Ops NUMA alloc miss                        0.00           0.00
>> Ops NUMA interleave hit                    0.00           0.00
>> Ops NUMA alloc local                  183999.00      183853.00
>> Ops NUMA base-page range updates           0.00       24370.00
>> Ops NUMA PTE updates                       0.00       24370.00
>> Ops NUMA PMD updates                       0.00           0.00
>> Ops NUMA hint faults                     539.00       24470.00
>> Ops NUMA hint local faults %             539.00       24447.00
>> Ops NUMA hint local percent              100.00          99.91
>> Ops NUMA pages migrated                    0.00          23.00
>> Ops AutoNUMA cost                          2.69         122.52
>>
> 
> Netperf these days usually runs on the same node so NUMA balancing
> triggers very rarely.

But we still see increase in the hint faults, need to investigate this.


>>                 autonumabenchautonumabench
>>                      default         pan
>> Duration User       94363.43    94436.71
>> Duration System     81671.72    81408.53
>> Duration Elapsed     1676.81     1647.99
>>
>>                                   autonumabench  autonumabench
>>                                         default            pan
>> Ops NUMA alloc hit                 539544115.00   539522029.00
>> Ops NUMA alloc miss                        0.00           0.00
>> Ops NUMA interleave hit                    0.00           0.00
>> Ops NUMA alloc local               279025768.00   281735736.00
>> Ops NUMA base-page range updates    69695169.00    84767502.00
>> Ops NUMA PTE updates                69695169.00    84767502.00
>> Ops NUMA PMD updates                       0.00           0.00
>> Ops NUMA hint faults                69691818.00    87895044.00
>> Ops NUMA hint local faults %        56565519.00    65819747.00
>> Ops NUMA hint local percent               81.17          74.88
>> Ops NUMA pages migrated              5950362.00     8310169.00
>> Ops AutoNUMA cost                     349060.01      440226.49
>>
> 
> More hinting faults and migrations. Not clear which sub-test exactly but
> most likely NUMA02.

I will have to run them separately and check.

Regards,
Bharata.