RE: [PATCH v3 0/8] x86/resctrl: Add support for Sub-NUMA cluster (SNC) systems

Shaopeng Tan (Fujitsu) posted 8 patches 2 years, 6 months ago
Only 0 patches received!
RE: [PATCH v3 0/8] x86/resctrl: Add support for Sub-NUMA cluster (SNC) systems
Posted by Shaopeng Tan (Fujitsu) 2 years, 6 months ago
Hi tony,

I ran selftest/resctrl in my environment,
the test result is "not ok".

Processer in my environment:
Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz

kernel:
$ uname -r
6.5.0-rc1+

Result :
Sub-NUMA enable:
xxx@xxx:~/linux_v6.5_rc1l$ sudo make -C tools/testing/selftests/resctrl run_tests
make: Entering directory '/.../tools/testing/selftests/resctrl'
TAP version 13
1..1
# timeout set to 120
# selftests: resctrl: resctrl_tests
# TAP version 13
# # Pass: Check kernel supports resctrl filesystem
# # Pass: Check resctrl mountpoint "/sys/fs/resctrl" exists
# # resctrl filesystem not mounted
# # dmesg: [    3.060018] resctrl: L3 allocation detected
# # dmesg: [    3.098180] resctrl: MB allocation detected
# # dmesg: [    3.118507] resctrl: L3 monitoring detected
# 1..4
# # Starting MBM BW change ...
# # Mounting resctrl to "/sys/fs/resctrl"
# # Mounting resctrl to "/sys/fs/resctrl"
# # Benchmark PID: 14784
# # Writing benchmark parameters to resctrl FS
# # Write schema "MB:0=100" to resctrl FS
# # Checking for pass/fail
# # Fail: Check MBM diff within 5%
# # avg_diff_per: 100%
# # Span (MB): 250
# # avg_bw_imc: 14185
# # avg_bw_resc: 28389
# not ok 1 MBM: bw change
# # Intel MBM may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.
# # Starting MBA Schemata change ...
# # Mounting resctrl to "/sys/fs/resctrl"
# # Mounting resctrl to "/sys/fs/resctrl"
# # Benchmark PID: 14787
# # Writing benchmark parameters to resctrl FS
# # Write schema "MB:0=100" to resctrl FS
# # Write schema "MB:0=90" to resctrl FS
# # Write schema "MB:0=80" to resctrl FS
# # Write schema "MB:0=70" to resctrl FS
# # Write schema "MB:0=60" to resctrl FS
# # Write schema "MB:0=50" to resctrl FS
# # Write schema "MB:0=40" to resctrl FS
# # Write schema "MB:0=30" to resctrl FS
# # Write schema "MB:0=20" to resctrl FS
# # Write schema "MB:0=10" to resctrl FS
# # Results are displayed in (MB)
# # Fail: Check MBA diff within 5% for schemata 100
# # avg_diff_per: 99%
# # avg_bw_imc: 14179
# # avg_bw_resc: 28340
# # Fail: Check MBA diff within 5% for schemata 90
# # avg_diff_per: 100%
# # avg_bw_imc: 9244
# # avg_bw_resc: 18497
# # Fail: Check MBA diff within 5% for schemata 80
# # avg_diff_per: 100%
# # avg_bw_imc: 9249
# # avg_bw_resc: 18504
# # Fail: Check MBA diff within 5% for schemata 70
# # avg_diff_per: 100%
# # avg_bw_imc: 9250
# # avg_bw_resc: 18506
# # Fail: Check MBA diff within 5% for schemata 60
# # avg_diff_per: 100%
# # avg_bw_imc: 7521
# # avg_bw_resc: 15055
# # Fail: Check MBA diff within 5% for schemata 50
# # avg_diff_per: 100%
# # avg_bw_imc: 7455
# # avg_bw_resc: 14917
# # Fail: Check MBA diff within 5% for schemata 40
# # avg_diff_per: 100%
# # avg_bw_imc: 5962
# # avg_bw_resc: 11934
# # Fail: Check MBA diff within 5% for schemata 30
# # avg_diff_per: 100%
# # avg_bw_imc: 4208
# # avg_bw_resc: 8436
# # Fail: Check MBA diff within 5% for schemata 20
# # avg_diff_per: 98%
# # avg_bw_imc: 2972
# # avg_bw_resc: 5909
# # Fail: Check MBA diff within 5% for schemata 10
# # avg_diff_per: 99%
# # avg_bw_imc: 1715
# # avg_bw_resc: 3426
# # Fail: Check schemata change using MBA
# # At least one test failed
# not ok 2 MBA: schemata change
# # Starting CMT test ...
# # Mounting resctrl to "/sys/fs/resctrl"
# # Mounting resctrl to "/sys/fs/resctrl"
# # Cache size :6488064
# # Benchmark PID: 14793
# # Writing benchmark parameters to resctrl FS
# # Checking for pass/fail
# # Fail: Check cache miss rate within 15%
# # Percent diff=91
# # Number of bits: 5
# # Average LLC val: 5640192
# # Cache span (bytes): 2949120
# not ok 3 CMT: test
# # Intel CMT may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.
# # Starting CAT test ...
# # Mounting resctrl to "/sys/fs/resctrl"
# # Mounting resctrl to "/sys/fs/resctrl"
# # Cache size :6488064
# # Writing benchmark parameters to resctrl FS
# # Write schema "L3:0=3f" to resctrl FS
# # Checking for pass/fail
# # Fail: Check cache miss rate within 4%
# # Percent diff=6
# # Number of bits: 6
# # Average LLC val: 51475
# # Cache span (lines): 55296
# not ok 4 CAT: test
# # Totals: pass:0 fail:4 xfail:0 xpass:0 skip:0 error:0
not ok 1 selftests: resctrl: resctrl_tests # exit=1
make: Leaving directory '/...l/tools/testing/selftests/resctrl'

Sub-NUMA disable:
xxx@xxx:~/linux_v6.5_rc1l$ sudo make -C tools/testing/selftests/resctrl run_tests
...
# # Starting CAT test ...
# # Mounting resctrl to "/sys/fs/resctrl"
# # Mounting resctrl to "/sys/fs/resctrl"
# # Cache size :6488064
# # Writing benchmark parameters to resctrl FS
# # Write schema "L3:0=3f" to resctrl FS
# # Checking for pass/fail
# # Fail: Check cache miss rate within 4%
# # Percent diff=6
# # Number of bits: 6
# # Average LLC val: 51899
# # Cache span (lines): 55296
# not ok 4 CAT: test
# # Totals: pass:3 fail:1 xfail:0 xpass:0 skip:0 error:0
not ok 1 selftests: resctrl: resctrl_tests # exit=1
make: Leaving directory '/.../tools/testing/selftests/resctrl'

Best regards,
Shaopeng TAN
Re: [PATCH v3 0/8] x86/resctrl: Add support for Sub-NUMA cluster (SNC) systems
Posted by Tony Luck 2 years, 6 months ago
On Wed, Jul 19, 2023 at 02:43:20AM +0000, Shaopeng Tan (Fujitsu) wrote:
> Hi tony,
> 
> I ran selftest/resctrl in my environment,
> the test result is "not ok".
> 
> Processer in my environment:
> Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz
> 
> kernel:
> $ uname -r
> 6.5.0-rc1+
> 
> Result :
> Sub-NUMA enable:
> xxx@xxx:~/linux_v6.5_rc1l$ sudo make -C tools/testing/selftests/resctrl run_tests
> make: Entering directory '/.../tools/testing/selftests/resctrl'

I see most tests pass. Just one fail on my most recent run with the
v4 patch series:

# # Fail: Check MBA diff within 5% for schemata 10
# # avg_diff_per: 7%
# # avg_bw_imc: 883
# # avg_bw_resc: 815
# # Fail: Check schemata change using MBA

But just missed the 5% target by a small amount,
not the near total failures that you see.

I wonder if there is a cross-SNC node memory
allocation issue.  Can you try running the test
bound to a CPU in one node:

$ taskset -c 1 sudo make -C tools/testing/selftests/resctrl run_tests

Try with different "-c" arguments to bind to different nodes. Do you
see different results on differnt nodes?

-Tony