[v3] selftests/resctrl: Fixes and improvements focused on Intel platforms

[PATCH v3 08/10] selftests/resctrl: Remove requirement on cache miss rate

Posted by Reinette Chatre 3 weeks, 3 days ago

As the CAT test reads the same buffer into different sized cache portions
it compares the number of cache misses against an expected percentage
based on the size of the cache portion.

Systems and test conditions vary. The CAT test is a test of resctrl
subsystem health and not a test of the hardware architecture so it is not
required to place requirements on the size of the difference in cache
misses, just that the number of cache misses when reading a buffer
increase as the cache portion used for the buffer decreases.

Remove additional constraint on how big the difference between cache
misses should be as the cache portion size changes. Only test that the
cache misses increase as the cache portion size decreases. This remains
a good sanity check of resctrl subsystem health while reducing impact
of hardware architectural differences and the various conditions under
which the test may run.

Increase the size difference between cache portions to additionally avoid
any consequences resulting from smaller increments.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
---
Changes since v2:
- Add Chen Yu's tag.
---
 tools/testing/selftests/resctrl/cat_test.c | 33 ++++------------------
 1 file changed, 5 insertions(+), 28 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c
index f00b622c1460..8bc47f06679a 100644
--- a/tools/testing/selftests/resctrl/cat_test.c
+++ b/tools/testing/selftests/resctrl/cat_test.c
@@ -14,42 +14,20 @@
 #define RESULT_FILE_NAME	"result_cat"
 #define NUM_OF_RUNS		5
 
-/*
- * Minimum difference in LLC misses between a test with n+1 bits CBM to the
- * test with n bits is MIN_DIFF_PERCENT_PER_BIT * (n - 1). With e.g. 5 vs 4
- * bits in the CBM mask, the minimum difference must be at least
- * MIN_DIFF_PERCENT_PER_BIT * (4 - 1) = 3 percent.
- *
- * The relationship between number of used CBM bits and difference in LLC
- * misses is not expected to be linear. With a small number of bits, the
- * margin is smaller than with larger number of bits. For selftest purposes,
- * however, linear approach is enough because ultimately only pass/fail
- * decision has to be made and distinction between strong and stronger
- * signal is irrelevant.
- */
-#define MIN_DIFF_PERCENT_PER_BIT	1UL
-
 static int show_results_info(__u64 sum_llc_val, int no_of_bits,
 			     unsigned long cache_span,
-			     unsigned long min_diff_percent,
 			     unsigned long num_of_runs, bool platform,
 			     __s64 *prev_avg_llc_val)
 {
 	__u64 avg_llc_val = 0;
-	float avg_diff;
 	int ret = 0;
 
 	avg_llc_val = sum_llc_val / num_of_runs;
 	if (*prev_avg_llc_val) {
-		float delta = (__s64)(avg_llc_val - *prev_avg_llc_val);
-
-		avg_diff = delta / *prev_avg_llc_val;
-		ret = platform && (avg_diff * 100) < (float)min_diff_percent;
-
-		ksft_print_msg("%s Check cache miss rate changed more than %.1f%%\n",
-			       ret ? "Fail:" : "Pass:", (float)min_diff_percent);
+		ret = platform && (avg_llc_val < *prev_avg_llc_val);
 
-		ksft_print_msg("Percent diff=%.1f\n", avg_diff * 100);
+		ksft_print_msg("%s Check cache miss rate increased\n",
+			       ret ? "Fail:" : "Pass:");
 	}
 	*prev_avg_llc_val = avg_llc_val;
 
@@ -58,10 +36,10 @@ static int show_results_info(__u64 sum_llc_val, int no_of_bits,
 	return ret;
 }
 
-/* Remove the highest bit from CBM */
+/* Remove the highest bits from CBM */
 static unsigned long next_mask(unsigned long current_mask)
 {
-	return current_mask & (current_mask >> 1);
+	return current_mask & (current_mask >> 2);
 }
 
 static int check_results(struct resctrl_val_param *param, const char *cache_type,
@@ -112,7 +90,6 @@ static int check_results(struct resctrl_val_param *param, const char *cache_type
 
 		ret = show_results_info(sum_llc_perf_miss, bits,
 					alloc_size / 64,
-					MIN_DIFF_PERCENT_PER_BIT * (bits - 1),
 					runs, get_vendor() == ARCH_INTEL,
 					&prev_avg_llc_val);
 		if (ret)
-- 
2.50.1

Re: [PATCH v3 08/10] selftests/resctrl: Remove requirement on cache miss rate

Posted by Ilpo Järvinen 1 week, 3 days ago

On Fri, 13 Mar 2026, Reinette Chatre wrote:

> As the CAT test reads the same buffer into different sized cache portions
> it compares the number of cache misses against an expected percentage
> based on the size of the cache portion.
> 
> Systems and test conditions vary. The CAT test is a test of resctrl
> subsystem health and not a test of the hardware architecture so it is not
> required to place requirements on the size of the difference in cache
> misses, just that the number of cache misses when reading a buffer
> increase as the cache portion used for the buffer decreases.
> 
> Remove additional constraint on how big the difference between cache
> misses should be as the cache portion size changes. Only test that the
> cache misses increase as the cache portion size decreases. This remains
> a good sanity check of resctrl subsystem health while reducing impact
> of hardware architectural differences and the various conditions under
> which the test may run.
> 
> Increase the size difference between cache portions to additionally avoid
> any consequences resulting from smaller increments.
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> Tested-by: Chen Yu <yu.c.chen@intel.com>
> ---
> Changes since v2:
> - Add Chen Yu's tag.
> ---
>  tools/testing/selftests/resctrl/cat_test.c | 33 ++++------------------
>  1 file changed, 5 insertions(+), 28 deletions(-)
> 
> diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c
> index f00b622c1460..8bc47f06679a 100644
> --- a/tools/testing/selftests/resctrl/cat_test.c
> +++ b/tools/testing/selftests/resctrl/cat_test.c
> @@ -14,42 +14,20 @@
>  #define RESULT_FILE_NAME	"result_cat"
>  #define NUM_OF_RUNS		5
>  
> -/*
> - * Minimum difference in LLC misses between a test with n+1 bits CBM to the
> - * test with n bits is MIN_DIFF_PERCENT_PER_BIT * (n - 1). With e.g. 5 vs 4
> - * bits in the CBM mask, the minimum difference must be at least
> - * MIN_DIFF_PERCENT_PER_BIT * (4 - 1) = 3 percent.
> - *
> - * The relationship between number of used CBM bits and difference in LLC
> - * misses is not expected to be linear. With a small number of bits, the
> - * margin is smaller than with larger number of bits. For selftest purposes,
> - * however, linear approach is enough because ultimately only pass/fail
> - * decision has to be made and distinction between strong and stronger
> - * signal is irrelevant.
> - */
> -#define MIN_DIFF_PERCENT_PER_BIT	1UL
> -
>  static int show_results_info(__u64 sum_llc_val, int no_of_bits,
>  			     unsigned long cache_span,
> -			     unsigned long min_diff_percent,
>  			     unsigned long num_of_runs, bool platform,
>  			     __s64 *prev_avg_llc_val)
>  {
>  	__u64 avg_llc_val = 0;
> -	float avg_diff;
>  	int ret = 0;
>  
>  	avg_llc_val = sum_llc_val / num_of_runs;
>  	if (*prev_avg_llc_val) {
> -		float delta = (__s64)(avg_llc_val - *prev_avg_llc_val);
> -
> -		avg_diff = delta / *prev_avg_llc_val;
> -		ret = platform && (avg_diff * 100) < (float)min_diff_percent;
> -
> -		ksft_print_msg("%s Check cache miss rate changed more than %.1f%%\n",
> -			       ret ? "Fail:" : "Pass:", (float)min_diff_percent);
> +		ret = platform && (avg_llc_val < *prev_avg_llc_val);
>  
> -		ksft_print_msg("Percent diff=%.1f\n", avg_diff * 100);
> +		ksft_print_msg("%s Check cache miss rate increased\n",
> +			       ret ? "Fail:" : "Pass:");

While I'm fine with removing the amount of change check, this no longer 
shows any numbers which would be a bit annoying if/when there's a failure.

-- 
 i.

>  	}
>  	*prev_avg_llc_val = avg_llc_val;
>  
> @@ -58,10 +36,10 @@ static int show_results_info(__u64 sum_llc_val, int no_of_bits,
>  	return ret;
>  }
>  
> -/* Remove the highest bit from CBM */
> +/* Remove the highest bits from CBM */
>  static unsigned long next_mask(unsigned long current_mask)
>  {
> -	return current_mask & (current_mask >> 1);
> +	return current_mask & (current_mask >> 2);
>  }
>  
>  static int check_results(struct resctrl_val_param *param, const char *cache_type,
> @@ -112,7 +90,6 @@ static int check_results(struct resctrl_val_param *param, const char *cache_type
>  
>  		ret = show_results_info(sum_llc_perf_miss, bits,
>  					alloc_size / 64,
> -					MIN_DIFF_PERCENT_PER_BIT * (bits - 1),
>  					runs, get_vendor() == ARCH_INTEL,
>  					&prev_avg_llc_val);
>  		if (ret)
>

Re: [PATCH v3 08/10] selftests/resctrl: Remove requirement on cache miss rate

Posted by Reinette Chatre 1 week, 3 days ago

Hi Ilpo,

On 3/27/26 10:45 AM, Ilpo Järvinen wrote:
> On Fri, 13 Mar 2026, Reinette Chatre wrote:
>> -/*
>> - * Minimum difference in LLC misses between a test with n+1 bits CBM to the
>> - * test with n bits is MIN_DIFF_PERCENT_PER_BIT * (n - 1). With e.g. 5 vs 4
>> - * bits in the CBM mask, the minimum difference must be at least
>> - * MIN_DIFF_PERCENT_PER_BIT * (4 - 1) = 3 percent.
>> - *
>> - * The relationship between number of used CBM bits and difference in LLC
>> - * misses is not expected to be linear. With a small number of bits, the
>> - * margin is smaller than with larger number of bits. For selftest purposes,
>> - * however, linear approach is enough because ultimately only pass/fail
>> - * decision has to be made and distinction between strong and stronger
>> - * signal is irrelevant.
>> - */
>> -#define MIN_DIFF_PERCENT_PER_BIT	1UL
>> -
>>  static int show_results_info(__u64 sum_llc_val, int no_of_bits,
>>  			     unsigned long cache_span,
>> -			     unsigned long min_diff_percent,
>>  			     unsigned long num_of_runs, bool platform,
>>  			     __s64 *prev_avg_llc_val)
>>  {
>>  	__u64 avg_llc_val = 0;
>> -	float avg_diff;
>>  	int ret = 0;
>>  
>>  	avg_llc_val = sum_llc_val / num_of_runs;
>>  	if (*prev_avg_llc_val) {
>> -		float delta = (__s64)(avg_llc_val - *prev_avg_llc_val);
>> -
>> -		avg_diff = delta / *prev_avg_llc_val;
>> -		ret = platform && (avg_diff * 100) < (float)min_diff_percent;
>> -
>> -		ksft_print_msg("%s Check cache miss rate changed more than %.1f%%\n",
>> -			       ret ? "Fail:" : "Pass:", (float)min_diff_percent);
>> +		ret = platform && (avg_llc_val < *prev_avg_llc_val);
>>  
>> -		ksft_print_msg("Percent diff=%.1f\n", avg_diff * 100);
>> +		ksft_print_msg("%s Check cache miss rate increased\n",
>> +			       ret ? "Fail:" : "Pass:");
> 
> While I'm fine with removing the amount of change check, this no longer 
> shows any numbers which would be a bit annoying if/when there's a failure.
> 

This snippet only removes display of the number that is no longer computed ("avg_diff").
The values that are compared now, avg_llc_val and it previous value, are printed
in the call to show_cache_info() that follows this snippet but is not visible in the diff.
 
Below is an example of what a user running the CAT test will see after these changes.
Since show_cache_info() always prints avg_llc_val the user can obtain insight into failure
by considering it and its previous measurement.

# Starting L3_CAT test ...
# Mounting resctrl to "/sys/fs/resctrl"
# Cache size :117964800
# Writing benchmark parameters to resctrl FS
# Write schema "L2:1=0x1" to resctrl FS
# Write schema "L3:0=1fc0" to resctrl FS
# Write schema "L3:0=3f" to resctrl FS
# Write schema "L3:0=1ff0" to resctrl FS
# Write schema "L3:0=f" to resctrl FS
# Write schema "L3:0=1ffc" to resctrl FS
# Write schema "L3:0=3" to resctrl FS
# Checking for pass/fail
# Number of bits: 6
# Average LLC val: 445092
# Cache span (lines): 737280
# Pass: Check cache miss rate increased
# Number of bits: 4
# Average LLC val: 724472
# Cache span (lines): 491520
# Pass: Check cache miss rate increased
# Number of bits: 2
# Average LLC val: 1085470
# Cache span (lines): 245760
ok 4 L3_CAT: test

Reinette

Re: [PATCH v3 08/10] selftests/resctrl: Remove requirement on cache miss rate

Posted by Ilpo Järvinen 1 week ago

On Fri, 27 Mar 2026, Reinette Chatre wrote:

> Hi Ilpo,
> 
> On 3/27/26 10:45 AM, Ilpo Järvinen wrote:
> > On Fri, 13 Mar 2026, Reinette Chatre wrote:
> >> -/*
> >> - * Minimum difference in LLC misses between a test with n+1 bits CBM to the
> >> - * test with n bits is MIN_DIFF_PERCENT_PER_BIT * (n - 1). With e.g. 5 vs 4
> >> - * bits in the CBM mask, the minimum difference must be at least
> >> - * MIN_DIFF_PERCENT_PER_BIT * (4 - 1) = 3 percent.
> >> - *
> >> - * The relationship between number of used CBM bits and difference in LLC
> >> - * misses is not expected to be linear. With a small number of bits, the
> >> - * margin is smaller than with larger number of bits. For selftest purposes,
> >> - * however, linear approach is enough because ultimately only pass/fail
> >> - * decision has to be made and distinction between strong and stronger
> >> - * signal is irrelevant.
> >> - */
> >> -#define MIN_DIFF_PERCENT_PER_BIT	1UL
> >> -
> >>  static int show_results_info(__u64 sum_llc_val, int no_of_bits,
> >>  			     unsigned long cache_span,
> >> -			     unsigned long min_diff_percent,
> >>  			     unsigned long num_of_runs, bool platform,
> >>  			     __s64 *prev_avg_llc_val)
> >>  {
> >>  	__u64 avg_llc_val = 0;
> >> -	float avg_diff;
> >>  	int ret = 0;
> >>  
> >>  	avg_llc_val = sum_llc_val / num_of_runs;
> >>  	if (*prev_avg_llc_val) {
> >> -		float delta = (__s64)(avg_llc_val - *prev_avg_llc_val);
> >> -
> >> -		avg_diff = delta / *prev_avg_llc_val;
> >> -		ret = platform && (avg_diff * 100) < (float)min_diff_percent;
> >> -
> >> -		ksft_print_msg("%s Check cache miss rate changed more than %.1f%%\n",
> >> -			       ret ? "Fail:" : "Pass:", (float)min_diff_percent);
> >> +		ret = platform && (avg_llc_val < *prev_avg_llc_val);
> >>  
> >> -		ksft_print_msg("Percent diff=%.1f\n", avg_diff * 100);
> >> +		ksft_print_msg("%s Check cache miss rate increased\n",
> >> +			       ret ? "Fail:" : "Pass:");
> > 
> > While I'm fine with removing the amount of change check, this no longer 
> > shows any numbers which would be a bit annoying if/when there's a failure.
> > 
> 
> This snippet only removes display of the number that is no longer computed ("avg_diff").
> The values that are compared now, avg_llc_val and it previous value, are printed
> in the call to show_cache_info() that follows this snippet but is not visible in the diff.
>  
> Below is an example of what a user running the CAT test will see after these changes.
> Since show_cache_info() always prints avg_llc_val the user can obtain insight into failure
> by considering it and its previous measurement.
> 
> # Starting L3_CAT test ...
> # Mounting resctrl to "/sys/fs/resctrl"
> # Cache size :117964800
> # Writing benchmark parameters to resctrl FS
> # Write schema "L2:1=0x1" to resctrl FS
> # Write schema "L3:0=1fc0" to resctrl FS
> # Write schema "L3:0=3f" to resctrl FS
> # Write schema "L3:0=1ff0" to resctrl FS
> # Write schema "L3:0=f" to resctrl FS
> # Write schema "L3:0=1ffc" to resctrl FS
> # Write schema "L3:0=3" to resctrl FS
> # Checking for pass/fail
> # Number of bits: 6
> # Average LLC val: 445092
> # Cache span (lines): 737280
> # Pass: Check cache miss rate increased
> # Number of bits: 4
> # Average LLC val: 724472
> # Cache span (lines): 491520
> # Pass: Check cache miss rate increased
> # Number of bits: 2
> # Average LLC val: 1085470
> # Cache span (lines): 245760
> ok 4 L3_CAT: test

Okay, I didn't remember there was another place printing the numbers.
No problem with this then,

Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

-- 
 i.

Re: [PATCH v3 08/10] selftests/resctrl: Remove requirement on cache miss rate

Posted by Reinette Chatre 6 days, 15 hours ago


On 3/31/26 1:07 AM, Ilpo Järvinen wrote:
> On Fri, 27 Mar 2026, Reinette Chatre wrote:

>> Below is an example of what a user running the CAT test will see after these changes.
>> Since show_cache_info() always prints avg_llc_val the user can obtain insight into failure
>> by considering it and its previous measurement.
>>
>> # Starting L3_CAT test ...
>> # Mounting resctrl to "/sys/fs/resctrl"
>> # Cache size :117964800
>> # Writing benchmark parameters to resctrl FS
>> # Write schema "L2:1=0x1" to resctrl FS
>> # Write schema "L3:0=1fc0" to resctrl FS
>> # Write schema "L3:0=3f" to resctrl FS
>> # Write schema "L3:0=1ff0" to resctrl FS
>> # Write schema "L3:0=f" to resctrl FS
>> # Write schema "L3:0=1ffc" to resctrl FS
>> # Write schema "L3:0=3" to resctrl FS
>> # Checking for pass/fail
>> # Number of bits: 6
>> # Average LLC val: 445092
>> # Cache span (lines): 737280
>> # Pass: Check cache miss rate increased
>> # Number of bits: 4
>> # Average LLC val: 724472
>> # Cache span (lines): 491520
>> # Pass: Check cache miss rate increased
>> # Number of bits: 2
>> # Average LLC val: 1085470
>> # Cache span (lines): 245760
>> ok 4 L3_CAT: test
> 
> Okay, I didn't remember there was another place printing the numbers.
> No problem with this then,
> 
> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> 

Thank you very much Ilpo.

Reinette