[v2] selftests/resctrl: Support diverse platforms with MBM and MBA tests

[PATCH V2 13/13] selftests/resctrl: Keep results from first test run

Posted by Reinette Chatre 1 year, 4 months ago

The resctrl selftests drop the results from every first test run
to avoid (per comment) "inaccurate due to monitoring setup transition
phase" data. Previously inaccurate data resulted from workloads needing
some time to "settle" and also the measurements themselves to
account for earlier measurements to measure across needed timeframe.

commit da50de0a92f3 ("selftests/resctrl: Calculate resctrl FS derived mem
bw over sleep(1) only")

ensured that measurements accurately measure just the time frame of
interest. The default "fill_buf" benchmark since separated the buffer
prepare phase from the benchmark run phase reducing the need for the
tests themselves to accommodate the benchmark's "settle" time.

With these enhancements there are no remaining portions needing
to "settle" and the first test run can contribute to measurements.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Remove comment about needing results from first run removed.
- Fix existing incorrect spacing while changing line.
---
 tools/testing/selftests/resctrl/cmt_test.c |  5 ++---
 tools/testing/selftests/resctrl/mba_test.c | 10 +++-------
 tools/testing/selftests/resctrl/mbm_test.c | 10 +++-------
 3 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cmt_test.c b/tools/testing/selftests/resctrl/cmt_test.c
index a7effe76b419..d4b85d144985 100644
--- a/tools/testing/selftests/resctrl/cmt_test.c
+++ b/tools/testing/selftests/resctrl/cmt_test.c
@@ -99,14 +99,13 @@ static int check_results(struct resctrl_val_param *param, size_t span, int no_of
 		}
 
 		/* Field 3 is llc occ resc value */
-		if (runs > 0)
-			sum_llc_occu_resc += strtoul(token_array[3], NULL, 0);
+		sum_llc_occu_resc += strtoul(token_array[3], NULL, 0);
 		runs++;
 	}
 	fclose(fp);
 
 	return show_results_info(sum_llc_occu_resc, no_of_bits, span,
-				 MAX_DIFF, MAX_DIFF_PERCENT, runs - 1, true);
+				 MAX_DIFF, MAX_DIFF_PERCENT, runs, true);
 }
 
 static void cmt_test_cleanup(void)
diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
index 5c6063d0a77c..89c2446b9f80 100644
--- a/tools/testing/selftests/resctrl/mba_test.c
+++ b/tools/testing/selftests/resctrl/mba_test.c
@@ -86,18 +86,14 @@ static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
 		int avg_diff_per;
 		float avg_diff;
 
-		/*
-		 * The first run is discarded due to inaccurate value from
-		 * phase transition.
-		 */
-		for (runs = NUM_OF_RUNS * allocation + 1;
+		for (runs = NUM_OF_RUNS * allocation;
 		     runs < NUM_OF_RUNS * allocation + NUM_OF_RUNS ; runs++) {
 			sum_bw_imc += bw_imc[runs];
 			sum_bw_resc += bw_resc[runs];
 		}
 
-		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
-		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
+		avg_bw_imc = sum_bw_imc / NUM_OF_RUNS;
+		avg_bw_resc = sum_bw_resc / NUM_OF_RUNS;
 		if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc < THROTTLE_THRESHOLD) {
 			ksft_print_msg("Bandwidth below threshold (%d MiB). Dropping results from MBA schemata %u.\n",
 				       THROTTLE_THRESHOLD,
diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c
index 7635ee6b9339..8c818e292dce 100644
--- a/tools/testing/selftests/resctrl/mbm_test.c
+++ b/tools/testing/selftests/resctrl/mbm_test.c
@@ -22,17 +22,13 @@ show_bw_info(unsigned long *bw_imc, unsigned long *bw_resc, size_t span)
 	int runs, ret, avg_diff_per;
 	float avg_diff = 0;
 
-	/*
-	 * Discard the first value which is inaccurate due to monitoring setup
-	 * transition phase.
-	 */
-	for (runs = 1; runs < NUM_OF_RUNS ; runs++) {
+	for (runs = 0; runs < NUM_OF_RUNS; runs++) {
 		sum_bw_imc += bw_imc[runs];
 		sum_bw_resc += bw_resc[runs];
 	}
 
-	avg_bw_imc = sum_bw_imc / 4;
-	avg_bw_resc = sum_bw_resc / 4;
+	avg_bw_imc = sum_bw_imc / NUM_OF_RUNS;
+	avg_bw_resc = sum_bw_resc / NUM_OF_RUNS;
 	avg_diff = (float)labs(avg_bw_resc - avg_bw_imc) / avg_bw_imc;
 	avg_diff_per = (int)(avg_diff * 100);
 
-- 
2.46.0

Re: [PATCH V2 13/13] selftests/resctrl: Keep results from first test run

Posted by Ilpo Järvinen 1 year, 4 months ago

On Thu, 12 Sep 2024, Reinette Chatre wrote:

> The resctrl selftests drop the results from every first test run
> to avoid (per comment) "inaccurate due to monitoring setup transition
> phase" data. Previously inaccurate data resulted from workloads needing
> some time to "settle" and also the measurements themselves to
> account for earlier measurements to measure across needed timeframe.
> 
> commit da50de0a92f3 ("selftests/resctrl: Calculate resctrl FS derived mem
> bw over sleep(1) only")
> 
> ensured that measurements accurately measure just the time frame of
> interest. The default "fill_buf" benchmark since separated the buffer
> prepare phase from the benchmark run phase reducing the need for the
> tests themselves to accommodate the benchmark's "settle" time.
> 
> With these enhancements there are no remaining portions needing
> to "settle" and the first test run can contribute to measurements.
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> Changes since V1:
> - Remove comment about needing results from first run removed.
> - Fix existing incorrect spacing while changing line.
> ---
>  tools/testing/selftests/resctrl/cmt_test.c |  5 ++---
>  tools/testing/selftests/resctrl/mba_test.c | 10 +++-------
>  tools/testing/selftests/resctrl/mbm_test.c | 10 +++-------
>  3 files changed, 8 insertions(+), 17 deletions(-)
> 
> diff --git a/tools/testing/selftests/resctrl/cmt_test.c b/tools/testing/selftests/resctrl/cmt_test.c
> index a7effe76b419..d4b85d144985 100644
> --- a/tools/testing/selftests/resctrl/cmt_test.c
> +++ b/tools/testing/selftests/resctrl/cmt_test.c
> @@ -99,14 +99,13 @@ static int check_results(struct resctrl_val_param *param, size_t span, int no_of
>  		}
>  
>  		/* Field 3 is llc occ resc value */
> -		if (runs > 0)
> -			sum_llc_occu_resc += strtoul(token_array[3], NULL, 0);
> +		sum_llc_occu_resc += strtoul(token_array[3], NULL, 0);
>  		runs++;
>  	}
>  	fclose(fp);
>  
>  	return show_results_info(sum_llc_occu_resc, no_of_bits, span,
> -				 MAX_DIFF, MAX_DIFF_PERCENT, runs - 1, true);
> +				 MAX_DIFF, MAX_DIFF_PERCENT, runs, true);
>  }
>  
>  static void cmt_test_cleanup(void)
> diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
> index 5c6063d0a77c..89c2446b9f80 100644
> --- a/tools/testing/selftests/resctrl/mba_test.c
> +++ b/tools/testing/selftests/resctrl/mba_test.c
> @@ -86,18 +86,14 @@ static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
>  		int avg_diff_per;
>  		float avg_diff;
>  
> -		/*
> -		 * The first run is discarded due to inaccurate value from
> -		 * phase transition.
> -		 */
> -		for (runs = NUM_OF_RUNS * allocation + 1;
> +		for (runs = NUM_OF_RUNS * allocation;
>  		     runs < NUM_OF_RUNS * allocation + NUM_OF_RUNS ; runs++) {
>  			sum_bw_imc += bw_imc[runs];
>  			sum_bw_resc += bw_resc[runs];
>  		}
>  
> -		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
> -		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
> +		avg_bw_imc = sum_bw_imc / NUM_OF_RUNS;
> +		avg_bw_resc = sum_bw_resc / NUM_OF_RUNS;
>  		if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc < THROTTLE_THRESHOLD) {
>  			ksft_print_msg("Bandwidth below threshold (%d MiB). Dropping results from MBA schemata %u.\n",
>  				       THROTTLE_THRESHOLD,
> diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c
> index 7635ee6b9339..8c818e292dce 100644
> --- a/tools/testing/selftests/resctrl/mbm_test.c
> +++ b/tools/testing/selftests/resctrl/mbm_test.c
> @@ -22,17 +22,13 @@ show_bw_info(unsigned long *bw_imc, unsigned long *bw_resc, size_t span)
>  	int runs, ret, avg_diff_per;
>  	float avg_diff = 0;
>  
> -	/*
> -	 * Discard the first value which is inaccurate due to monitoring setup
> -	 * transition phase.
> -	 */
> -	for (runs = 1; runs < NUM_OF_RUNS ; runs++) {
> +	for (runs = 0; runs < NUM_OF_RUNS; runs++) {
>  		sum_bw_imc += bw_imc[runs];
>  		sum_bw_resc += bw_resc[runs];
>  	}
>  
> -	avg_bw_imc = sum_bw_imc / 4;
> -	avg_bw_resc = sum_bw_resc / 4;
> +	avg_bw_imc = sum_bw_imc / NUM_OF_RUNS;
> +	avg_bw_resc = sum_bw_resc / NUM_OF_RUNS;
>  	avg_diff = (float)labs(avg_bw_resc - avg_bw_imc) / avg_bw_imc;
>  	avg_diff_per = (int)(avg_diff * 100);

While the patch itself is fine, I notice the code has this magic number 
gem too:

        unsigned long bw_imc[1024], bw_resc[1024];

Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>


-- 
 i.

Re: [PATCH V2 13/13] selftests/resctrl: Keep results from first test run

Posted by Reinette Chatre 1 year, 4 months ago

Hi Ilpo,

On 10/4/24 7:29 AM, Ilpo Järvinen wrote:
> On Thu, 12 Sep 2024, Reinette Chatre wrote:

...

>> diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c
>> index 7635ee6b9339..8c818e292dce 100644
>> --- a/tools/testing/selftests/resctrl/mbm_test.c
>> +++ b/tools/testing/selftests/resctrl/mbm_test.c
>> @@ -22,17 +22,13 @@ show_bw_info(unsigned long *bw_imc, unsigned long *bw_resc, size_t span)
>>  	int runs, ret, avg_diff_per;
>>  	float avg_diff = 0;
>>  
>> -	/*
>> -	 * Discard the first value which is inaccurate due to monitoring setup
>> -	 * transition phase.
>> -	 */
>> -	for (runs = 1; runs < NUM_OF_RUNS ; runs++) {
>> +	for (runs = 0; runs < NUM_OF_RUNS; runs++) {
>>  		sum_bw_imc += bw_imc[runs];
>>  		sum_bw_resc += bw_resc[runs];
>>  	}
>>  
>> -	avg_bw_imc = sum_bw_imc / 4;
>> -	avg_bw_resc = sum_bw_resc / 4;
>> +	avg_bw_imc = sum_bw_imc / NUM_OF_RUNS;
>> +	avg_bw_resc = sum_bw_resc / NUM_OF_RUNS;
>>  	avg_diff = (float)labs(avg_bw_resc - avg_bw_imc) / avg_bw_imc;
>>  	avg_diff_per = (int)(avg_diff * 100);
> 
> While the patch itself is fine, I notice the code has this magic number 
> gem too:
> 
>         unsigned long bw_imc[1024], bw_resc[1024];

That could be related to NUM_OF_RUNS ... I'll take a look. While this is safe
since both array size and number of runs are hardcoded in test, of course
you are right that this can improved.

I'm also concerned about something like below where there are some
assumptions of external data ... not that we expect the kernel
interface to change, but something like below should be more robust:

static int read_from_imc_dir(char *imc_dir, int count)
{
	char cas_count_cfg[1024],...
	...
	if (fscanf(fp, "%s", cas_count_cfg) <= 0) { /* May read more than 1024 */
		...
	}
}

> 
> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

Thank you very much for your review. Much appreciated.

Reinette