[PATCH V2 12/13] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth

Reinette Chatre posted 13 patches 2 months, 2 weeks ago
There is a newer version of this series
[PATCH V2 12/13] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth
Posted by Reinette Chatre 2 months, 2 weeks ago
The MBA test incrementally throttles memory bandwidth, each time
followed by a comparison between the memory bandwidth observed
by the performance counters and resctrl respectively.

While a comparison between performance counters and resctrl is
generally appropriate, they do not have an identical view of
memory bandwidth. For example RAS features or memory performance
features that generate memory traffic may drive accesses that are
counted differently by performance counters and MBM respectively,
for instance generating "overhead" traffic which is not counted
against any specific RMID. As a ratio, this different view of memory
bandwidth becomes more apparent at low memory bandwidths.

It is not practical to enable/disable the various features that
may generate memory bandwidth to give performance counters and
resctrl an identical view. Instead, do not compare performance
counters and resctrl view of memory bandwidth when the memory
bandwidth is low.

Bandwidth throttling behaves differently across platforms
so it is not appropriate to drop measurement data simply based
on the throttling level. Instead, use a threshold of 750MiB
that has been observed to support adequate comparison between
performance counters and resctrl.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Fix code alignment and spacing.
- Modify flow to use "continue" instead of "break" now that
  earlier changes decreases throttling.
- Expand comment of define to elaborate causes of discrepancy
  between performance counters and MBM.
---
 tools/testing/selftests/resctrl/mba_test.c |  7 +++++++
 tools/testing/selftests/resctrl/resctrl.h  | 10 ++++++++++
 2 files changed, 17 insertions(+)

diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
index d8d9637c1951..5c6063d0a77c 100644
--- a/tools/testing/selftests/resctrl/mba_test.c
+++ b/tools/testing/selftests/resctrl/mba_test.c
@@ -98,6 +98,13 @@ static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
 
 		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
 		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
+		if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc < THROTTLE_THRESHOLD) {
+			ksft_print_msg("Bandwidth below threshold (%d MiB). Dropping results from MBA schemata %u.\n",
+				       THROTTLE_THRESHOLD,
+				       ALLOCATION_MIN + ALLOCATION_STEP * allocation);
+			continue;
+		}
+
 		avg_diff = (float)labs(avg_bw_resc - avg_bw_imc) / avg_bw_imc;
 		avg_diff_per = (int)(avg_diff * 100);
 
diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
index dc01dc75cba5..eb151ac74938 100644
--- a/tools/testing/selftests/resctrl/resctrl.h
+++ b/tools/testing/selftests/resctrl/resctrl.h
@@ -43,6 +43,16 @@
 
 #define DEFAULT_SPAN		(250 * MB)
 
+/*
+ * Memory bandwidth (in MiB) below which the bandwidth comparisons
+ * between iMC and resctrl are considered unreliable. For example RAS
+ * features or memory performance features that generate memory traffic
+ * may drive accesses that are counted differently by performance counters
+ * and MBM respectively, for instance generating "overhead" traffic which
+ * is not counted against any specific RMID.
+ */
+#define THROTTLE_THRESHOLD	750
+
 /*
  * fill_buf_param:	"fill_buf" benchmark parameters
  * @buf_size:		Size (in bytes) of buffer used in benchmark.
-- 
2.46.0
Re: [PATCH V2 12/13] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth
Posted by Ilpo Järvinen 1 month, 3 weeks ago
On Thu, 12 Sep 2024, Reinette Chatre wrote:

> The MBA test incrementally throttles memory bandwidth, each time
> followed by a comparison between the memory bandwidth observed
> by the performance counters and resctrl respectively.
> 
> While a comparison between performance counters and resctrl is
> generally appropriate, they do not have an identical view of
> memory bandwidth. For example RAS features or memory performance
> features that generate memory traffic may drive accesses that are
> counted differently by performance counters and MBM respectively,
> for instance generating "overhead" traffic which is not counted
> against any specific RMID. As a ratio, this different view of memory
> bandwidth becomes more apparent at low memory bandwidths.
> 
> It is not practical to enable/disable the various features that
> may generate memory bandwidth to give performance counters and
> resctrl an identical view. Instead, do not compare performance
> counters and resctrl view of memory bandwidth when the memory
> bandwidth is low.
> 
> Bandwidth throttling behaves differently across platforms
> so it is not appropriate to drop measurement data simply based
> on the throttling level. Instead, use a threshold of 750MiB
> that has been observed to support adequate comparison between
> performance counters and resctrl.
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> Changes since V1:
> - Fix code alignment and spacing.
> - Modify flow to use "continue" instead of "break" now that
>   earlier changes decreases throttling.
> - Expand comment of define to elaborate causes of discrepancy
>   between performance counters and MBM.
> ---
>  tools/testing/selftests/resctrl/mba_test.c |  7 +++++++
>  tools/testing/selftests/resctrl/resctrl.h  | 10 ++++++++++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
> index d8d9637c1951..5c6063d0a77c 100644
> --- a/tools/testing/selftests/resctrl/mba_test.c
> +++ b/tools/testing/selftests/resctrl/mba_test.c
> @@ -98,6 +98,13 @@ static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
>  
>  		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
>  		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
> +		if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc < THROTTLE_THRESHOLD) {
> +			ksft_print_msg("Bandwidth below threshold (%d MiB). Dropping results from MBA schemata %u.\n",
> +				       THROTTLE_THRESHOLD,
> +				       ALLOCATION_MIN + ALLOCATION_STEP * allocation);
> +			continue;
> +		}
> +
>  		avg_diff = (float)labs(avg_bw_resc - avg_bw_imc) / avg_bw_imc;
>  		avg_diff_per = (int)(avg_diff * 100);
>  
> diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
> index dc01dc75cba5..eb151ac74938 100644
> --- a/tools/testing/selftests/resctrl/resctrl.h
> +++ b/tools/testing/selftests/resctrl/resctrl.h
> @@ -43,6 +43,16 @@
>  
>  #define DEFAULT_SPAN		(250 * MB)
>  
> +/*
> + * Memory bandwidth (in MiB) below which the bandwidth comparisons
> + * between iMC and resctrl are considered unreliable. For example RAS
> + * features or memory performance features that generate memory traffic
> + * may drive accesses that are counted differently by performance counters
> + * and MBM respectively, for instance generating "overhead" traffic which
> + * is not counted against any specific RMID.
> + */
> +#define THROTTLE_THRESHOLD	750
> +
>  /*
>   * fill_buf_param:	"fill_buf" benchmark parameters
>   * @buf_size:		Size (in bytes) of buffer used in benchmark.
> 

Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

-- 
 i.