[PATCH v3 45/47] arm_mpam: Add workaround for T241-MPAM-4

Ben Horgan posted 47 patches 4 weeks ago
There is a newer version of this series
[PATCH v3 45/47] arm_mpam: Add workaround for T241-MPAM-4
Posted by Ben Horgan 4 weeks ago
From: Shanker Donthineni <sdonthineni@nvidia.com>

In the T241 implementation of memory-bandwidth partitioning, in the absence
of contention for bandwidth, the minimum bandwidth setting can affect the
amount of achieved bandwidth. Specifically, the achieved bandwidth in the
absence of contention can settle to any value between the values of
MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX.  Also, if MPAMCFG_MBW_MIN is set
zero (below 0.78125%), once a core enters a throttled state, it will never
leave that state.

The first issue is not a concern if the MPAM software allows to program
MPAMCFG_MBW_MIN through the sysfs interface. This patch ensures program
MBW_MIN=1 (0.78125%) whenever MPAMCFG_MBW_MIN=0 is programmed.

In the scenario where the resctrl doesn't support the MBW_MIN interface via
sysfs, to achieve bandwidth closer to MBW_MAX in the absence of contention,
software should configure a relatively narrow gap between MBW_MIN and
MBW_MAX. The recommendation is to use a 5% gap to mitigate the problem.

[ morse: Added as second quirk, adapted to use the new intermediate values
in mpam_extend_config() ]

Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since rfc:
MPAM_IIDR_NVIDIA_T421 -> MPAM_IIDR_NVIDIA_T241
Handling when reset_mbw_min is set
---
 Documentation/arch/arm64/silicon-errata.rst |  2 ++
 drivers/resctrl/mpam_devices.c              | 34 +++++++++++++++++++--
 drivers/resctrl/mpam_internal.h             |  1 +
 3 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index 4e86b85fe3d6..b18bc704d4a1 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -248,6 +248,8 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | NVIDIA         | T241 MPAM       | T241-MPAM-1     | N/A                         |
 +----------------+-----------------+-----------------+-----------------------------+
+| NVIDIA         | T241 MPAM       | T241-MPAM-4     | N/A                         |
++----------------+-----------------+-----------------+-----------------------------+
 +----------------+-----------------+-----------------+-----------------------------+
 | Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585         |
 +----------------+-----------------+-----------------+-----------------------------+
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index bdf13a22d98f..884ca6a6d8f3 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -679,6 +679,12 @@ static const struct mpam_quirk mpam_quirks[] = {
 	.iidr_mask  = MPAM_IIDR_MATCH_ONE,
 	.workaround = T241_SCRUB_SHADOW_REGS,
 	},
+	{
+	/* NVIDIA t241 erratum T241-MPAM-4 */
+	.iidr       = MPAM_IIDR_NVIDIA_T241,
+	.iidr_mask  = MPAM_IIDR_MATCH_ONE,
+	.workaround = T241_FORCE_MBW_MIN_TO_ONE,
+	},
 	{ NULL } /* Sentinel */
 };
 
@@ -1464,6 +1470,17 @@ static void mpam_quirk_post_config_change(struct mpam_msc_ris *ris, u16 partid,
 		mpam_apply_t241_erratum(ris, partid);
 }
 
+static u16 mpam_wa_t241_force_mbw_min_to_one(struct mpam_props *props)
+{
+	u16 max_hw_value, min_hw_granule, res0_bits;
+
+	res0_bits = 16 - props->bwa_wd;
+	max_hw_value = ((1 << props->bwa_wd) - 1) << res0_bits;
+	min_hw_granule = ~max_hw_value;
+
+	return min_hw_granule + 1;
+}
+
 /* Called via IPI. Call while holding an SRCU reference */
 static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
 				      struct mpam_config *cfg)
@@ -1508,10 +1525,15 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
 
 	if (mpam_has_feature(mpam_feat_mbw_min, rprops) &&
 	    mpam_has_feature(mpam_feat_mbw_min, cfg)) {
-		if (cfg->reset_mbw_min)
-			mpam_write_partsel_reg(msc, MBW_MIN, 0);
-		else
+		if (cfg->reset_mbw_min) {
+			u16 reset = 0;
+
+			if (mpam_has_quirk(T241_FORCE_MBW_MIN_TO_ONE, msc))
+				reset = mpam_wa_t241_force_mbw_min_to_one(rprops);
+			mpam_write_partsel_reg(msc, MBW_MIN, reset);
+		} else {
 			mpam_write_partsel_reg(msc, MBW_MIN, cfg->mbw_min);
+		}
 	}
 
 	if (mpam_has_feature(mpam_feat_mbw_max, rprops) &&
@@ -2570,6 +2592,12 @@ static void mpam_extend_config(struct mpam_class *class, struct mpam_config *cfg
 		cfg->mbw_min = max(min, min_hw_granule);
 		mpam_set_feature(mpam_feat_mbw_min, cfg);
 	}
+
+	if (mpam_has_quirk(T241_FORCE_MBW_MIN_TO_ONE, class) &&
+	    cfg->mbw_min <= min_hw_granule) {
+		cfg->mbw_min = min_hw_granule + 1;
+		mpam_set_feature(mpam_feat_mbw_min, cfg);
+	}
 }
 
 static void mpam_reset_component_cfg(struct mpam_component *comp)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 9d15d37d4b5a..7b4566814945 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -224,6 +224,7 @@ struct mpam_props {
 /* Workaround bits for msc->quirks */
 enum mpam_device_quirks {
 	T241_SCRUB_SHADOW_REGS,
+	T241_FORCE_MBW_MIN_TO_ONE,
 	MPAM_QUIRK_LAST
 };
 
-- 
2.43.0
Re: [PATCH v3 45/47] arm_mpam: Add workaround for T241-MPAM-4
Posted by Fenghua Yu 3 weeks, 3 days ago
Hi, Shanker and Ben,

On 1/12/26 08:59, Ben Horgan wrote:
> From: Shanker Donthineni <sdonthineni@nvidia.com>
> 
> In the T241 implementation of memory-bandwidth partitioning, in the absence
> of contention for bandwidth, the minimum bandwidth setting can affect the
> amount of achieved bandwidth. Specifically, the achieved bandwidth in the
> absence of contention can settle to any value between the values of
> MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX.  Also, if MPAMCFG_MBW_MIN is set
> zero (below 0.78125%), once a core enters a throttled state, it will never
> leave that state.
> 
> The first issue is not a concern if the MPAM software allows to program
> MPAMCFG_MBW_MIN through the sysfs interface. This patch ensures program
> MBW_MIN=1 (0.78125%) whenever MPAMCFG_MBW_MIN=0 is programmed.

When MBW_MIN=1, min mem bw can be very low when contention. This may 
drop mem access performance. Is it possible to set MBW_MIN bigger so 
that ensure the floor of mem access is high?

Thanks.

-Fenghua

> 
> In the scenario where the resctrl doesn't support the MBW_MIN interface via
> sysfs, to achieve bandwidth closer to MBW_MAX in the absence of contention,
> software should configure a relatively narrow gap between MBW_MIN and
> MBW_MAX. The recommendation is to use a 5% gap to mitigate the problem.
> 
> [ morse: Added as second quirk, adapted to use the new intermediate values
> in mpam_extend_config() ]
> 
> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since rfc:
> MPAM_IIDR_NVIDIA_T421 -> MPAM_IIDR_NVIDIA_T241
> Handling when reset_mbw_min is set
> ---
>   Documentation/arch/arm64/silicon-errata.rst |  2 ++
>   drivers/resctrl/mpam_devices.c              | 34 +++++++++++++++++++--
>   drivers/resctrl/mpam_internal.h             |  1 +
>   3 files changed, 34 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
> index 4e86b85fe3d6..b18bc704d4a1 100644
> --- a/Documentation/arch/arm64/silicon-errata.rst
> +++ b/Documentation/arch/arm64/silicon-errata.rst
> @@ -248,6 +248,8 @@ stable kernels.
>   +----------------+-----------------+-----------------+-----------------------------+
>   | NVIDIA         | T241 MPAM       | T241-MPAM-1     | N/A                         |
>   +----------------+-----------------+-----------------+-----------------------------+
> +| NVIDIA         | T241 MPAM       | T241-MPAM-4     | N/A                         |
> ++----------------+-----------------+-----------------+-----------------------------+
>   +----------------+-----------------+-----------------+-----------------------------+
>   | Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585         |
>   +----------------+-----------------+-----------------+-----------------------------+
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index bdf13a22d98f..884ca6a6d8f3 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -679,6 +679,12 @@ static const struct mpam_quirk mpam_quirks[] = {
>   	.iidr_mask  = MPAM_IIDR_MATCH_ONE,
>   	.workaround = T241_SCRUB_SHADOW_REGS,
>   	},
> +	{
> +	/* NVIDIA t241 erratum T241-MPAM-4 */
> +	.iidr       = MPAM_IIDR_NVIDIA_T241,
> +	.iidr_mask  = MPAM_IIDR_MATCH_ONE,
> +	.workaround = T241_FORCE_MBW_MIN_TO_ONE,
> +	},
>   	{ NULL } /* Sentinel */
>   };
>   
> @@ -1464,6 +1470,17 @@ static void mpam_quirk_post_config_change(struct mpam_msc_ris *ris, u16 partid,
>   		mpam_apply_t241_erratum(ris, partid);
>   }
>   
> +static u16 mpam_wa_t241_force_mbw_min_to_one(struct mpam_props *props)
> +{
> +	u16 max_hw_value, min_hw_granule, res0_bits;
> +
> +	res0_bits = 16 - props->bwa_wd;
> +	max_hw_value = ((1 << props->bwa_wd) - 1) << res0_bits;
> +	min_hw_granule = ~max_hw_value;
> +
> +	return min_hw_granule + 1;
> +}
> +
>   /* Called via IPI. Call while holding an SRCU reference */
>   static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
>   				      struct mpam_config *cfg)
> @@ -1508,10 +1525,15 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
>   
>   	if (mpam_has_feature(mpam_feat_mbw_min, rprops) &&
>   	    mpam_has_feature(mpam_feat_mbw_min, cfg)) {
> -		if (cfg->reset_mbw_min)
> -			mpam_write_partsel_reg(msc, MBW_MIN, 0);
> -		else
> +		if (cfg->reset_mbw_min) {
> +			u16 reset = 0;
> +
> +			if (mpam_has_quirk(T241_FORCE_MBW_MIN_TO_ONE, msc))
> +				reset = mpam_wa_t241_force_mbw_min_to_one(rprops);
> +			mpam_write_partsel_reg(msc, MBW_MIN, reset);
> +		} else {
>   			mpam_write_partsel_reg(msc, MBW_MIN, cfg->mbw_min);
> +		}
>   	}
>   
>   	if (mpam_has_feature(mpam_feat_mbw_max, rprops) &&
> @@ -2570,6 +2592,12 @@ static void mpam_extend_config(struct mpam_class *class, struct mpam_config *cfg
>   		cfg->mbw_min = max(min, min_hw_granule);
>   		mpam_set_feature(mpam_feat_mbw_min, cfg);
>   	}
> +
> +	if (mpam_has_quirk(T241_FORCE_MBW_MIN_TO_ONE, class) &&
> +	    cfg->mbw_min <= min_hw_granule) {
> +		cfg->mbw_min = min_hw_granule + 1;
> +		mpam_set_feature(mpam_feat_mbw_min, cfg);
> +	}
>   }
>   
>   static void mpam_reset_component_cfg(struct mpam_component *comp)
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 9d15d37d4b5a..7b4566814945 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -224,6 +224,7 @@ struct mpam_props {
>   /* Workaround bits for msc->quirks */
>   enum mpam_device_quirks {
>   	T241_SCRUB_SHADOW_REGS,
> +	T241_FORCE_MBW_MIN_TO_ONE,
>   	MPAM_QUIRK_LAST
>   };
>
Re: [PATCH v3 45/47] arm_mpam: Add workaround for T241-MPAM-4
Posted by Ben Horgan 2 weeks, 6 days ago
Hi Fenghua,

On 1/15/26 23:20, Fenghua Yu wrote:
> Hi, Shanker and Ben,
> 
> On 1/12/26 08:59, Ben Horgan wrote:
>> From: Shanker Donthineni <sdonthineni@nvidia.com>
>>
>> In the T241 implementation of memory-bandwidth partitioning, in the
>> absence
>> of contention for bandwidth, the minimum bandwidth setting can affect the
>> amount of achieved bandwidth. Specifically, the achieved bandwidth in the
>> absence of contention can settle to any value between the values of
>> MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX.  Also, if MPAMCFG_MBW_MIN is set
>> zero (below 0.78125%), once a core enters a throttled state, it will
>> never
>> leave that state.
>>
>> The first issue is not a concern if the MPAM software allows to program
>> MPAMCFG_MBW_MIN through the sysfs interface. This patch ensures program
>> MBW_MIN=1 (0.78125%) whenever MPAMCFG_MBW_MIN=0 is programmed.
> 
> When MBW_MIN=1, min mem bw can be very low when contention. This may
> drop mem access performance. Is it possible to set MBW_MIN bigger so
> that ensure the floor of mem access is high?

Isn't that a policy decision rather than something we should be putting
in a quirk framework?

Thanks,

Ben

Re: [PATCH v3 45/47] arm_mpam: Add workaround for T241-MPAM-4
Posted by Fenghua Yu 1 week, 3 days ago
Hi, Ben,

On 1/19/26 12:56, Ben Horgan wrote:
> Hi Fenghua,
> 
> On 1/15/26 23:20, Fenghua Yu wrote:
>> Hi, Shanker and Ben,
>>
>> On 1/12/26 08:59, Ben Horgan wrote:
>>> From: Shanker Donthineni <sdonthineni@nvidia.com>
>>>
>>> In the T241 implementation of memory-bandwidth partitioning, in the
>>> absence
>>> of contention for bandwidth, the minimum bandwidth setting can affect the
>>> amount of achieved bandwidth. Specifically, the achieved bandwidth in the
>>> absence of contention can settle to any value between the values of
>>> MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX.  Also, if MPAMCFG_MBW_MIN is set
>>> zero (below 0.78125%), once a core enters a throttled state, it will
>>> never
>>> leave that state.
>>>
>>> The first issue is not a concern if the MPAM software allows to program
>>> MPAMCFG_MBW_MIN through the sysfs interface. This patch ensures program
>>> MBW_MIN=1 (0.78125%) whenever MPAMCFG_MBW_MIN=0 is programmed.
>>
>> When MBW_MIN=1, min mem bw can be very low when contention. This may
>> drop mem access performance. Is it possible to set MBW_MIN bigger so
>> that ensure the floor of mem access is high?
> 
> Isn't that a policy decision rather than something we should be putting
> in a quirk framework?

MBW_MIN is 1% or 5% less than MBW_MAX.

The lower MBW_MIN hints hardware to lower mem bandwidth when mem access 
contention. That causes memory performance degradation.

Is it possible to do the following changes to fix the performance issue?
1. By default min mbw is equal to max mbw. So hardware won't lower 
performance unless it's needed. This can fix the current performance issue.
2. Add a new schemata line (e.g. MBI:<id>=x;<id>=y;...) to specify min 
mbw just like max mbw specified by schemata line "MB:...". User can use 
this line to change min mbw per partition per node. This could be added 
in the future.

Thanks.

-Fenghua
Re: [PATCH v3 45/47] arm_mpam: Add workaround for T241-MPAM-4
Posted by Ben Horgan 1 week, 3 days ago
Hi Fenghua,

On 1/29/26 22:14, Fenghua Yu wrote:
> Hi, Ben,
> 
> On 1/19/26 12:56, Ben Horgan wrote:
>> Hi Fenghua,
>>
>> On 1/15/26 23:20, Fenghua Yu wrote:
>>> Hi, Shanker and Ben,
>>>
>>> On 1/12/26 08:59, Ben Horgan wrote:
>>>> From: Shanker Donthineni <sdonthineni@nvidia.com>
>>>>
>>>> In the T241 implementation of memory-bandwidth partitioning, in the
>>>> absence
>>>> of contention for bandwidth, the minimum bandwidth setting can
>>>> affect the
>>>> amount of achieved bandwidth. Specifically, the achieved bandwidth
>>>> in the
>>>> absence of contention can settle to any value between the values of
>>>> MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX.  Also, if MPAMCFG_MBW_MIN is set
>>>> zero (below 0.78125%), once a core enters a throttled state, it will
>>>> never
>>>> leave that state.
>>>>
>>>> The first issue is not a concern if the MPAM software allows to program
>>>> MPAMCFG_MBW_MIN through the sysfs interface. This patch ensures program
>>>> MBW_MIN=1 (0.78125%) whenever MPAMCFG_MBW_MIN=0 is programmed.
>>>
>>> When MBW_MIN=1, min mem bw can be very low when contention. This may
>>> drop mem access performance. Is it possible to set MBW_MIN bigger so
>>> that ensure the floor of mem access is high?
>>
>> Isn't that a policy decision rather than something we should be putting
>> in a quirk framework?
> 
> MBW_MIN is 1% or 5% less than MBW_MAX.
> 
> The lower MBW_MIN hints hardware to lower mem bandwidth when mem access
> contention. That causes memory performance degradation.
> 
> Is it possible to do the following changes to fix the performance issue?
> 1. By default min mbw is equal to max mbw. So hardware won't lower
> performance unless it's needed. This can fix the current performance issue.
> 2. Add a new schemata line (e.g. MBI:<id>=x;<id>=y;...) to specify min
> mbw just like max mbw specified by schemata line "MB:...". User can use
> this line to change min mbw per partition per node. This could be added
> in the future.

Thanks for bringing this up. It raises some more general queries about
the handling of mbw_min and so I'll move the discussion to
[PATCH v3 41/47] arm_mpam: Generate a configuration for min controls
as it isn't specific to the nvidia quirks.

> 
> Thanks.
> 
> -Fenghua

Thanks,

Ben