[RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers

James Morse posted 38 patches 2 months ago
There is a newer version of this series
[RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers
Posted by James Morse 2 months ago
MPAM allows traffic in the SoC to be labeled by the OS, these labels
are used to apply policy in caches and bandwidth regulators, and to
monitor traffic in the SoC. The label is made up of a PARTID and PMG
value. The x86 equivalent calls these CLOSID and RMID, but they don't
map precisely.

MPAM has two CPU system registers that is used to hold the PARTID and PMG
values that traffic generated at each exception level will use. These can be
set per-task by the resctrl file system. (resctrl is the defacto interface
for controlling this stuff).

Add a helper to switch this.

struct task_struct's separate CLOSID and RMID fields are insufficient
to implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID)
and PMG (sort of like the RMID) separately. On x86, the rmid is an
independent number, so a race that writes a mismatched  closid and rmid
into hardware is benign. On arm64, the pmg bits extend the partid.
(i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).
In this case, mismatching the values will 'dirty' a pmg value that
resctrl believes is clean, and is not tracking with its 'limbo' code.

To avoid this, the partid and pmg are always read and written as a pair.
Instead of making struct task_struct's closid and rmid fields an
endian-unsafe union, add the value to struct thread_info and always use
READ_ONCE()/WRITE_ONCE() when accessing this field.

Resctrl allows a per-cpu 'default' value to be set, this overrides the
values when scheduling a task in the default control-group, which has
PARTID 0. The way 'code data prioritisation' gets emulated means the
register value for the default group needs to be a variable.

The current system register value is kept in a per-cpu variable to
avoid writing to the system register if the value isn't going to change.
Writes to this register may reset the hardware state for regulating
bandwidth.

Finally, there is no reason to context switch these registers unless
there is a driver changing the values in struct task_struct. Hide
the whole thing behind a static key. This also allows the driver to
disable MPAM in response to errors reported by hardware. Move the
existing static key to belong to the arch code, as in the future
the MPAM driver may become a loadable module.

All this should depend on whether there is an MPAM driver, hide
it behind CONFIG_MPAM.

CC: Amit Singh Tomar <amitsinght@marvell.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/Kconfig                   |  2 +
 arch/arm64/include/asm/mpam.h        | 74 ++++++++++++++++++++++++++++
 arch/arm64/include/asm/thread_info.h |  3 ++
 arch/arm64/kernel/Makefile           |  1 +
 arch/arm64/kernel/mpam.c             | 13 +++++
 arch/arm64/kernel/process.c          |  7 +++
 drivers/resctrl/mpam_devices.c       |  2 -
 drivers/resctrl/mpam_internal.h      |  2 +
 8 files changed, 102 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm64/include/asm/mpam.h
 create mode 100644 arch/arm64/kernel/mpam.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 004d58cfbff8..558baa9e7c08 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2048,6 +2048,8 @@ config ARM64_MPAM
 
 	  MPAM is exposed to user-space via the resctrl pseudo filesystem.
 
+	  This option enables the extra context switch code.
+
 endmenu # "ARMv8.4 architectural features"
 
 menu "ARMv8.5 architectural features"
diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
new file mode 100644
index 000000000000..86a55176f884
--- /dev/null
+++ b/arch/arm64/include/asm/mpam.h
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2025 Arm Ltd. */
+
+#ifndef __ASM__MPAM_H
+#define __ASM__MPAM_H
+
+#include <linux/bitops.h>
+#include <linux/init.h>
+#include <linux/jump_label.h>
+#include <linux/percpu.h>
+#include <linux/sched.h>
+
+#include <asm/cpucaps.h>
+#include <asm/cpufeature.h>
+#include <asm/sysreg.h>
+
+DECLARE_STATIC_KEY_FALSE(mpam_enabled);
+DECLARE_PER_CPU(u64, arm64_mpam_default);
+DECLARE_PER_CPU(u64, arm64_mpam_current);
+
+/*
+ * The value of the MPAM0_EL1 sysreg when a task is in resctrl's default group.
+ * This is used by the context switch code to use the resctrl CPU property
+ * instead. The value is modified when CDP is enabled/disabled by mounting
+ * the resctrl filesystem.
+ */
+extern u64 arm64_mpam_global_default;
+
+/*
+ * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
+ * which may race with reads in __mpam_sched_in(). Ensure only one of the old
+ * or new values are used. Particular care should be taken with the pmg field
+ * as __mpam_sched_in() may read a partid and pmg that don't match, causing
+ * this value to be stored with cache allocations, despite being considered
+ * 'free' by resctrl.
+ *
+ * A value in struct thread_info is used instead of struct task_struct as the
+ * cpu's u64 register format is used, but struct task_struct has two u32'.
+ */
+static inline u64 mpam_get_regval(struct task_struct *tsk)
+{
+#ifdef CONFIG_ARM64_MPAM
+	return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
+#else
+	return 0;
+#endif
+}
+
+static inline void mpam_thread_switch(struct task_struct *tsk)
+{
+	u64 oldregval;
+	int cpu = smp_processor_id();
+	u64 regval = mpam_get_regval(tsk);
+
+	if (!IS_ENABLED(CONFIG_ARM64_MPAM) ||
+	    !static_branch_likely(&mpam_enabled))
+		return;
+
+	if (regval == READ_ONCE(arm64_mpam_global_default))
+		regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
+
+	oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
+	if (oldregval == regval)
+		return;
+
+	write_sysreg_s(regval, SYS_MPAM1_EL1);
+	isb();
+
+	/* Synchronising the EL0 write is left until the ERET to EL0 */
+	write_sysreg_s(regval, SYS_MPAM0_EL1);
+
+	WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
+}
+#endif /* __ASM__MPAM_H */
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index f241b8601ebd..c226dabd5019 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -41,6 +41,9 @@ struct thread_info {
 #ifdef CONFIG_SHADOW_CALL_STACK
 	void			*scs_base;
 	void			*scs_sp;
+#endif
+#ifdef CONFIG_ARM64_MPAM
+	u64			mpam_partid_pmg;
 #endif
 	u32			cpu;
 };
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 76f32e424065..15979f366519 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -67,6 +67,7 @@ obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
 obj-$(CONFIG_VMCORE_INFO)		+= vmcore_info.o
 obj-$(CONFIG_ARM_SDE_INTERFACE)		+= sdei.o
 obj-$(CONFIG_ARM64_PTR_AUTH)		+= pointer_auth.o
+obj-$(CONFIG_ARM64_MPAM)		+= mpam.o
 obj-$(CONFIG_ARM64_MTE)			+= mte.o
 obj-y					+= vdso-wrap.o
 obj-$(CONFIG_COMPAT_VDSO)		+= vdso32-wrap.o
diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
new file mode 100644
index 000000000000..9866d2ca0faa
--- /dev/null
+++ b/arch/arm64/kernel/mpam.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2025 Arm Ltd. */
+
+#include <asm/mpam.h>
+
+#include <linux/jump_label.h>
+#include <linux/percpu.h>
+
+DEFINE_STATIC_KEY_FALSE(mpam_enabled);
+DEFINE_PER_CPU(u64, arm64_mpam_default);
+DEFINE_PER_CPU(u64, arm64_mpam_current);
+
+u64 arm64_mpam_global_default;
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index fba7ca102a8c..b510c0699313 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -51,6 +51,7 @@
 #include <asm/fpsimd.h>
 #include <asm/gcs.h>
 #include <asm/mmu_context.h>
+#include <asm/mpam.h>
 #include <asm/mte.h>
 #include <asm/processor.h>
 #include <asm/pointer_auth.h>
@@ -737,6 +738,12 @@ struct task_struct *__switch_to(struct task_struct *prev,
 	if (prev->thread.sctlr_user != next->thread.sctlr_user)
 		update_sctlr_el1(next->thread.sctlr_user);
 
+	/*
+	 * MPAM thread switch happens after the DSB to ensure prev's accesses
+	 * use prev's MPAM settings.
+	 */
+	mpam_thread_switch(next);
+
 	/* the actual thread switch */
 	last = cpu_switch_to(prev, next);
 
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 0b5b158e1aaf..2996ad93fc3e 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -29,8 +29,6 @@
 
 #include "mpam_internal.h"
 
-DEFINE_STATIC_KEY_FALSE(mpam_enabled); /* This moves to arch code */
-
 /*
  * mpam_list_lock protects the SRCU lists when writing. Once the
  * mpam_enabled key is enabled these lists are read-only,
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index e79c3c47259c..4508a6654fe0 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -17,6 +17,8 @@
 #include <linux/srcu.h>
 #include <linux/types.h>
 
+#include <asm/mpam.h>
+
 #define MPAM_MSC_MAX_NUM_RIS	16
 
 struct platform_device;
-- 
2.39.5
Re: [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers
Posted by Jonathan Cameron 1 month, 3 weeks ago
On Fri, 5 Dec 2025 21:58:24 +0000
James Morse <james.morse@arm.com> wrote:

> MPAM allows traffic in the SoC to be labeled by the OS, these labels
> are used to apply policy in caches and bandwidth regulators, and to
> monitor traffic in the SoC. The label is made up of a PARTID and PMG
> value. The x86 equivalent calls these CLOSID and RMID, but they don't
> map precisely.
> 
> MPAM has two CPU system registers that is used to hold the PARTID and PMG
> values that traffic generated at each exception level will use. These can be
> set per-task by the resctrl file system. (resctrl is the defacto interface
> for controlling this stuff).
> 
> Add a helper to switch this.
> 
> struct task_struct's separate CLOSID and RMID fields are insufficient
> to implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID)
> and PMG (sort of like the RMID) separately. On x86, the rmid is an
> independent number, so a race that writes a mismatched  closid and rmid
> into hardware is benign. On arm64, the pmg bits extend the partid.
> (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).
> In this case, mismatching the values will 'dirty' a pmg value that
> resctrl believes is clean, and is not tracking with its 'limbo' code.
> 
> To avoid this, the partid and pmg are always read and written as a pair.
> Instead of making struct task_struct's closid and rmid fields an
> endian-unsafe union, add the value to struct thread_info and always use
> READ_ONCE()/WRITE_ONCE() when accessing this field.
> 
> Resctrl allows a per-cpu 'default' value to be set, this overrides the
> values when scheduling a task in the default control-group, which has
> PARTID 0. The way 'code data prioritisation' gets emulated means the
> register value for the default group needs to be a variable.
> 
> The current system register value is kept in a per-cpu variable to
> avoid writing to the system register if the value isn't going to change.
> Writes to this register may reset the hardware state for regulating
> bandwidth.
> 
> Finally, there is no reason to context switch these registers unless
> there is a driver changing the values in struct task_struct. Hide
> the whole thing behind a static key. This also allows the driver to
> disable MPAM in response to errors reported by hardware. Move the
> existing static key to belong to the arch code, as in the future
> the MPAM driver may become a loadable module.
> 
> All this should depend on whether there is an MPAM driver, hide
> it behind CONFIG_MPAM.
> 
> CC: Amit Singh Tomar <amitsinght@marvell.com>
> Signed-off-by: James Morse <james.morse@arm.com>

> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
> new file mode 100644
> index 000000000000..86a55176f884
> --- /dev/null
> +++ b/arch/arm64/include/asm/mpam.h
> @@ -0,0 +1,74 @@
...

> +/*
> + * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
> + * which may race with reads in __mpam_sched_in(). Ensure only one of the old
> + * or new values are used. Particular care should be taken with the pmg field
> + * as __mpam_sched_in() may read a partid and pmg that don't match, causing
> + * this value to be stored with cache allocations, despite being considered
> + * 'free' by resctrl.
> + *
> + * A value in struct thread_info is used instead of struct task_struct as the
> + * cpu's u64 register format is used, but struct task_struct has two u32'.

This comment probably wants to provide a little more info if it is to be useful,

Is it a reference to the closid and rmid fields under CONFIG_X86_CPU_RESCTRL?
I'm not immediately understanding why that matters given you could slap
a union on it without (I think) resulting in anything else moving.

Now having it in thread_info moves it into arch header territory so
might make sense for that reason.

> + */
> +static inline u64 mpam_get_regval(struct task_struct *tsk)
> +{
> +#ifdef CONFIG_ARM64_MPAM
> +	return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
> +#else
> +	return 0;
> +#endif
> +}
Re: [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers
Posted by Ben Horgan 1 month, 3 weeks ago
Hi Jonathan,

On 12/18/25 10:35, Jonathan Cameron wrote:
> On Fri, 5 Dec 2025 21:58:24 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> MPAM allows traffic in the SoC to be labeled by the OS, these labels
>> are used to apply policy in caches and bandwidth regulators, and to
>> monitor traffic in the SoC. The label is made up of a PARTID and PMG
>> value. The x86 equivalent calls these CLOSID and RMID, but they don't
>> map precisely.
>>
>> MPAM has two CPU system registers that is used to hold the PARTID and PMG
>> values that traffic generated at each exception level will use. These can be
>> set per-task by the resctrl file system. (resctrl is the defacto interface
>> for controlling this stuff).
>>
>> Add a helper to switch this.
>>
>> struct task_struct's separate CLOSID and RMID fields are insufficient
>> to implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID)
>> and PMG (sort of like the RMID) separately. On x86, the rmid is an
>> independent number, so a race that writes a mismatched  closid and rmid
>> into hardware is benign. On arm64, the pmg bits extend the partid.
>> (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).
>> In this case, mismatching the values will 'dirty' a pmg value that
>> resctrl believes is clean, and is not tracking with its 'limbo' code.
>>
>> To avoid this, the partid and pmg are always read and written as a pair.
>> Instead of making struct task_struct's closid and rmid fields an
>> endian-unsafe union, add the value to struct thread_info and always use
>> READ_ONCE()/WRITE_ONCE() when accessing this field.
>>
>> Resctrl allows a per-cpu 'default' value to be set, this overrides the
>> values when scheduling a task in the default control-group, which has
>> PARTID 0. The way 'code data prioritisation' gets emulated means the
>> register value for the default group needs to be a variable.
>>
>> The current system register value is kept in a per-cpu variable to
>> avoid writing to the system register if the value isn't going to change.
>> Writes to this register may reset the hardware state for regulating
>> bandwidth.
>>
>> Finally, there is no reason to context switch these registers unless
>> there is a driver changing the values in struct task_struct. Hide
>> the whole thing behind a static key. This also allows the driver to
>> disable MPAM in response to errors reported by hardware. Move the
>> existing static key to belong to the arch code, as in the future
>> the MPAM driver may become a loadable module.
>>
>> All this should depend on whether there is an MPAM driver, hide
>> it behind CONFIG_MPAM.
>>
>> CC: Amit Singh Tomar <amitsinght@marvell.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
> 
>> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
>> new file mode 100644
>> index 000000000000..86a55176f884
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/mpam.h
>> @@ -0,0 +1,74 @@
> ...
> 
>> +/*
>> + * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
>> + * which may race with reads in __mpam_sched_in(). Ensure only one of the old
>> + * or new values are used. Particular care should be taken with the pmg field
>> + * as __mpam_sched_in() may read a partid and pmg that don't match, causing
>> + * this value to be stored with cache allocations, despite being considered
>> + * 'free' by resctrl.
>> + *
>> + * A value in struct thread_info is used instead of struct task_struct as the
>> + * cpu's u64 register format is used, but struct task_struct has two u32'.
> 
> This comment probably wants to provide a little more info if it is to be useful,
> 
> Is it a reference to the closid and rmid fields under CONFIG_X86_CPU_RESCTRL?
> I'm not immediately understanding why that matters given you could slap
> a union on it without (I think) resulting in anything else moving.

Yes, the fields referred to are those closid and rmid. As James writes
in the commit message a union is an alternative, but it would be endian
unsafe. Unlikely to matter but lets not break things.

I'm replying for James as he is otherwise engaged. Thanks for the review
of this series and all your review on the previous MPAM series.

> 
> Now having it in thread_info moves it into arch header territory so
> might make sense for that reason.
> 
>> + */
>> +static inline u64 mpam_get_regval(struct task_struct *tsk)
>> +{
>> +#ifdef CONFIG_ARM64_MPAM
>> +	return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
>> +#else
>> +	return 0;
>> +#endif
>> +}
> 

Thanks,

Ben
Re: [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers
Posted by Ben Horgan 1 month, 3 weeks ago

On 12/18/25 14:52, Ben Horgan wrote:
> Hi Jonathan,
> 
> On 12/18/25 10:35, Jonathan Cameron wrote:
>> On Fri, 5 Dec 2025 21:58:24 +0000
>> James Morse <james.morse@arm.com> wrote:
>>
>>> MPAM allows traffic in the SoC to be labeled by the OS, these labels
>>> are used to apply policy in caches and bandwidth regulators, and to
>>> monitor traffic in the SoC. The label is made up of a PARTID and PMG
>>> value. The x86 equivalent calls these CLOSID and RMID, but they don't
>>> map precisely.
>>>
>>> MPAM has two CPU system registers that is used to hold the PARTID and PMG
>>> values that traffic generated at each exception level will use. These can be
>>> set per-task by the resctrl file system. (resctrl is the defacto interface
>>> for controlling this stuff).
>>>
>>> Add a helper to switch this.
>>>
>>> struct task_struct's separate CLOSID and RMID fields are insufficient
>>> to implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID)
>>> and PMG (sort of like the RMID) separately. On x86, the rmid is an
>>> independent number, so a race that writes a mismatched  closid and rmid
>>> into hardware is benign. On arm64, the pmg bits extend the partid.
>>> (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).
>>> In this case, mismatching the values will 'dirty' a pmg value that
>>> resctrl believes is clean, and is not tracking with its 'limbo' code.
>>>
>>> To avoid this, the partid and pmg are always read and written as a pair.
>>> Instead of making struct task_struct's closid and rmid fields an
>>> endian-unsafe union, add the value to struct thread_info and always use
>>> READ_ONCE()/WRITE_ONCE() when accessing this field.
>>>
>>> Resctrl allows a per-cpu 'default' value to be set, this overrides the
>>> values when scheduling a task in the default control-group, which has
>>> PARTID 0. The way 'code data prioritisation' gets emulated means the
>>> register value for the default group needs to be a variable.
>>>
>>> The current system register value is kept in a per-cpu variable to
>>> avoid writing to the system register if the value isn't going to change.
>>> Writes to this register may reset the hardware state for regulating
>>> bandwidth.
>>>
>>> Finally, there is no reason to context switch these registers unless
>>> there is a driver changing the values in struct task_struct. Hide
>>> the whole thing behind a static key. This also allows the driver to
>>> disable MPAM in response to errors reported by hardware. Move the
>>> existing static key to belong to the arch code, as in the future
>>> the MPAM driver may become a loadable module.
>>>
>>> All this should depend on whether there is an MPAM driver, hide
>>> it behind CONFIG_MPAM.
>>>
>>> CC: Amit Singh Tomar <amitsinght@marvell.com>
>>> Signed-off-by: James Morse <james.morse@arm.com>
>>
>>> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
>>> new file mode 100644
>>> index 000000000000..86a55176f884
>>> --- /dev/null
>>> +++ b/arch/arm64/include/asm/mpam.h
>>> @@ -0,0 +1,74 @@
>> ...
>>
>>> +/*
>>> + * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
>>> + * which may race with reads in __mpam_sched_in(). Ensure only one of the old
>>> + * or new values are used. Particular care should be taken with the pmg field
>>> + * as __mpam_sched_in() may read a partid and pmg that don't match, causing
>>> + * this value to be stored with cache allocations, despite being considered
>>> + * 'free' by resctrl.
>>> + *
>>> + * A value in struct thread_info is used instead of struct task_struct as the
>>> + * cpu's u64 register format is used, but struct task_struct has two u32'.
>>
>> This comment probably wants to provide a little more info if it is to be useful,
>>
>> Is it a reference to the closid and rmid fields under CONFIG_X86_CPU_RESCTRL?
>> I'm not immediately understanding why that matters given you could slap
>> a union on it without (I think) resulting in anything else moving.
> 
> Yes, the fields referred to are those closid and rmid. As James writes
> in the commit message a union is an alternative, but it would be endian
> unsafe. Unlikely to matter but lets not break things.

Meant to say... I'll add clarification in this vein to the comment.

> 
> I'm replying for James as he is otherwise engaged. Thanks for the review
> of this series and all your review on the previous MPAM series.
> 
>>
>> Now having it in thread_info moves it into arch header territory so
>> might make sense for that reason.
>>
>>> + */
>>> +static inline u64 mpam_get_regval(struct task_struct *tsk)
>>> +{
>>> +#ifdef CONFIG_ARM64_MPAM
>>> +	return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
>>> +#else
>>> +	return 0;
>>> +#endif
>>> +}
>>
> 
> Thanks,
> 
> Ben
> 

-- 
Thanks,

Ben
Re: [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers
Posted by Jonathan Cameron 1 month, 3 weeks ago
On Thu, 18 Dec 2025 14:55:23 +0000
Ben Horgan <ben.horgan@arm.com> wrote:

> On 12/18/25 14:52, Ben Horgan wrote:
> > Hi Jonathan,
> > 
> > On 12/18/25 10:35, Jonathan Cameron wrote:  
> >> On Fri, 5 Dec 2025 21:58:24 +0000
> >> James Morse <james.morse@arm.com> wrote:
> >>  
> >>> MPAM allows traffic in the SoC to be labeled by the OS, these labels
> >>> are used to apply policy in caches and bandwidth regulators, and to
> >>> monitor traffic in the SoC. The label is made up of a PARTID and PMG
> >>> value. The x86 equivalent calls these CLOSID and RMID, but they don't
> >>> map precisely.
> >>>
> >>> MPAM has two CPU system registers that is used to hold the PARTID and PMG
> >>> values that traffic generated at each exception level will use. These can be
> >>> set per-task by the resctrl file system. (resctrl is the defacto interface
> >>> for controlling this stuff).
> >>>
> >>> Add a helper to switch this.
> >>>
> >>> struct task_struct's separate CLOSID and RMID fields are insufficient
> >>> to implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID)
> >>> and PMG (sort of like the RMID) separately. On x86, the rmid is an
> >>> independent number, so a race that writes a mismatched  closid and rmid
> >>> into hardware is benign. On arm64, the pmg bits extend the partid.
> >>> (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).
> >>> In this case, mismatching the values will 'dirty' a pmg value that
> >>> resctrl believes is clean, and is not tracking with its 'limbo' code.
> >>>
> >>> To avoid this, the partid and pmg are always read and written as a pair.
> >>> Instead of making struct task_struct's closid and rmid fields an
> >>> endian-unsafe union, add the value to struct thread_info and always use
> >>> READ_ONCE()/WRITE_ONCE() when accessing this field.
> >>>
> >>> Resctrl allows a per-cpu 'default' value to be set, this overrides the
> >>> values when scheduling a task in the default control-group, which has
> >>> PARTID 0. The way 'code data prioritisation' gets emulated means the
> >>> register value for the default group needs to be a variable.
> >>>
> >>> The current system register value is kept in a per-cpu variable to
> >>> avoid writing to the system register if the value isn't going to change.
> >>> Writes to this register may reset the hardware state for regulating
> >>> bandwidth.
> >>>
> >>> Finally, there is no reason to context switch these registers unless
> >>> there is a driver changing the values in struct task_struct. Hide
> >>> the whole thing behind a static key. This also allows the driver to
> >>> disable MPAM in response to errors reported by hardware. Move the
> >>> existing static key to belong to the arch code, as in the future
> >>> the MPAM driver may become a loadable module.
> >>>
> >>> All this should depend on whether there is an MPAM driver, hide
> >>> it behind CONFIG_MPAM.
> >>>
> >>> CC: Amit Singh Tomar <amitsinght@marvell.com>
> >>> Signed-off-by: James Morse <james.morse@arm.com>  
> >>  
> >>> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
> >>> new file mode 100644
> >>> index 000000000000..86a55176f884
> >>> --- /dev/null
> >>> +++ b/arch/arm64/include/asm/mpam.h
> >>> @@ -0,0 +1,74 @@  
> >> ...
> >>  
> >>> +/*
> >>> + * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
> >>> + * which may race with reads in __mpam_sched_in(). Ensure only one of the old
> >>> + * or new values are used. Particular care should be taken with the pmg field
> >>> + * as __mpam_sched_in() may read a partid and pmg that don't match, causing
> >>> + * this value to be stored with cache allocations, despite being considered
> >>> + * 'free' by resctrl.
> >>> + *
> >>> + * A value in struct thread_info is used instead of struct task_struct as the
> >>> + * cpu's u64 register format is used, but struct task_struct has two u32'.  
> >>
> >> This comment probably wants to provide a little more info if it is to be useful,
> >>
> >> Is it a reference to the closid and rmid fields under CONFIG_X86_CPU_RESCTRL?
> >> I'm not immediately understanding why that matters given you could slap
> >> a union on it without (I think) resulting in anything else moving.  
> > 
> > Yes, the fields referred to are those closid and rmid. As James writes
> > in the commit message a union is an alternative, but it would be endian
> > unsafe. Unlikely to matter but lets not break things.  
> 
> Meant to say... I'll add clarification in this vein to the comment.

Goes to show I didn't read the patch description.  Oops.

I'm probably just being slow today, but why would it be endian unsafe?
I didn't think the alternative would be to assume the two uses of the storage
were valid at the same time but rather just to reuse the space (which would
have 64bit alignment anyway).  For that matter we could just put a u64 in
under a separate ifdef CONFIG_...

Obviously if the code made use of the access to closid / rmid for arm64
systems it would be a problem.

Anyway just expanding on the comment a bit should do the job with no
need for any other changes.

Thanks,

Jonathan


> 
> > 
> > I'm replying for James as he is otherwise engaged. Thanks for the review
> > of this series and all your review on the previous MPAM series.
> >   
> >>
> >> Now having it in thread_info moves it into arch header territory so
> >> might make sense for that reason.
> >>  
> >>> + */
> >>> +static inline u64 mpam_get_regval(struct task_struct *tsk)
> >>> +{
> >>> +#ifdef CONFIG_ARM64_MPAM
> >>> +	return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
> >>> +#else
> >>> +	return 0;
> >>> +#endif
> >>> +}  
> >>  
> > 
> > Thanks,
> > 
> > Ben
> >   
>
Re: [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers
Posted by Ben Horgan 1 month, 3 weeks ago

On 12/18/25 15:38, Jonathan Cameron wrote:
> On Thu, 18 Dec 2025 14:55:23 +0000
> Ben Horgan <ben.horgan@arm.com> wrote:
> 
>> On 12/18/25 14:52, Ben Horgan wrote:
>>> Hi Jonathan,
>>>
>>> On 12/18/25 10:35, Jonathan Cameron wrote:  
>>>> On Fri, 5 Dec 2025 21:58:24 +0000
>>>> James Morse <james.morse@arm.com> wrote:
>>>>  
>>>>> MPAM allows traffic in the SoC to be labeled by the OS, these labels
>>>>> are used to apply policy in caches and bandwidth regulators, and to
>>>>> monitor traffic in the SoC. The label is made up of a PARTID and PMG
>>>>> value. The x86 equivalent calls these CLOSID and RMID, but they don't
>>>>> map precisely.
>>>>>
>>>>> MPAM has two CPU system registers that is used to hold the PARTID and PMG
>>>>> values that traffic generated at each exception level will use. These can be
>>>>> set per-task by the resctrl file system. (resctrl is the defacto interface
>>>>> for controlling this stuff).
>>>>>
>>>>> Add a helper to switch this.
>>>>>
>>>>> struct task_struct's separate CLOSID and RMID fields are insufficient
>>>>> to implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID)
>>>>> and PMG (sort of like the RMID) separately. On x86, the rmid is an
>>>>> independent number, so a race that writes a mismatched  closid and rmid
>>>>> into hardware is benign. On arm64, the pmg bits extend the partid.
>>>>> (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).
>>>>> In this case, mismatching the values will 'dirty' a pmg value that
>>>>> resctrl believes is clean, and is not tracking with its 'limbo' code.
>>>>>
>>>>> To avoid this, the partid and pmg are always read and written as a pair.
>>>>> Instead of making struct task_struct's closid and rmid fields an
>>>>> endian-unsafe union, add the value to struct thread_info and always use
>>>>> READ_ONCE()/WRITE_ONCE() when accessing this field.
>>>>>
>>>>> Resctrl allows a per-cpu 'default' value to be set, this overrides the
>>>>> values when scheduling a task in the default control-group, which has
>>>>> PARTID 0. The way 'code data prioritisation' gets emulated means the
>>>>> register value for the default group needs to be a variable.
>>>>>
>>>>> The current system register value is kept in a per-cpu variable to
>>>>> avoid writing to the system register if the value isn't going to change.
>>>>> Writes to this register may reset the hardware state for regulating
>>>>> bandwidth.
>>>>>
>>>>> Finally, there is no reason to context switch these registers unless
>>>>> there is a driver changing the values in struct task_struct. Hide
>>>>> the whole thing behind a static key. This also allows the driver to
>>>>> disable MPAM in response to errors reported by hardware. Move the
>>>>> existing static key to belong to the arch code, as in the future
>>>>> the MPAM driver may become a loadable module.
>>>>>
>>>>> All this should depend on whether there is an MPAM driver, hide
>>>>> it behind CONFIG_MPAM.
>>>>>
>>>>> CC: Amit Singh Tomar <amitsinght@marvell.com>
>>>>> Signed-off-by: James Morse <james.morse@arm.com>  
>>>>  
>>>>> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
>>>>> new file mode 100644
>>>>> index 000000000000..86a55176f884
>>>>> --- /dev/null
>>>>> +++ b/arch/arm64/include/asm/mpam.h
>>>>> @@ -0,0 +1,74 @@  
>>>> ...
>>>>  
>>>>> +/*
>>>>> + * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
>>>>> + * which may race with reads in __mpam_sched_in(). Ensure only one of the old
>>>>> + * or new values are used. Particular care should be taken with the pmg field
>>>>> + * as __mpam_sched_in() may read a partid and pmg that don't match, causing
>>>>> + * this value to be stored with cache allocations, despite being considered
>>>>> + * 'free' by resctrl.
>>>>> + *
>>>>> + * A value in struct thread_info is used instead of struct task_struct as the
>>>>> + * cpu's u64 register format is used, but struct task_struct has two u32'.  
>>>>
>>>> This comment probably wants to provide a little more info if it is to be useful,
>>>>
>>>> Is it a reference to the closid and rmid fields under CONFIG_X86_CPU_RESCTRL?
>>>> I'm not immediately understanding why that matters given you could slap
>>>> a union on it without (I think) resulting in anything else moving.  
>>>
>>> Yes, the fields referred to are those closid and rmid. As James writes
>>> in the commit message a union is an alternative, but it would be endian
>>> unsafe. Unlikely to matter but lets not break things.  
>>
>> Meant to say... I'll add clarification in this vein to the comment.
> 
> Goes to show I didn't read the patch description.  Oops.
> 
> I'm probably just being slow today, but why would it be endian unsafe?
> I didn't think the alternative would be to assume the two uses of the storage
> were valid at the same time but rather just to reuse the space (which would
> have 64bit alignment anyway).  For that matter we could just put a u64 in
> under a separate ifdef CONFIG_...
> 
> Obviously if the code made use of the access to closid / rmid for arm64
> systems it would be a problem.

Yes, I think it would only be unsafe if closid / rmid were accessed, but
if they're not, just reusing the spot is cleaner. I assume James' point
is that as we can't use what we've already got it's ok just to do
something new.

> 
> Anyway just expanding on the comment a bit should do the job with no
> need for any other changes.
> 
> Thanks,
> 
> Jonathan
> 
> 
>>
>>>
>>> I'm replying for James as he is otherwise engaged. Thanks for the review
>>> of this series and all your review on the previous MPAM series.
>>>   
>>>>
>>>> Now having it in thread_info moves it into arch header territory so
>>>> might make sense for that reason.
>>>>  
>>>>> + */
>>>>> +static inline u64 mpam_get_regval(struct task_struct *tsk)
>>>>> +{
>>>>> +#ifdef CONFIG_ARM64_MPAM
>>>>> +	return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
>>>>> +#else
>>>>> +	return 0;
>>>>> +#endif
>>>>> +}  
>>>>  
>>>
>>> Thanks,
>>>
>>> Ben
>>>   
>>
> 

Thanks,

Ben
Re: [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers
Posted by Ben Horgan 1 month, 4 weeks ago
Hi James,

On 12/5/25 21:58, James Morse wrote:
> MPAM allows traffic in the SoC to be labeled by the OS, these labels
> are used to apply policy in caches and bandwidth regulators, and to
> monitor traffic in the SoC. The label is made up of a PARTID and PMG
> value. The x86 equivalent calls these CLOSID and RMID, but they don't
> map precisely.
> 
> MPAM has two CPU system registers that is used to hold the PARTID and PMG
> values that traffic generated at each exception level will use. These can be
> set per-task by the resctrl file system. (resctrl is the defacto interface
> for controlling this stuff).
> 
> Add a helper to switch this.
> 
> struct task_struct's separate CLOSID and RMID fields are insufficient
> to implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID)
> and PMG (sort of like the RMID) separately. On x86, the rmid is an
> independent number, so a race that writes a mismatched  closid and rmid
> into hardware is benign. On arm64, the pmg bits extend the partid.
> (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).
> In this case, mismatching the values will 'dirty' a pmg value that
> resctrl believes is clean, and is not tracking with its 'limbo' code.
> 
> To avoid this, the partid and pmg are always read and written as a pair.
> Instead of making struct task_struct's closid and rmid fields an
> endian-unsafe union, add the value to struct thread_info and always use
> READ_ONCE()/WRITE_ONCE() when accessing this field.
> 
> Resctrl allows a per-cpu 'default' value to be set, this overrides the
> values when scheduling a task in the default control-group, which has
> PARTID 0. The way 'code data prioritisation' gets emulated means the
> register value for the default group needs to be a variable.
> 
> The current system register value is kept in a per-cpu variable to
> avoid writing to the system register if the value isn't going to change.
> Writes to this register may reset the hardware state for regulating
> bandwidth.
> 
> Finally, there is no reason to context switch these registers unless
> there is a driver changing the values in struct task_struct. Hide
> the whole thing behind a static key. This also allows the driver to
> disable MPAM in response to errors reported by hardware. Move the
> existing static key to belong to the arch code, as in the future
> the MPAM driver may become a loadable module.
> 
> All this should depend on whether there is an MPAM driver, hide
> it behind CONFIG_MPAM.
> 
> CC: Amit Singh Tomar <amitsinght@marvell.com>
> Signed-off-by: James Morse <james.morse@arm.com>
[...]
> +
> +static inline void mpam_thread_switch(struct task_struct *tsk)
> +{
> +	u64 oldregval;
> +	int cpu = smp_processor_id();
> +	u64 regval = mpam_get_regval(tsk);
> +
> +	if (!IS_ENABLED(CONFIG_ARM64_MPAM) ||
> +	    !static_branch_likely(&mpam_enabled))
> +		return;
> +
> +	if (regval == READ_ONCE(arm64_mpam_global_default))
> +		regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
> +
> +	oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
> +	if (oldregval == regval)
> +		return;
> +
> +	write_sysreg_s(regval, SYS_MPAM1_EL1);
> +	isb();
> +
> +	/* Synchronising the EL0 write is left until the ERET to EL0 */
> +	write_sysreg_s(regval, SYS_MPAM0_EL1);

SYS_MPAMSM_EL1 needs to be written here too to account for accesses
generated by SME loads. Also, when in streaming mode, SVE and FP loads,
stores and SVE prefetches.

SYS_MPAMSM_EL1 should also be considered in initialisation code too.

> +
> +	WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
> +}
Thanks,

Ben
Re: [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers
Posted by Ben Horgan 2 months ago
Hi James,

On 12/5/25 21:58, James Morse wrote:
> MPAM allows traffic in the SoC to be labeled by the OS, these labels
> are used to apply policy in caches and bandwidth regulators, and to
> monitor traffic in the SoC. The label is made up of a PARTID and PMG
> value. The x86 equivalent calls these CLOSID and RMID, but they don't
> map precisely.
> 
[...]
> 
> All this should depend on whether there is an MPAM driver, hide
> it behind CONFIG_MPAM.

CONFIG_MPAM -> CONFIG_ARM64_MPAM

> 
> CC: Amit Singh Tomar <amitsinght@marvell.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  arch/arm64/Kconfig                   |  2 +
>  arch/arm64/include/asm/mpam.h        | 74 ++++++++++++++++++++++++++++
>  arch/arm64/include/asm/thread_info.h |  3 ++
>  arch/arm64/kernel/Makefile           |  1 +
>  arch/arm64/kernel/mpam.c             | 13 +++++
>  arch/arm64/kernel/process.c          |  7 +++
>  drivers/resctrl/mpam_devices.c       |  2 -
>  drivers/resctrl/mpam_internal.h      |  2 +
>  8 files changed, 102 insertions(+), 2 deletions(-)
>  create mode 100644 arch/arm64/include/asm/mpam.h
>  create mode 100644 arch/arm64/kernel/mpam.c
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 004d58cfbff8..558baa9e7c08 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2048,6 +2048,8 @@ config ARM64_MPAM
>  
>  	  MPAM is exposed to user-space via the resctrl pseudo filesystem.
>  
> +	  This option enables the extra context switch code.
> +
>  endmenu # "ARMv8.4 architectural features"
>  
>  menu "ARMv8.5 architectural features"
> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
> new file mode 100644
> index 000000000000..86a55176f884
> --- /dev/null
> +++ b/arch/arm64/include/asm/mpam.h
> @@ -0,0 +1,74 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (C) 2025 Arm Ltd. */
> +
> +#ifndef __ASM__MPAM_H
> +#define __ASM__MPAM_H
> +
> +#include <linux/bitops.h>
> +#include <linux/init.h>
> +#include <linux/jump_label.h>
> +#include <linux/percpu.h>
> +#include <linux/sched.h>
> +
> +#include <asm/cpucaps.h>
> +#include <asm/cpufeature.h>
> +#include <asm/sysreg.h>
> +
> +DECLARE_STATIC_KEY_FALSE(mpam_enabled);
> +DECLARE_PER_CPU(u64, arm64_mpam_default);
> +DECLARE_PER_CPU(u64, arm64_mpam_current);
> +
> +/*
> + * The value of the MPAM0_EL1 sysreg when a task is in resctrl's default group.
> + * This is used by the context switch code to use the resctrl CPU property
> + * instead. The value is modified when CDP is enabled/disabled by mounting
> + * the resctrl filesystem.
> + */
> +extern u64 arm64_mpam_global_default;
> +
> +/*
> + * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
> + * which may race with reads in __mpam_sched_in(). Ensure only one of the old
> + * or new values are used. Particular care should be taken with the pmg field
> + * as __mpam_sched_in() may read a partid and pmg that don't match, causing
> + * this value to be stored with cache allocations, despite being considered
> + * 'free' by resctrl.

__mpam_sched_in() -> mpam_thread_switch()
This is called from resctrl_arch_sched_in().

> + *
> + * A value in struct thread_info is used instead of struct task_struct as the
> + * cpu's u64 register format is used, but struct task_struct has two u32'.
> + */
> +static inline u64 mpam_get_regval(struct task_struct *tsk)
> +{
> +#ifdef CONFIG_ARM64_MPAM
> +	return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
> +#else
> +	return 0;
> +#endif
> +}
> +
> +static inline void mpam_thread_switch(struct task_struct *tsk)
> +{
> +	u64 oldregval;
> +	int cpu = smp_processor_id();
> +	u64 regval = mpam_get_regval(tsk);
> +
> +	if (!IS_ENABLED(CONFIG_ARM64_MPAM) ||
> +	    !static_branch_likely(&mpam_enabled))
> +		return;
> +
> +	if (regval == READ_ONCE(arm64_mpam_global_default))
> +		regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
> +
> +	oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
> +	if (oldregval == regval)
> +		return;
> +
> +	write_sysreg_s(regval, SYS_MPAM1_EL1);
> +	isb();
> +
> +	/* Synchronising the EL0 write is left until the ERET to EL0 */
> +	write_sysreg_s(regval, SYS_MPAM0_EL1);
> +
> +	WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
> +}
> +#endif /* __ASM__MPAM_H */
> diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> index f241b8601ebd..c226dabd5019 100644
> --- a/arch/arm64/include/asm/thread_info.h
> +++ b/arch/arm64/include/asm/thread_info.h
> @@ -41,6 +41,9 @@ struct thread_info {
>  #ifdef CONFIG_SHADOW_CALL_STACK
>  	void			*scs_base;
>  	void			*scs_sp;
> +#endif
> +#ifdef CONFIG_ARM64_MPAM
> +	u64			mpam_partid_pmg;
>  #end if
>  	u32			cpu;
>  };
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index 76f32e424065..15979f366519 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -67,6 +67,7 @@ obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
>  obj-$(CONFIG_VMCORE_INFO)		+= vmcore_info.o
>  obj-$(CONFIG_ARM_SDE_INTERFACE)		+= sdei.o
>  obj-$(CONFIG_ARM64_PTR_AUTH)		+= pointer_auth.o
> +obj-$(CONFIG_ARM64_MPAM)		+= mpam.o
>  obj-$(CONFIG_ARM64_MTE)			+= mte.o
>  obj-y					+= vdso-wrap.o
>  obj-$(CONFIG_COMPAT_VDSO)		+= vdso32-wrap.o
> diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
> new file mode 100644
> index 000000000000..9866d2ca0faa
> --- /dev/null
> +++ b/arch/arm64/kernel/mpam.c
> @@ -0,0 +1,13 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (C) 2025 Arm Ltd. */
> +
> +#include <asm/mpam.h>
> +
> +#include <linux/jump_label.h>
> +#include <linux/percpu.h>
> +
> +DEFINE_STATIC_KEY_FALSE(mpam_enabled);
> +DEFINE_PER_CPU(u64, arm64_mpam_default);
> +DEFINE_PER_CPU(u64, arm64_mpam_current);
> +
> +u64 arm64_mpam_global_default;
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index fba7ca102a8c..b510c0699313 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -51,6 +51,7 @@
>  #include <asm/fpsimd.h>
>  #include <asm/gcs.h>
>  #include <asm/mmu_context.h>
> +#include <asm/mpam.h>
>  #include <asm/mte.h>
>  #include <asm/processor.h>
>  #include <asm/pointer_auth.h>
> @@ -737,6 +738,12 @@ struct task_struct *__switch_to(struct task_struct *prev,
>  	if (prev->thread.sctlr_user != next->thread.sctlr_user)
>  		update_sctlr_el1(next->thread.sctlr_user);
>  
> +	/*
> +	 * MPAM thread switch happens after the DSB to ensure prev's accesses
> +	 * use prev's MPAM settings.
> +	 */
> +	mpam_thread_switch(next);
> +
>  	/* the actual thread switch */
>  	last = cpu_switch_to(prev, next);
>  
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 0b5b158e1aaf..2996ad93fc3e 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -29,8 +29,6 @@
>  
>  #include "mpam_internal.h"
>  
> -DEFINE_STATIC_KEY_FALSE(mpam_enabled); /* This moves to arch code */
> -
>  /*
>   * mpam_list_lock protects the SRCU lists when writing. Once the
>   * mpam_enabled key is enabled these lists are read-only,
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index e79c3c47259c..4508a6654fe0 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -17,6 +17,8 @@
>  #include <linux/srcu.h>
>  #include <linux/types.h>
>  
> +#include <asm/mpam.h>
> +
>  #define MPAM_MSC_MAX_NUM_RIS	16
>  
>  struct platform_device;

The DECLARE_STATIC_KEY_FALSE(mpam_enabled) on the next line can now be
removed.

Thanks,

Ben
Re: [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers
Posted by Fenghua Yu 2 months ago
Hi, James,

On 12/5/25 13:58, James Morse wrote:
> MPAM allows traffic in the SoC to be labeled by the OS, these labels
> are used to apply policy in caches and bandwidth regulators, and to
> monitor traffic in the SoC. The label is made up of a PARTID and PMG
> value. The x86 equivalent calls these CLOSID and RMID, but they don't
> map precisely.
> 
> MPAM has two CPU system registers that is used to hold the PARTID and PMG
> values that traffic generated at each exception level will use. These can be
> set per-task by the resctrl file system. (resctrl is the defacto interface
> for controlling this stuff).
> 
> Add a helper to switch this.
> 
> struct task_struct's separate CLOSID and RMID fields are insufficient
> to implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID)
> and PMG (sort of like the RMID) separately. On x86, the rmid is an
> independent number, so a race that writes a mismatched  closid and rmid
> into hardware is benign. On arm64, the pmg bits extend the partid.
> (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).
> In this case, mismatching the values will 'dirty' a pmg value that
> resctrl believes is clean, and is not tracking with its 'limbo' code.
> 
> To avoid this, the partid and pmg are always read and written as a pair.
> Instead of making struct task_struct's closid and rmid fields an
> endian-unsafe union, add the value to struct thread_info and always use
> READ_ONCE()/WRITE_ONCE() when accessing this field.
> 
> Resctrl allows a per-cpu 'default' value to be set, this overrides the
> values when scheduling a task in the default control-group, which has
> PARTID 0. The way 'code data prioritisation' gets emulated means the
> register value for the default group needs to be a variable.
> 
> The current system register value is kept in a per-cpu variable to
> avoid writing to the system register if the value isn't going to change.
> Writes to this register may reset the hardware state for regulating
> bandwidth.
> 
> Finally, there is no reason to context switch these registers unless
> there is a driver changing the values in struct task_struct. Hide
> the whole thing behind a static key. This also allows the driver to
> disable MPAM in response to errors reported by hardware. Move the
> existing static key to belong to the arch code, as in the future
> the MPAM driver may become a loadable module.
> 
> All this should depend on whether there is an MPAM driver, hide
> it behind CONFIG_MPAM.
> 
> CC: Amit Singh Tomar <amitsinght@marvell.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>   arch/arm64/Kconfig                   |  2 +
>   arch/arm64/include/asm/mpam.h        | 74 ++++++++++++++++++++++++++++
>   arch/arm64/include/asm/thread_info.h |  3 ++
>   arch/arm64/kernel/Makefile           |  1 +
>   arch/arm64/kernel/mpam.c             | 13 +++++
>   arch/arm64/kernel/process.c          |  7 +++
>   drivers/resctrl/mpam_devices.c       |  2 -
>   drivers/resctrl/mpam_internal.h      |  2 +
>   8 files changed, 102 insertions(+), 2 deletions(-)
>   create mode 100644 arch/arm64/include/asm/mpam.h
>   create mode 100644 arch/arm64/kernel/mpam.c
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 004d58cfbff8..558baa9e7c08 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2048,6 +2048,8 @@ config ARM64_MPAM
>   
>   	  MPAM is exposed to user-space via the resctrl pseudo filesystem.
>   
> +	  This option enables the extra context switch code.
> +
>   endmenu # "ARMv8.4 architectural features"
>   
>   menu "ARMv8.5 architectural features"
> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
> new file mode 100644
> index 000000000000..86a55176f884
> --- /dev/null
> +++ b/arch/arm64/include/asm/mpam.h
> @@ -0,0 +1,74 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (C) 2025 Arm Ltd. */
> +
> +#ifndef __ASM__MPAM_H
> +#define __ASM__MPAM_H
> +
> +#include <linux/bitops.h>
> +#include <linux/init.h>
> +#include <linux/jump_label.h>
> +#include <linux/percpu.h>
> +#include <linux/sched.h>
> +
> +#include <asm/cpucaps.h>
> +#include <asm/cpufeature.h>
> +#include <asm/sysreg.h>
> +
> +DECLARE_STATIC_KEY_FALSE(mpam_enabled);
> +DECLARE_PER_CPU(u64, arm64_mpam_default);
> +DECLARE_PER_CPU(u64, arm64_mpam_current);
> +
> +/*
> + * The value of the MPAM0_EL1 sysreg when a task is in resctrl's default group.
> + * This is used by the context switch code to use the resctrl CPU property
> + * instead. The value is modified when CDP is enabled/disabled by mounting
> + * the resctrl filesystem.
> + */
> +extern u64 arm64_mpam_global_default;
> +
> +/*
> + * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
> + * which may race with reads in __mpam_sched_in(). Ensure only one of the old
> + * or new values are used. Particular care should be taken with the pmg field
> + * as __mpam_sched_in() may read a partid and pmg that don't match, causing
> + * this value to be stored with cache allocations, despite being considered
> + * 'free' by resctrl.
> + *
> + * A value in struct thread_info is used instead of struct task_struct as the
> + * cpu's u64 register format is used, but struct task_struct has two u32'.
> + */
> +static inline u64 mpam_get_regval(struct task_struct *tsk)
> +{
> +#ifdef CONFIG_ARM64_MPAM
> +	return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
> +#else
> +	return 0;
> +#endif
> +}
> +
> +static inline void mpam_thread_switch(struct task_struct *tsk)
> +{
> +	u64 oldregval;
> +	int cpu = smp_processor_id();
> +	u64 regval = mpam_get_regval(tsk);
> +
> +	if (!IS_ENABLED(CONFIG_ARM64_MPAM) ||
> +	    !static_branch_likely(&mpam_enabled))
> +		return;
> +
> +	if (regval == READ_ONCE(arm64_mpam_global_default))
Why is this check needed? We need to read arm64_mpam_default in any 
case, right?> +		regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
> +
> +	oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
> +	if (oldregval == regval)
> +		return;
> +
> +	write_sysreg_s(regval, SYS_MPAM1_EL1);
> +	isb();
> +
> +	/* Synchronising the EL0 write is left until the ERET to EL0 */
> +	write_sysreg_s(regval, SYS_MPAM0_EL1);
> +
> +	WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
> +}
> +#endif /* __ASM__MPAM_H */

[SNIP]
Thanks.

-Fenghua
Re: [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers
Posted by Ben Horgan 2 months ago
Hi Fenghua,

On 12/5/25 23:53, Fenghua Yu wrote:
> Hi, James,
> 
> On 12/5/25 13:58, James Morse wrote:
[...]
>>
>> Resctrl allows a per-cpu 'default' value to be set, this overrides the
>> values when scheduling a task in the default control-group, which has
>> PARTID 0. The way 'code data prioritisation' gets emulated means the
>> register value for the default group needs to be a variable.
>>
[...]
>> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/
>> mpam.h
>> new file mode 100644
>> index 000000000000..86a55176f884
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/mpam.h
[...]
>> +/*
>> + * The value of the MPAM0_EL1 sysreg when a task is in resctrl's
>> default group.
>> + * This is used by the context switch code to use the resctrl CPU
>> property
>> + * instead. The value is modified when CDP is enabled/disabled by
>> mounting
>> + * the resctrl filesystem.
>> + */
>> +extern u64 arm64_mpam_global_default;
>> +
>> +/*
>> + * The resctrl filesystem writes to the partid/pmg values for threads
>> and CPUs,
>> + * which may race with reads in __mpam_sched_in(). Ensure only one of
>> the old
>> + * or new values are used. Particular care should be taken with the
>> pmg field
>> + * as __mpam_sched_in() may read a partid and pmg that don't match,
>> causing
>> + * this value to be stored with cache allocations, despite being
>> considered
>> + * 'free' by resctrl.
>> + *
>> + * A value in struct thread_info is used instead of struct
>> task_struct as the
>> + * cpu's u64 register format is used, but struct task_struct has two
>> u32'.
>> + */
>> +static inline u64 mpam_get_regval(struct task_struct *tsk)
>> +{
>> +#ifdef CONFIG_ARM64_MPAM
>> +    return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
>> +#else
>> +    return 0;
>> +#endif
>> +}
>> +
>> +static inline void mpam_thread_switch(struct task_struct *tsk)
>> +{
>> +    u64 oldregval;
>> +    int cpu = smp_processor_id();
>> +    u64 regval = mpam_get_regval(tsk);
>> +
>> +    if (!IS_ENABLED(CONFIG_ARM64_MPAM) ||
>> +        !static_branch_likely(&mpam_enabled))
>> +        return;
>> +
>> +    if (regval == READ_ONCE(arm64_mpam_global_default))
> Why is this check needed? We need to read arm64_mpam_default in any
> case, right?

If a task is in the default resctrl group then the per-cpu resctrl
configuration determines the mpam configuration. The way this is dealt
with here is that for a task in the default group mpam_partid_pmg is set
to arm64_mpam_global_default. When mpam_partid_pmg is
arm64_mpam_global_default we know the task is in the default group,
otherwise it would have a different partid, and so we consider the
per-cpu configuration, arm64_mpam_default. When the task is not in the
default resctrl group then we can just consider the per-task configuration.

As mentioned in the commit message, this is complicated by code data
prioritisation (cdp) and the value of arm64_mpam_global_default, depends
on whether this is enabled or not. See resctrl_arch_set_cdp_enabled()
added in a later patch. I'm not sure of a way to make this clearer in
the code while keeping the arch and drivers/resctrl patches separate.

> +        regval = READ_ONCE(per_cpu(arm64_mpam_default,
> cpu));
>> +
>> +    oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
>> +    if (oldregval == regval)
>> +        return;
>> +
>> +    write_sysreg_s(regval, SYS_MPAM1_EL1);
>> +    isb();
>> +
>> +    /* Synchronising the EL0 write is left until the ERET to EL0 */
>> +    write_sysreg_s(regval, SYS_MPAM0_EL1);
>> +
>> +    WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
>> +}
>> +#endif /* __ASM__MPAM_H */
> 
> [SNIP]
> Thanks.
> 
> -Fenghua
> 
Thanks,

Ben