[PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers

Ben Horgan posted 47 patches 4 weeks ago
There is a newer version of this series
[PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers
Posted by Ben Horgan 4 weeks ago
From: James Morse <james.morse@arm.com>

MPAM allows traffic in the SoC to be labeled by the OS, these labels are
used to apply policy in caches and bandwidth regulators, and to monitor
traffic in the SoC. The label is made up of a PARTID and PMG value. The x86
equivalent calls these CLOSID and RMID, but they don't map precisely.

MPAM has two CPU system registers that is used to hold the PARTID and PMG
values that traffic generated at each exception level will use. These can
be set per-task by the resctrl file system. (resctrl is the defacto
interface for controlling this stuff).

Add a helper to switch this.

struct task_struct's separate CLOSID and RMID fields are insufficient to
implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID) and
PMG (sort of like the RMID) separately. On x86, the rmid is an independent
number, so a race that writes a mismatched closid and rmid into hardware is
benign. On arm64, the pmg bits extend the partid.
(i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).  In
this case, mismatching the values will 'dirty' a pmg value that resctrl
believes is clean, and is not tracking with its 'limbo' code.

To avoid this, the partid and pmg are always read and written as a
pair. This requires a new u64 field. In struct task_struct there are two
u32, rmid and closid for the x86 case, but as we can't use them here do
something else. Add this new field, mpam_partid_pmg, to struct thread_info
to avoid adding more architecture specific code to struct task_struct.
Always use READ_ONCE()/WRITE_ONCE() when accessing this field.

Resctrl allows a per-cpu 'default' value to be set, this overrides the
values when scheduling a task in the default control-group, which has
PARTID 0. The way 'code data prioritisation' gets emulated means the
register value for the default group needs to be a variable.

The current system register value is kept in a per-cpu variable to avoid
writing to the system register if the value isn't going to change.  Writes
to this register may reset the hardware state for regulating bandwidth.

Finally, there is no reason to context switch these registers unless there
is a driver changing the values in struct task_struct. Hide the whole thing
behind a static key. This also allows the driver to disable MPAM in
response to errors reported by hardware. Move the existing static key to
belong to the arch code, as in the future the MPAM driver may become a
loadable module.

All this should depend on whether there is an MPAM driver, hide it behind
CONFIG_ARM64_MPAM.

CC: Amit Singh Tomar <amitsinght@marvell.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since rfc:
CONFIG_MPAM -> CONFIG_ARM64_MPAM in commit message
Remove extra DECLARE_STATIC_KEY_FALSE
Function name in comment, __mpam_sched_in() -> mpam_thread_switch()
Remove unused headers
Expand comment (Jonathan)

Changes since v2:
Tidy up ifdefs
---
 arch/arm64/Kconfig                   |  2 +
 arch/arm64/include/asm/mpam.h        | 67 ++++++++++++++++++++++++++++
 arch/arm64/include/asm/thread_info.h |  3 ++
 arch/arm64/kernel/Makefile           |  1 +
 arch/arm64/kernel/mpam.c             | 13 ++++++
 arch/arm64/kernel/process.c          |  7 +++
 drivers/resctrl/mpam_devices.c       |  2 -
 drivers/resctrl/mpam_internal.h      |  4 +-
 8 files changed, 95 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm64/include/asm/mpam.h
 create mode 100644 arch/arm64/kernel/mpam.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 93173f0a09c7..cdcc5b76a110 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2049,6 +2049,8 @@ config ARM64_MPAM
 
 	  MPAM is exposed to user-space via the resctrl pseudo filesystem.
 
+	  This option enables the extra context switch code.
+
 endmenu # "ARMv8.4 architectural features"
 
 menu "ARMv8.5 architectural features"
diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
new file mode 100644
index 000000000000..14011e5970ce
--- /dev/null
+++ b/arch/arm64/include/asm/mpam.h
@@ -0,0 +1,67 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2025 Arm Ltd. */
+
+#ifndef __ASM__MPAM_H
+#define __ASM__MPAM_H
+
+#include <linux/jump_label.h>
+#include <linux/percpu.h>
+#include <linux/sched.h>
+
+#include <asm/sysreg.h>
+
+DECLARE_STATIC_KEY_FALSE(mpam_enabled);
+DECLARE_PER_CPU(u64, arm64_mpam_default);
+DECLARE_PER_CPU(u64, arm64_mpam_current);
+
+/*
+ * The value of the MPAM0_EL1 sysreg when a task is in resctrl's default group.
+ * This is used by the context switch code to use the resctrl CPU property
+ * instead. The value is modified when CDP is enabled/disabled by mounting
+ * the resctrl filesystem.
+ */
+extern u64 arm64_mpam_global_default;
+
+/*
+ * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
+ * which may race with reads in mpam_thread_switch(). Ensure only one of the old
+ * or new values are used. Particular care should be taken with the pmg field as
+ * mpam_thread_switch() may read a partid and pmg that don't match, causing this
+ * value to be stored with cache allocations, despite being considered 'free' by
+ * resctrl.
+ */
+#ifdef CONFIG_ARM64_MPAM
+static inline u64 mpam_get_regval(struct task_struct *tsk)
+{
+	return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
+}
+
+static inline void mpam_thread_switch(struct task_struct *tsk)
+{
+	u64 oldregval;
+	int cpu = smp_processor_id();
+	u64 regval = mpam_get_regval(tsk);
+
+	if (!static_branch_likely(&mpam_enabled))
+		return;
+
+	if (regval == READ_ONCE(arm64_mpam_global_default))
+		regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
+
+	oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
+	if (oldregval == regval)
+		return;
+
+	write_sysreg_s(regval, SYS_MPAM1_EL1);
+	isb();
+
+	/* Synchronising the EL0 write is left until the ERET to EL0 */
+	write_sysreg_s(regval, SYS_MPAM0_EL1);
+
+	WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
+}
+#else
+static inline void mpam_thread_switch(struct task_struct *tsk) {}
+#endif /* CONFIG_ARM64_MPAM */
+
+#endif /* __ASM__MPAM_H */
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index a803b887b0b4..fc801a26ff9e 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -41,6 +41,9 @@ struct thread_info {
 #ifdef CONFIG_SHADOW_CALL_STACK
 	void			*scs_base;
 	void			*scs_sp;
+#endif
+#ifdef CONFIG_ARM64_MPAM
+	u64			mpam_partid_pmg;
 #endif
 	u32			cpu;
 };
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 76f32e424065..15979f366519 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -67,6 +67,7 @@ obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
 obj-$(CONFIG_VMCORE_INFO)		+= vmcore_info.o
 obj-$(CONFIG_ARM_SDE_INTERFACE)		+= sdei.o
 obj-$(CONFIG_ARM64_PTR_AUTH)		+= pointer_auth.o
+obj-$(CONFIG_ARM64_MPAM)		+= mpam.o
 obj-$(CONFIG_ARM64_MTE)			+= mte.o
 obj-y					+= vdso-wrap.o
 obj-$(CONFIG_COMPAT_VDSO)		+= vdso32-wrap.o
diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
new file mode 100644
index 000000000000..9866d2ca0faa
--- /dev/null
+++ b/arch/arm64/kernel/mpam.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2025 Arm Ltd. */
+
+#include <asm/mpam.h>
+
+#include <linux/jump_label.h>
+#include <linux/percpu.h>
+
+DEFINE_STATIC_KEY_FALSE(mpam_enabled);
+DEFINE_PER_CPU(u64, arm64_mpam_default);
+DEFINE_PER_CPU(u64, arm64_mpam_current);
+
+u64 arm64_mpam_global_default;
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 489554931231..47698955fa1e 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -51,6 +51,7 @@
 #include <asm/fpsimd.h>
 #include <asm/gcs.h>
 #include <asm/mmu_context.h>
+#include <asm/mpam.h>
 #include <asm/mte.h>
 #include <asm/processor.h>
 #include <asm/pointer_auth.h>
@@ -738,6 +739,12 @@ struct task_struct *__switch_to(struct task_struct *prev,
 	if (prev->thread.sctlr_user != next->thread.sctlr_user)
 		update_sctlr_el1(next->thread.sctlr_user);
 
+	/*
+	 * MPAM thread switch happens after the DSB to ensure prev's accesses
+	 * use prev's MPAM settings.
+	 */
+	mpam_thread_switch(next);
+
 	/* the actual thread switch */
 	last = cpu_switch_to(prev, next);
 
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index b495d5291868..860181266b15 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -29,8 +29,6 @@
 
 #include "mpam_internal.h"
 
-DEFINE_STATIC_KEY_FALSE(mpam_enabled); /* This moves to arch code */
-
 /*
  * mpam_list_lock protects the SRCU lists when writing. Once the
  * mpam_enabled key is enabled these lists are read-only,
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index e8971842b124..4632985bcca6 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -16,12 +16,12 @@
 #include <linux/srcu.h>
 #include <linux/types.h>
 
+#include <asm/mpam.h>
+
 #define MPAM_MSC_MAX_NUM_RIS	16
 
 struct platform_device;
 
-DECLARE_STATIC_KEY_FALSE(mpam_enabled);
-
 #ifdef CONFIG_MPAM_KUNIT_TEST
 #define PACKED_FOR_KUNIT __packed
 #else
-- 
2.43.0
Re: [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers
Posted by Catalin Marinas 3 weeks, 3 days ago
On Mon, Jan 12, 2026 at 04:58:33PM +0000, Ben Horgan wrote:
>  menu "ARMv8.5 architectural features"
> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
> new file mode 100644
> index 000000000000..14011e5970ce
> --- /dev/null
> +++ b/arch/arm64/include/asm/mpam.h
> @@ -0,0 +1,67 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (C) 2025 Arm Ltd. */
> +
> +#ifndef __ASM__MPAM_H
> +#define __ASM__MPAM_H
> +
> +#include <linux/jump_label.h>
> +#include <linux/percpu.h>
> +#include <linux/sched.h>
> +
> +#include <asm/sysreg.h>
> +
> +DECLARE_STATIC_KEY_FALSE(mpam_enabled);
> +DECLARE_PER_CPU(u64, arm64_mpam_default);
> +DECLARE_PER_CPU(u64, arm64_mpam_current);
> +
> +/*
> + * The value of the MPAM0_EL1 sysreg when a task is in resctrl's default group.
> + * This is used by the context switch code to use the resctrl CPU property
> + * instead. The value is modified when CDP is enabled/disabled by mounting
> + * the resctrl filesystem.
> + */
> +extern u64 arm64_mpam_global_default;
> +
> +/*
> + * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
> + * which may race with reads in mpam_thread_switch(). Ensure only one of the old
> + * or new values are used. Particular care should be taken with the pmg field as
> + * mpam_thread_switch() may read a partid and pmg that don't match, causing this
> + * value to be stored with cache allocations, despite being considered 'free' by
> + * resctrl.
> + */
> +#ifdef CONFIG_ARM64_MPAM
> +static inline u64 mpam_get_regval(struct task_struct *tsk)
> +{
> +	return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
> +}
> +
> +static inline void mpam_thread_switch(struct task_struct *tsk)
> +{
> +	u64 oldregval;
> +	int cpu = smp_processor_id();
> +	u64 regval = mpam_get_regval(tsk);
> +
> +	if (!static_branch_likely(&mpam_enabled))
> +		return;
> +
> +	if (regval == READ_ONCE(arm64_mpam_global_default))
> +		regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
> +
> +	oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
> +	if (oldregval == regval)
> +		return;
> +
> +	write_sysreg_s(regval, SYS_MPAM1_EL1);
> +	isb();
> +
> +	/* Synchronising the EL0 write is left until the ERET to EL0 */
> +	write_sysreg_s(regval, SYS_MPAM0_EL1);

Since we have an isb() already, does it make any difference if we write
MPAM0 before the barrier? Similar question for other places where we
write these two registers.

At some point, we should go through __switch_to() and coalesce the isbs
into fewer as we keep accumulating them (e.g. all those switch function
setting some sync variable if needed).

> +
> +	WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);

Is it too expensive to read the MPAM sysregs and avoid carrying around
another per-CPU state? You use it for pm restoring but we could just
save it in cpu_do_suspend() like other sysregs. Not a big issue, it just
feels like this function got unnecessarily complicated (it took me a bit
to figure out what it all does).

A related question - is resctrl_arch_set_cdp_enabled() always called in
non-preemptible contexts? We potentially have a race between setting
current->mpam_partid_msg and arm64_mpam_global_default, so the check in
mpam_thread_switch() can get confused.

And I couldn't figure out where the MPAMx_EL1 registers are written. If
any global/per-cpu/per-task value is changed, does the kernel wait until
the next thread switch to write the sysreg? The only places I can found
touching these sysregs are the thread switch, pm notifiers and KVM.

-- 
Catalin
Re: [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers
Posted by Ben Horgan 3 weeks ago
Hi Catalin,

On 1/15/26 17:58, Catalin Marinas wrote:
> On Mon, Jan 12, 2026 at 04:58:33PM +0000, Ben Horgan wrote:
>>  menu "ARMv8.5 architectural features"
>> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
>> new file mode 100644
>> index 000000000000..14011e5970ce
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/mpam.h
>> @@ -0,0 +1,67 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/* Copyright (C) 2025 Arm Ltd. */
>> +
>> +#ifndef __ASM__MPAM_H
>> +#define __ASM__MPAM_H
>> +
>> +#include <linux/jump_label.h>
>> +#include <linux/percpu.h>
>> +#include <linux/sched.h>
>> +
>> +#include <asm/sysreg.h>
>> +
>> +DECLARE_STATIC_KEY_FALSE(mpam_enabled);
>> +DECLARE_PER_CPU(u64, arm64_mpam_default);
>> +DECLARE_PER_CPU(u64, arm64_mpam_current);
>> +
>> +/*
>> + * The value of the MPAM0_EL1 sysreg when a task is in resctrl's default group.
>> + * This is used by the context switch code to use the resctrl CPU property
>> + * instead. The value is modified when CDP is enabled/disabled by mounting
>> + * the resctrl filesystem.
>> + */
>> +extern u64 arm64_mpam_global_default;
>> +
>> +/*
>> + * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
>> + * which may race with reads in mpam_thread_switch(). Ensure only one of the old
>> + * or new values are used. Particular care should be taken with the pmg field as
>> + * mpam_thread_switch() may read a partid and pmg that don't match, causing this
>> + * value to be stored with cache allocations, despite being considered 'free' by
>> + * resctrl.
>> + */
>> +#ifdef CONFIG_ARM64_MPAM
>> +static inline u64 mpam_get_regval(struct task_struct *tsk)
>> +{
>> +	return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
>> +}
>> +
>> +static inline void mpam_thread_switch(struct task_struct *tsk)
>> +{
>> +	u64 oldregval;
>> +	int cpu = smp_processor_id();
>> +	u64 regval = mpam_get_regval(tsk);
>> +
>> +	if (!static_branch_likely(&mpam_enabled))
>> +		return;
>> +
>> +	if (regval == READ_ONCE(arm64_mpam_global_default))
>> +		regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
>> +
>> +	oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
>> +	if (oldregval == regval)
>> +		return;
>> +
>> +	write_sysreg_s(regval, SYS_MPAM1_EL1);
>> +	isb();
>> +
>> +	/* Synchronising the EL0 write is left until the ERET to EL0 */
>> +	write_sysreg_s(regval, SYS_MPAM0_EL1);
> 
> Since we have an isb() already, does it make any difference if we write
> MPAM0 before the barrier? Similar question for other places where we
> write these two registers.

The reason for the isb() placement is to document that it's not required
for the MPAM0_EL1. All instructions running at EL1take their MPAM
configuration from MPAM1_EL1. This includes LDTR and STTR as you asked
about in a different thread.

> 
> At some point, we should go through __switch_to() and coalesce the isbs
> into fewer as we keep accumulating them (e.g. all those switch function
> setting some sync variable if needed).
> 
>> +
>> +	WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
> 
> Is it too expensive to read the MPAM sysregs and avoid carrying around
> another per-CPU state? You use it for pm restoring but we could just
> save it in cpu_do_suspend() like other sysregs. Not a big issue, it just
> feels like this function got unnecessarily complicated (it took me a bit
> to figure out what it all does).

It's done this way as it matches what's done in x86. I expect it would
be cheap enough to read the register to check whether a write is required.

> 
> A related question - is resctrl_arch_set_cdp_enabled() always called in
> non-preemptible contexts? We potentially have a race between setting
> current->mpam_partid_msg and arm64_mpam_global_default, so the check in
> mpam_thread_switch() can get confused.

The resctrl filesystem can only ever get mounted once and
resctrl_arch_set_cdp_enabled() is always called in mount. In
rdt_get_tree(), rdt_enable_ctx() calls resctrl_arch_set_cdp_enabled().
This is done while holding the rdtgroup_mutex and the user can't change
the mpam configuration from the default until the mutex is released and
it can claim it.

> 
> And I couldn't figure out where the MPAMx_EL1 registers are written. If
> any global/per-cpu/per-task value is changed, does the kernel wait until
> the next thread switch to write the sysreg? The only places I can found
> touching these sysregs are the thread switch, pm notifiers and KVM.
> 

If the task configuration is changed then the MPAM sysreg will only be
updated the next time it goes into the kernel. So, just eventually
consistent. For cpu configuration, update_closid_rmid() gets called
which ipis the cpus and ends up calling mpam_thread_switch() from
resctrl_arch_sched_in().

Thanks,

Ben
Re: [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers
Posted by Catalin Marinas 2 weeks, 3 days ago
Hi Ben,

On Mon, Jan 19, 2026 at 12:23:13PM +0000, Ben Horgan wrote:
> On 1/15/26 17:58, Catalin Marinas wrote:
> > On Mon, Jan 12, 2026 at 04:58:33PM +0000, Ben Horgan wrote:
> >> +static inline void mpam_thread_switch(struct task_struct *tsk)
> >> +{
> >> +	u64 oldregval;
> >> +	int cpu = smp_processor_id();
> >> +	u64 regval = mpam_get_regval(tsk);
> >> +
> >> +	if (!static_branch_likely(&mpam_enabled))
> >> +		return;
> >> +
> >> +	if (regval == READ_ONCE(arm64_mpam_global_default))
> >> +		regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
> >> +
> >> +	oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
> >> +	if (oldregval == regval)
> >> +		return;
> >> +
> >> +	write_sysreg_s(regval, SYS_MPAM1_EL1);
> >> +	isb();
> >> +
> >> +	/* Synchronising the EL0 write is left until the ERET to EL0 */
> >> +	write_sysreg_s(regval, SYS_MPAM0_EL1);
> > 
> > Since we have an isb() already, does it make any difference if we write
> > MPAM0 before the barrier? Similar question for other places where we
> > write these two registers.
> 
> The reason for the isb() placement is to document that it's not required
> for the MPAM0_EL1. All instructions running at EL1take their MPAM
> configuration from MPAM1_EL1. This includes LDTR and STTR as you asked
> about in a different thread.

It's fine to keep it this way if LDTR/STTR are not affected by the MPAM0
register.

> >> +
> >> +	WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
> > 
> > Is it too expensive to read the MPAM sysregs and avoid carrying around
> > another per-CPU state? You use it for pm restoring but we could just
> > save it in cpu_do_suspend() like other sysregs. Not a big issue, it just
> > feels like this function got unnecessarily complicated (it took me a bit
> > to figure out what it all does).
> 
> It's done this way as it matches what's done in x86. I expect it would
> be cheap enough to read the register to check whether a write is required.

Since you use it for CPU suspend/resume, I guess it's fine, it saves us
from having to preserve it in the low-level asm sleep code. I don't have
a strong preference either way.

> > A related question - is resctrl_arch_set_cdp_enabled() always called in
> > non-preemptible contexts? We potentially have a race between setting
> > current->mpam_partid_msg and arm64_mpam_global_default, so the check in
> > mpam_thread_switch() can get confused.
> 
> The resctrl filesystem can only ever get mounted once and
> resctrl_arch_set_cdp_enabled() is always called in mount. In
> rdt_get_tree(), rdt_enable_ctx() calls resctrl_arch_set_cdp_enabled().
> This is done while holding the rdtgroup_mutex and the user can't change
> the mpam configuration from the default until the mutex is released and
> it can claim it.

What if resctrl_arch_set_cdp_enabled() gets preempted between updating
current task partid and setting arm64_mpam_global_default? The mutex
doesn't help.

> > And I couldn't figure out where the MPAMx_EL1 registers are written. If
> > any global/per-cpu/per-task value is changed, does the kernel wait until
> > the next thread switch to write the sysreg? The only places I can found
> > touching these sysregs are the thread switch, pm notifiers and KVM.
> 
> If the task configuration is changed then the MPAM sysreg will only be
> updated the next time it goes into the kernel.

Is it updated when it goes into the kernel or when we have a context
switch? If you have a long-running user thread that is never scheduled
out, it may not notice the update even if it entered the kernel (but no
context switch).

> So, just eventually
> consistent. For cpu configuration, update_closid_rmid() gets called
> which ipis the cpus and ends up calling mpam_thread_switch() from
> resctrl_arch_sched_in().

I see, it should be fine.

-- 
Catalin
Re: [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers
Posted by Ben Horgan 2 weeks ago
Hi Catalin,

On 1/23/26 14:29, Catalin Marinas wrote:
> Hi Ben,
> 
> On Mon, Jan 19, 2026 at 12:23:13PM +0000, Ben Horgan wrote:
>> On 1/15/26 17:58, Catalin Marinas wrote:
>>> On Mon, Jan 12, 2026 at 04:58:33PM +0000, Ben Horgan wrote:
>>>> +static inline void mpam_thread_switch(struct task_struct *tsk)
>>>> +{
>>>> +	u64 oldregval;
>>>> +	int cpu = smp_processor_id();
>>>> +	u64 regval = mpam_get_regval(tsk);
>>>> +
>>>> +	if (!static_branch_likely(&mpam_enabled))
>>>> +		return;
>>>> +
>>>> +	if (regval == READ_ONCE(arm64_mpam_global_default))
>>>> +		regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
>>>> +
>>>> +	oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
>>>> +	if (oldregval == regval)
>>>> +		return;
>>>> +
>>>> +	write_sysreg_s(regval, SYS_MPAM1_EL1);
>>>> +	isb();
>>>> +
>>>> +	/* Synchronising the EL0 write is left until the ERET to EL0 */
>>>> +	write_sysreg_s(regval, SYS_MPAM0_EL1);
>>>
>>> Since we have an isb() already, does it make any difference if we write
>>> MPAM0 before the barrier? Similar question for other places where we
>>> write these two registers.
>>
>> The reason for the isb() placement is to document that it's not required
>> for the MPAM0_EL1. All instructions running at EL1take their MPAM
>> configuration from MPAM1_EL1. This includes LDTR and STTR as you asked
>> about in a different thread.
> 
> It's fine to keep it this way if LDTR/STTR are not affected by the MPAM0
> register.
> 
>>>> +
>>>> +	WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
>>>
>>> Is it too expensive to read the MPAM sysregs and avoid carrying around
>>> another per-CPU state? You use it for pm restoring but we could just
>>> save it in cpu_do_suspend() like other sysregs. Not a big issue, it just
>>> feels like this function got unnecessarily complicated (it took me a bit
>>> to figure out what it all does).
>>
>> It's done this way as it matches what's done in x86. I expect it would
>> be cheap enough to read the register to check whether a write is required.
> 
> Since you use it for CPU suspend/resume, I guess it's fine, it saves us
> from having to preserve it in the low-level asm sleep code. I don't have
> a strong preference either way.
> 
>>> A related question - is resctrl_arch_set_cdp_enabled() always called in
>>> non-preemptible contexts? We potentially have a race between setting
>>> current->mpam_partid_msg and arm64_mpam_global_default, so the check in
>>> mpam_thread_switch() can get confused.
>>
>> The resctrl filesystem can only ever get mounted once and
>> resctrl_arch_set_cdp_enabled() is always called in mount. In
>> rdt_get_tree(), rdt_enable_ctx() calls resctrl_arch_set_cdp_enabled().
>> This is done while holding the rdtgroup_mutex and the user can't change
>> the mpam configuration from the default until the mutex is released and
>> it can claim it.
> 
> What if resctrl_arch_set_cdp_enabled() gets preempted between updating
> current task partid and setting arm64_mpam_global_default? The mutex
> doesn't help.

I see, I had misinterpreted your question.

When looking into this I spotted that the per-cpu default,
arm64_mpam_default, is not updated on the cdp enable and so the default
tasks carry on running with the non-cdp default pmg/partid configuration.

On the race itself, if current->mpam_partid_pmg is set but not
arm64_mpam_global_default then if the current task is preempted at that
point there is a discrepancy. When the task gets context switched back
in then the comparison in mpam_thread_switch() will be false when true
would be expected.

In order to update arm64_mpam_default and make sure the mpam variables
and registers for the online cpus are in the right state by the end of
resctrl_arch_set_cdp_enabled() I'll add a per cpu call to
resctrl_arch_sync_cpu_closid_rmid(). I'll also add something in the
resume path so that arm64_mpam_default gets set correctly for the cpus
that were offline.

> 
>>> And I couldn't figure out where the MPAMx_EL1 registers are written. If
>>> any global/per-cpu/per-task value is changed, does the kernel wait until
>>> the next thread switch to write the sysreg? The only places I can found
>>> touching these sysregs are the thread switch, pm notifiers and KVM.
>>
>> If the task configuration is changed then the MPAM sysreg will only be
>> updated the next time it goes into the kernel.
> 
> Is it updated when it goes into the kernel or when we have a context
> switch? If you have a long-running user thread that is never scheduled
> out, it may not notice the update even if it entered the kernel (but no
> context switch).

Yes, only on context switch.

> 
>> So, just eventually
>> consistent. For cpu configuration, update_closid_rmid() gets called
>> which ipis the cpus and ends up calling mpam_thread_switch() from
>> resctrl_arch_sched_in().
> 
> I see, it should be fine.
> 

Thanks,

Ben
Re: [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers
Posted by Ben Horgan 2 weeks ago

On 1/26/26 14:30, Ben Horgan wrote:
> Hi Catalin,
> 
> On 1/23/26 14:29, Catalin Marinas wrote:
>> Hi Ben,
>>
>> On Mon, Jan 19, 2026 at 12:23:13PM +0000, Ben Horgan wrote:
>>> On 1/15/26 17:58, Catalin Marinas wrote:
>>>> On Mon, Jan 12, 2026 at 04:58:33PM +0000, Ben Horgan wrote:
>>>>> +static inline void mpam_thread_switch(struct task_struct *tsk)
>>>>> +{
>>>>> +	u64 oldregval;
>>>>> +	int cpu = smp_processor_id();
>>>>> +	u64 regval = mpam_get_regval(tsk);
>>>>> +
>>>>> +	if (!static_branch_likely(&mpam_enabled))
>>>>> +		return;
>>>>> +
>>>>> +	if (regval == READ_ONCE(arm64_mpam_global_default))
>>>>> +		regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
>>>>> +
>>>>> +	oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
>>>>> +	if (oldregval == regval)
>>>>> +		return;
>>>>> +
>>>>> +	write_sysreg_s(regval, SYS_MPAM1_EL1);
>>>>> +	isb();
>>>>> +
>>>>> +	/* Synchronising the EL0 write is left until the ERET to EL0 */
>>>>> +	write_sysreg_s(regval, SYS_MPAM0_EL1);
>>>>
>>>> Since we have an isb() already, does it make any difference if we write
>>>> MPAM0 before the barrier? Similar question for other places where we
>>>> write these two registers.
>>>
>>> The reason for the isb() placement is to document that it's not required
>>> for the MPAM0_EL1. All instructions running at EL1take their MPAM
>>> configuration from MPAM1_EL1. This includes LDTR and STTR as you asked
>>> about in a different thread.
>>
>> It's fine to keep it this way if LDTR/STTR are not affected by the MPAM0
>> register.
>>
>>>>> +
>>>>> +	WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
>>>>
>>>> Is it too expensive to read the MPAM sysregs and avoid carrying around
>>>> another per-CPU state? You use it for pm restoring but we could just
>>>> save it in cpu_do_suspend() like other sysregs. Not a big issue, it just
>>>> feels like this function got unnecessarily complicated (it took me a bit
>>>> to figure out what it all does).
>>>
>>> It's done this way as it matches what's done in x86. I expect it would
>>> be cheap enough to read the register to check whether a write is required.
>>
>> Since you use it for CPU suspend/resume, I guess it's fine, it saves us
>> from having to preserve it in the low-level asm sleep code. I don't have
>> a strong preference either way.
>>
>>>> A related question - is resctrl_arch_set_cdp_enabled() always called in
>>>> non-preemptible contexts? We potentially have a race between setting
>>>> current->mpam_partid_msg and arm64_mpam_global_default, so the check in
>>>> mpam_thread_switch() can get confused.
>>>
>>> The resctrl filesystem can only ever get mounted once and
>>> resctrl_arch_set_cdp_enabled() is always called in mount. In
>>> rdt_get_tree(), rdt_enable_ctx() calls resctrl_arch_set_cdp_enabled().
>>> This is done while holding the rdtgroup_mutex and the user can't change
>>> the mpam configuration from the default until the mutex is released and
>>> it can claim it.
>>
>> What if resctrl_arch_set_cdp_enabled() gets preempted between updating
>> current task partid and setting arm64_mpam_global_default? The mutex
>> doesn't help.
> 
> I see, I had misinterpreted your question.
> 
> When looking into this I spotted that the per-cpu default,
> arm64_mpam_default, is not updated on the cdp enable and so the default
> tasks carry on running with the non-cdp default pmg/partid configuration.
> 
> On the race itself, if current->mpam_partid_pmg is set but not
> arm64_mpam_global_default then if the current task is preempted at that
> point there is a discrepancy. When the task gets context switched back
> in then the comparison in mpam_thread_switch() will be false when true
> would be expected.
> 
> In order to update arm64_mpam_default and make sure the mpam variables
> and registers for the online cpus are in the right state by the end of
> resctrl_arch_set_cdp_enabled() I'll add a per cpu call to
> resctrl_arch_sync_cpu_closid_rmid(). I'll also add something in the
> resume path so that arm64_mpam_default gets set correctly for the cpus
> that were offline.

Rather than the resume path I'll just set all the arm64_mpam_default in
resctrl_arch_set_cdp_enabled(). In code:

--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -193,6 +193,7 @@ static void resctrl_reset_task_closids(void)
 int resctrl_arch_set_cdp_enabled(enum resctrl_res_level ignored, bool
enable)
 {
        u32 partid_i = RESCTRL_RESERVED_CLOSID, partid_d =
RESCTRL_RESERVED_CLOSID;
+       int cpu;

        cdp_enabled = enable;

@@ -209,6 +210,10 @@ int resctrl_arch_set_cdp_enabled(enum
resctrl_res_level ignored, bool enable)

        resctrl_reset_task_closids();

+       for_each_possible_cpu(cpu)
+               mpam_set_cpu_defaults(cpu, partid_d, partid_i, 0, 0);
+       on_each_cpu(resctrl_arch_sync_cpu_closid_rmid, NULL, 1);
+
        return 0;
 }

> 
>>
>>>> And I couldn't figure out where the MPAMx_EL1 registers are written. If
>>>> any global/per-cpu/per-task value is changed, does the kernel wait until
>>>> the next thread switch to write the sysreg? The only places I can found
>>>> touching these sysregs are the thread switch, pm notifiers and KVM.
>>>
>>> If the task configuration is changed then the MPAM sysreg will only be
>>> updated the next time it goes into the kernel.
>>
>> Is it updated when it goes into the kernel or when we have a context
>> switch? If you have a long-running user thread that is never scheduled
>> out, it may not notice the update even if it entered the kernel (but no
>> context switch).
> 
> Yes, only on context switch.
> 
>>
>>> So, just eventually
>>> consistent. For cpu configuration, update_closid_rmid() gets called
>>> which ipis the cpus and ends up calling mpam_thread_switch() from
>>> resctrl_arch_sched_in().
>>
>> I see, it should be fine.
>>
> 
> Thanks,
> 
> Ben
> 
> 

Thanks,

Ben
Re: [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers
Posted by Gavin Shan 3 weeks, 4 days ago
Hi Ben,

On 1/13/26 12:58 AM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> MPAM allows traffic in the SoC to be labeled by the OS, these labels are
> used to apply policy in caches and bandwidth regulators, and to monitor
> traffic in the SoC. The label is made up of a PARTID and PMG value. The x86
> equivalent calls these CLOSID and RMID, but they don't map precisely.
> 
> MPAM has two CPU system registers that is used to hold the PARTID and PMG
> values that traffic generated at each exception level will use. These can
> be set per-task by the resctrl file system. (resctrl is the defacto
> interface for controlling this stuff).
> 
> Add a helper to switch this.
> 
> struct task_struct's separate CLOSID and RMID fields are insufficient to
> implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID) and
> PMG (sort of like the RMID) separately. On x86, the rmid is an independent
> number, so a race that writes a mismatched closid and rmid into hardware is
> benign. On arm64, the pmg bits extend the partid.
> (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).  In
> this case, mismatching the values will 'dirty' a pmg value that resctrl
> believes is clean, and is not tracking with its 'limbo' code.
> 
> To avoid this, the partid and pmg are always read and written as a
> pair. This requires a new u64 field. In struct task_struct there are two
> u32, rmid and closid for the x86 case, but as we can't use them here do
> something else. Add this new field, mpam_partid_pmg, to struct thread_info
> to avoid adding more architecture specific code to struct task_struct.
> Always use READ_ONCE()/WRITE_ONCE() when accessing this field.
> 
> Resctrl allows a per-cpu 'default' value to be set, this overrides the
> values when scheduling a task in the default control-group, which has
> PARTID 0. The way 'code data prioritisation' gets emulated means the
> register value for the default group needs to be a variable.
> 
> The current system register value is kept in a per-cpu variable to avoid
> writing to the system register if the value isn't going to change.  Writes
> to this register may reset the hardware state for regulating bandwidth.
> 
> Finally, there is no reason to context switch these registers unless there
> is a driver changing the values in struct task_struct. Hide the whole thing
> behind a static key. This also allows the driver to disable MPAM in
> response to errors reported by hardware. Move the existing static key to
> belong to the arch code, as in the future the MPAM driver may become a
> loadable module.
> 
> All this should depend on whether there is an MPAM driver, hide it behind
> CONFIG_ARM64_MPAM.
> 
> CC: Amit Singh Tomar <amitsinght@marvell.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since rfc:
> CONFIG_MPAM -> CONFIG_ARM64_MPAM in commit message
> Remove extra DECLARE_STATIC_KEY_FALSE
> Function name in comment, __mpam_sched_in() -> mpam_thread_switch()
> Remove unused headers
> Expand comment (Jonathan)
> 
> Changes since v2:
> Tidy up ifdefs
> ---
>   arch/arm64/Kconfig                   |  2 +
>   arch/arm64/include/asm/mpam.h        | 67 ++++++++++++++++++++++++++++
>   arch/arm64/include/asm/thread_info.h |  3 ++
>   arch/arm64/kernel/Makefile           |  1 +
>   arch/arm64/kernel/mpam.c             | 13 ++++++
>   arch/arm64/kernel/process.c          |  7 +++
>   drivers/resctrl/mpam_devices.c       |  2 -
>   drivers/resctrl/mpam_internal.h      |  4 +-
>   8 files changed, 95 insertions(+), 4 deletions(-)
>   create mode 100644 arch/arm64/include/asm/mpam.h
>   create mode 100644 arch/arm64/kernel/mpam.c
> 

With the following nitpick addressed:

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 93173f0a09c7..cdcc5b76a110 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2049,6 +2049,8 @@ config ARM64_MPAM
>   
>   	  MPAM is exposed to user-space via the resctrl pseudo filesystem.
>   
> +	  This option enables the extra context switch code.
> +
>   endmenu # "ARMv8.4 architectural features"
>   
>   menu "ARMv8.5 architectural features"
> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
> new file mode 100644
> index 000000000000..14011e5970ce
> --- /dev/null
> +++ b/arch/arm64/include/asm/mpam.h
> @@ -0,0 +1,67 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (C) 2025 Arm Ltd. */
> +
> +#ifndef __ASM__MPAM_H
> +#define __ASM__MPAM_H
> +
> +#include <linux/jump_label.h>
> +#include <linux/percpu.h>
> +#include <linux/sched.h>
> +
> +#include <asm/sysreg.h>
> +
> +DECLARE_STATIC_KEY_FALSE(mpam_enabled);
> +DECLARE_PER_CPU(u64, arm64_mpam_default);
> +DECLARE_PER_CPU(u64, arm64_mpam_current);
> +
> +/*
> + * The value of the MPAM0_EL1 sysreg when a task is in resctrl's default group.
> + * This is used by the context switch code to use the resctrl CPU property
> + * instead. The value is modified when CDP is enabled/disabled by mounting
> + * the resctrl filesystem.
> + */
> +extern u64 arm64_mpam_global_default;
> +
> +/*
> + * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
> + * which may race with reads in mpam_thread_switch(). Ensure only one of the old
> + * or new values are used. Particular care should be taken with the pmg field as
> + * mpam_thread_switch() may read a partid and pmg that don't match, causing this
> + * value to be stored with cache allocations, despite being considered 'free' by
> + * resctrl.
> + */
> +#ifdef CONFIG_ARM64_MPAM
> +static inline u64 mpam_get_regval(struct task_struct *tsk)
> +{
> +	return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
> +}
> +
> +static inline void mpam_thread_switch(struct task_struct *tsk)
> +{
> +	u64 oldregval;
> +	int cpu = smp_processor_id();
> +	u64 regval = mpam_get_regval(tsk);
> +
> +	if (!static_branch_likely(&mpam_enabled))
> +		return;
> +
> +	if (regval == READ_ONCE(arm64_mpam_global_default))
> +		regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
> +
> +	oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
> +	if (oldregval == regval)
> +		return;
> +
> +	write_sysreg_s(regval, SYS_MPAM1_EL1);
> +	isb();
> +
> +	/* Synchronising the EL0 write is left until the ERET to EL0 */
> +	write_sysreg_s(regval, SYS_MPAM0_EL1);
> +
> +	WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
> +}
> +#else
> +static inline void mpam_thread_switch(struct task_struct *tsk) {}
> +#endif /* CONFIG_ARM64_MPAM */
> +
> +#endif /* __ASM__MPAM_H */
> diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> index a803b887b0b4..fc801a26ff9e 100644
> --- a/arch/arm64/include/asm/thread_info.h
> +++ b/arch/arm64/include/asm/thread_info.h
> @@ -41,6 +41,9 @@ struct thread_info {
>   #ifdef CONFIG_SHADOW_CALL_STACK
>   	void			*scs_base;
>   	void			*scs_sp;
> +#endif
> +#ifdef CONFIG_ARM64_MPAM
> +	u64			mpam_partid_pmg;
>   #endif
>   	u32			cpu;
>   };
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index 76f32e424065..15979f366519 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -67,6 +67,7 @@ obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
>   obj-$(CONFIG_VMCORE_INFO)		+= vmcore_info.o
>   obj-$(CONFIG_ARM_SDE_INTERFACE)		+= sdei.o
>   obj-$(CONFIG_ARM64_PTR_AUTH)		+= pointer_auth.o
> +obj-$(CONFIG_ARM64_MPAM)		+= mpam.o
>   obj-$(CONFIG_ARM64_MTE)			+= mte.o
>   obj-y					+= vdso-wrap.o
>   obj-$(CONFIG_COMPAT_VDSO)		+= vdso32-wrap.o
> diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
> new file mode 100644
> index 000000000000..9866d2ca0faa
> --- /dev/null
> +++ b/arch/arm64/kernel/mpam.c
> @@ -0,0 +1,13 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (C) 2025 Arm Ltd. */
> +
> +#include <asm/mpam.h>
> +
> +#include <linux/jump_label.h>
> +#include <linux/percpu.h>
> +

Nitpick: Needn't include those two header files since they have been included
to <asm/mpam.h>
  
> +DEFINE_STATIC_KEY_FALSE(mpam_enabled);
> +DEFINE_PER_CPU(u64, arm64_mpam_default);
> +DEFINE_PER_CPU(u64, arm64_mpam_current);
> +
> +u64 arm64_mpam_global_default;
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 489554931231..47698955fa1e 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -51,6 +51,7 @@
>   #include <asm/fpsimd.h>
>   #include <asm/gcs.h>
>   #include <asm/mmu_context.h>
> +#include <asm/mpam.h>
>   #include <asm/mte.h>
>   #include <asm/processor.h>
>   #include <asm/pointer_auth.h>
> @@ -738,6 +739,12 @@ struct task_struct *__switch_to(struct task_struct *prev,
>   	if (prev->thread.sctlr_user != next->thread.sctlr_user)
>   		update_sctlr_el1(next->thread.sctlr_user);
>   
> +	/*
> +	 * MPAM thread switch happens after the DSB to ensure prev's accesses
> +	 * use prev's MPAM settings.
> +	 */
> +	mpam_thread_switch(next);
> +
>   	/* the actual thread switch */
>   	last = cpu_switch_to(prev, next);
>   
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index b495d5291868..860181266b15 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -29,8 +29,6 @@
>   
>   #include "mpam_internal.h"
>   
> -DEFINE_STATIC_KEY_FALSE(mpam_enabled); /* This moves to arch code */
> -
>   /*
>    * mpam_list_lock protects the SRCU lists when writing. Once the
>    * mpam_enabled key is enabled these lists are read-only,
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index e8971842b124..4632985bcca6 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -16,12 +16,12 @@
>   #include <linux/srcu.h>
>   #include <linux/types.h>
>   
> +#include <asm/mpam.h>
> +
>   #define MPAM_MSC_MAX_NUM_RIS	16
>   
>   struct platform_device;
>   
> -DECLARE_STATIC_KEY_FALSE(mpam_enabled);
> -
>   #ifdef CONFIG_MPAM_KUNIT_TEST
>   #define PACKED_FOR_KUNIT __packed
>   #else

Thanks,
Gavin
Re: [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers
Posted by Jonathan Cameron 3 weeks, 4 days ago
On Thu, 15 Jan 2026 14:47:28 +0800
Gavin Shan <gshan@redhat.com> wrote:

> Hi Ben,
> 
> On 1/13/26 12:58 AM, Ben Horgan wrote:
> > From: James Morse <james.morse@arm.com>
> > 
> > MPAM allows traffic in the SoC to be labeled by the OS, these labels are
> > used to apply policy in caches and bandwidth regulators, and to monitor
> > traffic in the SoC. The label is made up of a PARTID and PMG value. The x86
> > equivalent calls these CLOSID and RMID, but they don't map precisely.
> > 
> > MPAM has two CPU system registers that is used to hold the PARTID and PMG
> > values that traffic generated at each exception level will use. These can
> > be set per-task by the resctrl file system. (resctrl is the defacto
> > interface for controlling this stuff).
> > 
> > Add a helper to switch this.
> > 
> > struct task_struct's separate CLOSID and RMID fields are insufficient to
> > implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID) and
> > PMG (sort of like the RMID) separately. On x86, the rmid is an independent
> > number, so a race that writes a mismatched closid and rmid into hardware is
> > benign. On arm64, the pmg bits extend the partid.
> > (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).  In
> > this case, mismatching the values will 'dirty' a pmg value that resctrl
> > believes is clean, and is not tracking with its 'limbo' code.
> > 
> > To avoid this, the partid and pmg are always read and written as a
> > pair. This requires a new u64 field. In struct task_struct there are two
> > u32, rmid and closid for the x86 case, but as we can't use them here do
> > something else. Add this new field, mpam_partid_pmg, to struct thread_info
> > to avoid adding more architecture specific code to struct task_struct.
> > Always use READ_ONCE()/WRITE_ONCE() when accessing this field.
> > 
> > Resctrl allows a per-cpu 'default' value to be set, this overrides the
> > values when scheduling a task in the default control-group, which has
> > PARTID 0. The way 'code data prioritisation' gets emulated means the
> > register value for the default group needs to be a variable.
> > 
> > The current system register value is kept in a per-cpu variable to avoid
> > writing to the system register if the value isn't going to change.  Writes
> > to this register may reset the hardware state for regulating bandwidth.
> > 
> > Finally, there is no reason to context switch these registers unless there
> > is a driver changing the values in struct task_struct. Hide the whole thing
> > behind a static key. This also allows the driver to disable MPAM in
> > response to errors reported by hardware. Move the existing static key to
> > belong to the arch code, as in the future the MPAM driver may become a
> > loadable module.
> > 
> > All this should depend on whether there is an MPAM driver, hide it behind
> > CONFIG_ARM64_MPAM.
> > 
> > CC: Amit Singh Tomar <amitsinght@marvell.com>
> > Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> > Signed-off-by: James Morse <james.morse@arm.com>
> > Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> > ---
> > Changes since rfc:
> > CONFIG_MPAM -> CONFIG_ARM64_MPAM in commit message
> > Remove extra DECLARE_STATIC_KEY_FALSE
> > Function name in comment, __mpam_sched_in() -> mpam_thread_switch()
> > Remove unused headers
> > Expand comment (Jonathan)
> > 
> > Changes since v2:
> > Tidy up ifdefs
> > ---
> >   arch/arm64/Kconfig                   |  2 +
> >   arch/arm64/include/asm/mpam.h        | 67 ++++++++++++++++++++++++++++
> >   arch/arm64/include/asm/thread_info.h |  3 ++
> >   arch/arm64/kernel/Makefile           |  1 +
> >   arch/arm64/kernel/mpam.c             | 13 ++++++
> >   arch/arm64/kernel/process.c          |  7 +++
> >   drivers/resctrl/mpam_devices.c       |  2 -
> >   drivers/resctrl/mpam_internal.h      |  4 +-
> >   8 files changed, 95 insertions(+), 4 deletions(-)
> >   create mode 100644 arch/arm64/include/asm/mpam.h
> >   create mode 100644 arch/arm64/kernel/mpam.c
> >   
> 
> With the following nitpick addressed:
> 

I commented on the nitpick.

> Reviewed-by: Gavin Shan <gshan@redhat.com>

> > diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> > index 76f32e424065..15979f366519 100644
> > --- a/arch/arm64/kernel/Makefile
> > +++ b/arch/arm64/kernel/Makefile
> > @@ -67,6 +67,7 @@ obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
> >   obj-$(CONFIG_VMCORE_INFO)		+= vmcore_info.o
> >   obj-$(CONFIG_ARM_SDE_INTERFACE)		+= sdei.o
> >   obj-$(CONFIG_ARM64_PTR_AUTH)		+= pointer_auth.o
> > +obj-$(CONFIG_ARM64_MPAM)		+= mpam.o
> >   obj-$(CONFIG_ARM64_MTE)			+= mte.o
> >   obj-y					+= vdso-wrap.o
> >   obj-$(CONFIG_COMPAT_VDSO)		+= vdso32-wrap.o
> > diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
> > new file mode 100644
> > index 000000000000..9866d2ca0faa
> > --- /dev/null
> > +++ b/arch/arm64/kernel/mpam.c
> > @@ -0,0 +1,13 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (C) 2025 Arm Ltd. */
> > +
> > +#include <asm/mpam.h>
> > +
> > +#include <linux/jump_label.h>
> > +#include <linux/percpu.h>
> > +  
> 
> Nitpick: Needn't include those two header files since they have been included
> to <asm/mpam.h>

That is a non obvious include chain that we should not rely on.
Please keep the headers and continue to follow include what you use
style (with exceptions when a given header is clearly documented as always including
another like some of the bit map stuff.) It is more obviously correct and
causes less grief if headers get refactored in future.

>   
> > +DEFINE_STATIC_KEY_FALSE(mpam_enabled);
> > +DEFINE_PER_CPU(u64, arm64_mpam_default);
> > +DEFINE_PER_CPU(u64, arm64_mpam_current);
> > +
> > +u64 arm64_mpam_global_default;

>
Re: [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers
Posted by Ben Horgan 3 weeks ago
Hi Gavin, Jonathan,

On 1/15/26 12:09, Jonathan Cameron wrote:
> On Thu, 15 Jan 2026 14:47:28 +0800
> Gavin Shan <gshan@redhat.com> wrote:
> 
>> Hi Ben,
>>
>> On 1/13/26 12:58 AM, Ben Horgan wrote:
>>> From: James Morse <james.morse@arm.com>
>>>
>>> MPAM allows traffic in the SoC to be labeled by the OS, these labels are
>>> used to apply policy in caches and bandwidth regulators, and to monitor
>>> traffic in the SoC. The label is made up of a PARTID and PMG value. The x86
>>> equivalent calls these CLOSID and RMID, but they don't map precisely.
>>>
>>> MPAM has two CPU system registers that is used to hold the PARTID and PMG
>>> values that traffic generated at each exception level will use. These can
>>> be set per-task by the resctrl file system. (resctrl is the defacto
>>> interface for controlling this stuff).
>>>
>>> Add a helper to switch this.
>>>
>>> struct task_struct's separate CLOSID and RMID fields are insufficient to
>>> implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID) and
>>> PMG (sort of like the RMID) separately. On x86, the rmid is an independent
>>> number, so a race that writes a mismatched closid and rmid into hardware is
>>> benign. On arm64, the pmg bits extend the partid.
>>> (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).  In
>>> this case, mismatching the values will 'dirty' a pmg value that resctrl
>>> believes is clean, and is not tracking with its 'limbo' code.
>>>
>>> To avoid this, the partid and pmg are always read and written as a
>>> pair. This requires a new u64 field. In struct task_struct there are two
>>> u32, rmid and closid for the x86 case, but as we can't use them here do
>>> something else. Add this new field, mpam_partid_pmg, to struct thread_info
>>> to avoid adding more architecture specific code to struct task_struct.
>>> Always use READ_ONCE()/WRITE_ONCE() when accessing this field.
>>>
>>> Resctrl allows a per-cpu 'default' value to be set, this overrides the
>>> values when scheduling a task in the default control-group, which has
>>> PARTID 0. The way 'code data prioritisation' gets emulated means the
>>> register value for the default group needs to be a variable.
>>>
>>> The current system register value is kept in a per-cpu variable to avoid
>>> writing to the system register if the value isn't going to change.  Writes
>>> to this register may reset the hardware state for regulating bandwidth.
>>>
>>> Finally, there is no reason to context switch these registers unless there
>>> is a driver changing the values in struct task_struct. Hide the whole thing
>>> behind a static key. This also allows the driver to disable MPAM in
>>> response to errors reported by hardware. Move the existing static key to
>>> belong to the arch code, as in the future the MPAM driver may become a
>>> loadable module.
>>>
>>> All this should depend on whether there is an MPAM driver, hide it behind
>>> CONFIG_ARM64_MPAM.
>>>
>>> CC: Amit Singh Tomar <amitsinght@marvell.com>
>>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>>> Signed-off-by: James Morse <james.morse@arm.com>
>>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>>> ---
>>> Changes since rfc:
>>> CONFIG_MPAM -> CONFIG_ARM64_MPAM in commit message
>>> Remove extra DECLARE_STATIC_KEY_FALSE
>>> Function name in comment, __mpam_sched_in() -> mpam_thread_switch()
>>> Remove unused headers
>>> Expand comment (Jonathan)
>>>
>>> Changes since v2:
>>> Tidy up ifdefs
>>> ---
>>>   arch/arm64/Kconfig                   |  2 +
>>>   arch/arm64/include/asm/mpam.h        | 67 ++++++++++++++++++++++++++++
>>>   arch/arm64/include/asm/thread_info.h |  3 ++
>>>   arch/arm64/kernel/Makefile           |  1 +
>>>   arch/arm64/kernel/mpam.c             | 13 ++++++
>>>   arch/arm64/kernel/process.c          |  7 +++
>>>   drivers/resctrl/mpam_devices.c       |  2 -
>>>   drivers/resctrl/mpam_internal.h      |  4 +-
>>>   8 files changed, 95 insertions(+), 4 deletions(-)
>>>   create mode 100644 arch/arm64/include/asm/mpam.h
>>>   create mode 100644 arch/arm64/kernel/mpam.c
>>>   
>>
>> With the following nitpick addressed:
>>
> 
> I commented on the nitpick.
> 
>> Reviewed-by: Gavin Shan <gshan@redhat.com>
> 
>>> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
>>> index 76f32e424065..15979f366519 100644
>>> --- a/arch/arm64/kernel/Makefile
>>> +++ b/arch/arm64/kernel/Makefile
>>> @@ -67,6 +67,7 @@ obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
>>>   obj-$(CONFIG_VMCORE_INFO)		+= vmcore_info.o
>>>   obj-$(CONFIG_ARM_SDE_INTERFACE)		+= sdei.o
>>>   obj-$(CONFIG_ARM64_PTR_AUTH)		+= pointer_auth.o
>>> +obj-$(CONFIG_ARM64_MPAM)		+= mpam.o
>>>   obj-$(CONFIG_ARM64_MTE)			+= mte.o
>>>   obj-y					+= vdso-wrap.o
>>>   obj-$(CONFIG_COMPAT_VDSO)		+= vdso32-wrap.o
>>> diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
>>> new file mode 100644
>>> index 000000000000..9866d2ca0faa
>>> --- /dev/null
>>> +++ b/arch/arm64/kernel/mpam.c
>>> @@ -0,0 +1,13 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/* Copyright (C) 2025 Arm Ltd. */
>>> +
>>> +#include <asm/mpam.h>
>>> +
>>> +#include <linux/jump_label.h>
>>> +#include <linux/percpu.h>
>>> +  
>>
>> Nitpick: Needn't include those two header files since they have been included
>> to <asm/mpam.h>
> 
> That is a non obvious include chain that we should not rely on.
> Please keep the headers and continue to follow include what you use
> style (with exceptions when a given header is clearly documented as always including
> another like some of the bit map stuff.) It is more obviously correct and
> causes less grief if headers get refactored in future.

Keeping the includes here makes sense to me too. Gavin, are you ok with
keeping this as is?

> 
>>   
>>> +DEFINE_STATIC_KEY_FALSE(mpam_enabled);
>>> +DEFINE_PER_CPU(u64, arm64_mpam_default);
>>> +DEFINE_PER_CPU(u64, arm64_mpam_current);
>>> +
>>> +u64 arm64_mpam_global_default;
> 
>>
> 

Thanks,

Ben
Re: [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers
Posted by Gavin Shan 2 weeks, 6 days ago
Hi Ben and Jonathan,

On 1/19/26 10:00 PM, Ben Horgan wrote:
> Hi Gavin, Jonathan,
> 
> On 1/15/26 12:09, Jonathan Cameron wrote:
>> On Thu, 15 Jan 2026 14:47:28 +0800
>> Gavin Shan <gshan@redhat.com> wrote:
>>
>>> Hi Ben,
>>>
>>> On 1/13/26 12:58 AM, Ben Horgan wrote:
>>>> From: James Morse <james.morse@arm.com>
>>>>
>>>> MPAM allows traffic in the SoC to be labeled by the OS, these labels are
>>>> used to apply policy in caches and bandwidth regulators, and to monitor
>>>> traffic in the SoC. The label is made up of a PARTID and PMG value. The x86
>>>> equivalent calls these CLOSID and RMID, but they don't map precisely.
>>>>
>>>> MPAM has two CPU system registers that is used to hold the PARTID and PMG
>>>> values that traffic generated at each exception level will use. These can
>>>> be set per-task by the resctrl file system. (resctrl is the defacto
>>>> interface for controlling this stuff).
>>>>
>>>> Add a helper to switch this.
>>>>
>>>> struct task_struct's separate CLOSID and RMID fields are insufficient to
>>>> implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID) and
>>>> PMG (sort of like the RMID) separately. On x86, the rmid is an independent
>>>> number, so a race that writes a mismatched closid and rmid into hardware is
>>>> benign. On arm64, the pmg bits extend the partid.
>>>> (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).  In
>>>> this case, mismatching the values will 'dirty' a pmg value that resctrl
>>>> believes is clean, and is not tracking with its 'limbo' code.
>>>>
>>>> To avoid this, the partid and pmg are always read and written as a
>>>> pair. This requires a new u64 field. In struct task_struct there are two
>>>> u32, rmid and closid for the x86 case, but as we can't use them here do
>>>> something else. Add this new field, mpam_partid_pmg, to struct thread_info
>>>> to avoid adding more architecture specific code to struct task_struct.
>>>> Always use READ_ONCE()/WRITE_ONCE() when accessing this field.
>>>>
>>>> Resctrl allows a per-cpu 'default' value to be set, this overrides the
>>>> values when scheduling a task in the default control-group, which has
>>>> PARTID 0. The way 'code data prioritisation' gets emulated means the
>>>> register value for the default group needs to be a variable.
>>>>
>>>> The current system register value is kept in a per-cpu variable to avoid
>>>> writing to the system register if the value isn't going to change.  Writes
>>>> to this register may reset the hardware state for regulating bandwidth.
>>>>
>>>> Finally, there is no reason to context switch these registers unless there
>>>> is a driver changing the values in struct task_struct. Hide the whole thing
>>>> behind a static key. This also allows the driver to disable MPAM in
>>>> response to errors reported by hardware. Move the existing static key to
>>>> belong to the arch code, as in the future the MPAM driver may become a
>>>> loadable module.
>>>>
>>>> All this should depend on whether there is an MPAM driver, hide it behind
>>>> CONFIG_ARM64_MPAM.
>>>>
>>>> CC: Amit Singh Tomar <amitsinght@marvell.com>
>>>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>>>> Signed-off-by: James Morse <james.morse@arm.com>
>>>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>>>> ---
>>>> Changes since rfc:
>>>> CONFIG_MPAM -> CONFIG_ARM64_MPAM in commit message
>>>> Remove extra DECLARE_STATIC_KEY_FALSE
>>>> Function name in comment, __mpam_sched_in() -> mpam_thread_switch()
>>>> Remove unused headers
>>>> Expand comment (Jonathan)
>>>>
>>>> Changes since v2:
>>>> Tidy up ifdefs
>>>> ---
>>>>    arch/arm64/Kconfig                   |  2 +
>>>>    arch/arm64/include/asm/mpam.h        | 67 ++++++++++++++++++++++++++++
>>>>    arch/arm64/include/asm/thread_info.h |  3 ++
>>>>    arch/arm64/kernel/Makefile           |  1 +
>>>>    arch/arm64/kernel/mpam.c             | 13 ++++++
>>>>    arch/arm64/kernel/process.c          |  7 +++
>>>>    drivers/resctrl/mpam_devices.c       |  2 -
>>>>    drivers/resctrl/mpam_internal.h      |  4 +-
>>>>    8 files changed, 95 insertions(+), 4 deletions(-)
>>>>    create mode 100644 arch/arm64/include/asm/mpam.h
>>>>    create mode 100644 arch/arm64/kernel/mpam.c
>>>>    
>>>
>>> With the following nitpick addressed:
>>>
>>
>> I commented on the nitpick.
>>
>>> Reviewed-by: Gavin Shan <gshan@redhat.com>
>>
>>>> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
>>>> index 76f32e424065..15979f366519 100644
>>>> --- a/arch/arm64/kernel/Makefile
>>>> +++ b/arch/arm64/kernel/Makefile
>>>> @@ -67,6 +67,7 @@ obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
>>>>    obj-$(CONFIG_VMCORE_INFO)		+= vmcore_info.o
>>>>    obj-$(CONFIG_ARM_SDE_INTERFACE)		+= sdei.o
>>>>    obj-$(CONFIG_ARM64_PTR_AUTH)		+= pointer_auth.o
>>>> +obj-$(CONFIG_ARM64_MPAM)		+= mpam.o
>>>>    obj-$(CONFIG_ARM64_MTE)			+= mte.o
>>>>    obj-y					+= vdso-wrap.o
>>>>    obj-$(CONFIG_COMPAT_VDSO)		+= vdso32-wrap.o
>>>> diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
>>>> new file mode 100644
>>>> index 000000000000..9866d2ca0faa
>>>> --- /dev/null
>>>> +++ b/arch/arm64/kernel/mpam.c
>>>> @@ -0,0 +1,13 @@
>>>> +// SPDX-License-Identifier: GPL-2.0
>>>> +/* Copyright (C) 2025 Arm Ltd. */
>>>> +
>>>> +#include <asm/mpam.h>
>>>> +
>>>> +#include <linux/jump_label.h>
>>>> +#include <linux/percpu.h>
>>>> +
>>>
>>> Nitpick: Needn't include those two header files since they have been included
>>> to <asm/mpam.h>
>>
>> That is a non obvious include chain that we should not rely on.
>> Please keep the headers and continue to follow include what you use
>> style (with exceptions when a given header is clearly documented as always including
>> another like some of the bit map stuff.) It is more obviously correct and
>> causes less grief if headers get refactored in future.
> 
> Keeping the includes here makes sense to me too. Gavin, are you ok with
> keeping this as is?
> 

Yeah, I'm fine to keep it as of being :-)

>>
>>>    
>>>> +DEFINE_STATIC_KEY_FALSE(mpam_enabled);
>>>> +DEFINE_PER_CPU(u64, arm64_mpam_default);
>>>> +DEFINE_PER_CPU(u64, arm64_mpam_current);
>>>> +
>>>> +u64 arm64_mpam_global_default;
>>
>>>
>>
> 
> Thanks,
> 
> Ben
> 
> 

Thanks,
Gavin