[RFC PATCH 13/19] x86/resctrl: Add PLZA state tracking and context switch handling

Babu Moger posted 19 patches 2 weeks, 3 days ago
[RFC PATCH 13/19] x86/resctrl: Add PLZA state tracking and context switch handling
Posted by Babu Moger 2 weeks, 3 days ago
The resctrl subsystem writes the task's RMID/CLOSID to IA32_PQR_ASSOC in
__resctrl_sched_in(). With PLZA support being introduced and guarded by
rdt_plza_enable_key, the kernel needs a way to track and program the PLZA
association independently of the regular RMID/CLOSID path.

Extend the per-CPU resctrl_pqr_state to track PLZA-related state, including
the current and default PLZA values along with the associated RMID and
CLOSID.

Update the resctrl scheduling-in path to program the PLZA MSR when PLZA
support is enabled. During the context switch, the task-specific PLZA
setting is applied if present; otherwise, the per-CPU default PLZA value is
used. The MSR is only written when the PLZA state changes, avoiding
unnecessary writes.

PLZA programming is guarded by a static key to ensure there is no overhead
when the feature is disabled.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/include/asm/resctrl.h | 19 +++++++++++++++++++
 include/linux/sched.h          |  1 +
 2 files changed, 20 insertions(+)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index fc0a7f64649e..76de7d6051b7 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -38,6 +38,10 @@ struct resctrl_pqr_state {
 	u32			cur_closid;
 	u32			default_rmid;
 	u32			default_closid;
+	u32			cur_plza;
+	u32			default_plza;
+	u32			plza_rmid;
+	u32			plza_closid;
 };
 
 DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);
@@ -115,6 +119,7 @@ static inline void __resctrl_sched_in(struct task_struct *tsk)
 	struct resctrl_pqr_state *state = this_cpu_ptr(&pqr_state);
 	u32 closid = READ_ONCE(state->default_closid);
 	u32 rmid = READ_ONCE(state->default_rmid);
+	u32 plza = READ_ONCE(state->default_plza);
 	u32 tmp;
 
 	/*
@@ -138,6 +143,20 @@ static inline void __resctrl_sched_in(struct task_struct *tsk)
 		state->cur_rmid = rmid;
 		wrmsr(MSR_IA32_PQR_ASSOC, rmid, closid);
 	}
+
+	if (static_branch_likely(&rdt_plza_enable_key)) {
+		tmp = READ_ONCE(tsk->plza);
+		if (tmp)
+			plza = tmp;
+
+		if (plza != state->cur_plza) {
+			state->cur_plza = plza;
+			wrmsr(MSR_IA32_PQR_PLZA_ASSOC,
+			      RMID_EN | state->plza_rmid,
+			      (plza ? PLZA_EN : 0) | CLOSID_EN | state->plza_closid);
+		}
+	}
+
 }
 
 static inline unsigned int resctrl_arch_round_mon_val(unsigned int val)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8f3a60f13393..d573163865ae 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1326,6 +1326,7 @@ struct task_struct {
 #ifdef CONFIG_X86_CPU_RESCTRL
 	u32				closid;
 	u32				rmid;
+	u32				plza;
 #endif
 #ifdef CONFIG_FUTEX
 	struct robust_list_head __user	*robust_list;
-- 
2.34.1
Re: [RFC PATCH 13/19] x86/resctrl: Add PLZA state tracking and context switch handling
Posted by Luck, Tony 1 week, 4 days ago
On Wed, Jan 21, 2026 at 03:12:51PM -0600, Babu Moger wrote:
> @@ -138,6 +143,20 @@ static inline void __resctrl_sched_in(struct task_struct *tsk)
>  		state->cur_rmid = rmid;
>  		wrmsr(MSR_IA32_PQR_ASSOC, rmid, closid);
>  	}
> +
> +	if (static_branch_likely(&rdt_plza_enable_key)) {
> +		tmp = READ_ONCE(tsk->plza);
> +		if (tmp)
> +			plza = tmp;
> +
> +		if (plza != state->cur_plza) {
> +			state->cur_plza = plza;
> +			wrmsr(MSR_IA32_PQR_PLZA_ASSOC,
> +			      RMID_EN | state->plza_rmid,
> +			      (plza ? PLZA_EN : 0) | CLOSID_EN | state->plza_closid);
> +		}
> +	}
> +

Babu,

This addition to the context switch code surprised me. After your talk
at LPC I had imagined that PLZA would be a single global setting so that
every syscall/page-fault/interrupt would run with a different CLOSID
(presumably one configured with more cache and memory bandwidth).

But this patch series looks like things are more flexible with the
ability to set different values (of RMID as well as CLOSID) per group.

It looks like it is possible to have some resctrl group with very
limited resources just bump up a bit when in ring0, while other
groups may get some different amount.

The additions for plza to the Documentation aren't helping me
understand how users will apply this.

Do you have some more examples?

-Tony
Re: [RFC PATCH 13/19] x86/resctrl: Add PLZA state tracking and context switch handling
Posted by Moger, Babu 1 week, 3 days ago
Hi Tony,

Thanks for the comment.

On 1/27/2026 4:30 PM, Luck, Tony wrote:
> On Wed, Jan 21, 2026 at 03:12:51PM -0600, Babu Moger wrote:
>> @@ -138,6 +143,20 @@ static inline void __resctrl_sched_in(struct task_struct *tsk)
>>   		state->cur_rmid = rmid;
>>   		wrmsr(MSR_IA32_PQR_ASSOC, rmid, closid);
>>   	}
>> +
>> +	if (static_branch_likely(&rdt_plza_enable_key)) {
>> +		tmp = READ_ONCE(tsk->plza);
>> +		if (tmp)
>> +			plza = tmp;
>> +
>> +		if (plza != state->cur_plza) {
>> +			state->cur_plza = plza;
>> +			wrmsr(MSR_IA32_PQR_PLZA_ASSOC,
>> +			      RMID_EN | state->plza_rmid,
>> +			      (plza ? PLZA_EN : 0) | CLOSID_EN | state->plza_closid);
>> +		}
>> +	}
>> +
> 
> Babu,
> 
> This addition to the context switch code surprised me. After your talk
> at LPC I had imagined that PLZA would be a single global setting so that
> every syscall/page-fault/interrupt would run with a different CLOSID
> (presumably one configured with more cache and memory bandwidth).
> 
> But this patch series looks like things are more flexible with the
> ability to set different values (of RMID as well as CLOSID) per group.

Yes. this similar what we have with MSR_IA32_PQR_ASSOC. The association 
can be done either thru CPUs (just one MSR write) or task based 
association(more MSR write as task moves around).
> 
> It looks like it is possible to have some resctrl group with very
> limited resources just bump up a bit when in ring0, while other
> groups may get some different amount.
> 
> The additions for plza to the Documentation aren't helping me
> understand how users will apply this.
> 
> Do you have some more examples?

Group creation is similar to what we have currently.

1. create a regular group and setup the limits.
    # mkdir /sys/fs/resctrl/group

2. Assign tasks or CPUs.
    # echo 1234 > /sys/fs/resctrl/group/tasks

    This is a regular group.

3. Now you figured that you need to change things in CPL0 for this task.

4. Now create a PLZA group now and tweek the limits,

    # mkdir /sys/fs/resctrl/group1

    # echo 1 > /sys/fs/resctrl/group1/plza

    # echo "MB:0=100" > /sys/fs/resctrl/group1/schemata

5. Assign the same task to the plza group.

    # echo 1234 > /sys/fs/resctrl/group1/tasks


Now the task 1234 will be using the limits from group1 when running in 
CPL0.

I will add few more details in my next revision.

Thanks
Babu
Re: [RFC PATCH 13/19] x86/resctrl: Add PLZA state tracking and context switch handling
Posted by Luck, Tony 1 week, 3 days ago
On Wed, Jan 28, 2026 at 10:01:39AM -0600, Moger, Babu wrote:
> Hi Tony,
> 
> Thanks for the comment.
> 
> On 1/27/2026 4:30 PM, Luck, Tony wrote:
> > On Wed, Jan 21, 2026 at 03:12:51PM -0600, Babu Moger wrote:
> > > @@ -138,6 +143,20 @@ static inline void __resctrl_sched_in(struct task_struct *tsk)
> > >   		state->cur_rmid = rmid;
> > >   		wrmsr(MSR_IA32_PQR_ASSOC, rmid, closid);
> > >   	}
> > > +
> > > +	if (static_branch_likely(&rdt_plza_enable_key)) {
> > > +		tmp = READ_ONCE(tsk->plza);
> > > +		if (tmp)
> > > +			plza = tmp;
> > > +
> > > +		if (plza != state->cur_plza) {
> > > +			state->cur_plza = plza;
> > > +			wrmsr(MSR_IA32_PQR_PLZA_ASSOC,
> > > +			      RMID_EN | state->plza_rmid,
> > > +			      (plza ? PLZA_EN : 0) | CLOSID_EN | state->plza_closid);
> > > +		}
> > > +	}
> > > +
> > 
> > Babu,
> > 
> > This addition to the context switch code surprised me. After your talk
> > at LPC I had imagined that PLZA would be a single global setting so that
> > every syscall/page-fault/interrupt would run with a different CLOSID
> > (presumably one configured with more cache and memory bandwidth).
> > 
> > But this patch series looks like things are more flexible with the
> > ability to set different values (of RMID as well as CLOSID) per group.
> 
> Yes. this similar what we have with MSR_IA32_PQR_ASSOC. The association can
> be done either thru CPUs (just one MSR write) or task based association(more
> MSR write as task moves around).
> > 
> > It looks like it is possible to have some resctrl group with very
> > limited resources just bump up a bit when in ring0, while other
> > groups may get some different amount.
> > 
> > The additions for plza to the Documentation aren't helping me
> > understand how users will apply this.
> > 
> > Do you have some more examples?
> 
> Group creation is similar to what we have currently.
> 
> 1. create a regular group and setup the limits.
>    # mkdir /sys/fs/resctrl/group
> 
> 2. Assign tasks or CPUs.
>    # echo 1234 > /sys/fs/resctrl/group/tasks
> 
>    This is a regular group.
> 
> 3. Now you figured that you need to change things in CPL0 for this task.
> 
> 4. Now create a PLZA group now and tweek the limits,
> 
>    # mkdir /sys/fs/resctrl/group1
> 
>    # echo 1 > /sys/fs/resctrl/group1/plza
> 
>    # echo "MB:0=100" > /sys/fs/resctrl/group1/schemata
> 
> 5. Assign the same task to the plza group.
> 
>    # echo 1234 > /sys/fs/resctrl/group1/tasks
> 
> 
> Now the task 1234 will be using the limits from group1 when running in CPL0.
> 
> I will add few more details in my next revision.
> 

Babu,

I've read a bit more of the code now and I think I understand more.

Some useful additions to your explanation.

1) Only one CTRL group can be marked as PLZA
2) It can't be the root/default group
3) It can't have sub monitor groups
4) It can't be pseudo-locked

Would a potential use case involve putting *all* tasks into the PLZA
group? That would avoid any additional context switch overhead as the
PLZA MSR would never need to change.

If that is the case, maybe for the PLZA group we should allow user to
do:

# echo '*' > tasks

-Tony