The mbm_cntr_assign mode provides an option to the user to assign a
counter to an RMID, event pair and monitor the bandwidth as long as
the counter is assigned. Number of assignments depend on number of
monitoring counters available.
Provide the interface to display the number of monitoring counters
supported. The interface file 'num_mbm_cntrs' is available when an
architecture supports mbm_cntr_assign mode.
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v8: Commit message update and documentation update.
v7: Minor commit log text changes.
v6: No changes.
v5: Changed the display name from num_cntrs to num_mbm_cntrs.
Updated the commit message.
Moved the patch after mbm_mode is introduced.
v4: Changed the counter name to num_cntrs. And few text changes.
v3: Changed the field name to mbm_assign_cntrs.
v2: Changed the field name to mbm_assignable_counters from abmc_counte
---
Documentation/arch/x86/resctrl.rst | 4 ++++
arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 ++++++++++++++++
3 files changed, 21 insertions(+)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index e4a7d6e815f6..1b5c05a35793 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -291,6 +291,10 @@ with the following files:
that case reading the mbm_total_bytes and mbm_local_bytes may report
'Unavailable' if there is no counter associated with that group.
+"num_mbm_cntrs":
+ The number of monitoring counters available for assignment when the
+ architecture supports mbm_cntr_assign mode.
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 2f3bf4529498..7aa579a99501 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1236,6 +1236,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
r->mon.mbm_cntr_assignable = true;
cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
+ resctrl_file_fflags_init("num_mbm_cntrs", RFTYPE_MON_INFO);
}
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 895264c207c7..c48b5450e6c2 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -869,6 +869,16 @@ static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
return 0;
}
+static int rdtgroup_num_mbm_cntrs_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdt_resource *r = of->kn->parent->priv;
+
+ seq_printf(s, "%d\n", r->mon.num_mbm_cntrs);
+
+ return 0;
+}
+
#ifdef CONFIG_PROC_CPU_RESCTRL
/*
@@ -1940,6 +1950,12 @@ static struct rftype res_common_files[] = {
.seq_show = rdtgroup_cpus_show,
.fflags = RFTYPE_BASE,
},
+ {
+ .name = "num_mbm_cntrs",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = rdtgroup_num_mbm_cntrs_show,
+ },
{
.name = "cpus_list",
.mode = 0644,
--
2.34.1
On Wed, Oct 09, 2024 at 12:39:33PM -0500, Babu Moger wrote: > +"num_mbm_cntrs": > + The number of monitoring counters available for assignment when the > + architecture supports mbm_cntr_assign mode. It's not obvious (to me) how these counters work. When I create a group with both local and total monitoring enabled, does that use up two counters (even though I only used up one RMID)? Are the counters multi-purpose. E.g. if I disable local counting on all groups, are the freed-up counters available for use to count total bandwidth on some additional groups? From the examples it looks like if there are free counters available when user does mkdir, then they will be assigned to the new rdtgroup. If only one counter is free, does it get assigned to local or total? Thanks -Tony
Hi Tony, On 10/9/24 17:49, Tony Luck wrote: > On Wed, Oct 09, 2024 at 12:39:33PM -0500, Babu Moger wrote: >> +"num_mbm_cntrs": >> + The number of monitoring counters available for assignment when the >> + architecture supports mbm_cntr_assign mode. > > It's not obvious (to me) how these counters work. When I create > a group with both local and total monitoring enabled, does that > use up two counters (even though I only used up one RMID)? That is correct. One RMID can be associated with multiple h/w counters. > > Are the counters multi-purpose. E.g. if I disable local counting > on all groups, are the freed-up counters available for use to > count total bandwidth on some additional groups? Yes. That is correct. With 32 counters you can enable both the events on up to 16 groups. You can also enable only one event in up to 32 groups. > >>From the examples it looks like if there are free counters > available when user does mkdir, then they will be assigned > to the new rdtgroup. If only one counter is free, does it > get assigned to local or total? Right now total event takes a priority. All good points. How about this text: "num_mbm_cntrs": The number of monitoring counters available for assignment when the architecture supports mbm_cntr_assign mode. Resctrl subsystem provides the interface to count maximum of two memory bandwidth events per group, from a combination of available total and local events. Keeping the current interface, users can enable a maximum of 2 counters per group. User will also have the option to enable only one counter to the group to maximize the number of groups monitored. -- Thanks Babu Moger
Hi Babu and Tony, On 10/10/24 8:12 AM, Moger, Babu wrote: > > All good points. How about this text: > > "num_mbm_cntrs": > The number of monitoring counters available for assignment when the > architecture supports mbm_cntr_assign mode. > > Resctrl subsystem provides the interface to count maximum of two memory subsystem -> filesystem > bandwidth events per group, from a combination of available total and Is this "from a combination of ..." snippet intended to hint at BMEC? > local events. Keeping the current interface, users can enable a maximum of What is meant by "Keeping the current interface"? Which interface? What will "current" mean when a user reads this documentation? > 2 counters per group. User will also have the option to enable only one "User will also have" is talking about the future. When will this be the case? > counter to the group to maximize the number of groups monitored. > > I think that we need to be careful when making this documentation so specific to the ABMC implementation. We already know that "soft-ABMC" is coming and Peter already shared [1] that with software assignment it will not be possible to assign counters to individual events. The goal of this work is to create a generic interface and this is the documentation for it. If this documentation is created to be specific to the first implementation it will make it difficult to use this same interface to support other implementations. Reinette [1] https://lore.kernel.org/all/CALPaoCi_TBZnULHQpYns+H+30jODZvyQpUHJRDHNwjQzajrD=A@mail.gmail.com/
Hi Reinette, On 10/14/24 11:25, Reinette Chatre wrote: > Hi Babu and Tony, > > On 10/10/24 8:12 AM, Moger, Babu wrote: >> >> All good points. How about this text: >> >> "num_mbm_cntrs": >> The number of monitoring counters available for assignment when the >> architecture supports mbm_cntr_assign mode. >> >> Resctrl subsystem provides the interface to count maximum of two memory > > subsystem -> filesystem Sure. > >> bandwidth events per group, from a combination of available total and > > Is this "from a combination of ..." snippet intended to hint at BMEC? No. We support 2 MBM events right now. That is why I added combination of total and local. I can remove that text. > >> local events. Keeping the current interface, users can enable a maximum of > > What is meant by "Keeping the current interface"? Which interface? What will > "current" mean when a user reads this documentation? I meant not to change any interface to support mbm_cntrl_assign feature. > >> 2 counters per group. User will also have the option to enable only one > > "User will also have" is talking about the future. When will this be the case? Again.. will have change the text here. > >> counter to the group to maximize the number of groups monitored. >> >> > > I think that we need to be careful when making this documentation so specific > to the ABMC implementation. We already know that "soft-ABMC" is coming and > Peter already shared [1] that with software assignment it will not be possible > to assign counters to individual events. > > The goal of this work is to create a generic interface and this is the documentation > for it. If this documentation is created to be specific to the first implementation > it will make it difficult to use this same interface to support other > implementations. > Agree. How about this? "num_mbm_cntrs": The number of monitoring counters available for assignment when the architecture supports mbm_cntr_assign mode. The resctrl filesystem allows user track up to two memory bandwidth events per group, using a mix of total and local events. Users can enable up to 2 counters per group. There's also an option to enable just one counter per group, which allows monitoring more groups. -- Thanks Babu Moger
Hi Babu, On 10/14/24 10:46 AM, Moger, Babu wrote: > On 10/14/24 11:25, Reinette Chatre wrote: >> On 10/10/24 8:12 AM, Moger, Babu wrote: >>> >>> All good points. How about this text: >>> >>> "num_mbm_cntrs": >>> The number of monitoring counters available for assignment when the >>> architecture supports mbm_cntr_assign mode. >>> >>> Resctrl subsystem provides the interface to count maximum of two memory >> >> subsystem -> filesystem > > Sure. >> >>> bandwidth events per group, from a combination of available total and >> >> Is this "from a combination of ..." snippet intended to hint at BMEC? > > No. We support 2 MBM events right now. That is why I added combination of > total and local. I can remove that text. > >> >>> local events. Keeping the current interface, users can enable a maximum of >> >> What is meant by "Keeping the current interface"? Which interface? What will >> "current" mean when a user reads this documentation? > > I meant not to change any interface to support mbm_cntrl_assign feature. > >> >>> 2 counters per group. User will also have the option to enable only one >> >> "User will also have" is talking about the future. When will this be the case? > > Again.. will have change the text here. > >> >>> counter to the group to maximize the number of groups monitored. >>> >>> >> >> I think that we need to be careful when making this documentation so specific >> to the ABMC implementation. We already know that "soft-ABMC" is coming and >> Peter already shared [1] that with software assignment it will not be possible >> to assign counters to individual events. >> >> The goal of this work is to create a generic interface and this is the documentation >> for it. If this documentation is created to be specific to the first implementation >> it will make it difficult to use this same interface to support other >> implementations. >> > > Agree. > > How about this? > > > "num_mbm_cntrs": > The number of monitoring counters available for assignment when the > architecture supports mbm_cntr_assign mode. > > The resctrl filesystem allows user track up to two memory bandwidth events > per group, using a mix of total and local events. Users can enable up to 2 "a mix of" remains unclear to me since there are only two options. I think we can be specific here. > counters per group. There's also an option to enable just one counter per > group, which allows monitoring more groups. > How about below for the second paragraph: The resctrl filesystem supports tracking up to two memory bandwidth events per monitoring group: mbm_total_bytes and/or mbm_local_bytes. Up to two counters can be assigned per monitoring group, one for each memory bandwidth event. More monitoring groups can be tracked by assigning one counter per monitoring group. However, doing so limits memory bandwidth tracking to a single memory bandwidth event per monitoring group. Reinette
Hi Reinette, On 10/14/24 13:30, Reinette Chatre wrote: > Hi Babu, > > On 10/14/24 10:46 AM, Moger, Babu wrote: >> On 10/14/24 11:25, Reinette Chatre wrote: >>> On 10/10/24 8:12 AM, Moger, Babu wrote: >>>> >>>> All good points. How about this text: >>>> >>>> "num_mbm_cntrs": >>>> The number of monitoring counters available for assignment when the >>>> architecture supports mbm_cntr_assign mode. >>>> >>>> Resctrl subsystem provides the interface to count maximum of two memory >>> >>> subsystem -> filesystem >> >> Sure. >>> >>>> bandwidth events per group, from a combination of available total and >>> >>> Is this "from a combination of ..." snippet intended to hint at BMEC? >> >> No. We support 2 MBM events right now. That is why I added combination of >> total and local. I can remove that text. >> >>> >>>> local events. Keeping the current interface, users can enable a maximum of >>> >>> What is meant by "Keeping the current interface"? Which interface? What will >>> "current" mean when a user reads this documentation? >> >> I meant not to change any interface to support mbm_cntrl_assign feature. >> >>> >>>> 2 counters per group. User will also have the option to enable only one >>> >>> "User will also have" is talking about the future. When will this be the case? >> >> Again.. will have change the text here. >> >>> >>>> counter to the group to maximize the number of groups monitored. >>>> >>>> >>> >>> I think that we need to be careful when making this documentation so specific >>> to the ABMC implementation. We already know that "soft-ABMC" is coming and >>> Peter already shared [1] that with software assignment it will not be possible >>> to assign counters to individual events. >>> >>> The goal of this work is to create a generic interface and this is the documentation >>> for it. If this documentation is created to be specific to the first implementation >>> it will make it difficult to use this same interface to support other >>> implementations. >>> >> >> Agree. >> >> How about this? >> >> >> "num_mbm_cntrs": >> The number of monitoring counters available for assignment when the >> architecture supports mbm_cntr_assign mode. >> >> The resctrl filesystem allows user track up to two memory bandwidth events >> per group, using a mix of total and local events. Users can enable up to 2 > > "a mix of" remains unclear to me since there are only two options. I think we > can be specific here. > >> counters per group. There's also an option to enable just one counter per >> group, which allows monitoring more groups. >> > > How about below for the second paragraph: > > The resctrl filesystem supports tracking up to two memory bandwidth > events per monitoring group: mbm_total_bytes and/or mbm_local_bytes. > Up to two counters can be assigned per monitoring group, one for each > memory bandwidth event. More monitoring groups can be tracked by > assigning one counter per monitoring group. However, doing so limits > memory bandwidth tracking to a single memory bandwidth event per > monitoring group. > Sure. Looks good. -- Thanks Babu Moger
> All good points. How about this text: > > "num_mbm_cntrs": > The number of monitoring counters available for assignment when the > architecture supports mbm_cntr_assign mode. > > Resctrl subsystem provides the interface to count maximum of two memory > bandwidth events per group, from a combination of available total and > local events. Keeping the current interface, users can enable a maximum of > 2 counters per group. User will also have the option to enable only one > counter to the group to maximize the number of groups monitored. Much better. Looks OK to me. New questions: 1) Should resctrl provide a file to tell the user how many free counters are available? They can figure it out by counting all the 'l' and 't' in "mbm_assign_control" and subtracting that from "num_mbm_cntrs". But that seems complex. 2) Even more so because free counters might be different per socket if the user did some "0=tl;1=_" assignments as in one of your examples. Maybe a UI like: $ cat /sys/fs/resctrl/info/L3_MON/free_mbm_cntrs 0=5;1=9 -Tony
Hi Tony, On 10/10/24 10:58, Luck, Tony wrote: >> All good points. How about this text: >> >> "num_mbm_cntrs": >> The number of monitoring counters available for assignment when the >> architecture supports mbm_cntr_assign mode. >> >> Resctrl subsystem provides the interface to count maximum of two memory >> bandwidth events per group, from a combination of available total and >> local events. Keeping the current interface, users can enable a maximum of >> 2 counters per group. User will also have the option to enable only one >> counter to the group to maximize the number of groups monitored. > > Much better. Looks OK to me. thanks > > New questions: > > 1) Should resctrl provide a file to tell the user how many free > counters are available? They can figure it out by counting all the 'l' and 't' > in "mbm_assign_control" and subtracting that from "num_mbm_cntrs". > But that seems complex. We have the information already in r->mon.mbm_cntr_free_map. How about adding an extra text while printing num_mbm_cntrs? $ cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs Total 32, Available 16 There are all global counters, we don't differentiate between sockets just like number of CLOSIDs. > > 2) Even more so because free counters might be different per socket > if the user did some "0=tl;1=_" assignments as in one of your examples. > > Maybe a UI like: > > $ cat /sys/fs/resctrl/info/L3_MON/free_mbm_cntrs > 0=5;1=9 > > -Tony > > -- Thanks Babu Moger
Babu, > We have the information already in r->mon.mbm_cntr_free_map. > > How about adding an extra text while printing num_mbm_cntrs? > > $ cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs > Total 32, Available 16 Either that or: Total 32 Available 16 which looks fractionally simpler to parse. But I don't have strong feelings. > There are all global counters, we don't differentiate between sockets just > like number of CLOSIDs. Interesting. So there is no real benefit from "0=tl;1=_" ... you are using up two counters, just not reporting them on socket 1. Why have this complexity in mbm_assign_control syntax? You could have just {grouppath}/{allocation} where allocation is one of _, t, l, tl -Tony
Hi Tony, On 10/10/24 12:08, Luck, Tony wrote: > Babu, > >> We have the information already in r->mon.mbm_cntr_free_map. >> >> How about adding an extra text while printing num_mbm_cntrs? >> >> $ cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs >> Total 32, Available 16 > > Either that or: > Total 32 > Available 16 > Sure. Fine with me. > which looks fractionally simpler to parse. But I don't have strong feelings. > >> There are all global counters, we don't differentiate between sockets just >> like number of CLOSIDs. > > Interesting. So there is no real benefit from "0=tl;1=_" ... you are using > up two counters, just not reporting them on socket 1. > > Why have this complexity in mbm_assign_control syntax? Lets take an example: $ cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs Total 32 Available 30 # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control //0=tl;1=tl; Here default group has taken two counters. # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control //0=_;1=tl; Here default group has two counters. Domain 0 does not have counters applied. So, you wont be able to read the MBM values for domain 0. Domain 1 has both the counters applied. Domain level application is important. This is similar to what we have with schemata. You can change the value in each individual domain. #cat schemata MB:0=2048;1=2048;2=2048;3=2048 L3:0=ffff;1=ffff;2=ffff;3=ffff > > You could have just {grouppath}/{allocation} > > where allocation is one of _, t, l, tl > > -Tony -- Thanks Babu Moger
Hi Tony and Babu, On 10/10/24 11:36 AM, Moger, Babu wrote: > Hi Tony, > > On 10/10/24 12:08, Luck, Tony wrote: >> Babu, >> >>> We have the information already in r->mon.mbm_cntr_free_map. >>> >>> How about adding an extra text while printing num_mbm_cntrs? >>> >>> $ cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs >>> Total 32, Available 16 >> >> Either that or: >> Total 32 >> Available 16 >> > > Sure. Fine with me. I think separate files would be easier to parse and matches the existing resctrl interface in this regard. How about "available_mbm_cntrs"? Reinette
Hi Reinette, On 10/14/24 11:59, wrote: > Hi Tony and Babu, > > On 10/10/24 11:36 AM, Moger, Babu wrote: >> Hi Tony, >> >> On 10/10/24 12:08, Luck, Tony wrote: >>> Babu, >>> >>>> We have the information already in r->mon.mbm_cntr_free_map. >>>> >>>> How about adding an extra text while printing num_mbm_cntrs? >>>> >>>> $ cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs >>>> Total 32, Available 16 >>> >>> Either that or: >>> Total 32 >>> Available 16 >>> >> >> Sure. Fine with me. > > I think separate files would be easier to parse and matches the existing resctrl > interface in this regard. How about "available_mbm_cntrs"? Sure. Thanks Babu Moger
> > Why have this complexity in mbm_assign_control syntax? > > Lets take an example: > $ cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs > Total 32 > Available 30 > > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > //0=tl;1=tl; > > Here default group has taken two counters. > > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > //0=_;1=tl; > > Here default group has two counters. > Domain 0 does not have counters applied. So, you wont be able to read the > MBM values for domain 0. > Domain 1 has both the counters applied. Is there some benefit from doing this? You are still using the same number of counters. You now can't read them from domain 0. You said the counters are system-wide. Does that mean that in this case: # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control //0=tl;1=tl; there aren't separate counts from each of domain 0 and domain 1. I.e. if I read both I'd see the same value (sum of traffic on both domains): $ grep . /sys/fs/resctrl/mon_data/*/*total* /sys/fs/resctrl /mon_data/mon_L3_00/mbm_total_bytes:260039467008 /sys/fs/resctrl /mon_data/mon_L3_01/mbm_total_bytes:260039467008 -Tony
Hi Tony, On 10/10/24 13:57, Luck, Tony wrote: >>> Why have this complexity in mbm_assign_control syntax? >> >> Lets take an example: >> $ cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs >> Total 32 >> Available 30 >> >> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> //0=tl;1=tl; >> >> Here default group has taken two counters. >> >> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> //0=_;1=tl; >> >> Here default group has two counters. >> Domain 0 does not have counters applied. So, you wont be able to read the >> MBM values for domain 0. >> Domain 1 has both the counters applied. > > Is there some benefit from doing this? You are still using the same > number of counters. You now can't read them from domain 0. > > You said the counters are system-wide. Does that mean that in > this case: Counter are system wide. We also keep track if the counters is applied to specific domain or not. We have two bitmaps to keep track of this. There is a cost to applying counter to the domain(IPI needs to be sent to the domain). > > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > //0=tl;1=tl; > > there aren't separate counts from each of domain 0 and domain 1. Yes. There is. Each domain has its own count. I am not sure about your config. # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control //0=_;1=tl; # grep . /sys/fs/resctrl/mon_data/*/*total* /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes:Unassigned /sys/fs/resctrl/mon_data/mon_L3_01/mbm_total_bytes:22976 > I.e. if I read both I'd see the same value (sum of traffic on both domains): > > $ grep . /sys/fs/resctrl/mon_data/*/*total* > /sys/fs/resctrl /mon_data/mon_L3_00/mbm_total_bytes:260039467008 > /sys/fs/resctrl /mon_data/mon_L3_01/mbm_total_bytes:260039467008 > > -Tony -- Thanks Babu Moger
On Thu, Oct 10, 2024 at 03:32:08PM -0500, Moger, Babu wrote: > > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > > //0=tl;1=tl; > > > > there aren't separate counts from each of domain 0 and domain 1. > > Yes. There is. Each domain has its own count. I am not sure about your config. I've been reading the code and see better now. There are a bunch (32) of counters per domain. But you have a system-wide allocator. So when making a group you may allocate counters 2 and 3 for total and local respectively. Then configure the local instance of counter 2 on each domain (recording that in the per-domain bitmap) for total bandwidth. Ditto for counter 3 instances on each domain. If the user updates the configuration to stop counting on domain 1. Then the per-domain bitmap is updated to show counters 2 and 3 are no longer in use on this domain. But those counters aren't freed (because domain 0 is still using them). Is there some hardware limitation that would prevent re-using domain 1 counters 2 & 3 for some other group (RMID)? Or is this just a s/w implementation detail because you have a system wide allocator for counters? -Tony
Hi Tony, On 10/11/2024 12:44 PM, Tony Luck wrote: > On Thu, Oct 10, 2024 at 03:32:08PM -0500, Moger, Babu wrote: >>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control >>> //0=tl;1=tl; >>> >>> there aren't separate counts from each of domain 0 and domain 1. >> >> Yes. There is. Each domain has its own count. I am not sure about your config. > > I've been reading the code and see better now. > > There are a bunch (32) of counters per domain. > > But you have a system-wide allocator. So when making > a group you may allocate counters 2 and 3 for total > and local respectively. Then configure the local instance > of counter 2 on each domain (recording that in the per-domain > bitmap) for total bandwidth. Ditto for counter 3 instances > on each domain. Yes. That is correct. > > If the user updates the configuration to stop counting > on domain 1. Then the per-domain bitmap is updated to > show counters 2 and 3 are no longer in use on this domain. > But those counters aren't freed (because domain 0 is still > using them). Yes. Correct. > > Is there some hardware limitation that would prevent > re-using domain 1 counters 2 & 3 for some other group (RMID)? > > Or is this just a s/w implementation detail because > you have a system wide allocator for counters? > There is no hardware limitation. It is how resctrl is designed. In case of Intel(with two sockets, 16 CLOSIDs), You can only create 16 groups. Each group will have two domains(domain 0 for socket 0 and domain 1 for socket 1). # cat schemata MB:0=100;1=100 L3:0=ffff;1=ffff; We may have to think of addressing this sometime in the future. -- - Babu Moger
On Fri, Oct 11, 2024 at 03:49:48PM -0500, Moger, Babu wrote: > > Is there some hardware limitation that would prevent > > re-using domain 1 counters 2 & 3 for some other group (RMID)? > > > > Or is this just a s/w implementation detail because > > you have a system wide allocator for counters? > > > > There is no hardware limitation. It is how resctrl is designed. > In case of Intel(with two sockets, 16 CLOSIDs), You can only create 16 > groups. Each group will have two domains(domain 0 for socket 0 and domain 1 > for socket 1). > > # cat schemata > MB:0=100;1=100 > L3:0=ffff;1=ffff; > > > We may have to think of addressing this sometime in the future. In this example, the hardware would support using the instances of counters 2 & 3 on socket 1 for a different group (RMID). But your code doesn't alllow it because the instances of counters 2 & 3 are active on socket 0. If you had a separate counter allocation pool for each domain you would not have this limitation. When counters 2 & 3 are freed on domain 1, they could be allocated to the domain 1 element of some other group. Maybe that isn't an interesting use case, so not worth doing? But if that is the goal, then there is no benefit in having /sys/fs/resctrl/info/L3_MON/mbm_assign_control allow different domains to choose different counter allocation policies. E.g. in this example from Documentation: /child_default_mon_grp/0=tl;1=l; This group allocated two counters (because domain 0 is counting both total and local). Domain 1 is only counting local, but that means a counter on domain 1 is sitting idle. It can't be used because the matching counter is active on domain 0. I.e. the user who chose this simply gave up being able to read total bandwidth on domain 1, but didn't get an extra counter in exchange for this sacrifice. That doesn't seem like a good deal. I see two options for improvement: 1) Implement per-domain allocation of counters. Then a counter freed in a domain becomes available for use in that domain for other groups. 2) Go all-in on the global counter model and simplify the syntax of mbm_assign_control to allocate the same counters in all domains. That would simplify the parsing code. -Tony
Hi Tony, On 10/11/24 2:36 PM, Tony Luck wrote: > On Fri, Oct 11, 2024 at 03:49:48PM -0500, Moger, Babu wrote: > > I.e. the user who chose this simply gave up being able to > read total bandwidth on domain 1, but didn't get an extra > counter in exchange for this sacrifice. That doesn't seem > like a good deal. As Babu mentioned earlier, this seems equivalent to the existing CLOSid management. For example, if a user assigns only CPUs from one domain to a resource group, it does not free up the CLOSID to create a new resource group dedicated to other domain(s). Reinette
On 10/14/24 11:46, wrote: > Hi Tony, > > On 10/11/24 2:36 PM, Tony Luck wrote: >> On Fri, Oct 11, 2024 at 03:49:48PM -0500, Moger, Babu wrote: >> >> I.e. the user who chose this simply gave up being able to >> read total bandwidth on domain 1, but didn't get an extra >> counter in exchange for this sacrifice. That doesn't seem >> like a good deal. > > As Babu mentioned earlier, this seems equivalent to the existing > CLOSid management. For example, if a user assigns only CPUs > from one domain to a resource group, it does not free up the > CLOSID to create a new resource group dedicated to other domain(s). > Thanks for the confirmation here. I was wondering if this works differently on Intel. I was trying to figure out on 2 socket intel system if we can create two separate resctrl groups sharing the same CLOSID (one group using CLOSID 1 on socket 0 and another group CLOSID 1 socket 1). No. We cannot do that. Even though hardware supports separate allocation for each domain, resctrl design does not support that. -- Thanks Babu Moger
> >> I.e. the user who chose this simply gave up being able to > >> read total bandwidth on domain 1, but didn't get an extra > >> counter in exchange for this sacrifice. That doesn't seem > >> like a good deal. > > > > As Babu mentioned earlier, this seems equivalent to the existing > > CLOSid management. For example, if a user assigns only CPUs > > from one domain to a resource group, it does not free up the > > CLOSID to create a new resource group dedicated to other domain(s). I hadn't considered the case where a user is assigning CPUs to resctrl groups instead of assigning tasks. With that context this makes sense to me now. Thanks. > Thanks for the confirmation here. > > I was wondering if this works differently on Intel. I was trying to figure > out on 2 socket intel system if we can create two separate resctrl groups > sharing the same CLOSID (one group using CLOSID 1 on socket 0 and another > group CLOSID 1 socket 1). No. We cannot do that. > > Even though hardware supports separate allocation for each domain, resctrl > design does not support that. So CLOSIDs and counters are blanket assigned across all domains. I understand that now. Back to my question of why complicate code and resctrl files by providing a mechanism to enable event counters differently per-domain. "0=tl;1=_" requires allocation of the same counters as "0=tl;1=tl" or "0=t;1=l" What advantage does it have over skipping the per-domain list and just providing a single value for all domains? You clearly expect this will be a common user request since you implemented the "*" means apply to all domains. -Tony
Hi Tony, On 10/14/24 12:49, Luck, Tony wrote: >>>> I.e. the user who chose this simply gave up being able to >>>> read total bandwidth on domain 1, but didn't get an extra >>>> counter in exchange for this sacrifice. That doesn't seem >>>> like a good deal. >>> >>> As Babu mentioned earlier, this seems equivalent to the existing >>> CLOSid management. For example, if a user assigns only CPUs >>> from one domain to a resource group, it does not free up the >>> CLOSID to create a new resource group dedicated to other domain(s). > > I hadn't considered the case where a user is assigning CPUs to resctrl > groups instead of assigning tasks. With that context this makes sense > to me now. Thanks. > > >> Thanks for the confirmation here. >> >> I was wondering if this works differently on Intel. I was trying to figure >> out on 2 socket intel system if we can create two separate resctrl groups >> sharing the same CLOSID (one group using CLOSID 1 on socket 0 and another >> group CLOSID 1 socket 1). No. We cannot do that. >> >> Even though hardware supports separate allocation for each domain, resctrl >> design does not support that. > > So CLOSIDs and counters are blanket assigned across all domains. I understand > that now. > > Back to my question of why complicate code and resctrl files by providing a > mechanism to enable event counters differently per-domain. > > "0=tl;1=_" requires allocation of the same counters as "0=tl;1=tl" or > "0=t;1=l" Yes. That is correct. > > What advantage does it have over skipping the per-domain list and > just providing a single value for all domains? You clearly expect this > will be a common user request since you implemented the "*" means > apply to all domains. > We started with a global assignment by applying assignment across all the domains initially. But we wanted give a generic approach which allows both the options(domain specific assignment and global assignment with '*"). It is also matches with other managements (RMID/CLOSID management) we are doing in resctrl right now. Also, there is an extra IPI for each domain if user is only interested in on domain. Some of the discussions are here. https://lore.kernel.org/lkml/f7dac996d87b4144e4c786178a7fd3d218eaebe8.1711674410.git.babu.moger@amd.com/#r Thanks Babu Moger
> > What advantage does it have over skipping the per-domain list and > > just providing a single value for all domains? You clearly expect this > > will be a common user request since you implemented the "*" means > > apply to all domains. > > > > We started with a global assignment by applying assignment across all the > domains initially. > > But we wanted give a generic approach which allows both the options(domain > specific assignment and global assignment with '*"). It is also matches > with other managements (RMID/CLOSID management) we are doing in resctrl > right now. Also, there is an extra IPI for each domain if user is only > interested in on domain. > > Some of the discussions are here. > https://lore.kernel.org/lkml/f7dac996d87b4144e4c786178a7fd3d218eaebe8.1711674410.git.babu.moger@amd.com/#r My summary of that: Peter: Complex, don't need per-domain. Reinette: Maybe some architecture might want per-domain. Since you seem to want to keep the flexibility for a possible future where per-domain is needed. The "available_mbm_cntrs" file suggested in another thread would need to list available counters on each domain to avoid ABI problems should that future arrive. $ cat num_mbm_counters 32 $ cat available_mbm_cntrs 0=12;1=9 Current implementation would show same number for all domains. -Tony
Hi Tony, On 10/14/24 12:51 PM, Luck, Tony wrote: >>> What advantage does it have over skipping the per-domain list and >>> just providing a single value for all domains? You clearly expect this >>> will be a common user request since you implemented the "*" means >>> apply to all domains. >>> >> >> We started with a global assignment by applying assignment across all the >> domains initially. >> >> But we wanted give a generic approach which allows both the options(domain >> specific assignment and global assignment with '*"). It is also matches >> with other managements (RMID/CLOSID management) we are doing in resctrl >> right now. Also, there is an extra IPI for each domain if user is only >> interested in on domain. >> >> Some of the discussions are here. >> https://lore.kernel.org/lkml/f7dac996d87b4144e4c786178a7fd3d218eaebe8.1711674410.git.babu.moger@amd.com/#r > > My summary of that: > > Peter: Complex, don't need per-domain. > Reinette: Maybe some architecture might want per-domain. To be specific ... we already have an architecture that supports per-domain: AMD's ABMC. When I considered the lifetime of user interfaces (forever?) while knowing that ABMC does indeed support per-domain counter assignment it seems a good precaution for the user interface to support that, even if the first implementation does not. There are two parts to this work: (a) the new user interface and (b) support for ABMC. I believe that the user interface has to be flexible to support all ABMC features that users may want to take advantage of, even if the first implementation does not enable those features. In addition, the user interface should support future usages that we know if, "soft-ABMC" and MPAM. I do not think that we should require all implementations to support everything made possible by user interface though. As I mentioned in that thread [1] I do think that the user _interface_ needs to be flexible by supporting domain level counter assignment, but that it may be possible that the _implementation_ only supports assignment to '*' domain values. I thus do not think we should simplify the syntax of mbm_assign_control, but I also do not think we should require that all implementations support all that the syntax makes possible. > Since you seem to want to keep the flexibility for a possible future > where per-domain is needed. The "available_mbm_cntrs" file > suggested in another thread would need to list available counters > on each domain to avoid ABI problems should that future arrive. > > $ cat num_mbm_counters > 32 > > $ cat available_mbm_cntrs > 0=12;1=9 Good point. > > Current implementation would show same number for all domains. > Reinette [1] https://lore.kernel.org/all/c8a23c54-237c-4ebb-9c88-39606b9ae1ab@intel.com/
Hi Reinette/Tony, On 10/14/24 15:05, wrote: > Hi Tony, > > On 10/14/24 12:51 PM, Luck, Tony wrote: >>>> What advantage does it have over skipping the per-domain list and >>>> just providing a single value for all domains? You clearly expect this >>>> will be a common user request since you implemented the "*" means >>>> apply to all domains. >>>> >>> >>> We started with a global assignment by applying assignment across all the >>> domains initially. >>> >>> But we wanted give a generic approach which allows both the options(domain >>> specific assignment and global assignment with '*"). It is also matches >>> with other managements (RMID/CLOSID management) we are doing in resctrl >>> right now. Also, there is an extra IPI for each domain if user is only >>> interested in on domain. >>> >>> Some of the discussions are here. >>> https://lore.kernel.org/lkml/f7dac996d87b4144e4c786178a7fd3d218eaebe8.1711674410.git.babu.moger@amd.com/#r >> >> My summary of that: >> >> Peter: Complex, don't need per-domain. >> Reinette: Maybe some architecture might want per-domain. > > To be specific ... we already have an architecture that supports per-domain: > AMD's ABMC. When I considered the lifetime of user interfaces (forever?) while knowing > that ABMC does indeed support per-domain counter assignment it seems a good > precaution for the user interface to support that, even if the first > implementation does not. > > There are two parts to this work: (a) the new user interface > and (b) support for ABMC. I believe that the user interface has to be > flexible to support all ABMC features that users may want to take advantage of, > even if the first implementation does not enable those features. In addition, > the user interface should support future usages that we know if, "soft-ABMC" > and MPAM. > > I do not think that we should require all implementations to support everything > made possible by user interface though. As I mentioned in that thread [1] I do > think that the user _interface_ needs to be flexible by supporting domain level > counter assignment, but that it may be possible that the _implementation_ only > supports assignment to '*' domain values. > > I thus do not think we should simplify the syntax of mbm_assign_control, > but I also do not think we should require that all implementations support all that > the syntax makes possible. > >> Since you seem to want to keep the flexibility for a possible future >> where per-domain is needed. The "available_mbm_cntrs" file >> suggested in another thread would need to list available counters >> on each domain to avoid ABI problems should that future arrive. >> >> $ cat num_mbm_counters >> 32 >> >> $ cat available_mbm_cntrs >> 0=12;1=9 > > Good point. > Working on this now. Wanted to confirm if we really need domain specific information? To me, it does not seem necessary for the user. User cannot make any decisions based on this information. All user wants to know is if there are global counters available. $ cat num_mbm_counters 32 $ cat available_mbm_cntrs 15 -- Thanks Babu Moger
> >> Since you seem to want to keep the flexibility for a possible future > >> where per-domain is needed. The "available_mbm_cntrs" file > >> suggested in another thread would need to list available counters > >> on each domain to avoid ABI problems should that future arrive. > >> > >> $ cat num_mbm_counters > >> 32 > >> > >> $ cat available_mbm_cntrs > >> 0=12;1=9 > > > > Good point. > > > > Working on this now. Wanted to confirm if we really need domain specific > information? > > To me, it does not seem necessary for the user. User cannot make any > decisions based on this information. > > All user wants to know is if there are global counters available. > > $ cat num_mbm_counters > 32 > > $ cat available_mbm_cntrs > 15 This approach paints resctrl into an ABI corner where it can't later update resctrl to track counters per-domain. Maybe you'll never want to do that, but some other architecture might want to have that flexibility. -Tony
Hi Tony, On 10/24/2024 12:37 PM, Luck, Tony wrote: >>>> Since you seem to want to keep the flexibility for a possible future >>>> where per-domain is needed. The "available_mbm_cntrs" file >>>> suggested in another thread would need to list available counters >>>> on each domain to avoid ABI problems should that future arrive. >>>> >>>> $ cat num_mbm_counters >>>> 32 >>>> >>>> $ cat available_mbm_cntrs >>>> 0=12;1=9 >>> >>> Good point. >>> >> >> Working on this now. Wanted to confirm if we really need domain specific >> information? >> >> To me, it does not seem necessary for the user. User cannot make any >> decisions based on this information. >> >> All user wants to know is if there are global counters available. >> >> $ cat num_mbm_counters >> 32 >> >> $ cat available_mbm_cntrs >> 15 > > This approach paints resctrl into an ABI corner where it can't later > update resctrl to track counters per-domain. Maybe you'll never want to do that, > but some other architecture might want to have that flexibility. Ok. Fine. Lets keep the per-domain counters. -- - Babu Moger
Hi Reinette/Tony, On 10/14/24 15:05, wrote: > Hi Tony, > > On 10/14/24 12:51 PM, Luck, Tony wrote: >>>> What advantage does it have over skipping the per-domain list and >>>> just providing a single value for all domains? You clearly expect this >>>> will be a common user request since you implemented the "*" means >>>> apply to all domains. >>>> >>> >>> We started with a global assignment by applying assignment across all the >>> domains initially. >>> >>> But we wanted give a generic approach which allows both the options(domain >>> specific assignment and global assignment with '*"). It is also matches >>> with other managements (RMID/CLOSID management) we are doing in resctrl >>> right now. Also, there is an extra IPI for each domain if user is only >>> interested in on domain. >>> >>> Some of the discussions are here. >>> https://lore.kernel.org/lkml/f7dac996d87b4144e4c786178a7fd3d218eaebe8.1711674410.git.babu.moger@amd.com/#r >> >> My summary of that: >> >> Peter: Complex, don't need per-domain. >> Reinette: Maybe some architecture might want per-domain. > > To be specific ... we already have an architecture that supports per-domain: > AMD's ABMC. When I considered the lifetime of user interfaces (forever?) while knowing > that ABMC does indeed support per-domain counter assignment it seems a good > precaution for the user interface to support that, even if the first > implementation does not. > > There are two parts to this work: (a) the new user interface > and (b) support for ABMC. I believe that the user interface has to be > flexible to support all ABMC features that users may want to take advantage of, > even if the first implementation does not enable those features. In addition, > the user interface should support future usages that we know if, "soft-ABMC" > and MPAM. > > I do not think that we should require all implementations to support everything > made possible by user interface though. As I mentioned in that thread [1] I do > think that the user _interface_ needs to be flexible by supporting domain level > counter assignment, but that it may be possible that the _implementation_ only > supports assignment to '*' domain values. > > I thus do not think we should simplify the syntax of mbm_assign_control, > but I also do not think we should require that all implementations support all that > the syntax makes possible. > >> Since you seem to want to keep the flexibility for a possible future >> where per-domain is needed. The "available_mbm_cntrs" file >> suggested in another thread would need to list available counters >> on each domain to avoid ABI problems should that future arrive. >> >> $ cat num_mbm_counters >> 32 >> >> $ cat available_mbm_cntrs >> 0=12;1=9 > > Good point. Ok. Will add it. Thanks Babu Moger
© 2016 - 2024 Red Hat, Inc.