Introduce the interface file "mbm_assign_mode" to list monitor modes
supported.
The "mbm_cntr_assign" mode provides the option to assign a counter to
an RMID, event pair and monitor the bandwidth as long as it is assigned.
On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable
Bandwidth Monitoring Counters) hardware feature and is enabled by default.
The "default" mode is the existing monitoring mode that works without the
explicit counter assignment, instead relying on dynamic counter assignment
by hardware that may result in hardware not dedicating a counter resulting
in monitoring data reads returning "Unavailable".
Provide an interface to display the monitor mode on the system.
$ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
[mbm_cntr_assign]
default
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v9: Updated user documentation based on comments.
v8: Commit message update.
v7: Updated the descriptions/commit log in resctrl.rst to generic text.
Thanks to James and Reinette.
Rename mbm_mode to mbm_assign_mode.
Introduced mutex lock in rdtgroup_mbm_mode_show().
v6: Added documentation for mbm_cntr_assign and legacy mode.
Moved mbm_mode fflags initialization to static initialization.
v5: Changed interface name to mbm_mode.
It will be always available even if ABMC feature is not supported.
Added description in resctrl.rst about ABMC mode.
Fixed display abmc and legacy consistantly.
v4: Fixed the checks for legacy and abmc mode. Default it ABMC.
v3: New patch to display ABMC capability.
---
Documentation/arch/x86/resctrl.rst | 33 ++++++++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 31 ++++++++++++++++++++++++
2 files changed, 64 insertions(+)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 30586728a4cd..a93d7980e25f 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -257,6 +257,39 @@ with the following files:
# cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
0=0x30;1=0x30;3=0x15;4=0x15
+"mbm_assign_mode":
+ Reports the list of monitoring modes supported. The enclosed brackets
+ indicate which mode is enabled.
+ ::
+
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
+ [mbm_cntr_assign]
+ default
+
+ "mbm_cntr_assign":
+
+ In mbm_cntr_assign mode user-space is able to specify which of the
+ events in CTRL_MON or MON groups should have a counter assigned using the
+ "mbm_assign_control" file. The number of counters available is described
+ in the "num_mbm_cntrs" file. Changing the mode may cause all counters on
+ a resource to reset.
+
+ The mode is useful on platforms which support more CTRL_MON and MON
+ groups than the hardware counters, meaning 'unassigned' events on CTRL_MON or
+ MON groups will report 'Unavailable' or count the traffic in an unpredictable
+ way.
+
+ AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature
+ enable this mode by default so that counters remain assigned even when the
+ corresponding RMID is not in use by any processor.
+
+ "default":
+
+ In default mode resctrl assumes there is a hardware counter for each
+ event within every CTRL_MON and MON group. Reading mbm_total_bytes or
+ mbm_local_bytes may report 'Unavailable' if there is no counter associated
+ with that event.
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index d54c2701c09c..f25ff1430014 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -845,6 +845,30 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
return ret;
}
+static int rdtgroup_mbm_assign_mode_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdt_resource *r = of->kn->parent->priv;
+
+ mutex_lock(&rdtgroup_mutex);
+
+ if (r->mon.mbm_cntr_assignable) {
+ if (resctrl_arch_mbm_cntr_assign_enabled(r)) {
+ seq_puts(s, "[mbm_cntr_assign]\n");
+ seq_puts(s, "default\n");
+ } else {
+ seq_puts(s, "mbm_cntr_assign\n");
+ seq_puts(s, "[default]\n");
+ }
+ } else {
+ seq_puts(s, "[default]\n");
+ }
+
+ mutex_unlock(&rdtgroup_mutex);
+
+ return 0;
+}
+
#ifdef CONFIG_PROC_CPU_RESCTRL
/*
@@ -1901,6 +1925,13 @@ static struct rftype res_common_files[] = {
.seq_show = mbm_local_bytes_config_show,
.write = mbm_local_bytes_config_write,
},
+ {
+ .name = "mbm_assign_mode",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = rdtgroup_mbm_assign_mode_show,
+ .fflags = RFTYPE_MON_INFO,
+ },
{
.name = "cpus",
.mode = 0644,
--
2.34.1
Hi Babu, On 10/29/24 4:21 PM, Babu Moger wrote: > Introduce the interface file "mbm_assign_mode" to list monitor modes > supported. > > The "mbm_cntr_assign" mode provides the option to assign a counter to > an RMID, event pair and monitor the bandwidth as long as it is assigned. > > On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable > Bandwidth Monitoring Counters) hardware feature and is enabled by default. > > The "default" mode is the existing monitoring mode that works without the > explicit counter assignment, instead relying on dynamic counter assignment > by hardware that may result in hardware not dedicating a counter resulting > in monitoring data reads returning "Unavailable". > > Provide an interface to display the monitor mode on the system. > $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode > [mbm_cntr_assign] > default > > Signed-off-by: Babu Moger <babu.moger@amd.com> > --- > v9: Updated user documentation based on comments. > > v8: Commit message update. > > v7: Updated the descriptions/commit log in resctrl.rst to generic text. > Thanks to James and Reinette. > Rename mbm_mode to mbm_assign_mode. > Introduced mutex lock in rdtgroup_mbm_mode_show(). > > v6: Added documentation for mbm_cntr_assign and legacy mode. > Moved mbm_mode fflags initialization to static initialization. > > v5: Changed interface name to mbm_mode. > It will be always available even if ABMC feature is not supported. > Added description in resctrl.rst about ABMC mode. > Fixed display abmc and legacy consistantly. > > v4: Fixed the checks for legacy and abmc mode. Default it ABMC. > > v3: New patch to display ABMC capability. > --- > Documentation/arch/x86/resctrl.rst | 33 ++++++++++++++++++++++++++ > arch/x86/kernel/cpu/resctrl/rdtgroup.c | 31 ++++++++++++++++++++++++ > 2 files changed, 64 insertions(+) > > diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst > index 30586728a4cd..a93d7980e25f 100644 > --- a/Documentation/arch/x86/resctrl.rst > +++ b/Documentation/arch/x86/resctrl.rst > @@ -257,6 +257,39 @@ with the following files: > # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config > 0=0x30;1=0x30;3=0x15;4=0x15 > > +"mbm_assign_mode": > + Reports the list of monitoring modes supported. The enclosed brackets > + indicate which mode is enabled. > + :: > + > + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode > + [mbm_cntr_assign] > + default > + > + "mbm_cntr_assign": > + > + In mbm_cntr_assign mode user-space is able to specify which of the > + events in CTRL_MON or MON groups should have a counter assigned using the > + "mbm_assign_control" file. The number of counters available is described > + in the "num_mbm_cntrs" file. Changing the mode may cause all counters on > + a resource to reset. > + > + The mode is useful on platforms which support more CTRL_MON and MON > + groups than the hardware counters, meaning 'unassigned' events on CTRL_MON or " than the hardware counters" -> " than hardware counters"? > + MON groups will report 'Unavailable' or count the traffic in an unpredictable > + way. I think the above can be confusing to users. It mentioned "*will* report Unavailable" and then "*or* count the traffic in an unpredictable way". It is not possible for counter to report "Unavailable" while also reporting unpredictable data. My concern is that there is no way for a user to know if the platform supports more CTRL_MON and MON groups than hardware counters and the above seems to imply that counters may be unreliable ... so how does a user know if counters are unreliable or not? Can this be made specific to help users know if their platforms are impacted? From what I know all AMD platforms are impacted so perhaps a straight-forward: "The mode is useful on AMD platforms which support more CTRL_MON and MON ..." I'm concerned that users with Intel platforms may want to use the "mbm_cntr_assign" mode to make the event data "more predictable" and then be concerned when the mode does not exist. As an alternative, is it possible to know the number of hardware counters on AMD systems without ABMC? I wonder if we could perhaps always expose num_mbm_cntrs as a way for users to know if their platform may be impacted by this type of "unpredictability" (by comparing num_mbm_cntrs to num_rmids). > + > + AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature > + enable this mode by default so that counters remain assigned even when the > + corresponding RMID is not in use by any processor. > + > + "default": > + > + In default mode resctrl assumes there is a hardware counter for each > + event within every CTRL_MON and MON group. Reading mbm_total_bytes or > + mbm_local_bytes may report 'Unavailable' if there is no counter associated > + with that event. If I understand correctly, on AMD platforms without ABMC the events only report "Unavailable" if there is no counter assigned at the time of the query. If a counter is unassigned and then reassigned then the event count will reset and the user will get some data back but it may thus be unpredictable (to match earlier language). Is this correct? Any AMD platform in "default" mode may thus be vulnerable to "unpredictable" event counts (not just "Unavailable") ... this gets complicated because users should be steered to avoid "default" mode if mbm_assign_mode is available, while not be made concerned to use "default" mode on Intel where mbm_assign_mode is not available. Reinette
Hi Reinette, On 11/15/24 18:00, Reinette Chatre wrote: > Hi Babu, > > On 10/29/24 4:21 PM, Babu Moger wrote: >> Introduce the interface file "mbm_assign_mode" to list monitor modes >> supported. >> >> The "mbm_cntr_assign" mode provides the option to assign a counter to >> an RMID, event pair and monitor the bandwidth as long as it is assigned. >> >> On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable >> Bandwidth Monitoring Counters) hardware feature and is enabled by default. >> >> The "default" mode is the existing monitoring mode that works without the >> explicit counter assignment, instead relying on dynamic counter assignment >> by hardware that may result in hardware not dedicating a counter resulting >> in monitoring data reads returning "Unavailable". >> >> Provide an interface to display the monitor mode on the system. >> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode >> [mbm_cntr_assign] >> default >> >> Signed-off-by: Babu Moger <babu.moger@amd.com> >> --- >> v9: Updated user documentation based on comments. >> >> v8: Commit message update. >> >> v7: Updated the descriptions/commit log in resctrl.rst to generic text. >> Thanks to James and Reinette. >> Rename mbm_mode to mbm_assign_mode. >> Introduced mutex lock in rdtgroup_mbm_mode_show(). >> >> v6: Added documentation for mbm_cntr_assign and legacy mode. >> Moved mbm_mode fflags initialization to static initialization. >> >> v5: Changed interface name to mbm_mode. >> It will be always available even if ABMC feature is not supported. >> Added description in resctrl.rst about ABMC mode. >> Fixed display abmc and legacy consistantly. >> >> v4: Fixed the checks for legacy and abmc mode. Default it ABMC. >> >> v3: New patch to display ABMC capability. >> --- >> Documentation/arch/x86/resctrl.rst | 33 ++++++++++++++++++++++++++ >> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 31 ++++++++++++++++++++++++ >> 2 files changed, 64 insertions(+) >> >> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst >> index 30586728a4cd..a93d7980e25f 100644 >> --- a/Documentation/arch/x86/resctrl.rst >> +++ b/Documentation/arch/x86/resctrl.rst >> @@ -257,6 +257,39 @@ with the following files: >> # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config >> 0=0x30;1=0x30;3=0x15;4=0x15 >> >> +"mbm_assign_mode": >> + Reports the list of monitoring modes supported. The enclosed brackets >> + indicate which mode is enabled. >> + :: >> + >> + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode >> + [mbm_cntr_assign] >> + default >> + >> + "mbm_cntr_assign": >> + >> + In mbm_cntr_assign mode user-space is able to specify which of the >> + events in CTRL_MON or MON groups should have a counter assigned using the >> + "mbm_assign_control" file. The number of counters available is described >> + in the "num_mbm_cntrs" file. Changing the mode may cause all counters on >> + a resource to reset. >> + >> + The mode is useful on platforms which support more CTRL_MON and MON >> + groups than the hardware counters, meaning 'unassigned' events on CTRL_MON or > > " than the hardware counters" -> " than hardware counters"? Sure. > >> + MON groups will report 'Unavailable' or count the traffic in an unpredictable >> + way. > > I think the above can be confusing to users. It mentioned "*will* report Unavailable" > and then "*or* count the traffic in an unpredictable way". It is not possible for > counter to report "Unavailable" while also reporting unpredictable data. > > My concern is that there is no way for a user to know if the platform supports more > CTRL_MON and MON groups than hardware counters and the above seems to imply that counters > may be unreliable ... so how does a user know if counters are unreliable or not? That is correct. There is no definite way to find out if the counters are unreliable. > > Can this be made specific to help users know if their platforms are impacted? From > what I know all AMD platforms are impacted so perhaps a straight-forward: > > "The mode is useful on AMD platforms which support more CTRL_MON and MON ..." Sure. > > I'm concerned that users with Intel platforms may want to use the "mbm_cntr_assign" mode > to make the event data "more predictable" and then be concerned when the mode does > not exist. > > As an alternative, is it possible to know the number of hardware counters on AMD systems > without ABMC? I wonder if we could perhaps always expose num_mbm_cntrs as a way for > users to know if their platform may be impacted by this type of "unpredictability" (by comparing > num_mbm_cntrs to num_rmids). There is some round about(or hacky) way to find that out number of RMIDs that can be active. > >> + >> + AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature >> + enable this mode by default so that counters remain assigned even when the >> + corresponding RMID is not in use by any processor. >> + >> + "default": >> + >> + In default mode resctrl assumes there is a hardware counter for each >> + event within every CTRL_MON and MON group. Reading mbm_total_bytes or >> + mbm_local_bytes may report 'Unavailable' if there is no counter associated >> + with that event. > > If I understand correctly, on AMD platforms without ABMC the events only report > "Unavailable" if there is no counter assigned at the time of the query. If a counter > is unassigned and then reassigned then the event count will reset and the user > will get some data back but it may thus be unpredictable (to match earlier language). > Is this correct? Any AMD platform in "default" mode may thus be vulnerable to > "unpredictable" event counts (not just "Unavailable") ... this gets complicated Yes. All the AMD systems without ABMC are affected by this problem. > because users should be steered to avoid "default" mode if mbm_assign_mode is > available, while not be made concerned to use "default" mode on Intel where > mbm_assign_mode is not available. Can we add text to clarify this? > > Reinette > > -- Thanks Babu Moger
Hi Babu, On 11/18/24 11:04 AM, Moger, Babu wrote: > Hi Reinette, > > On 11/15/24 18:00, Reinette Chatre wrote: >> Hi Babu, >> >> On 10/29/24 4:21 PM, Babu Moger wrote: >>> Introduce the interface file "mbm_assign_mode" to list monitor modes >>> supported. >>> >>> The "mbm_cntr_assign" mode provides the option to assign a counter to >>> an RMID, event pair and monitor the bandwidth as long as it is assigned. >>> >>> On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable >>> Bandwidth Monitoring Counters) hardware feature and is enabled by default. >>> >>> The "default" mode is the existing monitoring mode that works without the >>> explicit counter assignment, instead relying on dynamic counter assignment >>> by hardware that may result in hardware not dedicating a counter resulting >>> in monitoring data reads returning "Unavailable". >>> >>> Provide an interface to display the monitor mode on the system. >>> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode >>> [mbm_cntr_assign] >>> default >>> >>> Signed-off-by: Babu Moger <babu.moger@amd.com> >>> --- ... >> I'm concerned that users with Intel platforms may want to use the "mbm_cntr_assign" mode >> to make the event data "more predictable" and then be concerned when the mode does >> not exist. >> >> As an alternative, is it possible to know the number of hardware counters on AMD systems >> without ABMC? I wonder if we could perhaps always expose num_mbm_cntrs as a way for >> users to know if their platform may be impacted by this type of "unpredictability" (by comparing >> num_mbm_cntrs to num_rmids). > > There is some round about(or hacky) way to find that out number of RMIDs > that can be active. Does this give consistent and accurate data? Is this something that can be added to resctrl? (Reading your other message [1] it does not sound as though it can produce an accurate number on boot.) If not then it will be up to the documentation to be accurate. >>> + >>> + AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature >>> + enable this mode by default so that counters remain assigned even when the >>> + corresponding RMID is not in use by any processor. >>> + >>> + "default": >>> + >>> + In default mode resctrl assumes there is a hardware counter for each >>> + event within every CTRL_MON and MON group. Reading mbm_total_bytes or >>> + mbm_local_bytes may report 'Unavailable' if there is no counter associated >>> + with that event. >> >> If I understand correctly, on AMD platforms without ABMC the events only report >> "Unavailable" if there is no counter assigned at the time of the query. If a counter >> is unassigned and then reassigned then the event count will reset and the user >> will get some data back but it may thus be unpredictable (to match earlier language). >> Is this correct? Any AMD platform in "default" mode may thus be vulnerable to >> "unpredictable" event counts (not just "Unavailable") ... this gets complicated > > Yes. All the AMD systems without ABMC are affected by this problem. > >> because users should be steered to avoid "default" mode if mbm_assign_mode is >> available, while not be made concerned to use "default" mode on Intel where >> mbm_assign_mode is not available. > > Can we add text to clarify this? Please do. Reinette [1] https://lore.kernel.org/all/35fc70fd-0281-4ac8-b32b-efa2f4516901@amd.com/
Hi Reinette, On 11/18/2024 4:07 PM, Reinette Chatre wrote: > Hi Babu, > > On 11/18/24 11:04 AM, Moger, Babu wrote: >> Hi Reinette, >> >> On 11/15/24 18:00, Reinette Chatre wrote: >>> Hi Babu, >>> >>> On 10/29/24 4:21 PM, Babu Moger wrote: >>>> Introduce the interface file "mbm_assign_mode" to list monitor modes >>>> supported. >>>> >>>> The "mbm_cntr_assign" mode provides the option to assign a counter to >>>> an RMID, event pair and monitor the bandwidth as long as it is assigned. >>>> >>>> On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable >>>> Bandwidth Monitoring Counters) hardware feature and is enabled by default. >>>> >>>> The "default" mode is the existing monitoring mode that works without the >>>> explicit counter assignment, instead relying on dynamic counter assignment >>>> by hardware that may result in hardware not dedicating a counter resulting >>>> in monitoring data reads returning "Unavailable". >>>> >>>> Provide an interface to display the monitor mode on the system. >>>> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode >>>> [mbm_cntr_assign] >>>> default >>>> >>>> Signed-off-by: Babu Moger <babu.moger@amd.com> >>>> --- > > ... > >>> I'm concerned that users with Intel platforms may want to use the "mbm_cntr_assign" mode >>> to make the event data "more predictable" and then be concerned when the mode does >>> not exist. >>> >>> As an alternative, is it possible to know the number of hardware counters on AMD systems >>> without ABMC? I wonder if we could perhaps always expose num_mbm_cntrs as a way for >>> users to know if their platform may be impacted by this type of "unpredictability" (by comparing >>> num_mbm_cntrs to num_rmids). >> >> There is some round about(or hacky) way to find that out number of RMIDs >> that can be active. > > Does this give consistent and accurate data? Is this something that can be added to resctrl? > (Reading your other message [1] it does not sound as though it can produce an accurate > number on boot.) > If not then it will be up to the documentation to be accurate. > > >>>> + >>>> + AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature >>>> + enable this mode by default so that counters remain assigned even when the >>>> + corresponding RMID is not in use by any processor. >>>> + >>>> + "default": >>>> + >>>> + In default mode resctrl assumes there is a hardware counter for each >>>> + event within every CTRL_MON and MON group. Reading mbm_total_bytes or >>>> + mbm_local_bytes may report 'Unavailable' if there is no counter associated >>>> + with that event. >>> >>> If I understand correctly, on AMD platforms without ABMC the events only report >>> "Unavailable" if there is no counter assigned at the time of the query. If a counter >>> is unassigned and then reassigned then the event count will reset and the user >>> will get some data back but it may thus be unpredictable (to match earlier language). >>> Is this correct? Any AMD platform in "default" mode may thus be vulnerable to >>> "unpredictable" event counts (not just "Unavailable") ... this gets complicated >> >> Yes. All the AMD systems without ABMC are affected by this problem. >> >>> because users should be steered to avoid "default" mode if mbm_assign_mode is >>> available, while not be made concerned to use "default" mode on Intel where >>> mbm_assign_mode is not available. >> >> Can we add text to clarify this? > > Please do. I think we need to add text about AMD systems. How about this? "default": In default mode resctrl assumes there is a hardware counter for each event within every CTRL_MON and MON group. On AMD systems with 16 more monitoring groups, reading mbm_total_bytes or mbm_local_bytes may report 'Unavailable' if there is no counter associated with that event. It is therefore recommended to use the 'mbm_cntr_assign' mode, if supported." > > Reinette > > [1] https://lore.kernel.org/all/35fc70fd-0281-4ac8-b32b-efa2f4516901@amd.com/ > -- - Babu Moger
Hi Babu, On 11/22/24 10:25 AM, Moger, Babu wrote: > Hi Reinette, > > On 11/18/2024 4:07 PM, Reinette Chatre wrote: >> Hi Babu, >> >> On 11/18/24 11:04 AM, Moger, Babu wrote: >>> Hi Reinette, >>> >>> On 11/15/24 18:00, Reinette Chatre wrote: >>>> Hi Babu, >>>> >>>> On 10/29/24 4:21 PM, Babu Moger wrote: >>>>> Introduce the interface file "mbm_assign_mode" to list monitor modes >>>>> supported. >>>>> >>>>> The "mbm_cntr_assign" mode provides the option to assign a counter to >>>>> an RMID, event pair and monitor the bandwidth as long as it is assigned. >>>>> >>>>> On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable >>>>> Bandwidth Monitoring Counters) hardware feature and is enabled by default. >>>>> >>>>> The "default" mode is the existing monitoring mode that works without the >>>>> explicit counter assignment, instead relying on dynamic counter assignment >>>>> by hardware that may result in hardware not dedicating a counter resulting >>>>> in monitoring data reads returning "Unavailable". >>>>> >>>>> Provide an interface to display the monitor mode on the system. >>>>> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode >>>>> [mbm_cntr_assign] >>>>> default >>>>> >>>>> Signed-off-by: Babu Moger <babu.moger@amd.com> >>>>> --- >> >> ... >> >>>> I'm concerned that users with Intel platforms may want to use the "mbm_cntr_assign" mode >>>> to make the event data "more predictable" and then be concerned when the mode does >>>> not exist. >>>> >>>> As an alternative, is it possible to know the number of hardware counters on AMD systems >>>> without ABMC? I wonder if we could perhaps always expose num_mbm_cntrs as a way for >>>> users to know if their platform may be impacted by this type of "unpredictability" (by comparing >>>> num_mbm_cntrs to num_rmids). >>> >>> There is some round about(or hacky) way to find that out number of RMIDs >>> that can be active. >> >> Does this give consistent and accurate data? Is this something that can be added to resctrl? >> (Reading your other message [1] it does not sound as though it can produce an accurate >> number on boot.) >> If not then it will be up to the documentation to be accurate. >> >> >>>>> + >>>>> + AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature >>>>> + enable this mode by default so that counters remain assigned even when the >>>>> + corresponding RMID is not in use by any processor. >>>>> + >>>>> + "default": >>>>> + >>>>> + In default mode resctrl assumes there is a hardware counter for each >>>>> + event within every CTRL_MON and MON group. Reading mbm_total_bytes or >>>>> + mbm_local_bytes may report 'Unavailable' if there is no counter associated >>>>> + with that event. >>>> >>>> If I understand correctly, on AMD platforms without ABMC the events only report >>>> "Unavailable" if there is no counter assigned at the time of the query. If a counter >>>> is unassigned and then reassigned then the event count will reset and the user >>>> will get some data back but it may thus be unpredictable (to match earlier language). >>>> Is this correct? Any AMD platform in "default" mode may thus be vulnerable to >>>> "unpredictable" event counts (not just "Unavailable") ... this gets complicated >>> >>> Yes. All the AMD systems without ABMC are affected by this problem. >>> >>>> because users should be steered to avoid "default" mode if mbm_assign_mode is >>>> available, while not be made concerned to use "default" mode on Intel where >>>> mbm_assign_mode is not available. >>> >>> Can we add text to clarify this? >> >> Please do. > > I think we need to add text about AMD systems. How about this? > > "default": > In default mode resctrl assumes there is a hardware counter for each > event within every CTRL_MON and MON group. On AMD systems with 16 more monitoring groups, reading mbm_total_bytes or mbm_local_bytes may report 'Unavailable' if there is no counter associated with that event. It is therefore recommended to use the 'mbm_cntr_assign' mode, if supported." What is meant with "On AMD systems with 16 more monitoring groups"? First, the language is not clear, second, you mentioned earlier that there is just a "hacky" way to determine number of RMIDs that can be active but here "16" is made official in the documentation? Reinette
Hi Reinette, On 11/22/2024 3:37 PM, Reinette Chatre wrote: > Hi Babu, > > On 11/22/24 10:25 AM, Moger, Babu wrote: >> Hi Reinette, >> >> On 11/18/2024 4:07 PM, Reinette Chatre wrote: >>> Hi Babu, >>> >>> On 11/18/24 11:04 AM, Moger, Babu wrote: >>>> Hi Reinette, >>>> >>>> On 11/15/24 18:00, Reinette Chatre wrote: >>>>> Hi Babu, >>>>> >>>>> On 10/29/24 4:21 PM, Babu Moger wrote: >>>>>> Introduce the interface file "mbm_assign_mode" to list monitor modes >>>>>> supported. >>>>>> >>>>>> The "mbm_cntr_assign" mode provides the option to assign a counter to >>>>>> an RMID, event pair and monitor the bandwidth as long as it is assigned. >>>>>> >>>>>> On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable >>>>>> Bandwidth Monitoring Counters) hardware feature and is enabled by default. >>>>>> >>>>>> The "default" mode is the existing monitoring mode that works without the >>>>>> explicit counter assignment, instead relying on dynamic counter assignment >>>>>> by hardware that may result in hardware not dedicating a counter resulting >>>>>> in monitoring data reads returning "Unavailable". >>>>>> >>>>>> Provide an interface to display the monitor mode on the system. >>>>>> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode >>>>>> [mbm_cntr_assign] >>>>>> default >>>>>> >>>>>> Signed-off-by: Babu Moger <babu.moger@amd.com> >>>>>> --- >>> >>> ... >>> >>>>> I'm concerned that users with Intel platforms may want to use the "mbm_cntr_assign" mode >>>>> to make the event data "more predictable" and then be concerned when the mode does >>>>> not exist. >>>>> >>>>> As an alternative, is it possible to know the number of hardware counters on AMD systems >>>>> without ABMC? I wonder if we could perhaps always expose num_mbm_cntrs as a way for >>>>> users to know if their platform may be impacted by this type of "unpredictability" (by comparing >>>>> num_mbm_cntrs to num_rmids). >>>> >>>> There is some round about(or hacky) way to find that out number of RMIDs >>>> that can be active. >>> >>> Does this give consistent and accurate data? Is this something that can be added to resctrl? >>> (Reading your other message [1] it does not sound as though it can produce an accurate >>> number on boot.) >>> If not then it will be up to the documentation to be accurate. >>> >>> >>>>>> + >>>>>> + AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature >>>>>> + enable this mode by default so that counters remain assigned even when the >>>>>> + corresponding RMID is not in use by any processor. >>>>>> + >>>>>> + "default": >>>>>> + >>>>>> + In default mode resctrl assumes there is a hardware counter for each >>>>>> + event within every CTRL_MON and MON group. Reading mbm_total_bytes or >>>>>> + mbm_local_bytes may report 'Unavailable' if there is no counter associated >>>>>> + with that event. >>>>> >>>>> If I understand correctly, on AMD platforms without ABMC the events only report >>>>> "Unavailable" if there is no counter assigned at the time of the query. If a counter >>>>> is unassigned and then reassigned then the event count will reset and the user >>>>> will get some data back but it may thus be unpredictable (to match earlier language). >>>>> Is this correct? Any AMD platform in "default" mode may thus be vulnerable to >>>>> "unpredictable" event counts (not just "Unavailable") ... this gets complicated >>>> >>>> Yes. All the AMD systems without ABMC are affected by this problem. >>>> >>>>> because users should be steered to avoid "default" mode if mbm_assign_mode is >>>>> available, while not be made concerned to use "default" mode on Intel where >>>>> mbm_assign_mode is not available. >>>> >>>> Can we add text to clarify this? >>> >>> Please do. >> >> I think we need to add text about AMD systems. How about this? >> >> "default": >> In default mode resctrl assumes there is a hardware counter for each >> event within every CTRL_MON and MON group. On AMD systems with 16 more monitoring groups, reading mbm_total_bytes or mbm_local_bytes may report 'Unavailable' if there is no counter associated with that event. It is therefore recommended to use the 'mbm_cntr_assign' mode, if supported." > > > What is meant with "On AMD systems with 16 more monitoring groups"? First, the language is > not clear, second, you mentioned earlier that there is just a "hacky" way to determine number > of RMIDs that can be active but here "16" is made official in the documentation? > The lowest active RMID is 16. I could not get it using the hacky way. I have verified testing on all the previous generation of AMD systems by creating the monitoring groups until it reports "Unavailable". In recent systems it is 32. We can drop the exact number to be generic. There is no clear documentation on that. Here is what the doc says. A given implementation may have insufficient hardware to simultaneously track the bandwidth for all RMID values which the hardware supports. If an attempt is made to read a Bandwidth Count for an RMID that has been impacted by these hardware limitations, the “U” bit of the QM_CTR will be set when the counter is read. Subsequent QM_CTR reads for that RMID and Event may return a value with the "U" bit clear. Potential causes of the “U” bit being set include (but are not limited to) • RMID is not currently tracked by the hardware. • RMID was not tracked by the hardware at some time since it was last read. • RMID has not been read since it started being tracked by the hardware. All RMIDs which are currently in use by one or more processors in the QOS domain will be tracked. The hardware will always begin tracking a new RMID value when it gets written to the PQR_ASSOC register of any of the processors in the QOS domain and it is not already being tracked. When the hardware begins tracking an RMID that it was not previously tracking, it will clear the QM_CTR for all events in the new RMID - Babu Moger
Hi Babu, On 11/22/24 4:02 PM, Moger, Babu wrote: > Hi Reinette, > > On 11/22/2024 3:37 PM, Reinette Chatre wrote: >> Hi Babu, >> >> On 11/22/24 10:25 AM, Moger, Babu wrote: >>> Hi Reinette, >>> >>> On 11/18/2024 4:07 PM, Reinette Chatre wrote: >>>> Hi Babu, >>>> >>>> On 11/18/24 11:04 AM, Moger, Babu wrote: >>>>> Hi Reinette, >>>>> >>>>> On 11/15/24 18:00, Reinette Chatre wrote: >>>>>> Hi Babu, >>>>>> >>>>>> On 10/29/24 4:21 PM, Babu Moger wrote: >>>>>>> Introduce the interface file "mbm_assign_mode" to list monitor modes >>>>>>> supported. >>>>>>> >>>>>>> The "mbm_cntr_assign" mode provides the option to assign a counter to >>>>>>> an RMID, event pair and monitor the bandwidth as long as it is assigned. >>>>>>> >>>>>>> On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable >>>>>>> Bandwidth Monitoring Counters) hardware feature and is enabled by default. >>>>>>> >>>>>>> The "default" mode is the existing monitoring mode that works without the >>>>>>> explicit counter assignment, instead relying on dynamic counter assignment >>>>>>> by hardware that may result in hardware not dedicating a counter resulting >>>>>>> in monitoring data reads returning "Unavailable". >>>>>>> >>>>>>> Provide an interface to display the monitor mode on the system. >>>>>>> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode >>>>>>> [mbm_cntr_assign] >>>>>>> default >>>>>>> >>>>>>> Signed-off-by: Babu Moger <babu.moger@amd.com> >>>>>>> --- >>>> >>>> ... >>>> >>>>>> I'm concerned that users with Intel platforms may want to use the "mbm_cntr_assign" mode >>>>>> to make the event data "more predictable" and then be concerned when the mode does >>>>>> not exist. >>>>>> >>>>>> As an alternative, is it possible to know the number of hardware counters on AMD systems >>>>>> without ABMC? I wonder if we could perhaps always expose num_mbm_cntrs as a way for >>>>>> users to know if their platform may be impacted by this type of "unpredictability" (by comparing >>>>>> num_mbm_cntrs to num_rmids). >>>>> >>>>> There is some round about(or hacky) way to find that out number of RMIDs >>>>> that can be active. >>>> >>>> Does this give consistent and accurate data? Is this something that can be added to resctrl? >>>> (Reading your other message [1] it does not sound as though it can produce an accurate >>>> number on boot.) >>>> If not then it will be up to the documentation to be accurate. >>>> >>>> >>>>>>> + >>>>>>> + AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature >>>>>>> + enable this mode by default so that counters remain assigned even when the >>>>>>> + corresponding RMID is not in use by any processor. >>>>>>> + >>>>>>> + "default": >>>>>>> + >>>>>>> + In default mode resctrl assumes there is a hardware counter for each >>>>>>> + event within every CTRL_MON and MON group. Reading mbm_total_bytes or >>>>>>> + mbm_local_bytes may report 'Unavailable' if there is no counter associated >>>>>>> + with that event. >>>>>> >>>>>> If I understand correctly, on AMD platforms without ABMC the events only report >>>>>> "Unavailable" if there is no counter assigned at the time of the query. If a counter >>>>>> is unassigned and then reassigned then the event count will reset and the user >>>>>> will get some data back but it may thus be unpredictable (to match earlier language). >>>>>> Is this correct? Any AMD platform in "default" mode may thus be vulnerable to >>>>>> "unpredictable" event counts (not just "Unavailable") ... this gets complicated >>>>> >>>>> Yes. All the AMD systems without ABMC are affected by this problem. >>>>> >>>>>> because users should be steered to avoid "default" mode if mbm_assign_mode is >>>>>> available, while not be made concerned to use "default" mode on Intel where >>>>>> mbm_assign_mode is not available. >>>>> >>>>> Can we add text to clarify this? >>>> >>>> Please do. >>> >>> I think we need to add text about AMD systems. How about this? >>> >>> "default": >>> In default mode resctrl assumes there is a hardware counter for each >>> event within every CTRL_MON and MON group. On AMD systems with 16 more monitoring groups, reading mbm_total_bytes or mbm_local_bytes may report 'Unavailable' if there is no counter associated with that event. It is therefore recommended to use the 'mbm_cntr_assign' mode, if supported." >> >> >> What is meant with "On AMD systems with 16 more monitoring groups"? First, the language is >> not clear, second, you mentioned earlier that there is just a "hacky" way to determine number >> of RMIDs that can be active but here "16" is made official in the documentation? >> > > The lowest active RMID is 16. I could not get it using the hacky way. > I have verified testing on all the previous generation of AMD systems by creating the monitoring groups until it reports "Unavailable". > In recent systems it is 32. We can drop the exact number to be generic. > > > There is no clear documentation on that. Here is what the doc says. > > A given implementation may have insufficient hardware to simultaneously track the bandwidth for all RMID values which the hardware supports. If an attempt is made to read a Bandwidth Count for an RMID that has been impacted by these hardware limitations, the “U” bit of the > QM_CTR will be set when the counter is read. Subsequent QM_CTR reads for that RMID and Event may return a value with the "U" bit clear. Potential causes of the “U” bit being set include (but are not limited to) > > • RMID is not currently tracked by the hardware. > • RMID was not tracked by the hardware at some time since it was last read. > • RMID has not been read since it started being tracked by the hardware. > > All RMIDs which are currently in use by one or more processors in the QOS domain will be tracked. The hardware will always begin tracking a new RMID value when it gets written to the PQR_ASSOC register of any of the processors in the QOS domain and it is not already being tracked. When the hardware begins tracking an RMID that it was not previously tracking, it will clear the QM_CTR for all events in the new RMID > > - Babu Moger > I think I am starting to understand what is meant with the "count the traffic in an unpredictable way". From what I understand the hardware uses the "U" bit to indicate that an RMID was not tracked for a while, but it only sets this bit on the first read. After that the "U" bit may be cleared if a counter can be assigned to an RMID afterwards. If it was only user space that reads the data then it should be clear to the user when the hardware limitation is encountered and thus hardware behavior can be "predictable", but since the overflow handler runs once per second it may indeed be the overflow handler that encounters the "U" bit and that bit is not currently handled. This could leave user space with impression that events are always returning data but that data may indeed be wrong. In another thread [1] Tony confirmed that "U" bit is not returned by Intel systems so this issue only impacts AMD. As I understand the other scenarios in which AMD systems can return "U" (the first read after assigning an RMID and the first read after changing the memory config) are all scenarios that can be controlled by resctrl. I do not see why unpredictable data should be addressed with documentation. Could this not be fixed instead? Essentially stating "AMD systems without ABMC count the traffic in an unpredictable way" seems like a poor user experience. What if instead resctrl handles the "U" bit better? For example, when the overflow handler encounters the "U" bit the RMID can be permanently marked as "Unavailable"? Would that not be better than the counter behaving unpredictably with users never knowing if they can trust the event counters? Reinette [1] https://lore.kernel.org/all/ZzUvA2XE01U25A38@agluck-desk3/
Hi Reinette, On 11/25/2024 12:17 PM, Reinette Chatre wrote: > Hi Babu, > > On 11/22/24 4:02 PM, Moger, Babu wrote: >> Hi Reinette, >> >> On 11/22/2024 3:37 PM, Reinette Chatre wrote: >>> Hi Babu, >>> >>> On 11/22/24 10:25 AM, Moger, Babu wrote: >>>> Hi Reinette, >>>> >>>> On 11/18/2024 4:07 PM, Reinette Chatre wrote: >>>>> Hi Babu, >>>>> >>>>> On 11/18/24 11:04 AM, Moger, Babu wrote: >>>>>> Hi Reinette, >>>>>> >>>>>> On 11/15/24 18:00, Reinette Chatre wrote: >>>>>>> Hi Babu, >>>>>>> >>>>>>> On 10/29/24 4:21 PM, Babu Moger wrote: >>>>>>>> Introduce the interface file "mbm_assign_mode" to list monitor modes >>>>>>>> supported. >>>>>>>> >>>>>>>> The "mbm_cntr_assign" mode provides the option to assign a counter to >>>>>>>> an RMID, event pair and monitor the bandwidth as long as it is assigned. >>>>>>>> >>>>>>>> On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable >>>>>>>> Bandwidth Monitoring Counters) hardware feature and is enabled by default. >>>>>>>> >>>>>>>> The "default" mode is the existing monitoring mode that works without the >>>>>>>> explicit counter assignment, instead relying on dynamic counter assignment >>>>>>>> by hardware that may result in hardware not dedicating a counter resulting >>>>>>>> in monitoring data reads returning "Unavailable". >>>>>>>> >>>>>>>> Provide an interface to display the monitor mode on the system. >>>>>>>> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode >>>>>>>> [mbm_cntr_assign] >>>>>>>> default >>>>>>>> >>>>>>>> Signed-off-by: Babu Moger <babu.moger@amd.com> >>>>>>>> --- >>>>> >>>>> ... >>>>> >>>>>>> I'm concerned that users with Intel platforms may want to use the "mbm_cntr_assign" mode >>>>>>> to make the event data "more predictable" and then be concerned when the mode does >>>>>>> not exist. >>>>>>> >>>>>>> As an alternative, is it possible to know the number of hardware counters on AMD systems >>>>>>> without ABMC? I wonder if we could perhaps always expose num_mbm_cntrs as a way for >>>>>>> users to know if their platform may be impacted by this type of "unpredictability" (by comparing >>>>>>> num_mbm_cntrs to num_rmids). >>>>>> >>>>>> There is some round about(or hacky) way to find that out number of RMIDs >>>>>> that can be active. >>>>> >>>>> Does this give consistent and accurate data? Is this something that can be added to resctrl? >>>>> (Reading your other message [1] it does not sound as though it can produce an accurate >>>>> number on boot.) >>>>> If not then it will be up to the documentation to be accurate. >>>>> >>>>> >>>>>>>> + >>>>>>>> + AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature >>>>>>>> + enable this mode by default so that counters remain assigned even when the >>>>>>>> + corresponding RMID is not in use by any processor. >>>>>>>> + >>>>>>>> + "default": >>>>>>>> + >>>>>>>> + In default mode resctrl assumes there is a hardware counter for each >>>>>>>> + event within every CTRL_MON and MON group. Reading mbm_total_bytes or >>>>>>>> + mbm_local_bytes may report 'Unavailable' if there is no counter associated >>>>>>>> + with that event. >>>>>>> >>>>>>> If I understand correctly, on AMD platforms without ABMC the events only report >>>>>>> "Unavailable" if there is no counter assigned at the time of the query. If a counter >>>>>>> is unassigned and then reassigned then the event count will reset and the user >>>>>>> will get some data back but it may thus be unpredictable (to match earlier language). >>>>>>> Is this correct? Any AMD platform in "default" mode may thus be vulnerable to >>>>>>> "unpredictable" event counts (not just "Unavailable") ... this gets complicated >>>>>> >>>>>> Yes. All the AMD systems without ABMC are affected by this problem. >>>>>> >>>>>>> because users should be steered to avoid "default" mode if mbm_assign_mode is >>>>>>> available, while not be made concerned to use "default" mode on Intel where >>>>>>> mbm_assign_mode is not available. >>>>>> >>>>>> Can we add text to clarify this? >>>>> >>>>> Please do. >>>> >>>> I think we need to add text about AMD systems. How about this? >>>> >>>> "default": >>>> In default mode resctrl assumes there is a hardware counter for each >>>> event within every CTRL_MON and MON group. On AMD systems with 16 more monitoring groups, reading mbm_total_bytes or mbm_local_bytes may report 'Unavailable' if there is no counter associated with that event. It is therefore recommended to use the 'mbm_cntr_assign' mode, if supported." >>> >>> >>> What is meant with "On AMD systems with 16 more monitoring groups"? First, the language is >>> not clear, second, you mentioned earlier that there is just a "hacky" way to determine number >>> of RMIDs that can be active but here "16" is made official in the documentation? >>> >> >> The lowest active RMID is 16. I could not get it using the hacky way. >> I have verified testing on all the previous generation of AMD systems by creating the monitoring groups until it reports "Unavailable". >> In recent systems it is 32. We can drop the exact number to be generic. >> >> >> There is no clear documentation on that. Here is what the doc says. >> >> A given implementation may have insufficient hardware to simultaneously track the bandwidth for all RMID values which the hardware supports. If an attempt is made to read a Bandwidth Count for an RMID that has been impacted by these hardware limitations, the “U” bit of the >> QM_CTR will be set when the counter is read. Subsequent QM_CTR reads for that RMID and Event may return a value with the "U" bit clear. Potential causes of the “U” bit being set include (but are not limited to) >> >> • RMID is not currently tracked by the hardware. >> • RMID was not tracked by the hardware at some time since it was last read. >> • RMID has not been read since it started being tracked by the hardware. >> >> All RMIDs which are currently in use by one or more processors in the QOS domain will be tracked. The hardware will always begin tracking a new RMID value when it gets written to the PQR_ASSOC register of any of the processors in the QOS domain and it is not already being tracked. When the hardware begins tracking an RMID that it was not previously tracking, it will clear the QM_CTR for all events in the new RMID >> >> - Babu Moger >> > > I think I am starting to understand what is meant with the "count the traffic in an > unpredictable way". From what I understand the hardware uses the "U" bit to indicate > that an RMID was not tracked for a while, but it only sets this bit on the > first read. After that the "U" bit may be cleared if a counter can be assigned to an RMID > afterwards. > If it was only user space that reads the data then it should be clear to the user when the > hardware limitation is encountered and thus hardware behavior can be "predictable", but since > the overflow handler runs once per second it may indeed be the overflow handler that > encounters the "U" bit and that bit is not currently handled. This could leave user space > with impression that events are always returning data but that data may indeed be wrong. > > In another thread [1] Tony confirmed that "U" bit is not returned by Intel systems so > this issue only impacts AMD. As I understand the other scenarios in which AMD systems > can return "U" (the first read after assigning an RMID and the first read after changing > the memory config) are all scenarios that can be controlled by resctrl. > > I do not see why unpredictable data should be addressed with documentation. Could this not be > fixed instead? Essentially stating "AMD systems without ABMC count the traffic in an unpredictable > way" seems like a poor user experience. > What if instead resctrl handles the "U" bit better? For example, when the overflow > handler encounters the "U" bit the RMID can be permanently marked as "Unavailable"? Would > that not be better than the counter behaving unpredictably with users never knowing if they > can trust the event counters? Actually, I was looking at handling "Unavailable" in little bit better way. Right now, I see it reports "Unavailable" first then it goes into overflow and stays in overflow forever. Also setting the RMID Unavailable permanently is not a good option. We should have a way to reset it. At some later point the RMID can become active and report the correct numbers. I was thinking of introducing a new arch state(in arch_mbm_state) to handle this case. Need to investigate more on this. What do you think? > > Reinette > > [1] https://lore.kernel.org/all/ZzUvA2XE01U25A38@agluck-desk3/ > > -- - Babu Moger
Hi Babu, On 11/26/24 9:09 AM, Moger, Babu wrote: > On 11/25/2024 12:17 PM, Reinette Chatre wrote: >> On 11/22/24 4:02 PM, Moger, Babu wrote: >>> On 11/22/2024 3:37 PM, Reinette Chatre wrote: >>>> On 11/22/24 10:25 AM, Moger, Babu wrote: >>>>> On 11/18/2024 4:07 PM, Reinette Chatre wrote: >>>>>> On 11/18/24 11:04 AM, Moger, Babu wrote: >>>>>>> On 11/15/24 18:00, Reinette Chatre wrote: >>>>>>>> On 10/29/24 4:21 PM, Babu Moger wrote: >>>>>>>> I'm concerned that users with Intel platforms may want to use the "mbm_cntr_assign" mode >>>>>>>> to make the event data "more predictable" and then be concerned when the mode does >>>>>>>> not exist. >>>>>>>> >>>>>>>> As an alternative, is it possible to know the number of hardware counters on AMD systems >>>>>>>> without ABMC? I wonder if we could perhaps always expose num_mbm_cntrs as a way for >>>>>>>> users to know if their platform may be impacted by this type of "unpredictability" (by comparing >>>>>>>> num_mbm_cntrs to num_rmids). >>>>>>> >>>>>>> There is some round about(or hacky) way to find that out number of RMIDs >>>>>>> that can be active. >>>>>> >>>>>> Does this give consistent and accurate data? Is this something that can be added to resctrl? >>>>>> (Reading your other message [1] it does not sound as though it can produce an accurate >>>>>> number on boot.) >>>>>> If not then it will be up to the documentation to be accurate. >>>>>> >>>>>> >>>>>>>>> + >>>>>>>>> + AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature >>>>>>>>> + enable this mode by default so that counters remain assigned even when the >>>>>>>>> + corresponding RMID is not in use by any processor. >>>>>>>>> + >>>>>>>>> + "default": >>>>>>>>> + >>>>>>>>> + In default mode resctrl assumes there is a hardware counter for each >>>>>>>>> + event within every CTRL_MON and MON group. Reading mbm_total_bytes or >>>>>>>>> + mbm_local_bytes may report 'Unavailable' if there is no counter associated >>>>>>>>> + with that event. >>>>>>>> >>>>>>>> If I understand correctly, on AMD platforms without ABMC the events only report >>>>>>>> "Unavailable" if there is no counter assigned at the time of the query. If a counter >>>>>>>> is unassigned and then reassigned then the event count will reset and the user >>>>>>>> will get some data back but it may thus be unpredictable (to match earlier language). >>>>>>>> Is this correct? Any AMD platform in "default" mode may thus be vulnerable to >>>>>>>> "unpredictable" event counts (not just "Unavailable") ... this gets complicated >>>>>>> >>>>>>> Yes. All the AMD systems without ABMC are affected by this problem. >>>>>>> >>>>>>>> because users should be steered to avoid "default" mode if mbm_assign_mode is >>>>>>>> available, while not be made concerned to use "default" mode on Intel where >>>>>>>> mbm_assign_mode is not available. >>>>>>> >>>>>>> Can we add text to clarify this? >>>>>> >>>>>> Please do. >>>>> >>>>> I think we need to add text about AMD systems. How about this? >>>>> >>>>> "default": >>>>> In default mode resctrl assumes there is a hardware counter for each >>>>> event within every CTRL_MON and MON group. On AMD systems with 16 more monitoring groups, reading mbm_total_bytes or mbm_local_bytes may report 'Unavailable' if there is no counter associated with that event. It is therefore recommended to use the 'mbm_cntr_assign' mode, if supported." >>>> >>>> >>>> What is meant with "On AMD systems with 16 more monitoring groups"? First, the language is >>>> not clear, second, you mentioned earlier that there is just a "hacky" way to determine number >>>> of RMIDs that can be active but here "16" is made official in the documentation? >>>> >>> >>> The lowest active RMID is 16. I could not get it using the hacky way. >>> I have verified testing on all the previous generation of AMD systems by creating the monitoring groups until it reports "Unavailable". >>> In recent systems it is 32. We can drop the exact number to be generic. >>> >>> >>> There is no clear documentation on that. Here is what the doc says. >>> >>> A given implementation may have insufficient hardware to simultaneously track the bandwidth for all RMID values which the hardware supports. If an attempt is made to read a Bandwidth Count for an RMID that has been impacted by these hardware limitations, the “U” bit of the >>> QM_CTR will be set when the counter is read. Subsequent QM_CTR reads for that RMID and Event may return a value with the "U" bit clear. Potential causes of the “U” bit being set include (but are not limited to) >>> >>> • RMID is not currently tracked by the hardware. >>> • RMID was not tracked by the hardware at some time since it was last read. >>> • RMID has not been read since it started being tracked by the hardware. >>> >>> All RMIDs which are currently in use by one or more processors in the QOS domain will be tracked. The hardware will always begin tracking a new RMID value when it gets written to the PQR_ASSOC register of any of the processors in the QOS domain and it is not already being tracked. When the hardware begins tracking an RMID that it was not previously tracking, it will clear the QM_CTR for all events in the new RMID >>> >>> - Babu Moger >>> >> >> I think I am starting to understand what is meant with the "count the traffic in an >> unpredictable way". From what I understand the hardware uses the "U" bit to indicate >> that an RMID was not tracked for a while, but it only sets this bit on the >> first read. After that the "U" bit may be cleared if a counter can be assigned to an RMID >> afterwards. >> If it was only user space that reads the data then it should be clear to the user when the >> hardware limitation is encountered and thus hardware behavior can be "predictable", but since >> the overflow handler runs once per second it may indeed be the overflow handler that >> encounters the "U" bit and that bit is not currently handled. This could leave user space >> with impression that events are always returning data but that data may indeed be wrong. >> >> In another thread [1] Tony confirmed that "U" bit is not returned by Intel systems so >> this issue only impacts AMD. As I understand the other scenarios in which AMD systems >> can return "U" (the first read after assigning an RMID and the first read after changing >> the memory config) are all scenarios that can be controlled by resctrl. >> >> I do not see why unpredictable data should be addressed with documentation. Could this not be >> fixed instead? Essentially stating "AMD systems without ABMC count the traffic in an unpredictable >> way" seems like a poor user experience. >> What if instead resctrl handles the "U" bit better? For example, when the overflow >> handler encounters the "U" bit the RMID can be permanently marked as "Unavailable"? Would >> that not be better than the counter behaving unpredictably with users never knowing if they >> can trust the event counters? > > Actually, I was looking at handling "Unavailable" in little bit better way. Right now, I see it reports "Unavailable" first then it goes into overflow and stays in overflow forever. Could you please elaborate what you mean with "stays in overflow forever"? > > Also setting the RMID Unavailable permanently is not a good option. We should have a way to reset it. At some later point the RMID can become active and report the correct numbers. I assume that when an RMID becomes active cannot be the trigger to reset it since user space cannot then be aware that a counter was not available for a while. > I was thinking of introducing a new arch state(in arch_mbm_state) to handle this case. Need to investigate more on this. What do you think? > Some new state is surely needed to reflect that the RMID's data may be wrong. It is not clear to me how you envision the reset of the state. If it is driven from user space then I expect that resctrl needs to be taught something about this and it cannot just be buried in arch code. Reinette
Hi Reinette, On 11/26/2024 1:01 PM, Reinette Chatre wrote: > Hi Babu, > > On 11/26/24 9:09 AM, Moger, Babu wrote: >> On 11/25/2024 12:17 PM, Reinette Chatre wrote: >>> On 11/22/24 4:02 PM, Moger, Babu wrote: >>>> On 11/22/2024 3:37 PM, Reinette Chatre wrote: >>>>> On 11/22/24 10:25 AM, Moger, Babu wrote: >>>>>> On 11/18/2024 4:07 PM, Reinette Chatre wrote: >>>>>>> On 11/18/24 11:04 AM, Moger, Babu wrote: >>>>>>>> On 11/15/24 18:00, Reinette Chatre wrote: >>>>>>>>> On 10/29/24 4:21 PM, Babu Moger wrote: > >>>>>>>>> I'm concerned that users with Intel platforms may want to use the "mbm_cntr_assign" mode >>>>>>>>> to make the event data "more predictable" and then be concerned when the mode does >>>>>>>>> not exist. >>>>>>>>> >>>>>>>>> As an alternative, is it possible to know the number of hardware counters on AMD systems >>>>>>>>> without ABMC? I wonder if we could perhaps always expose num_mbm_cntrs as a way for >>>>>>>>> users to know if their platform may be impacted by this type of "unpredictability" (by comparing >>>>>>>>> num_mbm_cntrs to num_rmids). >>>>>>>> >>>>>>>> There is some round about(or hacky) way to find that out number of RMIDs >>>>>>>> that can be active. >>>>>>> >>>>>>> Does this give consistent and accurate data? Is this something that can be added to resctrl? >>>>>>> (Reading your other message [1] it does not sound as though it can produce an accurate >>>>>>> number on boot.) >>>>>>> If not then it will be up to the documentation to be accurate. >>>>>>> >>>>>>> >>>>>>>>>> + >>>>>>>>>> + AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature >>>>>>>>>> + enable this mode by default so that counters remain assigned even when the >>>>>>>>>> + corresponding RMID is not in use by any processor. >>>>>>>>>> + >>>>>>>>>> + "default": >>>>>>>>>> + >>>>>>>>>> + In default mode resctrl assumes there is a hardware counter for each >>>>>>>>>> + event within every CTRL_MON and MON group. Reading mbm_total_bytes or >>>>>>>>>> + mbm_local_bytes may report 'Unavailable' if there is no counter associated >>>>>>>>>> + with that event. >>>>>>>>> >>>>>>>>> If I understand correctly, on AMD platforms without ABMC the events only report >>>>>>>>> "Unavailable" if there is no counter assigned at the time of the query. If a counter >>>>>>>>> is unassigned and then reassigned then the event count will reset and the user >>>>>>>>> will get some data back but it may thus be unpredictable (to match earlier language). >>>>>>>>> Is this correct? Any AMD platform in "default" mode may thus be vulnerable to >>>>>>>>> "unpredictable" event counts (not just "Unavailable") ... this gets complicated >>>>>>>> >>>>>>>> Yes. All the AMD systems without ABMC are affected by this problem. >>>>>>>> >>>>>>>>> because users should be steered to avoid "default" mode if mbm_assign_mode is >>>>>>>>> available, while not be made concerned to use "default" mode on Intel where >>>>>>>>> mbm_assign_mode is not available. >>>>>>>> >>>>>>>> Can we add text to clarify this? >>>>>>> >>>>>>> Please do. >>>>>> >>>>>> I think we need to add text about AMD systems. How about this? >>>>>> >>>>>> "default": >>>>>> In default mode resctrl assumes there is a hardware counter for each >>>>>> event within every CTRL_MON and MON group. On AMD systems with 16 more monitoring groups, reading mbm_total_bytes or mbm_local_bytes may report 'Unavailable' if there is no counter associated with that event. It is therefore recommended to use the 'mbm_cntr_assign' mode, if supported." >>>>> >>>>> >>>>> What is meant with "On AMD systems with 16 more monitoring groups"? First, the language is >>>>> not clear, second, you mentioned earlier that there is just a "hacky" way to determine number >>>>> of RMIDs that can be active but here "16" is made official in the documentation? >>>>> >>>> >>>> The lowest active RMID is 16. I could not get it using the hacky way. >>>> I have verified testing on all the previous generation of AMD systems by creating the monitoring groups until it reports "Unavailable". >>>> In recent systems it is 32. We can drop the exact number to be generic. >>>> >>>> >>>> There is no clear documentation on that. Here is what the doc says. >>>> >>>> A given implementation may have insufficient hardware to simultaneously track the bandwidth for all RMID values which the hardware supports. If an attempt is made to read a Bandwidth Count for an RMID that has been impacted by these hardware limitations, the “U” bit of the >>>> QM_CTR will be set when the counter is read. Subsequent QM_CTR reads for that RMID and Event may return a value with the "U" bit clear. Potential causes of the “U” bit being set include (but are not limited to) >>>> >>>> • RMID is not currently tracked by the hardware. >>>> • RMID was not tracked by the hardware at some time since it was last read. >>>> • RMID has not been read since it started being tracked by the hardware. >>>> >>>> All RMIDs which are currently in use by one or more processors in the QOS domain will be tracked. The hardware will always begin tracking a new RMID value when it gets written to the PQR_ASSOC register of any of the processors in the QOS domain and it is not already being tracked. When the hardware begins tracking an RMID that it was not previously tracking, it will clear the QM_CTR for all events in the new RMID >>>> >>>> - Babu Moger >>>> >>> >>> I think I am starting to understand what is meant with the "count the traffic in an >>> unpredictable way". From what I understand the hardware uses the "U" bit to indicate >>> that an RMID was not tracked for a while, but it only sets this bit on the >>> first read. After that the "U" bit may be cleared if a counter can be assigned to an RMID >>> afterwards. >>> If it was only user space that reads the data then it should be clear to the user when the >>> hardware limitation is encountered and thus hardware behavior can be "predictable", but since >>> the overflow handler runs once per second it may indeed be the overflow handler that >>> encounters the "U" bit and that bit is not currently handled. This could leave user space >>> with impression that events are always returning data but that data may indeed be wrong. >>> >>> In another thread [1] Tony confirmed that "U" bit is not returned by Intel systems so >>> this issue only impacts AMD. As I understand the other scenarios in which AMD systems >>> can return "U" (the first read after assigning an RMID and the first read after changing >>> the memory config) are all scenarios that can be controlled by resctrl. >>> >>> I do not see why unpredictable data should be addressed with documentation. Could this not be >>> fixed instead? Essentially stating "AMD systems without ABMC count the traffic in an unpredictable >>> way" seems like a poor user experience. >>> What if instead resctrl handles the "U" bit better? For example, when the overflow >>> handler encounters the "U" bit the RMID can be permanently marked as "Unavailable"? Would >>> that not be better than the counter behaving unpredictably with users never knowing if they >>> can trust the event counters? >> >> Actually, I was looking at handling "Unavailable" in little bit better way. Right now, I see it reports "Unavailable" first then it goes into overflow and stays in overflow forever. > > Could you please elaborate what you mean with "stays in overflow forever"? This may not an issue. Once overflow(large number) happens, it will stay in that state until there is another change. But we are only concerned about the delta. Delta is fine. > >> >> Also setting the RMID Unavailable permanently is not a good option. We should have a way to reset it. At some later point the RMID can become active and report the correct numbers. > > I assume that when an RMID becomes active cannot be the trigger to reset it since user space cannot > then be aware that a counter was not available for a while. Yes. That is correct. > >> I was thinking of introducing a new arch state(in arch_mbm_state) to handle this case. Need to investigate more on this. What do you think? >> > > Some new state is surely needed to reflect that the RMID's data may be wrong. It is not clear to > me how you envision the reset of the state. If it is driven from user space then I expect that > resctrl needs to be taught something about this and it cannot just be buried in arch code. > Yes. We need to take a hard look at this. -- - Babu Moger
© 2016 - 2026 Red Hat, Inc.