Resctrl was written with the assumption that all monitor events
can be displayed as unsigned decimal integers.
Some telemetry events provide greater precision where architecture code
uses a fixed point format with 18 binary places.
Add a "display_format" field to struct mon_evt which can specify
that the value for the event be displayed as an integer for legacy
events, or as a floating point value with six decimal places converted
from the fixed point format received from architecture code.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
include/linux/resctrl_types.h | 5 +++++
fs/resctrl/internal.h | 2 ++
fs/resctrl/ctrlmondata.c | 24 +++++++++++++++++++++++-
fs/resctrl/monitor.c | 21 ++++++++++++---------
4 files changed, 42 insertions(+), 10 deletions(-)
diff --git a/include/linux/resctrl_types.h b/include/linux/resctrl_types.h
index 5ef14a24008c..6245034f6c76 100644
--- a/include/linux/resctrl_types.h
+++ b/include/linux/resctrl_types.h
@@ -50,4 +50,9 @@ enum resctrl_event_id {
#define QOS_NUM_MBM_EVENTS (QOS_L3_MBM_LOCAL_EVENT_ID - QOS_L3_MBM_TOTAL_EVENT_ID + 1)
#define MBM_EVENT_IDX(evt) ((evt) - QOS_L3_MBM_TOTAL_EVENT_ID)
+/* Event value display formats */
+enum resctrl_event_fmt {
+ EVT_FORMAT_U64,
+ EVT_FORMAT_U46_18,
+};
#endif /* __LINUX_RESCTRL_TYPES_H */
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index d8aa69b42c74..aaa74a17257d 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -73,6 +73,7 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
* @configurable: true if the event is configurable
* @enabled: true if the event is enabled
* @any_cpu: true if the event can be read from any CPU
+ * @display_format: format to display value to users
* @list: entry in &rdt_resource->evt_list
*/
struct mon_evt {
@@ -82,6 +83,7 @@ struct mon_evt {
bool configurable;
bool enabled;
bool any_cpu;
+ enum resctrl_event_fmt display_format;
struct list_head list;
};
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 7a2957b9c13e..1544c103446b 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -569,6 +569,28 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
resctrl_arch_mon_ctx_free(r, evt->evtid, rr->arch_mon_ctx);
}
+#define NUM_FRAC_BITS 18
+#define FRAC_MASK GENMASK(NUM_FRAC_BITS - 1, 0)
+
+static void print_event_value(struct seq_file *m, enum resctrl_event_fmt type, u64 val)
+{
+ u64 frac;
+
+ switch (type) {
+ case EVT_FORMAT_U64:
+ seq_printf(m, "%llu\n", val);
+ break;
+ case EVT_FORMAT_U46_18:
+ frac = val & FRAC_MASK;
+ frac = frac * 1000000;
+ /* round values up to nearest decimal representation */
+ frac += 1ul << (NUM_FRAC_BITS - 1);
+ frac >>= NUM_FRAC_BITS;
+ seq_printf(m, "%llu.%06llu\n", val >> NUM_FRAC_BITS, frac);
+ break;
+ }
+}
+
int rdtgroup_mondata_show(struct seq_file *m, void *arg)
{
struct kernfs_open_file *of = m->private;
@@ -637,7 +659,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
else if (rr.err == -EINVAL)
seq_puts(m, "Unavailable\n");
else
- seq_printf(m, "%llu\n", rr.val);
+ print_event_value(m, evt->display_format, rr.val);
out:
rdtgroup_kn_unlock(of->kn);
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index e903d3c076ee..be78488a15e5 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -844,19 +844,22 @@ static void dom_data_exit(struct rdt_resource *r)
struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
[QOS_L3_OCCUP_EVENT_ID] = {
- .name = "llc_occupancy",
- .evtid = QOS_L3_OCCUP_EVENT_ID,
- .rid = RDT_RESOURCE_L3,
+ .name = "llc_occupancy",
+ .evtid = QOS_L3_OCCUP_EVENT_ID,
+ .rid = RDT_RESOURCE_L3,
+ .display_format = EVT_FORMAT_U64,
},
[QOS_L3_MBM_TOTAL_EVENT_ID] = {
- .name = "mbm_total_bytes",
- .evtid = QOS_L3_MBM_TOTAL_EVENT_ID,
- .rid = RDT_RESOURCE_L3,
+ .name = "mbm_total_bytes",
+ .evtid = QOS_L3_MBM_TOTAL_EVENT_ID,
+ .rid = RDT_RESOURCE_L3,
+ .display_format = EVT_FORMAT_U64,
},
[QOS_L3_MBM_LOCAL_EVENT_ID] = {
- .name = "mbm_local_bytes",
- .evtid = QOS_L3_MBM_LOCAL_EVENT_ID,
- .rid = RDT_RESOURCE_L3,
+ .name = "mbm_local_bytes",
+ .evtid = QOS_L3_MBM_LOCAL_EVENT_ID,
+ .rid = RDT_RESOURCE_L3,
+ .display_format = EVT_FORMAT_U64,
},
};
--
2.48.1
Hi Tony,
shortlog nit: "fs/resctrl: Support additional monitor event display formats"
On 4/28/25 5:33 PM, Tony Luck wrote:
> Resctrl was written with the assumption that all monitor events
> can be displayed as unsigned decimal integers.
>
> Some telemetry events provide greater precision where architecture code
> uses a fixed point format with 18 binary places.
>
> Add a "display_format" field to struct mon_evt which can specify
> that the value for the event be displayed as an integer for legacy
> events, or as a floating point value with six decimal places converted
> from the fixed point format received from architecture code.
There was no discussion on this during the previous version.
While this version addresses the issue of architecture changing the
format it does not address the issue of how to handle different
architecture formats. With this change any architecture that may
want to support any of these events will be required to translate
whatever format it uses into the one Intel uses to be translated
again into format for user space. Do you think this is reasonable?
Alternatively, resctrl could add additional file that contains the
format so that if an architecture in the future needs to present data
differently, an interface will exist to guide userspace how to parse it.
Creation of such user interface cannot be delayed until the time
it is needed since then these formats would be ABI.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> include/linux/resctrl_types.h | 5 +++++
> fs/resctrl/internal.h | 2 ++
> fs/resctrl/ctrlmondata.c | 24 +++++++++++++++++++++++-
> fs/resctrl/monitor.c | 21 ++++++++++++---------
> 4 files changed, 42 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/resctrl_types.h b/include/linux/resctrl_types.h
> index 5ef14a24008c..6245034f6c76 100644
> --- a/include/linux/resctrl_types.h
> +++ b/include/linux/resctrl_types.h
This needs to be internal to resctrl fs.
resctrl_types.h should only contain the types required in asm/resctrl.h
> @@ -50,4 +50,9 @@ enum resctrl_event_id {
> #define QOS_NUM_MBM_EVENTS (QOS_L3_MBM_LOCAL_EVENT_ID - QOS_L3_MBM_TOTAL_EVENT_ID + 1)
> #define MBM_EVENT_IDX(evt) ((evt) - QOS_L3_MBM_TOTAL_EVENT_ID)
>
> +/* Event value display formats */
Please add details about what each format means (how it should
be interpreted).
Reinette
On Thu, May 08, 2025 at 08:49:56AM -0700, Reinette Chatre wrote:
> Hi Tony,
>
> shortlog nit: "fs/resctrl: Support additional monitor event display formats"
>
> On 4/28/25 5:33 PM, Tony Luck wrote:
> > Resctrl was written with the assumption that all monitor events
> > can be displayed as unsigned decimal integers.
> >
> > Some telemetry events provide greater precision where architecture code
> > uses a fixed point format with 18 binary places.
> >
> > Add a "display_format" field to struct mon_evt which can specify
> > that the value for the event be displayed as an integer for legacy
> > events, or as a floating point value with six decimal places converted
> > from the fixed point format received from architecture code.
>
> There was no discussion on this during the previous version.
> While this version addresses the issue of architecture changing the
> format it does not address the issue of how to handle different
> architecture formats. With this change any architecture that may
> want to support any of these events will be required to translate
> whatever format it uses into the one Intel uses to be translated
> again into format for user space. Do you think this is reasonable?
>
> Alternatively, resctrl could add additional file that contains the
> format so that if an architecture in the future needs to present data
> differently, an interface will exist to guide userspace how to parse it.
> Creation of such user interface cannot be delayed until the time
> it is needed since then these formats would be ABI.
What if resctrl filesystem allows architecture to supply the number
of binary places for fixed point values when enabling an event?
That would allow h/w implementations to pick an appropriate precision
for each new event. Different implementations of the same event
(e.g. "core_energy") may pick different precision across architectures
or between generations of the same architecture.
File system code can then do:
if (binary_places == 0)
display as integer
else
convert to floating point (with one decimal place per
three binary places)
Existing events are all integers and won't change (it would be weird
for an architecture to report "mbm_local_bytes" with a fixed point
rather than integer value).
New events may report in either integer or floating point format
with varying amounts of precision. But I'm not sure that would be
a burden for writing tools that can run on different architectures.
> >
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > ---
> > include/linux/resctrl_types.h | 5 +++++
> > fs/resctrl/internal.h | 2 ++
> > fs/resctrl/ctrlmondata.c | 24 +++++++++++++++++++++++-
> > fs/resctrl/monitor.c | 21 ++++++++++++---------
> > 4 files changed, 42 insertions(+), 10 deletions(-)
> >
> > diff --git a/include/linux/resctrl_types.h b/include/linux/resctrl_types.h
> > index 5ef14a24008c..6245034f6c76 100644
> > --- a/include/linux/resctrl_types.h
> > +++ b/include/linux/resctrl_types.h
>
> This needs to be internal to resctrl fs.
> resctrl_types.h should only contain the types required in asm/resctrl.h
>
> > @@ -50,4 +50,9 @@ enum resctrl_event_id {
> > #define QOS_NUM_MBM_EVENTS (QOS_L3_MBM_LOCAL_EVENT_ID - QOS_L3_MBM_TOTAL_EVENT_ID + 1)
> > #define MBM_EVENT_IDX(evt) ((evt) - QOS_L3_MBM_TOTAL_EVENT_ID)
> >
> > +/* Event value display formats */
>
> Please add details about what each format means (how it should
> be interpreted).
>
> Reinette
-Tony
Hi Tony, On 5/8/25 1:28 PM, Luck, Tony wrote: > On Thu, May 08, 2025 at 08:49:56AM -0700, Reinette Chatre wrote: >> On 4/28/25 5:33 PM, Tony Luck wrote: >>> Resctrl was written with the assumption that all monitor events >>> can be displayed as unsigned decimal integers. >>> >>> Some telemetry events provide greater precision where architecture code >>> uses a fixed point format with 18 binary places. >>> >>> Add a "display_format" field to struct mon_evt which can specify >>> that the value for the event be displayed as an integer for legacy >>> events, or as a floating point value with six decimal places converted >>> from the fixed point format received from architecture code. >> >> There was no discussion on this during the previous version. >> While this version addresses the issue of architecture changing the >> format it does not address the issue of how to handle different >> architecture formats. With this change any architecture that may >> want to support any of these events will be required to translate >> whatever format it uses into the one Intel uses to be translated >> again into format for user space. Do you think this is reasonable? >> >> Alternatively, resctrl could add additional file that contains the >> format so that if an architecture in the future needs to present data >> differently, an interface will exist to guide userspace how to parse it. >> Creation of such user interface cannot be delayed until the time >> it is needed since then these formats would be ABI. > > What if resctrl filesystem allows architecture to supply the number > of binary places for fixed point values when enabling an event? This sounds good. I do not think we are in a position to come up with an ideal solution. That would require assumptions of what another architecture may or may not do and thus we do not have complete information. > > That would allow h/w implementations to pick an appropriate precision > for each new event. Different implementations of the same event > (e.g. "core_energy") may pick different precision across architectures > or between generations of the same architecture. > > File system code can then do: > > if (binary_places == 0) > display as integer > else > convert to floating point (with one decimal place per > three binary places) I do not think this problem needs to be solved in this work but there needs to be a plan for how other architectures can be supported. When similar enabling needs to be done for that hypothetical architecture then it can be implemented ... if it is still valid based on what that architecture actually supports. It may be sufficient for the "plan" (as above) to be in comments. > > Existing events are all integers and won't change (it would be weird > for an architecture to report "mbm_local_bytes" with a fixed point > rather than integer value). > > New events may report in either integer or floating point format > with varying amounts of precision. But I'm not sure that would be Partly this will depend on the unit of measurement that should form part of the definition of the event. For example, events reporting cycles or ticks should only be integer, no? > a burden for writing tools that can run on different architectures. Maybe just a comment in the docs then ... and now I see that you did so already. My apologies, I did not get to the last four patches. Reinette
Hi, (Backtrace retained for context -- see my comment at the end.) Cheers ---Dave [...] On Thu, May 08, 2025 at 04:45:21PM -0700, Reinette Chatre wrote: > Hi Tony, > > On 5/8/25 1:28 PM, Luck, Tony wrote: > > On Thu, May 08, 2025 at 08:49:56AM -0700, Reinette Chatre wrote: > >> On 4/28/25 5:33 PM, Tony Luck wrote: > >>> Resctrl was written with the assumption that all monitor events > >>> can be displayed as unsigned decimal integers. > >>> > >>> Some telemetry events provide greater precision where architecture code > >>> uses a fixed point format with 18 binary places. > >>> > >>> Add a "display_format" field to struct mon_evt which can specify > >>> that the value for the event be displayed as an integer for legacy > >>> events, or as a floating point value with six decimal places converted > >>> from the fixed point format received from architecture code. > >> > >> There was no discussion on this during the previous version. > >> While this version addresses the issue of architecture changing the > >> format it does not address the issue of how to handle different > >> architecture formats. With this change any architecture that may > >> want to support any of these events will be required to translate > >> whatever format it uses into the one Intel uses to be translated > >> again into format for user space. Do you think this is reasonable? > >> > >> Alternatively, resctrl could add additional file that contains the > >> format so that if an architecture in the future needs to present data > >> differently, an interface will exist to guide userspace how to parse it. > >> Creation of such user interface cannot be delayed until the time > >> it is needed since then these formats would be ABI. > > > > What if resctrl filesystem allows architecture to supply the number > > of binary places for fixed point values when enabling an event? > > This sounds good. I do not think we are in a position to come up with > an ideal solution. That would require assumptions of what another > architecture may or may not do and thus we do not have complete information. > > > > > That would allow h/w implementations to pick an appropriate precision > > for each new event. Different implementations of the same event > > (e.g. "core_energy") may pick different precision across architectures > > or between generations of the same architecture. > > > > File system code can then do: > > > > if (binary_places == 0) > > display as integer > > else > > convert to floating point (with one decimal place per > > three binary places) > > I do not think this problem needs to be solved in this work but there needs > to be a plan for how other architectures can be supported. When similar > enabling needs to be done for that hypothetical architecture then it can > be implemented ... if it is still valid based on what that architecture actually > supports. > It may be sufficient for the "plan" (as above) to be in comments. > > > > > Existing events are all integers and won't change (it would be weird > > for an architecture to report "mbm_local_bytes" with a fixed point > > rather than integer value). > > > > New events may report in either integer or floating point format > > with varying amounts of precision. But I'm not sure that would be > > Partly this will depend on the unit of measurement that should form part of > the definition of the event. For example, events reporting cycles or ticks > should only be integer, no? > > > a burden for writing tools that can run on different architectures. > > Maybe just a comment in the docs then ... and now I see that you did > so already. My apologies, I did not get to the last four patches. > > Reinette Just a thought, but I think that while it's not possible to be fully generic, a parameter model along the lines of quantity = raw_value * ((double)multiplier / divisor) * BASE_UNIT would cover most things that we have or can reasonably foresee, including memory bandwidth control values. raw_value, multiplier and divisor would all be integers. Since raw_integer can be the value used by the hardware, its precision can probably be fixed at 1, though we could still report it explicitly. Fundamental base units would be things like "byte", "bytes per second" and "none" (i.e., dimensionless quantities). (Are there others?) Since we cannot guess for certain what userspace wants to do with the values, it feels better to let userspace do any scaling calculations itself, rather than trying to prettify the interface. For example: scaling memory bandwidth percentages for MPAM is a nuisance because the hardware uses fixed-point values scaled by a power of 2, not by 100: the two scales can never match up anywhere except at multiples of 25%, leading to irregular increments when rounded to an integer percentage value and uncertainty about what the bandwidth_gran parameter means. Round-trip conversions between the two representations become error-prone due to repeated rounding -- this proved quite fiddly to get right. Precision beyond 1% increments may also be available in the hardware, but is not accessible through the resctrl interface. For backwards compatibility we probably shouldn't change that particular interface, but if we can avoid new instances of the same kind of problem then that would be a benefit: i.e., explicitly tell userspace how to scale a given parameter. Cheers ---Dave
Hi Dave, On Fri, May 9, 2025 at 1:29 PM Dave Martin <Dave.Martin@arm.com> wrote: > > Hi, > > (Backtrace retained for context -- see my comment at the end.) > > Cheers > ---Dave > > [...] > > On Thu, May 08, 2025 at 04:45:21PM -0700, Reinette Chatre wrote: > > Hi Tony, > > > > On 5/8/25 1:28 PM, Luck, Tony wrote: > > > On Thu, May 08, 2025 at 08:49:56AM -0700, Reinette Chatre wrote: > > >> On 4/28/25 5:33 PM, Tony Luck wrote: > > >>> Resctrl was written with the assumption that all monitor events > > >>> can be displayed as unsigned decimal integers. > > >>> > > >>> Some telemetry events provide greater precision where architecture code > > >>> uses a fixed point format with 18 binary places. > > >>> > > >>> Add a "display_format" field to struct mon_evt which can specify > > >>> that the value for the event be displayed as an integer for legacy > > >>> events, or as a floating point value with six decimal places converted > > >>> from the fixed point format received from architecture code. > > >> > > >> There was no discussion on this during the previous version. > > >> While this version addresses the issue of architecture changing the > > >> format it does not address the issue of how to handle different > > >> architecture formats. With this change any architecture that may > > >> want to support any of these events will be required to translate > > >> whatever format it uses into the one Intel uses to be translated > > >> again into format for user space. Do you think this is reasonable? > > >> > > >> Alternatively, resctrl could add additional file that contains the > > >> format so that if an architecture in the future needs to present data > > >> differently, an interface will exist to guide userspace how to parse it. > > >> Creation of such user interface cannot be delayed until the time > > >> it is needed since then these formats would be ABI. > > > > > > What if resctrl filesystem allows architecture to supply the number > > > of binary places for fixed point values when enabling an event? > > > > This sounds good. I do not think we are in a position to come up with > > an ideal solution. That would require assumptions of what another > > architecture may or may not do and thus we do not have complete information. > > > > > > > > That would allow h/w implementations to pick an appropriate precision > > > for each new event. Different implementations of the same event > > > (e.g. "core_energy") may pick different precision across architectures > > > or between generations of the same architecture. > > > > > > File system code can then do: > > > > > > if (binary_places == 0) > > > display as integer > > > else > > > convert to floating point (with one decimal place per > > > three binary places) > > > > I do not think this problem needs to be solved in this work but there needs > > to be a plan for how other architectures can be supported. When similar > > enabling needs to be done for that hypothetical architecture then it can > > be implemented ... if it is still valid based on what that architecture actually > > supports. > > It may be sufficient for the "plan" (as above) to be in comments. > > > > > > > > Existing events are all integers and won't change (it would be weird > > > for an architecture to report "mbm_local_bytes" with a fixed point > > > rather than integer value). > > > > > > New events may report in either integer or floating point format > > > with varying amounts of precision. But I'm not sure that would be > > > > Partly this will depend on the unit of measurement that should form part of > > the definition of the event. For example, events reporting cycles or ticks > > should only be integer, no? > > > > > a burden for writing tools that can run on different architectures. > > > > Maybe just a comment in the docs then ... and now I see that you did > > so already. My apologies, I did not get to the last four patches. > > > > Reinette > > Just a thought, but I think that while it's not possible to be fully > generic, a parameter model along the lines of > > quantity = raw_value * ((double)multiplier / divisor) * BASE_UNIT > > would cover most things that we have or can reasonably foresee, > including memory bandwidth control values. > raw_value, multiplier and divisor would all be integers. > > Since raw_integer can be the value used by the hardware, its precision > can probably be fixed at 1, though we could still report it explicitly. > > Fundamental base units would be things like "byte", "bytes per second" > and "none" (i.e., dimensionless quantities). (Are there others?) > > > Since we cannot guess for certain what userspace wants to do with the > values, it feels better to let userspace do any scaling calculations > itself, rather than trying to prettify the interface. > > For example: scaling memory bandwidth percentages for MPAM is a > nuisance because the hardware uses fixed-point values scaled by a power > of 2, not by 100: the two scales can never match up anywhere except at > multiples of 25%, leading to irregular increments when rounded to an > integer percentage value and uncertainty about what the bandwidth_gran > parameter means. Round-trip conversions between the two > representations become error-prone due to repeated rounding -- this > proved quite fiddly to get right. Precision beyond 1% increments may > also be available in the hardware, but is not accessible through the > resctrl interface. Google users got annoyed with these rounding errors very quickly and asked me to change the MBA interface to the raw, fixed-point value used by the MPAM register interface. (but at least shifted down, since the MBW_MIN/MAX fields are left-justified) > > For backwards compatibility we probably shouldn't change that > particular interface, but if we can avoid new instances of the same > kind of problem then that would be a benefit: i.e., explicitly tell > userspace how to scale a given parameter. MBA is not programmed by percentage on AMD, so I'm not sure why this is considered necessary for backwards compatibility. -Peter
Hi, On Fri, May 09, 2025 at 04:46:30PM +0200, Peter Newman wrote: > Hi Dave, > > On Fri, May 9, 2025 at 1:29 PM Dave Martin <Dave.Martin@arm.com> wrote: [...] > > For example: scaling memory bandwidth percentages for MPAM is a > > nuisance because the hardware uses fixed-point values scaled by a power > > of 2, not by 100: the two scales can never match up anywhere except at > > multiples of 25%, leading to irregular increments when rounded to an > > integer percentage value and uncertainty about what the bandwidth_gran > > parameter means. Round-trip conversions between the two > > representations become error-prone due to repeated rounding -- this > > proved quite fiddly to get right. Precision beyond 1% increments may > > also be available in the hardware, but is not accessible through the > > resctrl interface. > > Google users got annoyed with these rounding errors very quickly and > asked me to change the MBA interface to the raw, fixed-point value > used by the MPAM register interface. (but at least shifted down, since > the MBW_MIN/MAX fields are left-justified) That's interesting. Do you find a need to do things like step the bandwidth allocation for a control group? So, as part of a tuning regime, the bandwidth value is read out, stepped to the next distinct hardware value and written back in? That kind of thing does not map in a convenient way onto the current interface, although fire-and-forget programming of a predetermined percentage works fine. Extending my model outline, a 6-bit MPAM MBW_PART implementation might be described by: min: 1 max: 64 step size: 1 multiplier: 1 divisor: 64 How easy / difficult do you think it would be for userspace to work with this, if resctrlfs were to expose the raw control (minus the ignored bits) with that metadata? Needless to say, the max and divisor values would dependent on the hardware and possibly other factors. They would be fixed for the lifetime of a single resctrl instance at the very least. > > For backwards compatibility we probably shouldn't change that > > particular interface, but if we can avoid new instances of the same > > kind of problem then that would be a benefit: i.e., explicitly tell > > userspace how to scale a given parameter. > > MBA is not programmed by percentage on AMD, so I'm not sure why this > is considered necessary for backwards compatibility. I presumed scripts (or pre-tuned data fed through them) are in practice pretty platform-specific, so that it will upset people if the interface changes between kernel versions at least on a given hardware family. The divergence between AMD and Intel in this area is unfortunate, but absolute and proportional bandwidth measures do not really seem to be interchangeable -- so a truly unified interface may not be easy to achieve either. Having two control names in the interface might work, say: MBP: proportion of total available memory bandwidth (%) MBA: absolute memory bandwidth (B/s) Then just expose the one that the hardware implements natively (while still exposing MB as a backwards compatible alias if necessary). Cheers ---Dave
On Fri, May 09, 2025 at 04:46:30PM +0200, Peter Newman wrote: > Hi Dave, > > On Fri, May 9, 2025 at 1:29 PM Dave Martin <Dave.Martin@arm.com> wrote: > > > > Hi, > > > > (Backtrace retained for context -- see my comment at the end.) > > > > Cheers > > ---Dave > > > > [...] > > > > On Thu, May 08, 2025 at 04:45:21PM -0700, Reinette Chatre wrote: > > > Hi Tony, > > > > > > On 5/8/25 1:28 PM, Luck, Tony wrote: > > > > On Thu, May 08, 2025 at 08:49:56AM -0700, Reinette Chatre wrote: > > > >> On 4/28/25 5:33 PM, Tony Luck wrote: > > > >>> Resctrl was written with the assumption that all monitor events > > > >>> can be displayed as unsigned decimal integers. > > > >>> > > > >>> Some telemetry events provide greater precision where architecture code > > > >>> uses a fixed point format with 18 binary places. > > > >>> > > > >>> Add a "display_format" field to struct mon_evt which can specify > > > >>> that the value for the event be displayed as an integer for legacy > > > >>> events, or as a floating point value with six decimal places converted > > > >>> from the fixed point format received from architecture code. > > > >> > > > >> There was no discussion on this during the previous version. > > > >> While this version addresses the issue of architecture changing the > > > >> format it does not address the issue of how to handle different > > > >> architecture formats. With this change any architecture that may > > > >> want to support any of these events will be required to translate > > > >> whatever format it uses into the one Intel uses to be translated > > > >> again into format for user space. Do you think this is reasonable? > > > >> > > > >> Alternatively, resctrl could add additional file that contains the > > > >> format so that if an architecture in the future needs to present data > > > >> differently, an interface will exist to guide userspace how to parse it. > > > >> Creation of such user interface cannot be delayed until the time > > > >> it is needed since then these formats would be ABI. > > > > > > > > What if resctrl filesystem allows architecture to supply the number > > > > of binary places for fixed point values when enabling an event? > > > > > > This sounds good. I do not think we are in a position to come up with > > > an ideal solution. That would require assumptions of what another > > > architecture may or may not do and thus we do not have complete information. > > > > > > > > > > > That would allow h/w implementations to pick an appropriate precision > > > > for each new event. Different implementations of the same event > > > > (e.g. "core_energy") may pick different precision across architectures > > > > or between generations of the same architecture. > > > > > > > > File system code can then do: > > > > > > > > if (binary_places == 0) > > > > display as integer > > > > else > > > > convert to floating point (with one decimal place per > > > > three binary places) > > > > > > I do not think this problem needs to be solved in this work but there needs > > > to be a plan for how other architectures can be supported. When similar > > > enabling needs to be done for that hypothetical architecture then it can > > > be implemented ... if it is still valid based on what that architecture actually > > > supports. > > > It may be sufficient for the "plan" (as above) to be in comments. > > > > > > > > > > > Existing events are all integers and won't change (it would be weird > > > > for an architecture to report "mbm_local_bytes" with a fixed point > > > > rather than integer value). > > > > > > > > New events may report in either integer or floating point format > > > > with varying amounts of precision. But I'm not sure that would be > > > > > > Partly this will depend on the unit of measurement that should form part of > > > the definition of the event. For example, events reporting cycles or ticks > > > should only be integer, no? > > > > > > > a burden for writing tools that can run on different architectures. > > > > > > Maybe just a comment in the docs then ... and now I see that you did > > > so already. My apologies, I did not get to the last four patches. > > > > > > Reinette > > > > Just a thought, but I think that while it's not possible to be fully > > generic, a parameter model along the lines of > > > > quantity = raw_value * ((double)multiplier / divisor) * BASE_UNIT > > > > would cover most things that we have or can reasonably foresee, > > including memory bandwidth control values. > > raw_value, multiplier and divisor would all be integers. > > > > Since raw_integer can be the value used by the hardware, its precision > > can probably be fixed at 1, though we could still report it explicitly. > > > > Fundamental base units would be things like "byte", "bytes per second" > > and "none" (i.e., dimensionless quantities). (Are there others?) The energy telemetry counters implemented in this series have: core_energy: Units are Joules activity: Units are Farads (Dynamic Capicitance or "CDyn") Each of these is reported by h/w as a fixed-point binary value with 18 binary places. I'm proposing reporing these as floating point decimal value with six decimal places (since 1/2^18 ~= 0.0000038) Loss of precision from conversion to decimal is likely far smaller than the error bars on the estimation of these values). > > > > Since we cannot guess for certain what userspace wants to do with the > > values, it feels better to let userspace do any scaling calculations > > itself, rather than trying to prettify the interface. > > > > For example: scaling memory bandwidth percentages for MPAM is a > > nuisance because the hardware uses fixed-point values scaled by a power > > of 2, not by 100: the two scales can never match up anywhere except at > > multiples of 25%, leading to irregular increments when rounded to an > > integer percentage value and uncertainty about what the bandwidth_gran > > parameter means. Round-trip conversions between the two > > representations become error-prone due to repeated rounding -- this > > proved quite fiddly to get right. Precision beyond 1% increments may > > also be available in the hardware, but is not accessible through the > > resctrl interface. Sounds like my fixed-point proposal could be useful for these memory bandwidth values. > Google users got annoyed with these rounding errors very quickly and > asked me to change the MBA interface to the raw, fixed-point value > used by the MPAM register interface. (but at least shifted down, since > the MBW_MIN/MAX fields are left-justified) > > > > > For backwards compatibility we probably shouldn't change that > > particular interface, but if we can avoid new instances of the same > > kind of problem then that would be a benefit: i.e., explicitly tell > > userspace how to scale a given parameter. > > MBA is not programmed by percentage on AMD, so I'm not sure why this > is considered necessary for backwards compatibility. Upcoming "region aware" Intel RDT features also present challenges for backward compatibility as there are going to be separate counters and controls for each region. Maintaining the use of "percentage" for controls only gives feeling of familiarity while any tools that are using the "MB:" line in the schemata file aren't going to work at all. Sadly, we won't have a direct "MBytes/second" interface (though our goal is to get to that some day). H/w interface values for throttling change from legacy 0,10,20,...90 (no throttle up to max) to 255,254...1 (max bandwidth to min). There's also the bonus feature that memory bandwidth limits can specify min and max values so hardware can grant jobs some amount of extra bandwidth when the system is not busy, or throttle just the low priority jobs when approaching capacity limits. > -Peter -Tony
© 2016 - 2025 Red Hat, Inc.