[v13] x86,fs/resctrl telemetry monitoring

[PATCH v13 25/32] x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG

Posted by Tony Luck 3 months, 1 week ago

There are now three meanings for "number of RMIDs":

1) The number for legacy features enumerated by CPUID leaf 0xF. This
is the maximum number of distinct values that can be loaded into
MSR_IA32_PQR_ASSOC. Note that systems with Sub-NUMA Cluster mode enabled
will force scaling down the CPUID enumerated value by the number of SNC
nodes per L3-cache.

2) The number of registers in MMIO space for each event. This
is enumerated in the XML files and is the value initialized into
event_group::num_rmids.

3) The number of "hardware counters" (this isn't a strictly accurate
description of how things work, but serves as a useful analogy that
does describe the limitations) feeding to those MMIO registers. This
is enumerated in telemetry_region::num_rmids returned from the call to
intel_pmt_get_regions_by_feature()

Event groups with insufficient "hardware counters" to track all RMIDs
are difficult for users to use, since the system may reassign "hardware
counters" at any time. This means that users cannot reliably collect
two consecutive event counts to compute the rate at which events are
occurring.

Introduce rdt_set_feature_disabled() to mark any under-resourced event groups
(those with telemetry_region::num_rmids < event_group::num_rmids  for any of
the event group's telemetry regions) as unusable.  Note that the rdt_options[]
structure must now be writable at run-time.

Limit an under-resourced event group's number of possible monitor
resource groups to the lowest number of "hardware counters" if the
user explicitly requests to enable it.

Scan all enabled event groups and assign the RDT_RESOURCE_PERF_PKG
resource "num_rmids" value to the smallest of these values as this value
will be used later to compare against the number of RMIDs supported
by other resources to determine how many monitoring resource groups
are supported.

N.B. Change type of rdt_resource::num_rmid to u32 to match type of
event_group::num_rmids so that min(r->num_rmid, e->num_rmids) won't
complain about mixing signed and unsigned types.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/resctrl.h                 |  2 +-
 arch/x86/kernel/cpu/resctrl/internal.h  |  2 +
 arch/x86/kernel/cpu/resctrl/core.c      | 18 +++++++-
 arch/x86/kernel/cpu/resctrl/intel_aet.c | 55 +++++++++++++++++++++++++
 fs/resctrl/rdtgroup.c                   |  2 +-
 5 files changed, 76 insertions(+), 3 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 34ad0f5f1309..a2bf335052d6 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -292,7 +292,7 @@ enum resctrl_schema_fmt {
  *			events of monitor groups created via mkdir.
  */
 struct resctrl_mon {
-	int			num_rmid;
+	u32			num_rmid;
 	unsigned int		mbm_cfg_mask;
 	int			num_mbm_cntrs;
 	bool			mbm_cntr_assignable;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index e3710b9f993e..cea76f88422c 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -227,6 +227,8 @@ void resctrl_arch_mbm_cntr_assign_set_one(struct rdt_resource *r);
 
 bool rdt_is_feature_enabled(char *name);
 
+void rdt_set_feature_disabled(char *name);
+
 #ifdef CONFIG_X86_CPU_RESCTRL_INTEL_AET
 bool intel_aet_get_events(void);
 void __exit intel_aet_exit(void);
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 7013911d3575..a8eb197e27db 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -788,7 +788,7 @@ struct rdt_options {
 	bool	force_off, force_on;
 };
 
-static struct rdt_options rdt_options[]  __ro_after_init = {
+static struct rdt_options rdt_options[] = {
 	RDT_OPT(RDT_FLAG_CMT,	    "cmt",	X86_FEATURE_CQM_OCCUP_LLC),
 	RDT_OPT(RDT_FLAG_MBM_TOTAL, "mbmtotal", X86_FEATURE_CQM_MBM_TOTAL),
 	RDT_OPT(RDT_FLAG_MBM_LOCAL, "mbmlocal", X86_FEATURE_CQM_MBM_LOCAL),
@@ -851,6 +851,22 @@ bool rdt_cpu_has(int flag)
 	return ret;
 }
 
+/*
+ * Can be called during feature enumeration if sanity check of
+ * a feature's parameters indicates problems with the feature.
+ */
+void rdt_set_feature_disabled(char *name)
+{
+	struct rdt_options *o;
+
+	for (o = rdt_options; o < &rdt_options[NUM_RDT_OPTIONS]; o++) {
+		if (!strcmp(name, o->name)) {
+			o->force_off = true;
+			return;
+		}
+	}
+}
+
 /*
  * Hardware features that do not have X86_FEATURE_* bits. There is no
  * "hardware does not support this at all" case. Assume that the caller
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index 781ca8ede39e..252a3fd4260c 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -25,6 +25,7 @@
 #include <linux/intel_pmt_features.h>
 #include <linux/intel_vsec.h>
 #include <linux/io.h>
+#include <linux/minmax.h>
 #include <linux/overflow.h>
 #include <linux/printk.h>
 #include <linux/rculist.h>
@@ -57,6 +58,7 @@ struct pmt_event {
  * struct event_group - All information about a group of telemetry events.
  * @feature:		Argument to intel_pmt_get_regions_by_feature() to
  *			discover if this event_group is supported.
+ * @name:		Name for this group (used by boot rdt= option)
  * @pfg:		Points to the aggregated telemetry space information
  *			returned by the intel_pmt_get_regions_by_feature()
  *			call to the INTEL_PMT_TELEMETRY driver that contains
@@ -64,6 +66,10 @@ struct pmt_event {
  *			Valid if the system supports the event group.
  *			NULL otherwise.
  * @guid:		Unique number per XML description file.
+ * @num_rmids:		Number of RMIDs supported by this group. May be
+ *			adjusted downwards if enumeration from
+ *			intel_pmt_get_regions_by_feature() indicates fewer
+ *			RMIDs can be tracked simultaneously.
  * @mmio_size:		Number of bytes of MMIO registers for this group.
  * @num_events:		Number of events in this group.
  * @evts:		Array of event descriptors.
@@ -71,10 +77,12 @@ struct pmt_event {
 struct event_group {
 	/* Data fields for additional structures to manage this group. */
 	enum pmt_feature_id		feature;
+	char				*name;
 	struct pmt_feature_group	*pfg;
 
 	/* Remaining fields initialized from XML file. */
 	u32				guid;
+	u32				num_rmids;
 	size_t				mmio_size;
 	unsigned int			num_events;
 	struct pmt_event		evts[] __counted_by(num_events);
@@ -89,7 +97,9 @@ struct event_group {
  */
 static struct event_group energy_0x26696143 = {
 	.feature	= FEATURE_PER_RMID_ENERGY_TELEM,
+	.name		= "energy",
 	.guid		= 0x26696143,
+	.num_rmids	= 576,
 	.mmio_size	= XML_MMIO_SIZE(576, 2, 3),
 	.num_events	= 2,
 	.evts		= {
@@ -104,7 +114,9 @@ static struct event_group energy_0x26696143 = {
  */
 static struct event_group perf_0x26557651 = {
 	.feature	= FEATURE_PER_RMID_PERF_TELEM,
+	.name		= "perf",
 	.guid		= 0x26557651,
+	.num_rmids	= 576,
 	.mmio_size	= XML_MMIO_SIZE(576, 7, 3),
 	.num_events	= 7,
 	.evts		= {
@@ -174,11 +186,54 @@ static bool group_has_usable_regions(struct event_group *e, struct pmt_feature_g
 	return usable_regions;
 }
 
+static bool all_regions_have_sufficient_rmid(struct event_group *e, struct pmt_feature_group
+*p)
+{
+	struct telemetry_region *tr;
+	bool ret = true;
+
+	for (int i = 0; i < p->count; i++) {
+		if (!p->regions[i].addr)
+			continue;
+		tr = &p->regions[i];
+		if (tr->num_rmids < e->num_rmids)
+			ret = false;
+	}
+
+	return ret;
+}
+
 static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
 {
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
+
 	if (!group_has_usable_regions(e, p))
 		return false;
 
+	/* Disable feature if insufficient RMIDs */
+	if (!all_regions_have_sufficient_rmid(e, p))
+		rdt_set_feature_disabled(e->name);
+
+	/* User can override above disable from kernel command line */
+	if (!rdt_is_feature_enabled(e->name))
+		return false;
+
+	for (int i = 0; i < p->count; i++) {
+		if (!p->regions[i].addr)
+			continue;
+		/*
+		 * e->num_rmids only adjusted lower if user (via rdt= kernel
+		 * parameter) forces an event group with insufficient RMID
+		 * to be enabled.
+		 */
+		e->num_rmids = min(e->num_rmids, p->regions[i].num_rmids);
+	}
+
+	if (r->mon.num_rmid)
+		r->mon.num_rmid = min(r->mon.num_rmid, e->num_rmids);
+	else
+		r->mon.num_rmid = e->num_rmids;
+
 	for (int j = 0; j < e->num_events; j++)
 		resctrl_enable_mon_event(e->evts[j].id, true,
 					 e->evts[j].bin_bits, &e->evts[j]);
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 84336b6e1679..b67faf6a5012 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1135,7 +1135,7 @@ static int rdt_num_rmids_show(struct kernfs_open_file *of,
 {
 	struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
 
-	seq_printf(seq, "%d\n", r->mon.num_rmid);
+	seq_printf(seq, "%u\n", r->mon.num_rmid);
 
 	return 0;
 }
-- 
2.51.0

Re: [PATCH v13 25/32] x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG

Posted by Reinette Chatre 2 months, 3 weeks ago

Hi Tony,

On 10/29/25 9:21 AM, Tony Luck wrote:
> There are now three meanings for "number of RMIDs":
> 
> 1) The number for legacy features enumerated by CPUID leaf 0xF. This
> is the maximum number of distinct values that can be loaded into
> MSR_IA32_PQR_ASSOC. Note that systems with Sub-NUMA Cluster mode enabled
> will force scaling down the CPUID enumerated value by the number of SNC
> nodes per L3-cache.
> 
> 2) The number of registers in MMIO space for each event. This
> is enumerated in the XML files and is the value initialized into
> event_group::num_rmids.
> 
> 3) The number of "hardware counters" (this isn't a strictly accurate
> description of how things work, but serves as a useful analogy that
> does describe the limitations) feeding to those MMIO registers. This
> is enumerated in telemetry_region::num_rmids returned from the call to
> intel_pmt_get_regions_by_feature()
> 
> Event groups with insufficient "hardware counters" to track all RMIDs
> are difficult for users to use, since the system may reassign "hardware
> counters" at any time. This means that users cannot reliably collect
> two consecutive event counts to compute the rate at which events are
> occurring.
> 
> Introduce rdt_set_feature_disabled() to mark any under-resourced event groups
> (those with telemetry_region::num_rmids < event_group::num_rmids  for any of

Extra space above, also note how line lengths differ between paragraphs.

> the event group's telemetry regions) as unusable.  Note that the rdt_options[]
> structure must now be writable at run-time.
> 
> Limit an under-resourced event group's number of possible monitor
> resource groups to the lowest number of "hardware counters" if the
> user explicitly requests to enable it.
> 
> Scan all enabled event groups and assign the RDT_RESOURCE_PERF_PKG
> resource "num_rmids" value to the smallest of these values as this value
> will be used later to compare against the number of RMIDs supported
> by other resources to determine how many monitoring resource groups
> are supported.
> 
> N.B. Change type of rdt_resource::num_rmid to u32 to match type of

resctrl_mon::num_rmid

> event_group::num_rmids so that min(r->num_rmid, e->num_rmids) won't
> complain about mixing signed and unsigned types.

May be worthwhile to highlight that resctrl_mon::num_rmid is already
used as a u32 so changing its type does not need to cascade into all
users. Something like "Change type of resctrl_mon::num_rmid to u32 to
match its usage and the type of ..."


> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> index 781ca8ede39e..252a3fd4260c 100644
> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -25,6 +25,7 @@
>  #include <linux/intel_pmt_features.h>
>  #include <linux/intel_vsec.h>
>  #include <linux/io.h>
> +#include <linux/minmax.h>
>  #include <linux/overflow.h>
>  #include <linux/printk.h>
>  #include <linux/rculist.h>
> @@ -57,6 +58,7 @@ struct pmt_event {
>   * struct event_group - All information about a group of telemetry events.
>   * @feature:		Argument to intel_pmt_get_regions_by_feature() to
>   *			discover if this event_group is supported.
> + * @name:		Name for this group (used by boot rdt= option)
>   * @pfg:		Points to the aggregated telemetry space information
>   *			returned by the intel_pmt_get_regions_by_feature()
>   *			call to the INTEL_PMT_TELEMETRY driver that contains
> @@ -64,6 +66,10 @@ struct pmt_event {
>   *			Valid if the system supports the event group.
>   *			NULL otherwise.
>   * @guid:		Unique number per XML description file.
> + * @num_rmids:		Number of RMIDs supported by this group. May be
> + *			adjusted downwards if enumeration from
> + *			intel_pmt_get_regions_by_feature() indicates fewer
> + *			RMIDs can be tracked simultaneously.
>   * @mmio_size:		Number of bytes of MMIO registers for this group.
>   * @num_events:		Number of events in this group.
>   * @evts:		Array of event descriptors.
> @@ -71,10 +77,12 @@ struct pmt_event {
>  struct event_group {
>  	/* Data fields for additional structures to manage this group. */
>  	enum pmt_feature_id		feature;
> +	char				*name;
>  	struct pmt_feature_group	*pfg;
>  
>  	/* Remaining fields initialized from XML file. */
>  	u32				guid;
> +	u32				num_rmids;

This addition is to resctrl where there is already a resctrl_mon::num_rmid.
Can this naming be made consistent with resctrl instead of INTEL_PMT_TELEMETRY?

>  	size_t				mmio_size;
>  	unsigned int			num_events;
>  	struct pmt_event		evts[] __counted_by(num_events);
> @@ -89,7 +97,9 @@ struct event_group {
>   */
>  static struct event_group energy_0x26696143 = {
>  	.feature	= FEATURE_PER_RMID_ENERGY_TELEM,
> +	.name		= "energy",
>  	.guid		= 0x26696143,
> +	.num_rmids	= 576,
>  	.mmio_size	= XML_MMIO_SIZE(576, 2, 3),
>  	.num_events	= 2,
>  	.evts		= {
> @@ -104,7 +114,9 @@ static struct event_group energy_0x26696143 = {
>   */
>  static struct event_group perf_0x26557651 = {
>  	.feature	= FEATURE_PER_RMID_PERF_TELEM,
> +	.name		= "perf",
>  	.guid		= 0x26557651,
> +	.num_rmids	= 576,
>  	.mmio_size	= XML_MMIO_SIZE(576, 7, 3),
>  	.num_events	= 7,
>  	.evts		= {
> @@ -174,11 +186,54 @@ static bool group_has_usable_regions(struct event_group *e, struct pmt_feature_g
>  	return usable_regions;
>  }
>  
> +static bool all_regions_have_sufficient_rmid(struct event_group *e, struct pmt_feature_group
> +*p)
> +{
> +	struct telemetry_region *tr;
> +	bool ret = true;
> +
> +	for (int i = 0; i < p->count; i++) {
> +		if (!p->regions[i].addr)
> +			continue;
> +		tr = &p->regions[i];
> +		if (tr->num_rmids < e->num_rmids)
> +			ret = false;
> +	}
> +
> +	return ret;
> +}
> +
>  static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
>  {
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
> +
>  	if (!group_has_usable_regions(e, p))
>  		return false;
>  
> +	/* Disable feature if insufficient RMIDs */
> +	if (!all_regions_have_sufficient_rmid(e, p))
> +		rdt_set_feature_disabled(e->name);
> +
> +	/* User can override above disable from kernel command line */
> +	if (!rdt_is_feature_enabled(e->name))
> +		return false;

Considering this from the user's perspective I do not think there is an easy way for user space
to know that the feature was force disabled and that there is an option to override this.
What is the use case considered here? If I understand correctly there may be a way to deduce this
by consulting files in /sys/class/intel_pmt/ where the telemetry regions' number of supported
RMID is printed. A user can compare that to the XML files to determine that there is an issue
that can explain resctrl not exposing the feature. Considering this difficulty I wonder if it
may be helpful to print an error message here? This is of course not perfect since the kernel
log cannot be guaranteed to forever contain it ... but it may help?

Reinette

Re: [PATCH v13 25/32] x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG

Posted by Luck, Tony 2 months, 3 weeks ago

On Thu, Nov 13, 2025 at 02:51:45PM -0800, Reinette Chatre wrote:
> >  static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
> >  {
> > +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
> > +
> >  	if (!group_has_usable_regions(e, p))
> >  		return false;
> >  
> > +	/* Disable feature if insufficient RMIDs */
> > +	if (!all_regions_have_sufficient_rmid(e, p))
> > +		rdt_set_feature_disabled(e->name);
> > +
> > +	/* User can override above disable from kernel command line */
> > +	if (!rdt_is_feature_enabled(e->name))
> > +		return false;
> 
> Considering this from the user's perspective I do not think there is an easy way for user space
> to know that the feature was force disabled and that there is an option to override this.
> What is the use case considered here? If I understand correctly there may be a way to deduce this
> by consulting files in /sys/class/intel_pmt/ where the telemetry regions' number of supported
> RMID is printed. A user can compare that to the XML files to determine that there is an issue
> that can explain resctrl not exposing the feature. Considering this difficulty I wonder if it
> may be helpful to print an error message here? This is of course not perfect since the kernel
> log cannot be guaranteed to forever contain it ... but it may help?

Reinette,

Good idea. Here's a draft. Comments welcome to improve the user message
that would look like:

resctrl: Feature energy guid=0x26696143 not enabled due to insufficient RMIDs


static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
{
	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
	bool warn_disable = false;

	if (!group_has_usable_regions(e, p))
		return false;

	/* Disable feature if insufficient RMIDs */
	if (!all_regions_have_sufficient_rmid(e, p)) {
		warn_disable = true;
		rdt_set_feature_disabled(e->name);
	}

	/* User can override above disable from kernel command line */
	if (!rdt_is_feature_enabled(e->name)) {
		if (warn_disable)
			pr_info("Feature %s guid=0x%x not enabled due to insufficient RMIDs\n",
				e->name, e->guid);
		return false;
	}
	...
}

-Tony

Re: [PATCH v13 25/32] x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG

Posted by Reinette Chatre 2 months, 3 weeks ago

Hi Tony,

On 11/14/25 1:55 PM, Luck, Tony wrote:
> 
> resctrl: Feature energy guid=0x26696143 not enabled due to insufficient RMIDs
> 
> 
> static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
> {
> 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
> 	bool warn_disable = false;
> 
> 	if (!group_has_usable_regions(e, p))
> 		return false;
> 
> 	/* Disable feature if insufficient RMIDs */
> 	if (!all_regions_have_sufficient_rmid(e, p)) {
> 		warn_disable = true;
> 		rdt_set_feature_disabled(e->name);
> 	}
> 
> 	/* User can override above disable from kernel command line */
> 	if (!rdt_is_feature_enabled(e->name)) {
> 		if (warn_disable)
> 			pr_info("Feature %s guid=0x%x not enabled due to insufficient RMIDs\n",
> 				e->name, e->guid);
> 		return false;
> 	}
> 	...
> }

Thank you for considering. This looks good to me.

I now realize that if a system supports, for example, two energy guid and only one has insufficient
RMID then one or both may be disabled by default depending on which resctrl attempts to enable
first. This is arbitrary based on where the event group appears in the array.

How a system with two guid of the same feature type would work is not clear to me though. Looks
like they cannot share events at all since an event is uniquely associated with a struct pmt_event
that can belong to only one event group. If they may share events then enable_events()->resctrl_enable_mon_event()
will complain loudly but still proceed and allow the event group to be enabled.
I think the resctrl_enable_mon_event() warnings were added to support enabling of new features
so that the WARNs can catch issues during development ... now it may encounter issues when a
kernel with this implementation is run on a system that supports a single feature with
multiple guid. Do you have more insight in how the "single feature with multiple guid" may look to
better prepare resctrl to handle them?

Should "enable_events" be split so that a feature can be disabled for all its event groups if
any of them cannot be enabled due to insufficient RMIDs?
Perhaps resctrl_enable_mon_event() should also now return success/fail so that an event group
cannot be enabled if its events cannot be enabled?
Finally, a system with two guid of the same feature type will end up printing duplicate
"<feature type> monitoring detected" that could be more descriptive?

Reinette

Re: [PATCH v13 25/32] x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG

Posted by Luck, Tony 2 months, 3 weeks ago

On Fri, Nov 14, 2025 at 03:26:42PM -0800, Reinette Chatre wrote:
> Hi Tony,
> 
> On 11/14/25 1:55 PM, Luck, Tony wrote:
> > 
> > resctrl: Feature energy guid=0x26696143 not enabled due to insufficient RMIDs
> > 
> > 
> > static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
> > {
> > 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
> > 	bool warn_disable = false;
> > 
> > 	if (!group_has_usable_regions(e, p))
> > 		return false;
> > 
> > 	/* Disable feature if insufficient RMIDs */
> > 	if (!all_regions_have_sufficient_rmid(e, p)) {
> > 		warn_disable = true;
> > 		rdt_set_feature_disabled(e->name);
> > 	}
> > 
> > 	/* User can override above disable from kernel command line */
> > 	if (!rdt_is_feature_enabled(e->name)) {
> > 		if (warn_disable)
> > 			pr_info("Feature %s guid=0x%x not enabled due to insufficient RMIDs\n",
> > 				e->name, e->guid);
> > 		return false;
> > 	}
> > 	...
> > }
> 
> Thank you for considering. This looks good to me.
> 
> I now realize that if a system supports, for example, two energy guid and only one has insufficient
> RMID then one or both may be disabled by default depending on which resctrl attempts to enable
> first. This is arbitrary based on where the event group appears in the array.

intel_pmt_get_regions_by_feature() does return arrays of telemetry_region
with different guids today, but not currently for the "RMID" features.
So this could be a problem in the future.

I think I need to drop the "rdt=perf,!energy" command line control as
being too coarse. Instead add a new boot argument. E.g.

	rdtguid=0x26696143,!0x26557651

to give the user control per-guid instead of per-pmt_feature_id. Users
can discover which guids are supported on a system by looking in
/sys/bus/auxiliary/devices/intel_vsec.discovery.*/intel_pmt/features*/per_rmid*
where there are "guids" and "num_rmids" files.

> How a system with two guid of the same feature type would work is not clear to me though. Looks
> like they cannot share events at all since an event is uniquely associated with a struct pmt_event
> that can belong to only one event group. If they may share events then enable_events()->resctrl_enable_mon_event()
> will complain loudly but still proceed and allow the event group to be enabled.

I can't see a good reason why the same event would be enabled under
different guids present on the same system. We can revisit my assumption
if the "Duplicate enable for event" message shows up.

> I think the resctrl_enable_mon_event() warnings were added to support enabling of new features
> so that the WARNs can catch issues during development ... now it may encounter issues when a
> kernel with this implementation is run on a system that supports a single feature with
> multiple guid. Do you have more insight in how the "single feature with multiple guid" may look to
> better prepare resctrl to handle them?
> 
> Should "enable_events" be split so that a feature can be disabled for all its event groups if
> any of them cannot be enabled due to insufficient RMIDs?
> Perhaps resctrl_enable_mon_event() should also now return success/fail so that an event group
> cannot be enabled if its events cannot be enabled?
> Finally, a system with two guid of the same feature type will end up printing duplicate
> "<feature type> monitoring detected" that could be more descriptive?

I need to add the guid to that message.
> 
> Reinette

-Tony

Re: [PATCH v13 25/32] x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG

Posted by Reinette Chatre 2 months, 3 weeks ago

Hi Tony,

On 11/17/25 8:37 AM, Luck, Tony wrote:
> On Fri, Nov 14, 2025 at 03:26:42PM -0800, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 11/14/25 1:55 PM, Luck, Tony wrote:
>>>
>>> resctrl: Feature energy guid=0x26696143 not enabled due to insufficient RMIDs
>>>
>>>
>>> static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
>>> {
>>> 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
>>> 	bool warn_disable = false;
>>>
>>> 	if (!group_has_usable_regions(e, p))
>>> 		return false;
>>>
>>> 	/* Disable feature if insufficient RMIDs */
>>> 	if (!all_regions_have_sufficient_rmid(e, p)) {
>>> 		warn_disable = true;
>>> 		rdt_set_feature_disabled(e->name);
>>> 	}
>>>
>>> 	/* User can override above disable from kernel command line */
>>> 	if (!rdt_is_feature_enabled(e->name)) {
>>> 		if (warn_disable)
>>> 			pr_info("Feature %s guid=0x%x not enabled due to insufficient RMIDs\n",
>>> 				e->name, e->guid);
>>> 		return false;
>>> 	}
>>> 	...
>>> }
>>
>> Thank you for considering. This looks good to me.
>>
>> I now realize that if a system supports, for example, two energy guid and only one has insufficient
>> RMID then one or both may be disabled by default depending on which resctrl attempts to enable
>> first. This is arbitrary based on where the event group appears in the array.
> 
> intel_pmt_get_regions_by_feature() does return arrays of telemetry_region
> with different guids today, but not currently for the "RMID" features.
> So this could be a problem in the future.
> 
> I think I need to drop the "rdt=perf,!energy" command line control as
> being too coarse. Instead add a new boot argument. E.g.
> 
> 	rdtguid=0x26696143,!0x26557651
> 
> to give the user control per-guid instead of per-pmt_feature_id. Users
> can discover which guids are supported on a system by looking in
> /sys/bus/auxiliary/devices/intel_vsec.discovery.*/intel_pmt/features*/per_rmid*
> where there are "guids" and "num_rmids" files.

Should disable/enable be per RMID telemetry feature? I do not see anything preventing a system from
using the same guid for different RMID telemetry features.

I think it will be useful to look at how other kernel parameters distinguish different
categories of parameters so that resctrl can be consistent here. Looks like an underscore is
most useful and also flexible since it allows both a dash and underscore to be used.

Another alternative that is common in kernel parameters is to use ":". For example, 
	rdt=energy:0x26696143

With something like above user can, for example, use just "energy" to disable all RMID energy
telemetry or be specific to which guid should be disabled. This seems to fit well with existing
rdt parameters and be quite flexible.

> 
>> How a system with two guid of the same feature type would work is not clear to me though. Looks
>> like they cannot share events at all since an event is uniquely associated with a struct pmt_event
>> that can belong to only one event group. If they may share events then enable_events()->resctrl_enable_mon_event()
>> will complain loudly but still proceed and allow the event group to be enabled.
> 
> I can't see a good reason why the same event would be enabled under
> different guids present on the same system. We can revisit my assumption
> if the "Duplicate enable for event" message shows up.

This would be difficult to handle at that time, no? From what I can tell this would enable
an unusable event group to actually be enabled resulting in untested and invalid flows.
I think it will be safer to not enable an event group in this scenario and seems to math your
expectation that this would be unexpected. The "Duplicate enable for event" message will still
appear and we can still revisit those assumptions when they do, but the systems encountering
them will not be running with enabled event groups that are not actually fully enabled.

> 
>> I think the resctrl_enable_mon_event() warnings were added to support enabling of new features
>> so that the WARNs can catch issues during development ... now it may encounter issues when a
>> kernel with this implementation is run on a system that supports a single feature with
>> multiple guid. Do you have more insight in how the "single feature with multiple guid" may look to
>> better prepare resctrl to handle them?
>>
>> Should "enable_events" be split so that a feature can be disabled for all its event groups if
>> any of them cannot be enabled due to insufficient RMIDs?
>> Perhaps resctrl_enable_mon_event() should also now return success/fail so that an event group
>> cannot be enabled if its events cannot be enabled?
>> Finally, a system with two guid of the same feature type will end up printing duplicate
>> "<feature type> monitoring detected" that could be more descriptive?
> 
> I need to add the guid to that message.

Sounds good. Thank you.

Reinette

Re: [PATCH v13 25/32] x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG

Posted by Luck, Tony 2 months, 3 weeks ago

On Mon, Nov 17, 2025 at 09:31:41AM -0800, Reinette Chatre wrote:
> Hi Tony,
> 
> On 11/17/25 8:37 AM, Luck, Tony wrote:
> > On Fri, Nov 14, 2025 at 03:26:42PM -0800, Reinette Chatre wrote:
> >> Hi Tony,
> >>
> >> On 11/14/25 1:55 PM, Luck, Tony wrote:
> >>>
> >>> resctrl: Feature energy guid=0x26696143 not enabled due to insufficient RMIDs
> >>>
> >>>
> >>> static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
> >>> {
> >>> 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
> >>> 	bool warn_disable = false;
> >>>
> >>> 	if (!group_has_usable_regions(e, p))
> >>> 		return false;
> >>>
> >>> 	/* Disable feature if insufficient RMIDs */
> >>> 	if (!all_regions_have_sufficient_rmid(e, p)) {
> >>> 		warn_disable = true;
> >>> 		rdt_set_feature_disabled(e->name);
> >>> 	}
> >>>
> >>> 	/* User can override above disable from kernel command line */
> >>> 	if (!rdt_is_feature_enabled(e->name)) {
> >>> 		if (warn_disable)
> >>> 			pr_info("Feature %s guid=0x%x not enabled due to insufficient RMIDs\n",
> >>> 				e->name, e->guid);
> >>> 		return false;
> >>> 	}
> >>> 	...
> >>> }
> >>
> >> Thank you for considering. This looks good to me.
> >>
> >> I now realize that if a system supports, for example, two energy guid and only one has insufficient
> >> RMID then one or both may be disabled by default depending on which resctrl attempts to enable
> >> first. This is arbitrary based on where the event group appears in the array.
> > 
> > intel_pmt_get_regions_by_feature() does return arrays of telemetry_region
> > with different guids today, but not currently for the "RMID" features.
> > So this could be a problem in the future.
> > 
> > I think I need to drop the "rdt=perf,!energy" command line control as
> > being too coarse. Instead add a new boot argument. E.g.
> > 
> > 	rdtguid=0x26696143,!0x26557651
> > 
> > to give the user control per-guid instead of per-pmt_feature_id. Users
> > can discover which guids are supported on a system by looking in
> > /sys/bus/auxiliary/devices/intel_vsec.discovery.*/intel_pmt/features*/per_rmid*
> > where there are "guids" and "num_rmids" files.
> 
> Should disable/enable be per RMID telemetry feature? I do not see anything preventing a system from
> using the same guid for different RMID telemetry features.
> 
> I think it will be useful to look at how other kernel parameters distinguish different
> categories of parameters so that resctrl can be consistent here. Looks like an underscore is
> most useful and also flexible since it allows both a dash and underscore to be used.
> 
> Another alternative that is common in kernel parameters is to use ":". For example, 
> 	rdt=energy:0x26696143
> 
> With something like above user can, for example, use just "energy" to disable all RMID energy
> telemetry or be specific to which guid should be disabled. This seems to fit well with existing
> rdt parameters and be quite flexible.

See rough patch at foot of this e-mail. It's just on top of my WIP v14,
but if it looks like the right direction I will merge it into the series
in the patch that adds the energy/perf options.

> > 
> >> How a system with two guid of the same feature type would work is not clear to me though. Looks
> >> like they cannot share events at all since an event is uniquely associated with a struct pmt_event
> >> that can belong to only one event group. If they may share events then enable_events()->resctrl_enable_mon_event()
> >> will complain loudly but still proceed and allow the event group to be enabled.
> > 
> > I can't see a good reason why the same event would be enabled under
> > different guids present on the same system. We can revisit my assumption
> > if the "Duplicate enable for event" message shows up.
> 
> This would be difficult to handle at that time, no? From what I can tell this would enable
> an unusable event group to actually be enabled resulting in untested and invalid flows.
> I think it will be safer to not enable an event group in this scenario and seems to math your
> expectation that this would be unexpected. The "Duplicate enable for event" message will still
> appear and we can still revisit those assumptions when they do, but the systems encountering
> them will not be running with enabled event groups that are not actually fully enabled.

There's a hardware cost to including an event in an aggregator.
Inclusing the same event in mutliple aggregators described by
different guids is really something that should never happen.
Just printing a warning and skipping the event seems an adequate
defense.
> 
> > 
> >> I think the resctrl_enable_mon_event() warnings were added to support enabling of new features
> >> so that the WARNs can catch issues during development ... now it may encounter issues when a
> >> kernel with this implementation is run on a system that supports a single feature with
> >> multiple guid. Do you have more insight in how the "single feature with multiple guid" may look to
> >> better prepare resctrl to handle them?
> >>
> >> Should "enable_events" be split so that a feature can be disabled for all its event groups if
> >> any of them cannot be enabled due to insufficient RMIDs?
> >> Perhaps resctrl_enable_mon_event() should also now return success/fail so that an event group
> >> cannot be enabled if its events cannot be enabled?
> >> Finally, a system with two guid of the same feature type will end up printing duplicate
> >> "<feature type> monitoring detected" that could be more descriptive?
> > 
> > I need to add the guid to that message.
> 
> Sounds good. Thank you.
> 
> Reinette

-Tony

---

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 08eb78acb988..25df1abc1537 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -241,6 +241,7 @@ int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64 *val);
 void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r,
 				struct list_head *add_pos);
 void intel_aet_add_debugfs(void);
+void intel_aet_option(bool force_off, const char *option, const char *suboption);
 #else
 static inline bool intel_aet_get_events(void) { return false; }
 static inline void __exit intel_aet_exit(void) { }
@@ -252,6 +253,7 @@ static inline int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64
 static inline void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r,
 					      struct list_head *add_pos) { }
 static inline void intel_aet_add_debugfs(void) { }
+static inline void intel_aet_option(bool force_off, const char *option, const char *suboption) { };
 #endif
 
 #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 5cae4119686e..68195f458c0b 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -842,6 +842,7 @@ static struct rdt_options rdt_options[] = {
 static int __init set_rdt_options(char *str)
 {
 	struct rdt_options *o;
+	char *suboption;
 	bool force_off;
 	char *tok;
 
@@ -851,6 +852,11 @@ static int __init set_rdt_options(char *str)
 		force_off = *tok == '!';
 		if (force_off)
 			tok++;
+		suboption = strpbrk(tok, ":");
+		if (suboption) {
+			*suboption++ = '\0';
+			intel_aet_option(force_off, tok, suboption);
+		}
 		for (o = rdt_options; o < &rdt_options[NUM_RDT_OPTIONS]; o++) {
 			if (strcmp(tok, o->name) == 0) {
 				if (force_off)
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index 6028bfec229b..b3c61bcd3e8f 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -69,6 +69,9 @@ struct pmt_event {
  *			data for all telemetry regions type @feature.
  *			Valid if the system supports the event group.
  *			NULL otherwise.
+ * @force_off:		Set true when "rdt" command line disables this @guid.
+ * @force_on:		Set true when "rdt" command line overrides disable of
+ *			this @guid due to insufficeint @num_rmid.
  * @guid:		Unique number per XML description file.
  * @num_rmid:		Number of RMIDs supported by this group. May be
  *			adjusted downwards if enumeration from
@@ -83,6 +86,7 @@ struct event_group {
 	enum pmt_feature_id		feature;
 	char				*name;
 	struct pmt_feature_group	*pfg;
+	bool				force_off, force_on;
 
 	/* Remaining fields initialized from XML file. */
 	u32				guid;
@@ -144,6 +148,26 @@ static struct event_group *known_event_groups[] = {
 	     _peg < &known_event_groups[ARRAY_SIZE(known_event_groups)];	\
 	     _peg++)
 
+void intel_aet_option(bool force_off, const char *option, const char *suboption)
+{
+	struct event_group **peg;
+	u32 guid;
+
+	if (kstrtou32(suboption, 16, &guid))
+		return;
+
+	for_each_event_group(peg) {
+		if (!strcmp(option, (*peg)->name))
+			continue;
+		if ((*peg)->guid != guid)
+			continue;
+		if (force_off)
+			(*peg)->force_off = true;
+		else
+			(*peg)->force_on = true;
+	}
+}
+
 /*
  * Clear the address field of regions that did not pass the checks in
  * skip_telem_region() so they will not be used by intel_aet_read_event().
@@ -252,6 +276,9 @@ static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
 	bool warn_disable = false;
 
+	if (e->force_off)
+		return false;
+
 	if (!group_has_usable_regions(e, p))
 		return false;
 
@@ -262,7 +289,7 @@ static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
 	}
 
 	/* User can override above disable from kernel command line */
-	if (!rdt_is_feature_enabled(e->name)) {
+	if (!rdt_is_feature_enabled(e->name) && !e->force_on) {
 		if (warn_disable)
 			pr_info("Feature %s guid=0x%x not enabled due to insufficient RMIDs\n",
 				e->name, e->guid);

Re: [PATCH v13 25/32] x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG

Posted by Reinette Chatre 2 months, 3 weeks ago

Hi Tony,

On 11/17/25 10:52 AM, Luck, Tony wrote:
> On Mon, Nov 17, 2025 at 09:31:41AM -0800, Reinette Chatre wrote:
>> On 11/17/25 8:37 AM, Luck, Tony wrote:
>>> On Fri, Nov 14, 2025 at 03:26:42PM -0800, Reinette Chatre wrote:
>>>> On 11/14/25 1:55 PM, Luck, Tony wrote:
>>>> How a system with two guid of the same feature type would work is not clear to me though. Looks
>>>> like they cannot share events at all since an event is uniquely associated with a struct pmt_event
>>>> that can belong to only one event group. If they may share events then enable_events()->resctrl_enable_mon_event()
>>>> will complain loudly but still proceed and allow the event group to be enabled.
>>>
>>> I can't see a good reason why the same event would be enabled under
>>> different guids present on the same system. We can revisit my assumption
>>> if the "Duplicate enable for event" message shows up.
>>
>> This would be difficult to handle at that time, no? From what I can tell this would enable
>> an unusable event group to actually be enabled resulting in untested and invalid flows.
>> I think it will be safer to not enable an event group in this scenario and seems to math your
>> expectation that this would be unexpected. The "Duplicate enable for event" message will still
>> appear and we can still revisit those assumptions when they do, but the systems encountering
>> them will not be running with enabled event groups that are not actually fully enabled.
> 
> There's a hardware cost to including an event in an aggregator.
> Inclusing the same event in mutliple aggregators described by
> different guids is really something that should never happen.
> Just printing a warning and skipping the event seems an adequate
> defense.

My concern is that after skipping the event there is a deceiving message that the event group was
enabled successfully.

...

> ---
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 08eb78acb988..25df1abc1537 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -241,6 +241,7 @@ int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64 *val);
>  void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r,
>  				struct list_head *add_pos);
>  void intel_aet_add_debugfs(void);
> +void intel_aet_option(bool force_off, const char *option, const char *suboption);
>  #else
>  static inline bool intel_aet_get_events(void) { return false; }
>  static inline void __exit intel_aet_exit(void) { }
> @@ -252,6 +253,7 @@ static inline int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64
>  static inline void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r,
>  					      struct list_head *add_pos) { }
>  static inline void intel_aet_add_debugfs(void) { }
> +static inline void intel_aet_option(bool force_off, const char *option, const char *suboption) { };

(nit: stray semicolon)

>  #endif
>  
>  #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 5cae4119686e..68195f458c0b 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -842,6 +842,7 @@ static struct rdt_options rdt_options[] = {
>  static int __init set_rdt_options(char *str)
>  {
>  	struct rdt_options *o;
> +	char *suboption;
>  	bool force_off;
>  	char *tok;
>  
> @@ -851,6 +852,11 @@ static int __init set_rdt_options(char *str)
>  		force_off = *tok == '!';
>  		if (force_off)
>  			tok++;
> +		suboption = strpbrk(tok, ":");
> +		if (suboption) {
> +			*suboption++ = '\0';

This looks like an open code of strsep()?

> +			intel_aet_option(force_off, tok, suboption);
> +		}

I think this can be simplified. It also looks possible to follow some patterns of existing
option handling. 

By adding the force_on/force_off members to struct event_group I do not see the perf and
energy options needed in rdt_options[] anymore. rdt_set_feature_disabled() and
rdt_is_feature_enabled() now also seems unnecessary because the event_group::force_on and
event_group::force_off are sufficient.

It looks to me that the entire token can be passed here to intel_aet_option() and it
returns 1 to indicate that it was able to "handle" the token, 0 otherwise. If intel_aet_option()
was able to handle the option then it is not necessary to do further parsing.
"handle" means that it could successfully initialize the new members of struct event_group.

So instead, how about something like:
		if (intel_aet_option(force_off, tok))
			continue;

>  		for (o = rdt_options; o < &rdt_options[NUM_RDT_OPTIONS]; o++) {
>  			if (strcmp(tok, o->name) == 0) {
>  				if (force_off)
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> index 6028bfec229b..b3c61bcd3e8f 100644
> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -69,6 +69,9 @@ struct pmt_event {
>   *			data for all telemetry regions type @feature.
>   *			Valid if the system supports the event group.
>   *			NULL otherwise.
> + * @force_off:		Set true when "rdt" command line disables this @guid.

To be consistent, can also be true if event group was found to have insufficient RMID.

> + * @force_on:		Set true when "rdt" command line overrides disable of
> + *			this @guid due to insufficeint @num_rmid.

"Set" can be dropped to be explicit of state instead of potentially confusing "Set" to
be a verb.

insufficeint -> insufficient

>   * @guid:		Unique number per XML description file.
>   * @num_rmid:		Number of RMIDs supported by this group. May be
>   *			adjusted downwards if enumeration from
> @@ -83,6 +86,7 @@ struct event_group {
>  	enum pmt_feature_id		feature;
>  	char				*name;
>  	struct pmt_feature_group	*pfg;
> +	bool				force_off, force_on;
>  
>  	/* Remaining fields initialized from XML file. */
>  	u32				guid;
> @@ -144,6 +148,26 @@ static struct event_group *known_event_groups[] = {
>  	     _peg < &known_event_groups[ARRAY_SIZE(known_event_groups)];	\
>  	     _peg++)
>  
> +void intel_aet_option(bool force_off, const char *option, const char *suboption)
> +{
> +	struct event_group **peg;
> +	u32 guid;
> +

Can use strsep() here to split provided token into name and guid. Take care to
check if guid NULL before attempting kstrtou32().

> +	if (kstrtou32(suboption, 16, &guid))
> +		return;
> +
> +	for_each_event_group(peg) {
> +		if (!strcmp(option, (*peg)->name))

!strcmp() -> strcmp()?

> +			continue;
> +		if ((*peg)->guid != guid)
> +			continue;

If no guid provided then all event groups with matching name can have
force_on/force_off member set to support user providing, for example: "!perf" to
disable all perf event groups.

> +		if (force_off)
> +			(*peg)->force_off = true;
> +		else
> +			(*peg)->force_on = true;
> +	}
> +}
> +
>  /*
>   * Clear the address field of regions that did not pass the checks in
>   * skip_telem_region() so they will not be used by intel_aet_read_event().
> @@ -252,6 +276,9 @@ static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
>  	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
>  	bool warn_disable = false;
>  
> +	if (e->force_off)
> +		return false;
> +
>  	if (!group_has_usable_regions(e, p))
>  		return false;
>  

rdt_set_feature_disabled() that is around here can be replaced with setting
event_group::force_off

> @@ -262,7 +289,7 @@ static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
>  	}
>  
>  	/* User can override above disable from kernel command line */
> -	if (!rdt_is_feature_enabled(e->name)) {
> +	if (!rdt_is_feature_enabled(e->name) && !e->force_on) {

rdt_is_feature_enabled() can be replaced with check of event_group::force_off

>  		if (warn_disable)
>  			pr_info("Feature %s guid=0x%x not enabled due to insufficient RMIDs\n",
>  				e->name, e->guid);
> 
> 

I believe changes I mention would simplify the implementation a lot while making it
more powerful to support the, for example, "!perf" use case. What do you think?

Reinette

Re: [PATCH v13 25/32] x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG

Posted by Luck, Tony 2 months, 3 weeks ago

On Tue, Nov 18, 2025 at 08:48:18AM -0800, Reinette Chatre wrote:
> Hi Tony,
> 
> On 11/17/25 10:52 AM, Luck, Tony wrote:
> > On Mon, Nov 17, 2025 at 09:31:41AM -0800, Reinette Chatre wrote:
> >> On 11/17/25 8:37 AM, Luck, Tony wrote:
> >>> On Fri, Nov 14, 2025 at 03:26:42PM -0800, Reinette Chatre wrote:
> >>>> On 11/14/25 1:55 PM, Luck, Tony wrote:
> >>>> How a system with two guid of the same feature type would work is not clear to me though. Looks
> >>>> like they cannot share events at all since an event is uniquely associated with a struct pmt_event
> >>>> that can belong to only one event group. If they may share events then enable_events()->resctrl_enable_mon_event()
> >>>> will complain loudly but still proceed and allow the event group to be enabled.
> >>>
> >>> I can't see a good reason why the same event would be enabled under
> >>> different guids present on the same system. We can revisit my assumption
> >>> if the "Duplicate enable for event" message shows up.
> >>
> >> This would be difficult to handle at that time, no? From what I can tell this would enable
> >> an unusable event group to actually be enabled resulting in untested and invalid flows.
> >> I think it will be safer to not enable an event group in this scenario and seems to math your
> >> expectation that this would be unexpected. The "Duplicate enable for event" message will still
> >> appear and we can still revisit those assumptions when they do, but the systems encountering
> >> them will not be running with enabled event groups that are not actually fully enabled.
> > 
> > There's a hardware cost to including an event in an aggregator.
> > Inclusing the same event in mutliple aggregators described by
> > different guids is really something that should never happen.
> > Just printing a warning and skipping the event seems an adequate
> > defense.
> 
> My concern is that after skipping the event there is a deceiving message that the event group was
> enabled successfully.

I can change resctrl_enable_mon_event() to return a "bool" to say
whether each event was successfully enabled.

Then change to:

	int skipped_events = 0;

	for (int j = 0; j < e->num_events; j++) {
		if (!resctrl_enable_mon_event(e->evts[j].id, true,
					      e->evts[j].bin_bits, &e->evts[j]))
			skipped_events++;
	}

	if (e->num_events == skipped_events) {
		pr_info("No events enabled in %s %s:0x%x\n", r->name, e->name, e->guid);
		return false;
	}

	if (skipped_events)
		pr_info("%s %s:0x%x monitoring detected (skipped %d events)\n", r->name,
			r->name, e->name, e->guid, skipped_events);
	else
		pr_info("%s %s:0x%x monitoring detected\n", r->name, e->name, e->guid);
> 
> ...
> 
> > ---
> > 
> > diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> > index 08eb78acb988..25df1abc1537 100644
> > --- a/arch/x86/kernel/cpu/resctrl/internal.h
> > +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> > @@ -241,6 +241,7 @@ int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64 *val);
> >  void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r,
> >  				struct list_head *add_pos);
> >  void intel_aet_add_debugfs(void);
> > +void intel_aet_option(bool force_off, const char *option, const char *suboption);
> >  #else
> >  static inline bool intel_aet_get_events(void) { return false; }
> >  static inline void __exit intel_aet_exit(void) { }
> > @@ -252,6 +253,7 @@ static inline int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64
> >  static inline void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r,
> >  					      struct list_head *add_pos) { }
> >  static inline void intel_aet_add_debugfs(void) { }
> > +static inline void intel_aet_option(bool force_off, const char *option, const char *suboption) { };
> 
> (nit: stray semicolon)
> 
> >  #endif
> >  
> >  #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
> > diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> > index 5cae4119686e..68195f458c0b 100644
> > --- a/arch/x86/kernel/cpu/resctrl/core.c
> > +++ b/arch/x86/kernel/cpu/resctrl/core.c
> > @@ -842,6 +842,7 @@ static struct rdt_options rdt_options[] = {
> >  static int __init set_rdt_options(char *str)
> >  {
> >  	struct rdt_options *o;
> > +	char *suboption;
> >  	bool force_off;
> >  	char *tok;
> >  
> > @@ -851,6 +852,11 @@ static int __init set_rdt_options(char *str)
> >  		force_off = *tok == '!';
> >  		if (force_off)
> >  			tok++;
> > +		suboption = strpbrk(tok, ":");
> > +		if (suboption) {
> > +			*suboption++ = '\0';
> 
> This looks like an open code of strsep()?
> 
> > +			intel_aet_option(force_off, tok, suboption);
> > +		}
> 
> I think this can be simplified. It also looks possible to follow some patterns of existing
> option handling. 
> 
> By adding the force_on/force_off members to struct event_group I do not see the perf and
> energy options needed in rdt_options[] anymore. rdt_set_feature_disabled() and
> rdt_is_feature_enabled() now also seems unnecessary because the event_group::force_on and
> event_group::force_off are sufficient.
> 
> It looks to me that the entire token can be passed here to intel_aet_option() and it
> returns 1 to indicate that it was able to "handle" the token, 0 otherwise. If intel_aet_option()
> was able to handle the option then it is not necessary to do further parsing.
> "handle" means that it could successfully initialize the new members of struct event_group.
> 
> So instead, how about something like:
> 		if (intel_aet_option(force_off, tok))
> 			continue;
> 
> >  		for (o = rdt_options; o < &rdt_options[NUM_RDT_OPTIONS]; o++) {
> >  			if (strcmp(tok, o->name) == 0) {
> >  				if (force_off)
> > diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> > index 6028bfec229b..b3c61bcd3e8f 100644
> > --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> > +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> > @@ -69,6 +69,9 @@ struct pmt_event {
> >   *			data for all telemetry regions type @feature.
> >   *			Valid if the system supports the event group.
> >   *			NULL otherwise.
> > + * @force_off:		Set true when "rdt" command line disables this @guid.
> 
> To be consistent, can also be true if event group was found to have insufficient RMID.
> 
> > + * @force_on:		Set true when "rdt" command line overrides disable of
> > + *			this @guid due to insufficeint @num_rmid.
> 
> "Set" can be dropped to be explicit of state instead of potentially confusing "Set" to
> be a verb.
> 
> insufficeint -> insufficient
> 
> >   * @guid:		Unique number per XML description file.
> >   * @num_rmid:		Number of RMIDs supported by this group. May be
> >   *			adjusted downwards if enumeration from
> > @@ -83,6 +86,7 @@ struct event_group {
> >  	enum pmt_feature_id		feature;
> >  	char				*name;
> >  	struct pmt_feature_group	*pfg;
> > +	bool				force_off, force_on;
> >  
> >  	/* Remaining fields initialized from XML file. */
> >  	u32				guid;
> > @@ -144,6 +148,26 @@ static struct event_group *known_event_groups[] = {
> >  	     _peg < &known_event_groups[ARRAY_SIZE(known_event_groups)];	\
> >  	     _peg++)
> >  
> > +void intel_aet_option(bool force_off, const char *option, const char *suboption)
> > +{
> > +	struct event_group **peg;
> > +	u32 guid;
> > +
> 
> Can use strsep() here to split provided token into name and guid. Take care to
> check if guid NULL before attempting kstrtou32().
> 
> > +	if (kstrtou32(suboption, 16, &guid))
> > +		return;
> > +
> > +	for_each_event_group(peg) {
> > +		if (!strcmp(option, (*peg)->name))
> 
> !strcmp() -> strcmp()?
> 
> > +			continue;
> > +		if ((*peg)->guid != guid)
> > +			continue;
> 
> If no guid provided then all event groups with matching name can have
> force_on/force_off member set to support user providing, for example: "!perf" to
> disable all perf event groups.
> 
> > +		if (force_off)
> > +			(*peg)->force_off = true;
> > +		else
> > +			(*peg)->force_on = true;
> > +	}
> > +}
> > +
> >  /*
> >   * Clear the address field of regions that did not pass the checks in
> >   * skip_telem_region() so they will not be used by intel_aet_read_event().
> > @@ -252,6 +276,9 @@ static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
> >  	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
> >  	bool warn_disable = false;
> >  
> > +	if (e->force_off)
> > +		return false;
> > +
> >  	if (!group_has_usable_regions(e, p))
> >  		return false;
> >  
> 
> rdt_set_feature_disabled() that is around here can be replaced with setting
> event_group::force_off
> 
> > @@ -262,7 +289,7 @@ static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
> >  	}
> >  
> >  	/* User can override above disable from kernel command line */
> > -	if (!rdt_is_feature_enabled(e->name)) {
> > +	if (!rdt_is_feature_enabled(e->name) && !e->force_on) {
> 
> rdt_is_feature_enabled() can be replaced with check of event_group::force_off
> 
> >  		if (warn_disable)
> >  			pr_info("Feature %s guid=0x%x not enabled due to insufficient RMIDs\n",
> >  				e->name, e->guid);
> > 
> > 
> 
> I believe changes I mention would simplify the implementation a lot while making it
> more powerful to support the, for example, "!perf" use case. What do you think?

Agreed. Simpler and more flexible. I'll make these changes.
> 
> Reinette

-Tony

Re: [PATCH v13 25/32] x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG

Posted by Reinette Chatre 2 months, 3 weeks ago

Hi Tony,

On 11/18/25 9:35 AM, Luck, Tony wrote:
> On Tue, Nov 18, 2025 at 08:48:18AM -0800, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 11/17/25 10:52 AM, Luck, Tony wrote:
>>> On Mon, Nov 17, 2025 at 09:31:41AM -0800, Reinette Chatre wrote:
>>>> On 11/17/25 8:37 AM, Luck, Tony wrote:
>>>>> On Fri, Nov 14, 2025 at 03:26:42PM -0800, Reinette Chatre wrote:
>>>>>> On 11/14/25 1:55 PM, Luck, Tony wrote:
>>>>>> How a system with two guid of the same feature type would work is not clear to me though. Looks
>>>>>> like they cannot share events at all since an event is uniquely associated with a struct pmt_event
>>>>>> that can belong to only one event group. If they may share events then enable_events()->resctrl_enable_mon_event()
>>>>>> will complain loudly but still proceed and allow the event group to be enabled.
>>>>>
>>>>> I can't see a good reason why the same event would be enabled under
>>>>> different guids present on the same system. We can revisit my assumption
>>>>> if the "Duplicate enable for event" message shows up.
>>>>
>>>> This would be difficult to handle at that time, no? From what I can tell this would enable
>>>> an unusable event group to actually be enabled resulting in untested and invalid flows.
>>>> I think it will be safer to not enable an event group in this scenario and seems to math your
>>>> expectation that this would be unexpected. The "Duplicate enable for event" message will still
>>>> appear and we can still revisit those assumptions when they do, but the systems encountering
>>>> them will not be running with enabled event groups that are not actually fully enabled.
>>>
>>> There's a hardware cost to including an event in an aggregator.
>>> Inclusing the same event in mutliple aggregators described by
>>> different guids is really something that should never happen.
>>> Just printing a warning and skipping the event seems an adequate
>>> defense.
>>
>> My concern is that after skipping the event there is a deceiving message that the event group was
>> enabled successfully.
> 
> I can change resctrl_enable_mon_event() to return a "bool" to say
> whether each event was successfully enabled.
> 
> Then change to:
> 
> 	int skipped_events = 0;
> 
> 	for (int j = 0; j < e->num_events; j++) {
> 		if (!resctrl_enable_mon_event(e->evts[j].id, true,
> 					      e->evts[j].bin_bits, &e->evts[j]))
> 			skipped_events++;
> 	}
> 
> 	if (e->num_events == skipped_events) {
> 		pr_info("No events enabled in %s %s:0x%x\n", r->name, e->name, e->guid);
> 		return false;
> 	}
> 
> 	if (skipped_events)
> 		pr_info("%s %s:0x%x monitoring detected (skipped %d events)\n", r->name,
> 			r->name, e->name, e->guid, skipped_events);
> 	else
> 		pr_info("%s %s:0x%x monitoring detected\n", r->name, e->name, e->guid);

This looks good to me. Thank you. I am not able to tell from this snippet but since enabling of
an event group can fail at this point I think the r->mon.num_rmid initialization should
now be moved later so that a failing event group will not impact the number of RMIDs the
resource can use.

Reinette