[PATCH v11 17/31] x86/resctrl: Find and enable usable telemetry events

Tony Luck posted 31 patches 4 months, 2 weeks ago
There is a newer version of this series
[PATCH v11 17/31] x86/resctrl: Find and enable usable telemetry events
Posted by Tony Luck 4 months, 2 weeks ago
The INTEL_PMT_TELEMETRY driver provides telemetry region structures of the
types requested by resctrl.

Scan these structures to discover which pass sanity checks to derive
a list of valid regions:

1) They have guid known to resctrl.
2) They have a valid package ID.
3) The enumerated size of the MMIO region matches the expected
   value from the XML description file.
4) At least one region passes the above checks.

For each valid region enable all the events in the associated
event_group::evts[].

Pass a pointer to the pmt_event structure of the event within the struct
event_group that resctrl stores in mon_evt::arch_priv. resctrl passes
this pointer back when asking to read the event data which enables the
data to be found in MMIO.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/intel_aet.c | 38 +++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index f9b5f6cd08f8..98ba9ba05ee5 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -20,9 +20,11 @@
 #include <linux/intel_pmt_features.h>
 #include <linux/intel_vsec.h>
 #include <linux/overflow.h>
+#include <linux/printk.h>
 #include <linux/resctrl.h>
 #include <linux/resctrl_types.h>
 #include <linux/stddef.h>
+#include <linux/topology.h>
 #include <linux/types.h>
 
 #include "internal.h"
@@ -114,12 +116,44 @@ static struct event_group *known_perf_event_groups[] = {
 	for (_peg = (_grp); _peg < &_grp[ARRAY_SIZE(_grp)]; _peg++)	\
 		if ((*_peg)->pfg)
 
-/* Stub for now */
-static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
+static bool skip_telem_region(struct telemetry_region *tr, struct event_group *e)
 {
+	if (tr->guid != e->guid)
+		return true;
+	if (tr->plat_info.package_id >= topology_max_packages()) {
+		pr_warn("Bad package %u in guid 0x%x\n", tr->plat_info.package_id,
+			tr->guid);
+		return true;
+	}
+	if (tr->size != e->mmio_size) {
+		pr_warn("MMIO space wrong size (%zu bytes) for guid 0x%x. Expected %zu bytes.\n",
+			tr->size, e->guid, e->mmio_size);
+		return true;
+	}
+
 	return false;
 }
 
+static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
+{
+	bool usable_events = false;
+
+	for (int i = 0; i < p->count; i++) {
+		if (skip_telem_region(&p->regions[i], e))
+			continue;
+		usable_events = true;
+	}
+
+	if (!usable_events)
+		return false;
+
+	for (int j = 0; j < e->num_events; j++)
+		resctrl_enable_mon_event(e->evts[j].id, true,
+					 e->evts[j].bin_bits, &e->evts[j]);
+
+	return true;
+}
+
 DEFINE_FREE(intel_pmt_put_feature_group, struct pmt_feature_group *,
 		if (!IS_ERR_OR_NULL(_T))
 			intel_pmt_put_feature_group(_T))
-- 
2.51.0
Re: [PATCH v11 17/31] x86/resctrl: Find and enable usable telemetry events
Posted by Reinette Chatre 4 months ago
Hi Tony,

On 9/25/25 1:03 PM, Tony Luck wrote:
> The INTEL_PMT_TELEMETRY driver provides telemetry region structures of the
> types requested by resctrl.
> 
> Scan these structures to discover which pass sanity checks to derive
> a list of valid regions:

The "to derive a list of valid regions" does not align with the
"At least one region passes the above checks" requirement. If this is about
valid (usable?) regions then I think (4) should be dropped. If this is instead about
valid events then above should be reworded to say that instead.

> 
> 1) They have guid known to resctrl.
> 2) They have a valid package ID.
> 3) The enumerated size of the MMIO region matches the expected
>    value from the XML description file.
> 4) At least one region passes the above checks.
> 

Everything below is clear by looking at the patch. It can also be seen from patch
that enabling is done only once if there is *any* valid region instead of "for each
valid region". One thing that may be useful to add is "why" all events
can be enabled. If I understand correctly it can be something like:

	Enable events that usable telemetry regions are responsible for.

> For each valid region enable all the events in the associated
> event_group::evts[].
> 
> Pass a pointer to the pmt_event structure of the event within the struct
> event_group that resctrl stores in mon_evt::arch_priv. resctrl passes
> this pointer back when asking to read the event data which enables the
> data to be found in MMIO.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/resctrl/intel_aet.c | 38 +++++++++++++++++++++++--
>  1 file changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> index f9b5f6cd08f8..98ba9ba05ee5 100644
> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -20,9 +20,11 @@
>  #include <linux/intel_pmt_features.h>
>  #include <linux/intel_vsec.h>
>  #include <linux/overflow.h>
> +#include <linux/printk.h>
>  #include <linux/resctrl.h>
>  #include <linux/resctrl_types.h>
>  #include <linux/stddef.h>
> +#include <linux/topology.h>
>  #include <linux/types.h>
>  
>  #include "internal.h"
> @@ -114,12 +116,44 @@ static struct event_group *known_perf_event_groups[] = {
>  	for (_peg = (_grp); _peg < &_grp[ARRAY_SIZE(_grp)]; _peg++)	\
>  		if ((*_peg)->pfg)
>  
> -/* Stub for now */
> -static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
> +static bool skip_telem_region(struct telemetry_region *tr, struct event_group *e)
>  {
> +	if (tr->guid != e->guid)
> +		return true;
> +	if (tr->plat_info.package_id >= topology_max_packages()) {
> +		pr_warn("Bad package %u in guid 0x%x\n", tr->plat_info.package_id,
> +			tr->guid);
> +		return true;
> +	}

I have not encountered any mention of the possibility that packages may differ
in which telemetry region types they support. For example, could it be possible for package
A to have usable regions of the PERF type but package B doesn't? From what I can tell
INTEL_PMT_TELEMETRY supports layouts where this can be possible. If I understand correctly
this implementation will create event files for these domains but when the user attempts to
read the data it will fail. Can this work add some snippet about possibility of this
scenario and if/how it is supported?

> +	if (tr->size != e->mmio_size) {
> +		pr_warn("MMIO space wrong size (%zu bytes) for guid 0x%x. Expected %zu bytes.\n",
> +			tr->size, e->guid, e->mmio_size);
> +		return true;
> +	}
> +
>  	return false;
>  }
>  
> +static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
> +{
> +	bool usable_events = false;
> +
> +	for (int i = 0; i < p->count; i++) {
> +		if (skip_telem_region(&p->regions[i], e))
> +			continue;
> +		usable_events = true;

A previous concern [1] was why this loop does not break out at this point. I think it will 
help to make this clear if marking a telemetry region as unusable (mark_telem_region_unusable())
is done in this patch. Doing so makes the "usable" and "unusable" distinction in one
patch while making clear that the loop needs to complete.

> +	}
> +
> +	if (!usable_events)
> +		return false;
> +
> +	for (int j = 0; j < e->num_events; j++)
> +		resctrl_enable_mon_event(e->evts[j].id, true,
> +					 e->evts[j].bin_bits, &e->evts[j]);
> +
> +	return true;
> +}
> +
>  DEFINE_FREE(intel_pmt_put_feature_group, struct pmt_feature_group *,
>  		if (!IS_ERR_OR_NULL(_T))
>  			intel_pmt_put_feature_group(_T))

Reinette

[1] https://lore.kernel.org/lkml/9ac43e78-8955-db5d-61be-e08008e41f0d@linux.intel.com/
Re: [PATCH v11 17/31] x86/resctrl: Find and enable usable telemetry events
Posted by Luck, Tony 4 months ago
On Fri, Oct 03, 2025 at 04:52:01PM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 9/25/25 1:03 PM, Tony Luck wrote:
> > The INTEL_PMT_TELEMETRY driver provides telemetry region structures of the
> > types requested by resctrl.
> > 
> > Scan these structures to discover which pass sanity checks to derive
> > a list of valid regions:
> 
> The "to derive a list of valid regions" does not align with the
> "At least one region passes the above checks" requirement. If this is about
> valid (usable?) regions then I think (4) should be dropped. If this is instead about
> valid events then above should be reworded to say that instead.

Will drop "4".
> 
> > 
> > 1) They have guid known to resctrl.
> > 2) They have a valid package ID.
> > 3) The enumerated size of the MMIO region matches the expected
> >    value from the XML description file.
> > 4) At least one region passes the above checks.
> > 
> 
> Everything below is clear by looking at the patch. It can also be seen from patch
> that enabling is done only once if there is *any* valid region instead of "for each
> valid region". One thing that may be useful to add is "why" all events
> can be enabled. If I understand correctly it can be something like:
> 
> 	Enable events that usable telemetry regions are responsible for.

Looks better. I will use this.

> 
> > For each valid region enable all the events in the associated
> > event_group::evts[].
> > 
> > Pass a pointer to the pmt_event structure of the event within the struct
> > event_group that resctrl stores in mon_evt::arch_priv. resctrl passes
> > this pointer back when asking to read the event data which enables the
> > data to be found in MMIO.
> > 
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > ---
> >  arch/x86/kernel/cpu/resctrl/intel_aet.c | 38 +++++++++++++++++++++++--
> >  1 file changed, 36 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> > index f9b5f6cd08f8..98ba9ba05ee5 100644
> > --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> > +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> > @@ -20,9 +20,11 @@
> >  #include <linux/intel_pmt_features.h>
> >  #include <linux/intel_vsec.h>
> >  #include <linux/overflow.h>
> > +#include <linux/printk.h>
> >  #include <linux/resctrl.h>
> >  #include <linux/resctrl_types.h>
> >  #include <linux/stddef.h>
> > +#include <linux/topology.h>
> >  #include <linux/types.h>
> >  
> >  #include "internal.h"
> > @@ -114,12 +116,44 @@ static struct event_group *known_perf_event_groups[] = {
> >  	for (_peg = (_grp); _peg < &_grp[ARRAY_SIZE(_grp)]; _peg++)	\
> >  		if ((*_peg)->pfg)
> >  
> > -/* Stub for now */
> > -static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
> > +static bool skip_telem_region(struct telemetry_region *tr, struct event_group *e)
> >  {
> > +	if (tr->guid != e->guid)
> > +		return true;
> > +	if (tr->plat_info.package_id >= topology_max_packages()) {
> > +		pr_warn("Bad package %u in guid 0x%x\n", tr->plat_info.package_id,
> > +			tr->guid);
> > +		return true;
> > +	}
> 
> I have not encountered any mention of the possibility that packages may differ
> in which telemetry region types they support. For example, could it be possible for package
> A to have usable regions of the PERF type but package B doesn't? From what I can tell
> INTEL_PMT_TELEMETRY supports layouts where this can be possible. If I understand correctly
> this implementation will create event files for these domains but when the user attempts to
> read the data it will fail. Can this work add some snippet about possibility of this
> scenario and if/how it is supported?

Yes, this is architecturally possible. But I do not expect that systems will
be built that do this. You are right that such a system will create files that
always return "Unavailable" when read.

Is it sufficient to document this in the commit message?

I don't feel that it would be worthwhile to suppress creation of these files for
a "can't happen" situation. I'm not sure that doing so would be significantly
better from a user interface perspective. Users would get slightly more notice
(-ENOENT when trying to open the file). But the code would require
architecture calls from file system code to check which files need to be created
separately for each domain.

> 
> > +	if (tr->size != e->mmio_size) {
> > +		pr_warn("MMIO space wrong size (%zu bytes) for guid 0x%x. Expected %zu bytes.\n",
> > +			tr->size, e->guid, e->mmio_size);
> > +		return true;
> > +	}
> > +
> >  	return false;
> >  }
> >  
> > +static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
> > +{
> > +	bool usable_events = false;
> > +
> > +	for (int i = 0; i < p->count; i++) {
> > +		if (skip_telem_region(&p->regions[i], e))
> > +			continue;
> > +		usable_events = true;
> 
> A previous concern [1] was why this loop does not break out at this point. I think it will 
> help to make this clear if marking a telemetry region as unusable (mark_telem_region_unusable())
> is done in this patch. Doing so makes the "usable" and "unusable" distinction in one
> patch while making clear that the loop needs to complete.

Ok. I'll pull mark_telem_region_unusable() into this patch.

> > +	}
> > +
> > +	if (!usable_events)
> > +		return false;
> > +
> > +	for (int j = 0; j < e->num_events; j++)
> > +		resctrl_enable_mon_event(e->evts[j].id, true,
> > +					 e->evts[j].bin_bits, &e->evts[j]);
> > +
> > +	return true;
> > +}
> > +
> >  DEFINE_FREE(intel_pmt_put_feature_group, struct pmt_feature_group *,
> >  		if (!IS_ERR_OR_NULL(_T))
> >  			intel_pmt_put_feature_group(_T))
> 
> Reinette
> 
> [1] https://lore.kernel.org/lkml/9ac43e78-8955-db5d-61be-e08008e41f0d@linux.intel.com/

-Tony
Re: [PATCH v11 17/31] x86/resctrl: Find and enable usable telemetry events
Posted by Reinette Chatre 4 months ago
Hi Tony,

On 10/6/25 12:58 PM, Luck, Tony wrote:
> On Fri, Oct 03, 2025 at 04:52:01PM -0700, Reinette Chatre wrote:
>> On 9/25/25 1:03 PM, Tony Luck wrote:
>>> @@ -114,12 +116,44 @@ static struct event_group *known_perf_event_groups[] = {
>>>  	for (_peg = (_grp); _peg < &_grp[ARRAY_SIZE(_grp)]; _peg++)	\
>>>  		if ((*_peg)->pfg)
>>>  
>>> -/* Stub for now */
>>> -static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
>>> +static bool skip_telem_region(struct telemetry_region *tr, struct event_group *e)
>>>  {
>>> +	if (tr->guid != e->guid)
>>> +		return true;
>>> +	if (tr->plat_info.package_id >= topology_max_packages()) {
>>> +		pr_warn("Bad package %u in guid 0x%x\n", tr->plat_info.package_id,
>>> +			tr->guid);
>>> +		return true;
>>> +	}
>>
>> I have not encountered any mention of the possibility that packages may differ
>> in which telemetry region types they support. For example, could it be possible for package
>> A to have usable regions of the PERF type but package B doesn't? From what I can tell
>> INTEL_PMT_TELEMETRY supports layouts where this can be possible. If I understand correctly
>> this implementation will create event files for these domains but when the user attempts to
>> read the data it will fail. Can this work add some snippet about possibility of this
>> scenario and if/how it is supported?
> 
> Yes, this is architecturally possible. But I do not expect that systems will
> be built that do this. You are right that such a system will create files that
> always return "Unavailable" when read.
> 
> Is it sufficient to document this in the commit message?
> 
> I don't feel that it would be worthwhile to suppress creation of these files for
> a "can't happen" situation. I'm not sure that doing so would be significantly
> better from a user interface perspective. Users would get slightly more notice
> (-ENOENT when trying to open the file). But the code would require
> architecture calls from file system code to check which files need to be created
> separately for each domain.

I think it is sufficient to document this in the commit message to help create
confidence in robustness in support of different scenarios. I have not encountered such
a system but could this scenario be similar to one where a two socket system supports MBM
but only one socket has memory populated? I do not know what reading the counter MSR will
return in this case though.

> 
>>
>>> +	if (tr->size != e->mmio_size) {
>>> +		pr_warn("MMIO space wrong size (%zu bytes) for guid 0x%x. Expected %zu bytes.\n",
>>> +			tr->size, e->guid, e->mmio_size);
>>> +		return true;
>>> +	}
>>> +
>>>  	return false;
>>>  }
>>>  
>>> +static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
>>> +{
>>> +	bool usable_events = false;
>>> +
>>> +	for (int i = 0; i < p->count; i++) {
>>> +		if (skip_telem_region(&p->regions[i], e))
>>> +			continue;
>>> +		usable_events = true;
>>
>> A previous concern [1] was why this loop does not break out at this point. I think it will 
>> help to make this clear if marking a telemetry region as unusable (mark_telem_region_unusable())
>> is done in this patch. Doing so makes the "usable" and "unusable" distinction in one
>> patch while making clear that the loop needs to complete.
> 
> Ok. I'll pull mark_telem_region_unusable() into this patch.

Thank you.


Reinette
RE: [PATCH v11 17/31] x86/resctrl: Find and enable usable telemetry events
Posted by Luck, Tony 4 months ago
> > I don't feel that it would be worthwhile to suppress creation of these files for
> > a "can't happen" situation. I'm not sure that doing so would be significantly
> > better from a user interface perspective. Users would get slightly more notice
> > (-ENOENT when trying to open the file). But the code would require
> > architecture calls from file system code to check which files need to be created
> > separately for each domain.
>
> I think it is sufficient to document this in the commit message to help create
> confidence in robustness in support of different scenarios. I have not encountered such
> a system but could this scenario be similar to one where a two socket system supports MBM
> but only one socket has memory populated? I do not know what reading the counter MSR will
> return in this case though.

The counting h/w is likely unaware of whether DIMMs slots are populated or not. My guess
would be that in this case the counters would read as zero forever, rather than "unavailable".

CXL supports hot-add memory. So whether a node has memory could change at runtime
on a system with CXL support.

-Tony