[PATCH v8 00/32] x86,fs/resctrl telemetry monitoring

Tony Luck posted 32 patches 1 month, 3 weeks ago
There is a newer version of this series
.../admin-guide/kernel-parameters.txt         |   2 +-
Documentation/filesystems/resctrl.rst         |  85 +++-
include/linux/resctrl.h                       |  89 +++-
include/linux/resctrl_types.h                 |  26 +-
arch/x86/include/asm/resctrl.h                |  16 -
arch/x86/kernel/cpu/resctrl/internal.h        |  56 ++-
fs/resctrl/internal.h                         |  68 ++-
arch/x86/kernel/cpu/resctrl/core.c            | 310 ++++++++----
arch/x86/kernel/cpu/resctrl/intel_aet.c       | 448 ++++++++++++++++++
arch/x86/kernel/cpu/resctrl/monitor.c         |  76 +--
fs/resctrl/ctrlmondata.c                      | 123 ++++-
fs/resctrl/monitor.c                          | 310 ++++++------
fs/resctrl/rdtgroup.c                         | 272 +++++++----
arch/x86/Kconfig                              |  13 +
arch/x86/kernel/cpu/resctrl/Makefile          |   1 +
15 files changed, 1442 insertions(+), 453 deletions(-)
create mode 100644 arch/x86/kernel/cpu/resctrl/intel_aet.c
[PATCH v8 00/32] x86,fs/resctrl telemetry monitoring
Posted by Tony Luck 1 month, 3 weeks ago
The prerequisite patch series to the Intel Telemetry code is
now in upstream v6.17-rc1. So that's the basis for this series
with no extra dependencies.

This patch series also available from the rdt-aet-v8 of:
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git

Changes since v7 was posted here:
Link: https://lore.kernel.org/all/20250711235341.113933-1-tony.luck@intel.com/

--- cover ---
Removed references to dependencies, there are none.
Updated change log.

--- 1 ---
Added Reinette RB tag

--- 2 ---
No change

--- 3 ---
No change

--- 4 ---
No change

--- 5 ---
Improved problem/solution descriptions in commit comment.

--- 6 ---
Added "break" to default case in domain_add_cpu_mon()

Improved commit problem description. Dropped "function"
when referring to name with "()".

--- 7 ---
Improve context statement in commit comment.
Add a problem statement.
Apply suggestions to fix portion of commit comment.

--- 8 ---
In domain_remove_cpu_ctrl() s/&d->hdr.list/&hdr->list/

--- 9 ---
Avoid double WARN_ON_ONCE() in __mon_event_count() and mbm_bw_count().
Move domain_header_is_valid() in mkdir_mondata_subdir() to where
it is needed before container_of().

Fix commit comment to use present tense for background information
and imperative for the changes. Separate problem statement into
its own paragraph.

--- 10 ---
Fix commit comment to use present tense for background information.
Separate problem statement into own paragraph.

--- 11 ---
Move these renames here from patch 26:
resctrl_mon_resource_init -> resctrl_mon_l3_resource_init()
resctrl_mon_resource_exit() -> resctrl_mon_l3_resource_exit()

--- 12 ---
Clean up commit comment with problem statement in its own
paragraph, and with better description.
Keep original kerneldoc for rmid_read::evt

--- 13 ---
Changed comment for cpu_on_correct_domain() as suggested.
Move WARN_ON_ONCE() check for L3 resource from __mon_event_count
to cpu_on_correct_domain().
Fix commit comments as suggested.

--- 14 ---
Break long line in declaration of resctrl_enable_mon_event()
Reformat comments in print_event_value() use more horizontal space.
s/binary_bit/binary_bits/. s/chosemn/chosen/
Commit comment cleanups to use present tense for background and
imperative for solution.

--- 15 ---
Enumeration of telemetry features is performed by sub-drivers
of the OOBMSM driver. These are loaded during device_init() time,
but the probe functions are triggered by deferred_probe_work_func()
which can happen after resctrl_init().
Tracing this procedure shows that it completes several seconds before
the earliest possible mount of the resctrl file system.

Updated Kconfig dependencies to require INTEL_TPMI=y so that all
drivers needed for enumeration are built in.

Fix commit to use imperative tone.

--- 16 ---
Drop "core" from Subject line.
s/these events/telemetry events/
Merge two solution paragraphs into one.

--- 17 ---
s/CONFIG_X86_RESCTRL_CPU_INTEL_AET/CONFIG_X86_CPU_RESCTRL_INTEL_AET/
Make it depend on INTEL_PMT_TELEMETRY instead of INTEL_PMT_DISCOVERY.
Defer addition of active_event_groups list until patch 19.
Drop NUM_KNOWN_* macros and open code ARRAY_SIZE() at point of use.
s/Try to use any with a known matching guid/
Try to use every telemetry aggregator with a known guid/
Change type of "ret" in get_pmt_feature() from bool to int.
Rewrote commit changelog to provide more details on how telemetry
aggregators are connected to MMIO regions and to event groups.

--- 18 ---
Move the "Sanity check" description in the commit message to before the
"Scan the array" and rewrite to describe which regions are usable by
Linux. s/package_id/package id/
Pull extra blank line in discover_events() into this patch from patch 19.

--- 19 ---
s/addresses of the telemetry regions used by each aggregator/
addresses of the telemetry regions/
Drop unneeded call to free_pkg_mmio_info((*peg)->pkginfo);
s/for each aggregator/for each region/
s/struct mmio_info/struct pkg_mmio_info/
Change ascii art to note that we can support "M" aggregators in
one package and "N" in another (which the code can support, but
likely will not happen in practice).
Implement active_event_groups list (previously scattered between
patches 16-20)


--- 20 ---
Moved stray list_add(&e->list, &active_event_groups); into patch 19
where it belongs.
Fixed kerneldoc comment for pmt_event::bin_bits to say this is number
of bits in the fraction part of fixed-point.
Add closing quote around "energy" in commit comment.

--- 21 ---
s/passed/passes/ s/rmid/RMID/
Point out the explicit problem being solved with example of telemetry
counters needing the offset in MMIO space.

--- 22 ---
Drop local "eventid" in discover_events(). Use e->evts[i].id directly.
Change intel_aet_read_event() to return success if ANY aggregator has
valid data instead on only when ALL are valid. This matches existing
resctrl behaviour for providing a summed value for a CTRL_MON group
together with all child MON groups.
Restructure commit message to include details of how events are enabled
with an arch_priv pointer to the pmt_event.

--- 23 ---
Move setup_intel_aet_mon_domain() from arch/x86/kernel/cpu/resctrl/core.c
to arch/x86/kernel/cpu/resctrl/intel_aet.c. Rename to
intel_aet_setup_mon_domain() to match pattern of other functions. Add
a stub to arch/x86/kernel/cpu/resctrl/internal.h for when
CONFIG_X86_CPU_RESCTRL_INTEL_AET is not set.
s/Intel-PMT-scoped/package scoped/

--- 24 ---
"that support is present" -> "that hardware support is present"
s/option/feature/

--- 25 ---
In check_rmid_count() change test to "if (tr->num_rmids < e->num_rmids)"
(i.e. if the number of hardware counters in the aggregator is fewer
than the number of counter locations in the MMIO region).
Eliminate the rdt_num_system_rmids variable.
"features" -> "feature's" (in comment for rdt_set_feature_disabled())
"h/w" -> "hardware" (three places in commit comment).

--- 26 ---
Function renames pulled into patch 11.
Drop the helper functions, just move the code to alloc/free
closid_num_dirty_rmid[] inline intoresctrl_mon_l3_resource_init()
and resctrl_mon_l3_resource_exit().
Fix leak of closid_num_dirty_rmid[] in resctrl_mon_l3_resource_init()
when dom_data_init() fails.
Note in commit comment that this change is prep for moving
the allocation rmid_ptrs[] to mount time.

--- 27 ---
Split into two parts:
0027:
	Update resctrl_arch_system_num_rmid_idx() to compute minimum
	RMID across resources.
0028: 
	Better function names.
	Rename: dom_data_init() -> setup_rmid_lru_list()
	Rename: dom_data_exit() -> free_rmid_lru_list()
	Move calls out of resctrl_mon_l3_resource_{init,exit}() and into
	rdt_get_tree(), resctrl_exit()
	Fix comment in setup_rmid_lru_list() about setup of rdtgroup_default.

--- 28 ---
Add check that RDT_RESOURCE_PERF_PKG resource is mon_capable. Also
set global rdt_mon_capable (to cover the case where there are no
enabled L3 monitor events).

--- 29 ---
Move declaration of debugfs_resctrl_info next to debugfs_resctrl.
s/debugs/debugfs/
Separate problem from context in commit comment.

--- 30 ---
Unchanged

--- 31 ---
Unchanged

Background
----------
On Intel systems that support per-RMID telemetry monitoring each logical
processor keeps a local count for various events. When the
IA32_PQR_ASSOC.RMID value for the logical processor changes (or when a
two millisecond counter expires) these event counts are transmitted to
an event aggregator on the same package as the processor together with
the current RMID value. The event counters are reset to zero to begin
counting again.

Each aggregator takes the incoming event counts and adds them to
cumulative counts for each event for each RMID. Note that there can be
multiple aggregators on each package with no architectural association
between logical processors and an aggregator.

All of these aggregated counters can be read by an operating system from
the MMIO space of the Out Of Band Management Service Module (OOBMSM)
device(s) on a system. Any counter can be read from any logical processor.

Intel publishes details for each processor generation showing which
events are counted by each logical processor and the offsets for each
accumulated counter value within the MMIO space in XML files here:
https://github.com/intel/Intel-PMT.

For example there are two energy related telemetry events for the
Clearwater Forest family of processors and the MMIO space looks like this:

Offset  RMID    Event
------  ----    -----
0x0000  0       core_energy
0x0008  0       activity
0x0010  1       core_energy
0x0018  1       activity
...
0x23F0  575     core_energy
0x23F8  575     activity

In addition the XML file provides the units (Joules for core_energy,
Farads for activity) and the type of data (fixed-point binary with
bit 63 used to indicate the data is valid, and the low 18 bits as a
binary fraction).

Finally, each XML file provides a 32-bit unique id (or guid) that is
used as an index to find the correct XML description file for each
telemetry implementation.

The INTEL_PMT_DISCOVERY driver provides intel_pmt_get_regions_by_feature()
to enumerate the aggregator instances (also referred to as "telemetry
regions" in this series) on a platform. It provides:

1) guid  - so resctrl can determine which events are supported
2) MMIO base address of counters
3) package id

Resctrl accumulates counts from all aggregators on a package in order
to provide a consistent user interface across processor generations.

Directory structure for the telemetry events looks like this:

$ tree /sys/fs/resctrl/mon_data/
/sys/fs/resctrl/mon_data/
mon_data
├── mon_PERF_PKG_00
│   ├── activity
│   └── core_energy
└── mon_PERF_PKG_01
    ├── activity
    └── core_energy

Reading the "core_energy" file from some resctrl mon_data directory shows
the cumulative energy (in Joules) used by all tasks that ran with the RMID
associated with that directory on a given package. Note that "core_energy"
reports only energy consumed by CPU cores (data processing units,
L1/L2 caches, etc.). It does not include energy used in the "uncore"
(L3 cache, on package devices, etc.), or used by memory or I/O devices.


Signed-off-by: Tony Luck <tony.luck@intel.com>


Tony Luck (32):
  x86,fs/resctrl: Consolidate monitor event descriptions
  x86,fs/resctrl: Replace architecture event enabled checks
  x86/resctrl: Remove 'rdt_mon_features' global variable
  x86,fs/resctrl: Prepare for more monitor events
  x86,fs/resctrl: Improve domain type checking
  x86/resctrl: Move L3 initialization into new helper function
  x86,fs/resctrl: Refactor domain_remove_cpu_mon() ready for new domain
    types
  x86/resctrl: Clean up domain_remove_cpu_ctrl()
  x86,fs/resctrl: Use struct rdt_domain_hdr instead of struct
    rdt_mon_domain
  x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain
  x86,fs/resctrl: Rename some L3 specific functions
  fs/resctrl: Make event details accessible to functions when reading
    events
  x86,fs/resctrl: Handle events that can be read from any CPU
  x86,fs/resctrl: Support binary fixed point event counters
  x86,fs/resctrl: Add an architectural hook called for each mount
  x86,fs/resctrl: Add and initialize rdt_resource for package scope
    monitor
  x86/resctrl: Discover hardware telemetry events
  x86/resctrl: Count valid telemetry aggregators per package
  x86/resctrl: Complete telemetry event enumeration
  x86,fs/resctrl: Fill in details of events for guid 0x26696143 and
    0x26557651
  x86,fs/resctrl: Add architectural event pointer
  x86/resctrl: Read telemetry events
  x86/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG
  x86/resctrl: Add energy/perf choices to rdt boot option
  x86/resctrl: Handle number of RMIDs supported by telemetry resources
  fs/resctrl: Move allocation/free of closid_num_dirty_rmid
  fs,x86/resctrl: Compute number of RMIDs as minimum across resources
  fs/resctrl: Move RMID initialization to first mount
  x86/resctrl: Enable RDT_RESOURCE_PERF_PKG
  fs/resctrl: Provide interface to create architecture specific debugfs
    area
  x86/resctrl: Add debugfs files to show telemetry aggregator status
  x86,fs/resctrl: Update Documentation for package events

 .../admin-guide/kernel-parameters.txt         |   2 +-
 Documentation/filesystems/resctrl.rst         |  85 +++-
 include/linux/resctrl.h                       |  89 +++-
 include/linux/resctrl_types.h                 |  26 +-
 arch/x86/include/asm/resctrl.h                |  16 -
 arch/x86/kernel/cpu/resctrl/internal.h        |  56 ++-
 fs/resctrl/internal.h                         |  68 ++-
 arch/x86/kernel/cpu/resctrl/core.c            | 310 ++++++++----
 arch/x86/kernel/cpu/resctrl/intel_aet.c       | 448 ++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/monitor.c         |  76 +--
 fs/resctrl/ctrlmondata.c                      | 123 ++++-
 fs/resctrl/monitor.c                          | 310 ++++++------
 fs/resctrl/rdtgroup.c                         | 272 +++++++----
 arch/x86/Kconfig                              |  13 +
 arch/x86/kernel/cpu/resctrl/Makefile          |   1 +
 15 files changed, 1442 insertions(+), 453 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/resctrl/intel_aet.c


base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
-- 
2.50.1

Re: [PATCH v8 00/32] x86,fs/resctrl telemetry monitoring
Posted by Reinette Chatre 1 month, 2 weeks ago
Hi Tony,

On 8/11/25 11:16 AM, Tony Luck wrote:
> Background
> ----------
> On Intel systems that support per-RMID telemetry monitoring each logical
> processor keeps a local count for various events. When the
> IA32_PQR_ASSOC.RMID value for the logical processor changes (or when a
> two millisecond counter expires) these event counts are transmitted to
> an event aggregator on the same package as the processor together with
> the current RMID value. The event counters are reset to zero to begin
> counting again.
> 
> Each aggregator takes the incoming event counts and adds them to
> cumulative counts for each event for each RMID. Note that there can be
> multiple aggregators on each package with no architectural association
> between logical processors and an aggregator.
> 
> All of these aggregated counters can be read by an operating system from
> the MMIO space of the Out Of Band Management Service Module (OOBMSM)
> device(s) on a system. Any counter can be read from any logical processor.
> 
> Intel publishes details for each processor generation showing which
> events are counted by each logical processor and the offsets for each
> accumulated counter value within the MMIO space in XML files here:
> https://github.com/intel/Intel-PMT.
> 
> For example there are two energy related telemetry events for the
> Clearwater Forest family of processors and the MMIO space looks like this:
> 
> Offset  RMID    Event
> ------  ----    -----
> 0x0000  0       core_energy
> 0x0008  0       activity
> 0x0010  1       core_energy
> 0x0018  1       activity
> ...
> 0x23F0  575     core_energy
> 0x23F8  575     activity
> 
> In addition the XML file provides the units (Joules for core_energy,
> Farads for activity) and the type of data (fixed-point binary with
> bit 63 used to indicate the data is valid, and the low 18 bits as a
> binary fraction).
> 
> Finally, each XML file provides a 32-bit unique id (or guid) that is
> used as an index to find the correct XML description file for each
> telemetry implementation.
> 
> The INTEL_PMT_DISCOVERY driver provides intel_pmt_get_regions_by_feature()
> to enumerate the aggregator instances (also referred to as "telemetry
> regions" in this series) on a platform. It provides:
> 
> 1) guid  - so resctrl can determine which events are supported
> 2) MMIO base address of counters
> 3) package id
> 
> Resctrl accumulates counts from all aggregators on a package in order
> to provide a consistent user interface across processor generations.
> 
> Directory structure for the telemetry events looks like this:
> 
> $ tree /sys/fs/resctrl/mon_data/
> /sys/fs/resctrl/mon_data/
> mon_data
> ├── mon_PERF_PKG_00
> │   ├── activity
> │   └── core_energy
> └── mon_PERF_PKG_01
>     ├── activity
>     └── core_energy
> 
> Reading the "core_energy" file from some resctrl mon_data directory shows
> the cumulative energy (in Joules) used by all tasks that ran with the RMID
> associated with that directory on a given package. Note that "core_energy"
> reports only energy consumed by CPU cores (data processing units,
> L1/L2 caches, etc.). It does not include energy used in the "uncore"
> (L3 cache, on package devices, etc.), or used by memory or I/O devices.
> 
> 

I think this series is close to being ready to pass to the x86 maintainers.
To that end I focused a lot on the changelogs with the goal to meet the
tip requirements that mostly involved switching to imperative tone. I do not
expect that I found all the cases though (and I may also have made some mistakes
in my suggested text!) so please ensure the changelogs are in imperative tone
and uses consistent terms throughout the series.

If you disagree with any feedback or if any of the feedback is unclear please
let us discuss before you spin a new version. Of course it is not required
that you follow all feedback but if you don't I would like to learn why.

Please note that I will be offline next week.

Reinette

RE: [PATCH v8 00/32] x86,fs/resctrl telemetry monitoring
Posted by Luck, Tony 1 month, 2 weeks ago
> I think this series is close to being ready to pass to the x86 maintainers.
> To that end I focused a lot on the changelogs with the goal to meet the
> tip requirements that mostly involved switching to imperative tone. I do not
> expect that I found all the cases though (and I may also have made some mistakes
> in my suggested text!) so please ensure the changelogs are in imperative tone
> and uses consistent terms throughout the series.
>
> If you disagree with any feedback or if any of the feedback is unclear please
> let us discuss before you spin a new version. Of course it is not required
> that you follow all feedback but if you don't I would like to learn why.
>
> Please note that I will be offline next week.

Reinette,

I took one fast pass through all your comments. I think they all
look good (and I believe I understand each one). Thanks for all
the suggestions.

I'll try to keep (better) notes on what I change as I work through
each patch so I'll have a summary of any areas that I'm unsure
about for you to read when you get back before I post v9.

Enjoy your time away.

-Tony
Re: [PATCH v8 00/32] x86,fs/resctrl telemetry monitoring
Posted by Luck, Tony 1 month, 1 week ago
On Fri, Aug 15, 2025 at 03:47:17PM +0000, Luck, Tony wrote:
> > I think this series is close to being ready to pass to the x86 maintainers.
> > To that end I focused a lot on the changelogs with the goal to meet the
> > tip requirements that mostly involved switching to imperative tone. I do not
> > expect that I found all the cases though (and I may also have made some mistakes
> > in my suggested text!) so please ensure the changelogs are in imperative tone
> > and uses consistent terms throughout the series.
> >
> > If you disagree with any feedback or if any of the feedback is unclear please
> > let us discuss before you spin a new version. Of course it is not required
> > that you follow all feedback but if you don't I would like to learn why.
> >
> > Please note that I will be offline next week.
> 
> Reinette,
> 
> I took one fast pass through all your comments. I think they all
> look good (and I believe I understand each one). Thanks for all
> the suggestions.
> 
> I'll try to keep (better) notes on what I change as I work through
> each patch so I'll have a summary of any areas that I'm unsure
> about for you to read when you get back before I post v9.
> 
> Enjoy your time away.

Reinette,

I've completed a longer, slower, pass through making changes to prepare
for v9.  Summary of changes per patch below. I didn't disagree with any
of your suggestions.

The bulk of the changes are small, and localized to each patch. The
exception being removal of pkg_mmio_info[] which dropped patch 18 (which
just counted regions prior to allocation of pkg_mmio_info[]) and patch
19 (which finished filling out the details.

discover_events() is now named enable_events(), since there are almost
no "steps" involved, it doesn't have a header comment. The name now
describes what it does.

Theoretically skip_this_region() might find some regions unusable, while
others in the same pmt_feature_group are OK. To handle this I zero the
telemetry_region::addr so that intel_aet_read_event() can easily skip
"bad" regions.

I've pushed a rdt-aet-v9-wip branch to:
Link: https://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git
if you want to take a quick look at the end result before
I patch bomb the list.

-Tony

--- cover ---
INTEL_PMT_DISCOVERY -> INTEL_PMT_TELEMETRY

--- 1 ---
No change

--- 2 ---
No change

--- 3 ---
No change

--- 4 ---
No change

--- 5 ---
"The "type" field provides" -> "rdt_domain_hdr::type provides"
Update final paragraph of commit message as suggested.

--- 6 ---
Code: in domain_add_cpu_mon() change WARN_ON_ONCE(1); in default case to
pr_warn_once("Unknown resource rid=%d\n", r->rid);


--- 7 ---
Code: in domain_remove_cpu_mon() move domain_header_is_valid()to
immediately before container_of() and switch check from r->rid to
hard-coded RDT_RESOURCE_L3.
s/list_del_rcu(&d->hdr.list);/list_del_rcu(&hdr->list);/

"separated" -> "separate"
"can skip" -> "skips"

--- 8 ---
Code: in domain_remove_cpu_ctrl()  move domain_header_is_valid()
to immediately before container_of().
s/list_del_rcu(&d->hdr.list);/list_del_rcu(&hdr->list);/
(moved here from patch 9)

"refactor" -> "refactor domain_remove_cpu_ctrl()"


--- 9 ---
Change second paragraph of commit comment as suggested.

"so the rmid_read::d field is replaced" -> "so replace the L3 specific domain
pointer rmid_read::d with rmid_read::hdr that points to the generic domain
header"

Use imperative tone for last paragraph "Update kerneldoc for mon_data::sum ..."

--- 10 ---
Change rdt_mon_domain to rdt_l3_mon_domain in comment above logical_rmid_to_physical_rmid()

Replace entire commit comment with improved version.

--- 11 ---
"different resources" -> "a different resource"
"these functions" -> "the L3 resource specific functions"
"Two groups of functions renamed here:" -> "Rename three groups of functions:"
Added rdt_get_mon_l3_config() to list of renamed functions to put the "l3"
before the "mon" for consistency.

When changing names for resctrl_mon_resource_{init,exit} add the "l3_" before
"mon" for consistency with other *l3_mon*" naming.

--- 12 ---
"to lower levels" -> "via the mon_data and rmid_read structures to
the functions finally reading the monitoring data."

Replace 3rd paragraph of commit comment with supplied better version.

--- 13 ---
Code: In cpu_on_correct_domain() s/return -EINVAL;/return false;/

--- 14 ---
Code: Move definition of MAX_BINARY_BITS from <linux/resctrl.h> to
fs/resctrl/internal.h.

Be explicit in commit comment that the file system makes the determination
on which events can be displayed in floating point format.

--- 15 ---
"asyncronnous" -> "asynchronous"
"for these drivers" -> "of these drivers"
"are called" -> "completes"
"But expectations" -> "Expectations"
"The call is made with no locks held." -> "resctrl filesystem calls the
hook with no locks held."

--- 16 ---
s/CPU hotplug/CPU hot plug/
Add Reinette's RB tag.

--- 17 ---
Code:
"OOBMSM" -> "INTEL_PMT_TELEMETRY"
"INTEL_PMT_DISCOVERY" -> "INTEL_PMT_TELEMETRY"
re-wrap comment for get_pmt_feature() to use 80 columns.
"OOBMSM discovery" -> "INTEL_PMT_TELEMETRY"
Add the intel_pmt_put_feature_group() calls to intel_aet_exit()
to match the intel_pmt_get_regions_by_feature() calls in get_pmt_feature()
using a new macro for_each_enabled_event_group().
Rename discover_events() to enable_events()

Add period at end of help text in arch/x86/Kconfig.

"Data for telemetry events is collected by each CPU and sent" ->
"Each CPU collects data for telemetry events that it sends"
"is changed" -> "changes"
"or when two milliseconds have elapsed" -> "or when a two millisecond timer expires"
"mad" -> "made"
"Enumeration of support for telemetry events is done" ->
"The INTEL_PMT_TELEMETRY driver enumerates support for telemetry events."
Drop references to INTEL_PMT_DISCOVERY driver.

Merge last two paragraphs of commit message.

Reformat commit to use more of page width.

Add maintainer note about checkpatch complaints for DEFINE_FREE()

--- 18 ---
Dropped. See new patch 0019

--- 19 ---
Dropped. See new patch 0019

--- 20 ---
Now 0018
Rewrite opening paragraph to avoid "continuation of the subject"

Add note/link on the source of the XML files that describe events.

--- 21 ---
Now 0019
Drop "vague first sentence" of second paragraph.

--- NEW 0020 ---
Replaces most of parts 18/32 and 19/32.
Contains skip_this_region() from patch 18/32, but skips all the
code to count regions and allocat pkg_mmio_info[].

Also includes event enabling code from 22/32

--- 22 ---
Now 0021
Modify intel_aet_read_event() to dig into pmt_event::pfg to
find the MMIO base addresses that v8 patch stored in the pkg_mmio_info[]
structures.

--- 23 ---
Now 0022
Add domain_header_is_valid() check in domain_remove_cpu_mon()
before using container_of().

"There are structures" -> "There are per domain structures ..."

Replace my commit fix description with better version from Reinette.

--- 24 ---
Unparseable last sentence of commit message replaced with details
and examples.

--- 25 ---
RMIDS -> RMIDs
"Adjusted downwards ..." -> "May be adjusted downwards ..."
check_rmid_count() -> all_regions_have_sufficient_rmid()
"Potentially disable" -> "Disable"
Add comment:
/* e->num_rmids only adjusted lower if user forces an unusable region to be usable */

--- 26 ---
Add "during resctrl initialization" and "during resctrl exit" to commit
background statement.

Add "closid_num_dirty_rmid[]" to be specific about what is being allocated/freed.

--- 27 ---
"mon capable" -> "mon_capable" (three times)
"alloc capable" -> "alloc_capable"
"rdt_l3_mon_domain::states[]" -> "rdt_l3_mon_domain::mbm_states[] and
+rdt_l3_mon_domain::rmid_busy_llc"

"the number of RMIDs" -> "the system's number of RMIDs"

--- 28 ---
Add comment to setup_rmid_lru_list() to note that rmid_ptrs[]
is allocated of first mount and is reused on subsequent mounts.
It is freed in resctrl_exit().

Lock/unlock rdtgroup_mutex around body of free_rmid_lru_list()

Rewrite commit based on suggestions with some modifications to explain
why error paths in rdt_get_tree() do not call free_rmid_lru_list().

--- 29 ---
TODO: recheck for use of "CPU hot plug notifiers" may have been called "hooks", "callbacks", and
"handlers" through this series.

--- 30 ---
"a resource" -> "a monitoring resource"

--- 31 ---
"last_update_timestamp" -> "agg_last_update_timestamp" in commit comment

Move creation of debugfs files to the end of enable_events().

Rewrite to work based on event_group::pfg since event_group::pkginfo[] is gone.


--- 32 ---
Describe "num_rmids" file values independently for L3 and telemetry.
Move the note about upper bound on directory creation to its own
paragraph to say it is the smaller of reported "num_rmids" values.

"or of a processor package" -> "another for each processor package"

Change paragraph about contents of subdirectories of mon_data to
gize example file names instead of hard coding specifics.

prescence -> presence
will vary -> may vary
last_update_timestamp -> agg_last_update_timestamp

Simplify commit comment as suggested.
Re: [PATCH v8 00/32] x86,fs/resctrl telemetry monitoring
Posted by Reinette Chatre 1 month, 1 week ago
Hi Tony,

On 8/25/25 3:20 PM, Luck, Tony wrote:
> On Fri, Aug 15, 2025 at 03:47:17PM +0000, Luck, Tony wrote:
>>> I think this series is close to being ready to pass to the x86 maintainers.
>>> To that end I focused a lot on the changelogs with the goal to meet the
>>> tip requirements that mostly involved switching to imperative tone. I do not
>>> expect that I found all the cases though (and I may also have made some mistakes
>>> in my suggested text!) so please ensure the changelogs are in imperative tone
>>> and uses consistent terms throughout the series.
>>>
>>> If you disagree with any feedback or if any of the feedback is unclear please
>>> let us discuss before you spin a new version. Of course it is not required
>>> that you follow all feedback but if you don't I would like to learn why.
>>>
>>> Please note that I will be offline next week.
>>
>> Reinette,
>>
>> I took one fast pass through all your comments. I think they all
>> look good (and I believe I understand each one). Thanks for all
>> the suggestions.
>>
>> I'll try to keep (better) notes on what I change as I work through
>> each patch so I'll have a summary of any areas that I'm unsure
>> about for you to read when you get back before I post v9.
>>
>> Enjoy your time away.
> 
> Reinette,
> 
> I've completed a longer, slower, pass through making changes to prepare
> for v9.  Summary of changes per patch below. I didn't disagree with any
> of your suggestions.

For me the item that I expected may need discussion is the use of
active_event_groups that no longer exists in v9.

> 
> The bulk of the changes are small, and localized to each patch. The
> exception being removal of pkg_mmio_info[] which dropped patch 18 (which
> just counted regions prior to allocation of pkg_mmio_info[]) and patch
> 19 (which finished filling out the details.
> 
> discover_events() is now named enable_events(), since there are almost
> no "steps" involved, it doesn't have a header comment. The name now
> describes what it does.
> 
> Theoretically skip_this_region() might find some regions unusable, while
> others in the same pmt_feature_group are OK. To handle this I zero the
> telemetry_region::addr so that intel_aet_read_event() can easily skip
> "bad" regions.

This is a significant change that simplifies the implementation a lot. 
Even so, it is concerning that resctrl takes liberty to reach in and modify
INTEL_PMT_TELEMETRY's data structure for its convenience though. Could the
changelog motivate why it is ok and safe to do so? Should something like
this not rather be done with a helper exposed by subsystem (INTEL_PMT_TELEMETRY)
to be able to track such changes?

Reinette
Re: [PATCH v8 00/32] x86,fs/resctrl telemetry monitoring
Posted by Luck, Tony 1 month, 1 week ago
On Thu, Aug 28, 2025 at 09:45:41AM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 8/25/25 3:20 PM, Luck, Tony wrote:
> > On Fri, Aug 15, 2025 at 03:47:17PM +0000, Luck, Tony wrote:
> >>> I think this series is close to being ready to pass to the x86 maintainers.
> >>> To that end I focused a lot on the changelogs with the goal to meet the
> >>> tip requirements that mostly involved switching to imperative tone. I do not
> >>> expect that I found all the cases though (and I may also have made some mistakes
> >>> in my suggested text!) so please ensure the changelogs are in imperative tone
> >>> and uses consistent terms throughout the series.
> >>>
> >>> If you disagree with any feedback or if any of the feedback is unclear please
> >>> let us discuss before you spin a new version. Of course it is not required
> >>> that you follow all feedback but if you don't I would like to learn why.
> >>>
> >>> Please note that I will be offline next week.
> >>
> >> Reinette,
> >>
> >> I took one fast pass through all your comments. I think they all
> >> look good (and I believe I understand each one). Thanks for all
> >> the suggestions.
> >>
> >> I'll try to keep (better) notes on what I change as I work through
> >> each patch so I'll have a summary of any areas that I'm unsure
> >> about for you to read when you get back before I post v9.
> >>
> >> Enjoy your time away.
> > 
> > Reinette,
> > 
> > I've completed a longer, slower, pass through making changes to prepare
> > for v9.  Summary of changes per patch below. I didn't disagree with any
> > of your suggestions.
> 
> For me the item that I expected may need discussion is the use of
> active_event_groups that no longer exists in v9.

Yes. It was removed as part of the refactor to drop the pkg_mmio_info[]
array.

> 
> > 
> > The bulk of the changes are small, and localized to each patch. The
> > exception being removal of pkg_mmio_info[] which dropped patch 18 (which
> > just counted regions prior to allocation of pkg_mmio_info[]) and patch
> > 19 (which finished filling out the details.
> > 
> > discover_events() is now named enable_events(), since there are almost
> > no "steps" involved, it doesn't have a header comment. The name now
> > describes what it does.
> > 
> > Theoretically skip_this_region() might find some regions unusable, while
> > others in the same pmt_feature_group are OK. To handle this I zero the
> > telemetry_region::addr so that intel_aet_read_event() can easily skip
> > "bad" regions.
> 
> This is a significant change that simplifies the implementation a lot. 
> Even so, it is concerning that resctrl takes liberty to reach in and modify
> INTEL_PMT_TELEMETRY's data structure for its convenience though. Could the
> changelog motivate why it is ok and safe to do so? Should something like
> this not rather be done with a helper exposed by subsystem (INTEL_PMT_TELEMETRY)
> to be able to track such changes?

I can update the commit message to explain. I did check how the INTEL_PMT_TELEMETRY
code handles intel_pmt_put_feature_group(). It just does kref_put() on
pmt_feature_group::kref which results in a call to
pmt_feature_group_release() which simply does a kfree() for the
structure. So it doesn't care if I modify any fields (except for kref!)

If INTEL_PMT_TELEMETRY did care, it would have marked the return pointer
from intel_pmt_get_regions_by_feature() as "const".

If that isn't sufficient, can you expand on your thoughts about a helper
in the INTEL_PMT_TELEMETRY subsystem?

> Reinette

-Tony
Re: [PATCH v8 00/32] x86,fs/resctrl telemetry monitoring
Posted by Reinette Chatre 1 month, 1 week ago
Hi Tony,

On 8/28/25 1:14 PM, Luck, Tony wrote:
> On Thu, Aug 28, 2025 at 09:45:41AM -0700, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 8/25/25 3:20 PM, Luck, Tony wrote:
>>> On Fri, Aug 15, 2025 at 03:47:17PM +0000, Luck, Tony wrote:
>>>>> I think this series is close to being ready to pass to the x86 maintainers.
>>>>> To that end I focused a lot on the changelogs with the goal to meet the
>>>>> tip requirements that mostly involved switching to imperative tone. I do not
>>>>> expect that I found all the cases though (and I may also have made some mistakes
>>>>> in my suggested text!) so please ensure the changelogs are in imperative tone
>>>>> and uses consistent terms throughout the series.
>>>>>
>>>>> If you disagree with any feedback or if any of the feedback is unclear please
>>>>> let us discuss before you spin a new version. Of course it is not required
>>>>> that you follow all feedback but if you don't I would like to learn why.
>>>>>
>>>>> Please note that I will be offline next week.
>>>>
>>>> Reinette,
>>>>
>>>> I took one fast pass through all your comments. I think they all
>>>> look good (and I believe I understand each one). Thanks for all
>>>> the suggestions.
>>>>
>>>> I'll try to keep (better) notes on what I change as I work through
>>>> each patch so I'll have a summary of any areas that I'm unsure
>>>> about for you to read when you get back before I post v9.
>>>>
>>>> Enjoy your time away.
>>>
>>> Reinette,
>>>
>>> I've completed a longer, slower, pass through making changes to prepare
>>> for v9.  Summary of changes per patch below. I didn't disagree with any
>>> of your suggestions.
>>
>> For me the item that I expected may need discussion is the use of
>> active_event_groups that no longer exists in v9.
> 
> Yes. It was removed as part of the refactor to drop the pkg_mmio_info[]
> array.
> 
>>
>>>
>>> The bulk of the changes are small, and localized to each patch. The
>>> exception being removal of pkg_mmio_info[] which dropped patch 18 (which
>>> just counted regions prior to allocation of pkg_mmio_info[]) and patch
>>> 19 (which finished filling out the details.
>>>
>>> discover_events() is now named enable_events(), since there are almost
>>> no "steps" involved, it doesn't have a header comment. The name now
>>> describes what it does.
>>>
>>> Theoretically skip_this_region() might find some regions unusable, while
>>> others in the same pmt_feature_group are OK. To handle this I zero the
>>> telemetry_region::addr so that intel_aet_read_event() can easily skip
>>> "bad" regions.
>>
>> This is a significant change that simplifies the implementation a lot. 
>> Even so, it is concerning that resctrl takes liberty to reach in and modify
>> INTEL_PMT_TELEMETRY's data structure for its convenience though. Could the
>> changelog motivate why it is ok and safe to do so? Should something like
>> this not rather be done with a helper exposed by subsystem (INTEL_PMT_TELEMETRY)
>> to be able to track such changes?
> 
> I can update the commit message to explain. I did check how the INTEL_PMT_TELEMETRY
> code handles intel_pmt_put_feature_group(). It just does kref_put() on
> pmt_feature_group::kref which results in a call to
> pmt_feature_group_release() which simply does a kfree() for the
> structure. So it doesn't care if I modify any fields (except for kref!)

My assumption was that "getting" a reference means that there is a single 
object to which multiple users can obtain a reference and then the object is
released when the final reference is dropped. Now after looking more closely
at intel_pmt_get_regions_by_feature() (btw, it is still remapped to 
fake_intel_pmt_get_regions_by_feature in the branch you provide) I see that
every caller gets a fresh copy of a struct pmt_feature_group and not a reference to
an existing struct pmt_feature_group as I assumed. This is unexpected to me and
not clear to me why the reference counting is needed in INTEL_PMT_TELEMETRY.
Are there any kref_get()/kref_put() on the pmt_feature_group's kref?
 
> If INTEL_PMT_TELEMETRY did care, it would have marked the return pointer
> from intel_pmt_get_regions_by_feature() as "const".
> 
> If that isn't sufficient, can you expand on your thoughts about a helper
> in the INTEL_PMT_TELEMETRY subsystem?

Updating the changelog to highlight that INTEL_PMT_TELEMETRY provides a copy of
a struct pmt_feature_group for resctrl's private use instead of a reference to an
existing one will be sufficient.

Thank you

Reinette
RE: [PATCH v8 00/32] x86,fs/resctrl telemetry monitoring
Posted by Luck, Tony 1 month ago
>> If that isn't sufficient, can you expand on your thoughts about a helper
>> in the INTEL_PMT_TELEMETRY subsystem?
>
> Updating the changelog to highlight that INTEL_PMT_TELEMETRY provides a copy of
> a struct pmt_feature_group for resctrl's private use instead of a reference to an
> existing one will be sufficient..

Okay. I will update the comment.

I checked with David about whether the resources might be unmapped underneath us
since there is no additional reference taken when INTEL_PMT_TELEMETRY hands
us this copy. The answer is currently "no". The mappings would only be released if
the INTEL_PMT_TELEMTRY driver were unloaded. But we require the driver to be
built-in so that we can call from resctrl.

This could change if resctrl ever becomes a loadable module and we remove the
requirement that INTEL_PMT_TELEMTRY be built in.

-Tony

Re: [PATCH v8 00/32] x86,fs/resctrl telemetry monitoring
Posted by Reinette Chatre 1 month, 3 weeks ago
Hi Tony,

On 8/11/25 11:16 AM, Tony Luck wrote:

...

> 
> The INTEL_PMT_DISCOVERY driver provides intel_pmt_get_regions_by_feature()

INTEL_PMT_DISCOVERY -> INTEL_PMT_TELEMETRY ?

Reinette
RE: [PATCH v8 00/32] x86,fs/resctrl telemetry monitoring
Posted by Luck, Tony 1 month, 3 weeks ago
> > The INTEL_PMT_DISCOVERY driver provides intel_pmt_get_regions_by_feature()
>
> INTEL_PMT_DISCOVERY -> INTEL_PMT_TELEMETRY ?

Reinette,

Perhaps.  The discovery driver is somewhat it's own thing:

drivers/platform/x86/intel/pmt/Makefile:obj-$(CONFIG_INTEL_PMT_DISCOVERY)       += pmt_discovery.o

But it is automatically pulled in if you enable INTEL_PMT_TELEMETRY:

drivers/platform/x86/intel/pmt/Kconfig-config INTEL_PMT_TELEMETRY
drivers/platform/x86/intel/pmt/Kconfig- tristate "Intel Platform Monitoring Technology (PMT) Telemetry driver"
drivers/platform/x86/intel/pmt/Kconfig- depends on INTEL_VSEC
drivers/platform/x86/intel/pmt/Kconfig: select INTEL_PMT_DISCOVERY

-Tony
Re: [PATCH v8 00/32] x86,fs/resctrl telemetry monitoring
Posted by Reinette Chatre 1 month, 3 weeks ago
Hi Tony,

On 8/14/25 8:44 AM, Luck, Tony wrote:
>>> The INTEL_PMT_DISCOVERY driver provides intel_pmt_get_regions_by_feature()
>>
>> INTEL_PMT_DISCOVERY -> INTEL_PMT_TELEMETRY ?
> 
> Reinette,
> 
> Perhaps.  The discovery driver is somewhat it's own thing:
> 
> drivers/platform/x86/intel/pmt/Makefile:obj-$(CONFIG_INTEL_PMT_DISCOVERY)       += pmt_discovery.o
> 
> But it is automatically pulled in if you enable INTEL_PMT_TELEMETRY:
> 
> drivers/platform/x86/intel/pmt/Kconfig-config INTEL_PMT_TELEMETRY
> drivers/platform/x86/intel/pmt/Kconfig- tristate "Intel Platform Monitoring Technology (PMT) Telemetry driver"
> drivers/platform/x86/intel/pmt/Kconfig- depends on INTEL_VSEC
> drivers/platform/x86/intel/pmt/Kconfig: select INTEL_PMT_DISCOVERY

The sentence reads:
"The INTEL_PMT_DISCOVERY driver provides intel_pmt_get_regions_by_feature()"

Looking at code:
intel_pmt_get_regions_by_feature() is implemented in:
	drivers/platform/x86/intel/pmt/telemetry.c

Looking at Makefile (drivers/platform/x86/intel/pmt/Makefile):
obj-$(CONFIG_INTEL_PMT_TELEMETRY)	+= pmt_telemetry.o
pmt_telemetry-y				:= telemetry.o

Since intel_pmt_get_regions_by_feature() is in file associated with
INTEL_PMT_TELEMETRY I believe it is more accurate to say:

"The INTEL_PMT_TELEMETRY driver provides intel_pmt_get_regions_by_feature()"

I do not see INTEL_PMT_DISCOVERY depend on INTEL_PMT_TELEMETRY so
having a kernel built with just INTEL_PMT_DISCOVERY will not
include intel_pmt_get_regions_by_feature() so I do not believe 
"The INTEL_PMT_DISCOVERY driver provides intel_pmt_get_regions_by_feature()" 
is accurate.

Reinette