RE: [PATCH v8 00/32] x86,fs/resctrl telemetry monitoring

Luck, Tony posted 32 patches 1 month, 2 weeks ago
Only 0 patches received!
There is a newer version of this series
RE: [PATCH v8 00/32] x86,fs/resctrl telemetry monitoring
Posted by Luck, Tony 1 month, 2 weeks ago
> I think this series is close to being ready to pass to the x86 maintainers.
> To that end I focused a lot on the changelogs with the goal to meet the
> tip requirements that mostly involved switching to imperative tone. I do not
> expect that I found all the cases though (and I may also have made some mistakes
> in my suggested text!) so please ensure the changelogs are in imperative tone
> and uses consistent terms throughout the series.
>
> If you disagree with any feedback or if any of the feedback is unclear please
> let us discuss before you spin a new version. Of course it is not required
> that you follow all feedback but if you don't I would like to learn why.
>
> Please note that I will be offline next week.

Reinette,

I took one fast pass through all your comments. I think they all
look good (and I believe I understand each one). Thanks for all
the suggestions.

I'll try to keep (better) notes on what I change as I work through
each patch so I'll have a summary of any areas that I'm unsure
about for you to read when you get back before I post v9.

Enjoy your time away.

-Tony
Re: [PATCH v8 00/32] x86,fs/resctrl telemetry monitoring
Posted by Luck, Tony 1 month, 1 week ago
On Fri, Aug 15, 2025 at 03:47:17PM +0000, Luck, Tony wrote:
> > I think this series is close to being ready to pass to the x86 maintainers.
> > To that end I focused a lot on the changelogs with the goal to meet the
> > tip requirements that mostly involved switching to imperative tone. I do not
> > expect that I found all the cases though (and I may also have made some mistakes
> > in my suggested text!) so please ensure the changelogs are in imperative tone
> > and uses consistent terms throughout the series.
> >
> > If you disagree with any feedback or if any of the feedback is unclear please
> > let us discuss before you spin a new version. Of course it is not required
> > that you follow all feedback but if you don't I would like to learn why.
> >
> > Please note that I will be offline next week.
> 
> Reinette,
> 
> I took one fast pass through all your comments. I think they all
> look good (and I believe I understand each one). Thanks for all
> the suggestions.
> 
> I'll try to keep (better) notes on what I change as I work through
> each patch so I'll have a summary of any areas that I'm unsure
> about for you to read when you get back before I post v9.
> 
> Enjoy your time away.

Reinette,

I've completed a longer, slower, pass through making changes to prepare
for v9.  Summary of changes per patch below. I didn't disagree with any
of your suggestions.

The bulk of the changes are small, and localized to each patch. The
exception being removal of pkg_mmio_info[] which dropped patch 18 (which
just counted regions prior to allocation of pkg_mmio_info[]) and patch
19 (which finished filling out the details.

discover_events() is now named enable_events(), since there are almost
no "steps" involved, it doesn't have a header comment. The name now
describes what it does.

Theoretically skip_this_region() might find some regions unusable, while
others in the same pmt_feature_group are OK. To handle this I zero the
telemetry_region::addr so that intel_aet_read_event() can easily skip
"bad" regions.

I've pushed a rdt-aet-v9-wip branch to:
Link: https://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git
if you want to take a quick look at the end result before
I patch bomb the list.

-Tony

--- cover ---
INTEL_PMT_DISCOVERY -> INTEL_PMT_TELEMETRY

--- 1 ---
No change

--- 2 ---
No change

--- 3 ---
No change

--- 4 ---
No change

--- 5 ---
"The "type" field provides" -> "rdt_domain_hdr::type provides"
Update final paragraph of commit message as suggested.

--- 6 ---
Code: in domain_add_cpu_mon() change WARN_ON_ONCE(1); in default case to
pr_warn_once("Unknown resource rid=%d\n", r->rid);


--- 7 ---
Code: in domain_remove_cpu_mon() move domain_header_is_valid()to
immediately before container_of() and switch check from r->rid to
hard-coded RDT_RESOURCE_L3.
s/list_del_rcu(&d->hdr.list);/list_del_rcu(&hdr->list);/

"separated" -> "separate"
"can skip" -> "skips"

--- 8 ---
Code: in domain_remove_cpu_ctrl()  move domain_header_is_valid()
to immediately before container_of().
s/list_del_rcu(&d->hdr.list);/list_del_rcu(&hdr->list);/
(moved here from patch 9)

"refactor" -> "refactor domain_remove_cpu_ctrl()"


--- 9 ---
Change second paragraph of commit comment as suggested.

"so the rmid_read::d field is replaced" -> "so replace the L3 specific domain
pointer rmid_read::d with rmid_read::hdr that points to the generic domain
header"

Use imperative tone for last paragraph "Update kerneldoc for mon_data::sum ..."

--- 10 ---
Change rdt_mon_domain to rdt_l3_mon_domain in comment above logical_rmid_to_physical_rmid()

Replace entire commit comment with improved version.

--- 11 ---
"different resources" -> "a different resource"
"these functions" -> "the L3 resource specific functions"
"Two groups of functions renamed here:" -> "Rename three groups of functions:"
Added rdt_get_mon_l3_config() to list of renamed functions to put the "l3"
before the "mon" for consistency.

When changing names for resctrl_mon_resource_{init,exit} add the "l3_" before
"mon" for consistency with other *l3_mon*" naming.

--- 12 ---
"to lower levels" -> "via the mon_data and rmid_read structures to
the functions finally reading the monitoring data."

Replace 3rd paragraph of commit comment with supplied better version.

--- 13 ---
Code: In cpu_on_correct_domain() s/return -EINVAL;/return false;/

--- 14 ---
Code: Move definition of MAX_BINARY_BITS from <linux/resctrl.h> to
fs/resctrl/internal.h.

Be explicit in commit comment that the file system makes the determination
on which events can be displayed in floating point format.

--- 15 ---
"asyncronnous" -> "asynchronous"
"for these drivers" -> "of these drivers"
"are called" -> "completes"
"But expectations" -> "Expectations"
"The call is made with no locks held." -> "resctrl filesystem calls the
hook with no locks held."

--- 16 ---
s/CPU hotplug/CPU hot plug/
Add Reinette's RB tag.

--- 17 ---
Code:
"OOBMSM" -> "INTEL_PMT_TELEMETRY"
"INTEL_PMT_DISCOVERY" -> "INTEL_PMT_TELEMETRY"
re-wrap comment for get_pmt_feature() to use 80 columns.
"OOBMSM discovery" -> "INTEL_PMT_TELEMETRY"
Add the intel_pmt_put_feature_group() calls to intel_aet_exit()
to match the intel_pmt_get_regions_by_feature() calls in get_pmt_feature()
using a new macro for_each_enabled_event_group().
Rename discover_events() to enable_events()

Add period at end of help text in arch/x86/Kconfig.

"Data for telemetry events is collected by each CPU and sent" ->
"Each CPU collects data for telemetry events that it sends"
"is changed" -> "changes"
"or when two milliseconds have elapsed" -> "or when a two millisecond timer expires"
"mad" -> "made"
"Enumeration of support for telemetry events is done" ->
"The INTEL_PMT_TELEMETRY driver enumerates support for telemetry events."
Drop references to INTEL_PMT_DISCOVERY driver.

Merge last two paragraphs of commit message.

Reformat commit to use more of page width.

Add maintainer note about checkpatch complaints for DEFINE_FREE()

--- 18 ---
Dropped. See new patch 0019

--- 19 ---
Dropped. See new patch 0019

--- 20 ---
Now 0018
Rewrite opening paragraph to avoid "continuation of the subject"

Add note/link on the source of the XML files that describe events.

--- 21 ---
Now 0019
Drop "vague first sentence" of second paragraph.

--- NEW 0020 ---
Replaces most of parts 18/32 and 19/32.
Contains skip_this_region() from patch 18/32, but skips all the
code to count regions and allocat pkg_mmio_info[].

Also includes event enabling code from 22/32

--- 22 ---
Now 0021
Modify intel_aet_read_event() to dig into pmt_event::pfg to
find the MMIO base addresses that v8 patch stored in the pkg_mmio_info[]
structures.

--- 23 ---
Now 0022
Add domain_header_is_valid() check in domain_remove_cpu_mon()
before using container_of().

"There are structures" -> "There are per domain structures ..."

Replace my commit fix description with better version from Reinette.

--- 24 ---
Unparseable last sentence of commit message replaced with details
and examples.

--- 25 ---
RMIDS -> RMIDs
"Adjusted downwards ..." -> "May be adjusted downwards ..."
check_rmid_count() -> all_regions_have_sufficient_rmid()
"Potentially disable" -> "Disable"
Add comment:
/* e->num_rmids only adjusted lower if user forces an unusable region to be usable */

--- 26 ---
Add "during resctrl initialization" and "during resctrl exit" to commit
background statement.

Add "closid_num_dirty_rmid[]" to be specific about what is being allocated/freed.

--- 27 ---
"mon capable" -> "mon_capable" (three times)
"alloc capable" -> "alloc_capable"
"rdt_l3_mon_domain::states[]" -> "rdt_l3_mon_domain::mbm_states[] and
+rdt_l3_mon_domain::rmid_busy_llc"

"the number of RMIDs" -> "the system's number of RMIDs"

--- 28 ---
Add comment to setup_rmid_lru_list() to note that rmid_ptrs[]
is allocated of first mount and is reused on subsequent mounts.
It is freed in resctrl_exit().

Lock/unlock rdtgroup_mutex around body of free_rmid_lru_list()

Rewrite commit based on suggestions with some modifications to explain
why error paths in rdt_get_tree() do not call free_rmid_lru_list().

--- 29 ---
TODO: recheck for use of "CPU hot plug notifiers" may have been called "hooks", "callbacks", and
"handlers" through this series.

--- 30 ---
"a resource" -> "a monitoring resource"

--- 31 ---
"last_update_timestamp" -> "agg_last_update_timestamp" in commit comment

Move creation of debugfs files to the end of enable_events().

Rewrite to work based on event_group::pfg since event_group::pkginfo[] is gone.


--- 32 ---
Describe "num_rmids" file values independently for L3 and telemetry.
Move the note about upper bound on directory creation to its own
paragraph to say it is the smaller of reported "num_rmids" values.

"or of a processor package" -> "another for each processor package"

Change paragraph about contents of subdirectories of mon_data to
gize example file names instead of hard coding specifics.

prescence -> presence
will vary -> may vary
last_update_timestamp -> agg_last_update_timestamp

Simplify commit comment as suggested.
Re: [PATCH v8 00/32] x86,fs/resctrl telemetry monitoring
Posted by Reinette Chatre 1 month ago
Hi Tony,

On 8/25/25 3:20 PM, Luck, Tony wrote:
> On Fri, Aug 15, 2025 at 03:47:17PM +0000, Luck, Tony wrote:
>>> I think this series is close to being ready to pass to the x86 maintainers.
>>> To that end I focused a lot on the changelogs with the goal to meet the
>>> tip requirements that mostly involved switching to imperative tone. I do not
>>> expect that I found all the cases though (and I may also have made some mistakes
>>> in my suggested text!) so please ensure the changelogs are in imperative tone
>>> and uses consistent terms throughout the series.
>>>
>>> If you disagree with any feedback or if any of the feedback is unclear please
>>> let us discuss before you spin a new version. Of course it is not required
>>> that you follow all feedback but if you don't I would like to learn why.
>>>
>>> Please note that I will be offline next week.
>>
>> Reinette,
>>
>> I took one fast pass through all your comments. I think they all
>> look good (and I believe I understand each one). Thanks for all
>> the suggestions.
>>
>> I'll try to keep (better) notes on what I change as I work through
>> each patch so I'll have a summary of any areas that I'm unsure
>> about for you to read when you get back before I post v9.
>>
>> Enjoy your time away.
> 
> Reinette,
> 
> I've completed a longer, slower, pass through making changes to prepare
> for v9.  Summary of changes per patch below. I didn't disagree with any
> of your suggestions.

For me the item that I expected may need discussion is the use of
active_event_groups that no longer exists in v9.

> 
> The bulk of the changes are small, and localized to each patch. The
> exception being removal of pkg_mmio_info[] which dropped patch 18 (which
> just counted regions prior to allocation of pkg_mmio_info[]) and patch
> 19 (which finished filling out the details.
> 
> discover_events() is now named enable_events(), since there are almost
> no "steps" involved, it doesn't have a header comment. The name now
> describes what it does.
> 
> Theoretically skip_this_region() might find some regions unusable, while
> others in the same pmt_feature_group are OK. To handle this I zero the
> telemetry_region::addr so that intel_aet_read_event() can easily skip
> "bad" regions.

This is a significant change that simplifies the implementation a lot. 
Even so, it is concerning that resctrl takes liberty to reach in and modify
INTEL_PMT_TELEMETRY's data structure for its convenience though. Could the
changelog motivate why it is ok and safe to do so? Should something like
this not rather be done with a helper exposed by subsystem (INTEL_PMT_TELEMETRY)
to be able to track such changes?

Reinette
Re: [PATCH v8 00/32] x86,fs/resctrl telemetry monitoring
Posted by Luck, Tony 1 month ago
On Thu, Aug 28, 2025 at 09:45:41AM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 8/25/25 3:20 PM, Luck, Tony wrote:
> > On Fri, Aug 15, 2025 at 03:47:17PM +0000, Luck, Tony wrote:
> >>> I think this series is close to being ready to pass to the x86 maintainers.
> >>> To that end I focused a lot on the changelogs with the goal to meet the
> >>> tip requirements that mostly involved switching to imperative tone. I do not
> >>> expect that I found all the cases though (and I may also have made some mistakes
> >>> in my suggested text!) so please ensure the changelogs are in imperative tone
> >>> and uses consistent terms throughout the series.
> >>>
> >>> If you disagree with any feedback or if any of the feedback is unclear please
> >>> let us discuss before you spin a new version. Of course it is not required
> >>> that you follow all feedback but if you don't I would like to learn why.
> >>>
> >>> Please note that I will be offline next week.
> >>
> >> Reinette,
> >>
> >> I took one fast pass through all your comments. I think they all
> >> look good (and I believe I understand each one). Thanks for all
> >> the suggestions.
> >>
> >> I'll try to keep (better) notes on what I change as I work through
> >> each patch so I'll have a summary of any areas that I'm unsure
> >> about for you to read when you get back before I post v9.
> >>
> >> Enjoy your time away.
> > 
> > Reinette,
> > 
> > I've completed a longer, slower, pass through making changes to prepare
> > for v9.  Summary of changes per patch below. I didn't disagree with any
> > of your suggestions.
> 
> For me the item that I expected may need discussion is the use of
> active_event_groups that no longer exists in v9.

Yes. It was removed as part of the refactor to drop the pkg_mmio_info[]
array.

> 
> > 
> > The bulk of the changes are small, and localized to each patch. The
> > exception being removal of pkg_mmio_info[] which dropped patch 18 (which
> > just counted regions prior to allocation of pkg_mmio_info[]) and patch
> > 19 (which finished filling out the details.
> > 
> > discover_events() is now named enable_events(), since there are almost
> > no "steps" involved, it doesn't have a header comment. The name now
> > describes what it does.
> > 
> > Theoretically skip_this_region() might find some regions unusable, while
> > others in the same pmt_feature_group are OK. To handle this I zero the
> > telemetry_region::addr so that intel_aet_read_event() can easily skip
> > "bad" regions.
> 
> This is a significant change that simplifies the implementation a lot. 
> Even so, it is concerning that resctrl takes liberty to reach in and modify
> INTEL_PMT_TELEMETRY's data structure for its convenience though. Could the
> changelog motivate why it is ok and safe to do so? Should something like
> this not rather be done with a helper exposed by subsystem (INTEL_PMT_TELEMETRY)
> to be able to track such changes?

I can update the commit message to explain. I did check how the INTEL_PMT_TELEMETRY
code handles intel_pmt_put_feature_group(). It just does kref_put() on
pmt_feature_group::kref which results in a call to
pmt_feature_group_release() which simply does a kfree() for the
structure. So it doesn't care if I modify any fields (except for kref!)

If INTEL_PMT_TELEMETRY did care, it would have marked the return pointer
from intel_pmt_get_regions_by_feature() as "const".

If that isn't sufficient, can you expand on your thoughts about a helper
in the INTEL_PMT_TELEMETRY subsystem?

> Reinette

-Tony
Re: [PATCH v8 00/32] x86,fs/resctrl telemetry monitoring
Posted by Reinette Chatre 1 month ago
Hi Tony,

On 8/28/25 1:14 PM, Luck, Tony wrote:
> On Thu, Aug 28, 2025 at 09:45:41AM -0700, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 8/25/25 3:20 PM, Luck, Tony wrote:
>>> On Fri, Aug 15, 2025 at 03:47:17PM +0000, Luck, Tony wrote:
>>>>> I think this series is close to being ready to pass to the x86 maintainers.
>>>>> To that end I focused a lot on the changelogs with the goal to meet the
>>>>> tip requirements that mostly involved switching to imperative tone. I do not
>>>>> expect that I found all the cases though (and I may also have made some mistakes
>>>>> in my suggested text!) so please ensure the changelogs are in imperative tone
>>>>> and uses consistent terms throughout the series.
>>>>>
>>>>> If you disagree with any feedback or if any of the feedback is unclear please
>>>>> let us discuss before you spin a new version. Of course it is not required
>>>>> that you follow all feedback but if you don't I would like to learn why.
>>>>>
>>>>> Please note that I will be offline next week.
>>>>
>>>> Reinette,
>>>>
>>>> I took one fast pass through all your comments. I think they all
>>>> look good (and I believe I understand each one). Thanks for all
>>>> the suggestions.
>>>>
>>>> I'll try to keep (better) notes on what I change as I work through
>>>> each patch so I'll have a summary of any areas that I'm unsure
>>>> about for you to read when you get back before I post v9.
>>>>
>>>> Enjoy your time away.
>>>
>>> Reinette,
>>>
>>> I've completed a longer, slower, pass through making changes to prepare
>>> for v9.  Summary of changes per patch below. I didn't disagree with any
>>> of your suggestions.
>>
>> For me the item that I expected may need discussion is the use of
>> active_event_groups that no longer exists in v9.
> 
> Yes. It was removed as part of the refactor to drop the pkg_mmio_info[]
> array.
> 
>>
>>>
>>> The bulk of the changes are small, and localized to each patch. The
>>> exception being removal of pkg_mmio_info[] which dropped patch 18 (which
>>> just counted regions prior to allocation of pkg_mmio_info[]) and patch
>>> 19 (which finished filling out the details.
>>>
>>> discover_events() is now named enable_events(), since there are almost
>>> no "steps" involved, it doesn't have a header comment. The name now
>>> describes what it does.
>>>
>>> Theoretically skip_this_region() might find some regions unusable, while
>>> others in the same pmt_feature_group are OK. To handle this I zero the
>>> telemetry_region::addr so that intel_aet_read_event() can easily skip
>>> "bad" regions.
>>
>> This is a significant change that simplifies the implementation a lot. 
>> Even so, it is concerning that resctrl takes liberty to reach in and modify
>> INTEL_PMT_TELEMETRY's data structure for its convenience though. Could the
>> changelog motivate why it is ok and safe to do so? Should something like
>> this not rather be done with a helper exposed by subsystem (INTEL_PMT_TELEMETRY)
>> to be able to track such changes?
> 
> I can update the commit message to explain. I did check how the INTEL_PMT_TELEMETRY
> code handles intel_pmt_put_feature_group(). It just does kref_put() on
> pmt_feature_group::kref which results in a call to
> pmt_feature_group_release() which simply does a kfree() for the
> structure. So it doesn't care if I modify any fields (except for kref!)

My assumption was that "getting" a reference means that there is a single 
object to which multiple users can obtain a reference and then the object is
released when the final reference is dropped. Now after looking more closely
at intel_pmt_get_regions_by_feature() (btw, it is still remapped to 
fake_intel_pmt_get_regions_by_feature in the branch you provide) I see that
every caller gets a fresh copy of a struct pmt_feature_group and not a reference to
an existing struct pmt_feature_group as I assumed. This is unexpected to me and
not clear to me why the reference counting is needed in INTEL_PMT_TELEMETRY.
Are there any kref_get()/kref_put() on the pmt_feature_group's kref?
 
> If INTEL_PMT_TELEMETRY did care, it would have marked the return pointer
> from intel_pmt_get_regions_by_feature() as "const".
> 
> If that isn't sufficient, can you expand on your thoughts about a helper
> in the INTEL_PMT_TELEMETRY subsystem?

Updating the changelog to highlight that INTEL_PMT_TELEMETRY provides a copy of
a struct pmt_feature_group for resctrl's private use instead of a reference to an
existing one will be sufficient.

Thank you

Reinette
RE: [PATCH v8 00/32] x86,fs/resctrl telemetry monitoring
Posted by Luck, Tony 1 month ago
>> If that isn't sufficient, can you expand on your thoughts about a helper
>> in the INTEL_PMT_TELEMETRY subsystem?
>
> Updating the changelog to highlight that INTEL_PMT_TELEMETRY provides a copy of
> a struct pmt_feature_group for resctrl's private use instead of a reference to an
> existing one will be sufficient..

Okay. I will update the comment.

I checked with David about whether the resources might be unmapped underneath us
since there is no additional reference taken when INTEL_PMT_TELEMETRY hands
us this copy. The answer is currently "no". The mappings would only be released if
the INTEL_PMT_TELEMTRY driver were unloaded. But we require the driver to be
built-in so that we can call from resctrl.

This could change if resctrl ever becomes a loadable module and we remove the
requirement that INTEL_PMT_TELEMTRY be built in.

-Tony