Add driver registration i/f to resctrl

[RFC PATCH 0/7] Add driver registration i/f to resctrl

Posted by Tony Luck 2 years, 9 months ago

This is very much proof of concept code at this stage. I have a few
quality of service features that are hard to intergrate into the core
resctrl code because they are model specific, or have input parameters
that do not fit neatly into the existing schemata model.

Also, as AMD, ARM, and now RISC-V are looking to share the core resctrl
code, it might be helpful to have "driver" as a software layer for
per-CPU architectural code to avoid cluttering the core.

None of my drivers are ready to post, so this series has a simple example
driver that would meet the same debug requirements of Babu Moger's
patch to expose the CLOSID/RMID in files in each directory:

  https://lore.kernel.org/all/168177449635.1758847.13040588638888054027.stgit@bmoger-ubuntu/

Doing this debug with a driver that can be loaded unloaded without
having to unmount and remount the resctrl file system appears slightly
more convenient that a "-o debug" option. But this example driver is
really intended just as a toy example of what can be done.

The series is broken into steps that add callback functions into various
different parts of the resctrl hierarchy. That list of parts has been
driven by the needs of the drivers that I want to write. The
registration interface could be extended if there are additional
hooks need for other drivers.

I'm looking for high level comments on the desireability of this approach
at this time. I don't expect any of this to be merged until I have some
real drivers that use this to offer to upstream.

Tony Luck (7):
  x86/resctrl: Add register/unregister functions for driver to hook into
    resctrl
  x86/resctrl: Add an interface to add/remove a new info/directory
  x86/resctrl: Add driver callback when directories are removed
  x86/resctrl: Add capability to driver registration to create control
    files
  x86/resctrl: Enhance driver registration to hook into schemata files
  x86/resctrl: Allow a device to override an existing schemata entry
  x86/resctrl: Example resctrl driver

 include/linux/resctrl.h                       |  37 +++
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c     |  17 +-
 .../cpu/resctrl/drivers/resctrl_example.c     |  77 ++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 227 ++++++++++++++++++
 arch/x86/Kconfig                              |  11 +
 arch/x86/kernel/cpu/resctrl/Makefile          |   1 +
 arch/x86/kernel/cpu/resctrl/drivers/Makefile  |   1 +
 7 files changed, 368 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/resctrl/drivers/resctrl_example.c
 create mode 100644 arch/x86/kernel/cpu/resctrl/drivers/Makefile


base-commit: 6a8f57ae2eb07ab39a6f0ccad60c760743051026
-- 
2.39.2

Re: [RFC PATCH 0/7] Add driver registration i/f to resctrl

Posted by Reinette Chatre 2 years, 9 months ago

Hi Tony,

On 4/20/2023 3:06 PM, Tony Luck wrote:
> This is very much proof of concept code at this stage. I have a few
> quality of service features that are hard to intergrate into the core

(integrate?)

> resctrl code because they are model specific, or have input parameters
> that do not fit neatly into the existing schemata model.

I would like to understand these two motivations better.

Regarding "because they are model specific": the features would remain
model specific no matter from where they are supported so it sounds to
me that you would like to move/contain the model specific logic to
the drivers? Why would this be a motivation since it seems that this
would still make things model specific. I do not think resctrl is
averse to model specific code when I consider the code like
cache_alloc_hsw_probe() and __check_quirks_intel().

Regarding "do not fit neatly into the existing schemata model": could
you please elaborate? If I understand correctly a driver would like
to take ownership of a line in the schemata file, I then look at
a driver as providing support for a resource. It looks like 
these new resources may not be "domain aware" so would require unique
parsing, but that is something we can do in resctrl, no? Something
like a unique "parse_line()" associated with each resctrl resource?

Considering the above it is not clear to me at this point why this
driver interface is needed. Why could each new driver not instead be
a resctrl resource?

> Also, as AMD, ARM, and now RISC-V are looking to share the core resctrl
> code, it might be helpful to have "driver" as a software layer for
> per-CPU architectural code to avoid cluttering the core.
> 
> None of my drivers are ready to post, so this series has a simple example
> driver that would meet the same debug requirements of Babu Moger's
> patch to expose the CLOSID/RMID in files in each directory:
> 
>   https://lore.kernel.org/all/168177449635.1758847.13040588638888054027.stgit@bmoger-ubuntu/
> 
> Doing this debug with a driver that can be loaded unloaded without
> having to unmount and remount the resctrl file system appears slightly
> more convenient that a "-o debug" option. But this example driver is
> really intended just as a toy example of what can be done.

The driver seems simple but I think it already shows that this can get
complicated with many sharp corners. If I understand correctly this driver
will add the "closid" and "rmid" files in every control and monitor group. This
driver thus assumes a system that supports both control and monitoring, but
that is not guaranteed. For robustness the "rmid" file should not appear
in a control group that does not support monitoring.

> 
> The series is broken into steps that add callback functions into various
> different parts of the resctrl hierarchy. That list of parts has been
> driven by the needs of the drivers that I want to write. The
> registration interface could be extended if there are additional
> hooks need for other drivers.

The boundaries of the resctrl and driver interface are not clear to me.
Looking at where the new driver API is created and how it is used in the
example code I see that this occurs in include/linux/resctrl.h. This is
the API that an architecture using resctrl is intended to use and
thus provides much more to the drivers that I'd expect it to want to or
be able to use based on this description. 

> 
> I'm looking for high level comments on the desireability of this approach

(desirability?)

> at this time. I don't expect any of this to be merged until I have some
> real drivers that use this to offer to upstream.

Some hints about scenarios that require this driver interface would be
helpful.

Apart from the high level comments above I looked briefly at the patches
and responded there where I have some high level comments/questions.

Reinette

RE: [RFC PATCH 0/7] Add driver registration i/f to resctrl

Posted by Luck, Tony 2 years, 9 months ago

Reinette,

Thanks for the review. Your comments on individual patches are all good, but I'll try to address
them here to keep all the pieces together.

You sum up everything in your concise:

"I would like to understand these two motivations better."

I'll try to give some additional details. But the specific features of the drivers I
want to add are still under wraps, so I will have to be somewhat vague at times.
If that means that a final decision can't be made until more details are forthcoming,
that's fine. At least the discussion has begun now so people have some time
to think about this well before I need to get code upstream. Or if the answer
is "No. We will never create s/w layers in the resctrl code." Then I can go
back to puzzling some other solutions.

Intel has always had some model specific "RDT" features that we have
not attempted to push to upstream. Rationale is that architectural features
have lasting value across CPU generations. Once the code is upstream it
"just works" (TM) for all time. Cluttering up core code with features that are
only applicable to one or two CPU generations seemed like a recipe for
disaster from a long-term maintenance perspective.

But things are changing. Previously the end user industries that wanted
the model specific features were content for out-of-tree patches for their
specific systems. We now have some features that may be useful to a
wider audience, ones that requires a solution integrated into upstream
so that they don't take on technical debt to move the solution to new
kernel versions.

I'm also guessing that with other architectures (AMD, ARM, RISC-V)
all building on the resctrl foundation, some of them may also have some
features that don't fit neatly into the core of resctrl.

My RFC patches were just to show that creating a s/w layer in resctrl
is possible. If there is a better dividing line between core code and
architecture/model specific code I'm happy to discuss cleaner ways to
draw the line. E.g. you make the point here and in one of your comments
to the individual patches that making new resctrl resources may be a
cleaner solution. I didn't do it that way partly because the existing code
has a static array of resources. But it might be relatively simple to transform
that into a list to make dynamically adding new resources easier. But see
Q6 below about tracking domains in a resource for other challenges.

Specific questions you raise:

Q1) Will there be a need for drivers to "call into" resctrl rather than rely
on call backs? May make locking complex.

A1) So far I haven't found a case. But I only have three drivers (in addition
to the example one) so it is possible that more complex things may be needed.
I agree that this will raise many locking challenges. Exporting the resctrl
mutex seems like a terrible idea.

Q2) What about exclusive groups?
A2) I didn’t try to handle in this RFC. Additional bits will be needed.

Q3) How to make visible to the driver other resctrl assumptions (e.g. default
group is CLOSID=0, RMID=0).
A3) I think this specific example is unlikely to ever change (it is somewhat tied
to the power-on/reset state of the IA32_PQR_ASSOC register. But the general
point is true that assumptions by drivers may create challenges to refactor core
code in ways that break those assumptions.

Q4) Suppressing schemata resources from a driver surprised you.
A4) This is, as you guessed, about conflicting resources. There are other ways it
could be handled. E.g. use existing mount options to suppress the resource from
the schemata. To be safe that might also need some way to fail to load of a driver
that needs other access to a resource unless the correct mount options are in
force.

Q5) Boundaries are not clear. Too much of resctrl internals made visible to drivers.
A5) Can work on more abstract interfaces if we move forward with some sort of
layered approach.

Q6) Domain awareness of drivers.
A6) This is a challenge. Especially as the domain for a driver may not match up
with any existing resource scope (e.g. driver may be socket scoped, which may
not be the same as "L3 cache" scoped). After I posted this series I added
an entry in the resource table with socket scope to handle this. Dynamically adding
a new resource with a custom scope has challenges (because the domain lists
attached to that resource are maintained by the resctrl cpu hot plug callbacks as
CPUs come online and go offline.

-Tony

Re: [RFC PATCH 0/7] Add driver registration i/f to resctrl

Posted by Reinette Chatre 2 years, 9 months ago

Hi Tony,

On 5/8/2023 11:32 AM, Luck, Tony wrote:
> Reinette,
> 
> Thanks for the review. Your comments on individual patches are all good, but I'll try to address
> them here to keep all the pieces together.
> 
> You sum up everything in your concise:
> 
>   "I would like to understand these two motivations better."
> 
> I'll try to give some additional details. But the specific features of the drivers I
> want to add are still under wraps, so I will have to be somewhat vague at times.
> If that means that a final decision can't be made until more details are forthcoming,
> that's fine. At least the discussion has begun now so people have some time
> to think about this well before I need to get code upstream. Or if the answer
> is "No. We will never create s/w layers in the resctrl code." Then I can go
> back to puzzling some other solutions.
> 
> Intel has always had some model specific "RDT" features that we have
> not attempted to push to upstream. Rationale is that architectural features
> have lasting value across CPU generations. Once the code is upstream it
> "just works" (TM) for all time. Cluttering up core code with features that are
> only applicable to one or two CPU generations seemed like a recipe for
> disaster from a long-term maintenance perspective.

Could you please elaborate how this seems like a "recipe for disaster"? I
can certainly see how removing a driver is easy when it is decided that
something is "end of life". I rarely see "end of life" in practice
though and viewing removal of obsolete code from a driver as a "disaster"
is not clear to me. Re-factoring code occurs frequently.

> But things are changing. Previously the end user industries that wanted
> the model specific features were content for out-of-tree patches for their
> specific systems. We now have some features that may be useful to a
> wider audience, ones that requires a solution integrated into upstream
> so that they don't take on technical debt to move the solution to new
> kernel versions.
> 
> I'm also guessing that with other architectures (AMD, ARM, RISC-V)
> all building on the resctrl foundation, some of them may also have some
> features that don't fit neatly into the core of resctrl.

They absolutely have features that don't fit neatly into the core of
resctrl. I am not able to tell whether this style of interface would
solve this.

The second motivation from your original email was that these new features
have "input parameters that do not fit neatly into the existing schemata
model". It is not obvious to me where you address that, but maybe it is
related to "Q6".

> My RFC patches were just to show that creating a s/w layer in resctrl
> is possible. If there is a better dividing line between core code and 
> architecture/model specific code I'm happy to discuss cleaner ways to
> draw the line. E.g. you make the point here and in one of your comments
> to the individual patches that making new resctrl resources may be a 
> cleaner solution. I didn't do it that way partly because the existing code
> has a static array of resources. But it might be relatively simple to transform
> that into a list to make dynamically adding new resources easier. But see
> Q6 below about tracking domains in a resource for other challenges.
> 
> Specific questions you raise:
> 
> Q1) Will there be a need for drivers to "call into" resctrl rather than rely
> on call backs? May make locking complex.
> 
> A1) So far I haven't found a case. But I only have three drivers (in addition
> to the example one) so it is possible that more complex things may be needed.
> I agree that this will raise many locking challenges. Exporting the resctrl
> mutex seems like a terrible idea.

I agree that we should not export the mutex. You may be interested in the work
that James is working on separating the locks. 
https://lore.kernel.org/lkml/20230320172620.18254-1-james.morse@arm.com/

It would be great if you could join these discussions to see if there are
some common requirements that can be solved together.

> Q2) What about exclusive groups?
> A2) I didn’t try to handle in this RFC. Additional bits will be needed.

It seems like every time a driver needs "additional bits" it would impact
all the other drivers.

> 
> Q3) How to make visible to the driver other resctrl assumptions (e.g. default
> group is CLOSID=0, RMID=0).

Actually I think resctrl assumptions should be _invisible_ to drivers.

> A3) I think this specific example is unlikely to ever change (it is somewhat tied
> to the power-on/reset state of the IA32_PQR_ASSOC register. But the general
> point is true that assumptions by drivers may create challenges to refactor core
> code in ways that break those assumptions.
> 
> Q4) Suppressing schemata resources from a driver surprised you.
> A4) This is, as you guessed, about conflicting resources. There are other ways it
> could be handled. E.g. use existing mount options to suppress the resource from
> the schemata. To be safe that might also need some way to fail to load of a driver
> that needs other access to a resource unless the correct mount options are in
> force.

From above it sounds like there may be scenarios where a driver layer would still
be accompanied by core changes (like mount options added to the core that will 
allow/deny certain drivers). If there was no driver layer it could just be handled
in a single spot.

The second part of my original question was "Where would it be decided whether
the overriding driver should be loaded and why can that logic not be in
enumeration within resctrl?" It is the user that needs to determine that there are
conflicting resources?

> Q5) Boundaries are not clear. Too much of resctrl internals made visible to drivers.
> A5) Can work on more abstract interfaces if we move forward with some sort of
> layered approach.
> 
> Q6) Domain awareness of drivers.
> A6) This is a challenge. Especially as the domain for a driver may not match up
> with any existing resource scope (e.g. driver may be socket scoped, which may
> not be the same as "L3 cache" scoped). After I posted this series I added
> an entry in the resource table with socket scope to handle this. Dynamically adding
> a new resource with a custom scope has challenges (because the domain lists
> attached to that resource are maintained by the resctrl cpu hot plug callbacks as
> CPUs come online and go offline.

My comment was not about a need to make drivers "domain aware". My assumption was that
drivers are not domain aware since I did not see any related information shared
with the drivers and since the drivers override the schemata entries I thus assumed
that the schemata entries use some driver specific scope. 
The challenge to add a resource with a custom scope seems like the biggest problem
raised thus far. Is this perhaps what started the venture down this driver interface?

Reinette

RE: [RFC PATCH 0/7] Add driver registration i/f to resctrl

Posted by Luck, Tony 2 years, 9 months ago

>> Intel has always had some model specific "RDT" features that we have
>> not attempted to push to upstream. Rationale is that architectural features
>> have lasting value across CPU generations. Once the code is upstream it
>> "just works" (TM) for all time. Cluttering up core code with features that are
>> only applicable to one or two CPU generations seemed like a recipe for
>> disaster from a long-term maintenance perspective.
>
> Could you please elaborate how this seems like a "recipe for disaster"? I
> can certainly see how removing a driver is easy when it is decided that
> something is "end of life". I rarely see "end of life" in practice
> though and viewing removal of obsolete code from a driver as a "disaster"
> is not clear to me. Re-factoring code occurs frequently.

I'm thinking of the amount of code under arch/x86/kernel/cpu/resctrl. In
v6.4-rc1 it looks like:

$ wc -l $resctrl/*.[ch]
   996 arch/x86/kernel/cpu/resctrl/core.c
   581 arch/x86/kernel/cpu/resctrl/ctrlmondata.c
   560 arch/x86/kernel/cpu/resctrl/internal.h
   845 arch/x86/kernel/cpu/resctrl/monitor.c
  1600 arch/x86/kernel/cpu/resctrl/pseudo_lock.c
    43 arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h
  3733 arch/x86/kernel/cpu/resctrl/rdtgroup.c
  8358 total

Fenghua did a built-in implementation for one of the features that I'd
like to implement as a driver and the bottom line of "git diff --stat" for
his series of patches was:

9 files changed, 1300 insertions(+), 10 deletions(-)

Projecting forward a few CPU generations there may be 2-3 different
versions of that code. Plus all the other model specific features that
we'd like to support. The core resctrl architectural code is going to
disappear in the maze of "do this for CPU models X & Y, but do that
for CPU model Z". 


>> I'm also guessing that with other architectures (AMD, ARM, RISC-V)
>> all building on the resctrl foundation, some of them may also have some
>> features that don't fit neatly into the core of resctrl.
>
> They absolutely have features that don't fit neatly into the core of
> resctrl. I am not able to tell whether this style of interface would
> solve this.

I'm hoping James and others copied on this thread can chime in to
answer that.

> The second motivation from your original email was that these new features
> have "input parameters that do not fit neatly into the existing schemata
> model". It is not obvious to me where you address that, but maybe it is
> related to "Q6".

That's really tied into the specific features that aren't up for public discussion
at this point.

> I agree that we should not export the mutex. You may be interested in the work
> that James is working on separating the locks. 
> https://lore.kernel.org/lkml/20230320172620.18254-1-james.morse@arm.com/
>
> It would be great if you could join these discussions to see if there are
> some common requirements that can be solved together.

Thanks for the link. I'll go off and read that thread.

>> Q2) What about exclusive groups?
>> A2) I didn’t try to handle in this RFC. Additional bits will be needed.
>
> It seems like every time a driver needs "additional bits" it would impact
> all the other drivers.

It depends. If the new hook is just some additional callback function, then
existing drivers would have an implied ".newfunc = NULL," in the registration
structure. So wouldn't need any changes.

The hooks I implemented in my RFC series are the union of the requirements
of each driver. But each driver just sets up the hooks that it needs. E.g. my
silly example driver only used the "add files to the ctrlmon directories" hook.

>> Q3) How to make visible to the driver other resctrl assumptions (e.g. default
>> group is CLOSID=0, RMID=0).
>
> Actually I think resctrl assumptions should be _invisible_ to drivers.

That's a worthy goal. I'm not sure whether it can always be met. E.g. for
the default CLOSID/RMID case the driver will want to initialize to some
useful default because receiving callbacks from the core resctrl code as
the user writes to files to set up a specific configuration.

>> Q4) Suppressing schemata resources from a driver surprised you.
>> A4) This is, as you guessed, about conflicting resources. There are other ways it
>> could be handled. E.g. use existing mount options to suppress the resource from
>> the schemata. To be safe that might also need some way to fail to load of a driver
>> that needs other access to a resource unless the correct mount options are in
>> force.
>
>From above it sounds like there may be scenarios where a driver layer would still
>be accompanied by core changes (like mount options added to the core that will
>allow/deny certain drivers). If there was no driver layer it could just be handled
>in a single spot.

I threw out mount options as an alternative to suppressing features. Having
thought about it, I'm not fond of it at all.

> The second part of my original question was "Where would it be decided whether
> the overriding driver should be loaded and why can that logic not be in
> enumeration within resctrl?" It is the user that needs to determine that there are
> conflicting resources?

The horror of model specific features is appalling, or non-existent, enumeration.
In the dim and distant past of resctrl there was once a point where it did
sting compares of model strings against a list of specific SKUs that supported
early RDT features.

>> Q6) Domain awareness of drivers.
>> A6) This is a challenge. Especially as the domain for a driver may not match up
>> with any existing resource scope (e.g. driver may be socket scoped, which may
>> not be the same as "L3 cache" scoped). After I posted this series I added
>> an entry in the resource table with socket scope to handle this. Dynamically adding
>> a new resource with a custom scope has challenges (because the domain lists
>> attached to that resource are maintained by the resctrl cpu hot plug callbacks as
>> CPUs come online and go offline.
>
>My comment was not about a need to make drivers "domain aware". My assumption was that
>drivers are not domain aware since I did not see any related information shared
>with the drivers and since the drivers override the schemata entries I thus assumed
>that the schemata entries use some driver specific scope.
>The challenge to add a resource with a custom scope seems like the biggest problem
>raised thus far. Is this perhaps what started the venture down this driver interface?

New domain scopes wasn't a driving motivation, just a thing that was found along
the implementation journey. After playing with some ways to have each driver keep
track of scope I found that I'd replicated some of the core domain tracking cpuhp
code and decided that juat making the core keep track of a socket scoped resource
with call backs to the driver(s) for socket add/delete was the cleanest way to go.

That might mean asking the core to track other scopes (like "tile") in future if some
control/measure feature has that scope. Having created a "node" scope in my
patch series for SNC[1], it then is quite trivial to add additional resources
with any scope needed.

-Tony

[1] https://lore.kernel.org/all/20230126184157.27626-1-tony.luck@intel.com/ ... that
series is not forgotten. Just needs a slightly different approach which needs an internal
approval before I can post version 2 of the series.

RE: [RFC PATCH 0/7] Add driver registration i/f to resctrl

Posted by Luck, Tony 2 years, 9 months ago

Reinette,

You asked for examples of resources that don't fit well into the existing schemata model.

Here's the details behind one of those cases.

Existing resources have a single value per domain (e.g. a cache-way bitmask, or a memory
bandwidth percentage or target). One of my new resources has several parameters. At first
glance this might be solved by just listing a comma separated list of these parameters for
each domain. So the schemata entry for feature XYZ that has two domains might look like
this:

XYZ:0=param1,param2,param3;1=param1,param2,param3

But this feature has a second problem. The hardware supports a very limited set of variations.
This could be handled by reporting num_closid for this resource to that low number. But then
resctrl core code would limit all resources to that value. Instead the h/w allows programming
a mapping feature from closid numbers to resource instances (as the saying goes "any computer
science problem can be solved with one extra level of indirection").

So if the driver named these instances: A, B, C, D. Then a schemata file might look like:

XYZ:0=B;1=C

meaning that the driver will set up so CLOSID for this resctrl resource is mapped to instance "B"
on domain 0 and to instance "C" on domain 1.

The driver provides a way to set param1, param2, param3 for each of the A, B, C, D
instances.

-Tony

Re: [RFC PATCH 0/7] Add driver registration i/f to resctrl

Posted by Reinette Chatre 2 years, 9 months ago

Hi Tony,

On 5/11/2023 1:35 PM, Luck, Tony wrote:
> Reinette,
> 
> You asked for examples of resources that don't fit well into the
> existing schemata model.
Primarily about me trying to understand what about the existing schemata
model makes these new features hard to support. My first confusion is
that it is not clear to me what you mean with "schemata
model". I think there are two parts to this: the schemata model as
the user interface and the schemata model as implemented within
resctrl.

It is not clear in your motivation whether you are talking about the
former or the latter and thus dug through the patches to figure this
out. At first it seemed as though the driver aims for a totally
different configuration mechanism with the support for driver specific
writable files in every group ... but then then the driver pivots and
hooks into schemata files so it then seems that the driver mechanism does
aim to maintain the schemata model as user interface (perhaps with
additional configuration files to give more meaning to the values
written to the schemata file).

So, it seems to me that the concern is not with the user interface, but
how the schemata parsing is done in resctrl. I just seem to keep guessing
here so I would appreciate more insight.

> Here's the details behind one of those cases.
> 
> Existing resources have a single value per domain (e.g. a cache-way
> bitmask, or a memory bandwidth percentage or target). One of my new
> resources has several parameters. At first glance this might be
> solved by just listing a comma separated list of these parameters
> for each domain. So the schemata entry for feature XYZ that has two
> domains might look like this:

 
> XYZ:0=param1,param2,param3;1=param1,param2,param3
> 
> But this feature has a second problem. The hardware supports a very
> limited set of variations. This could be handled by reporting
> num_closid for this resource to that low number. But then resctrl
> core code would limit all resources to that value. Instead the h/w
> allows programming a mapping feature from closid numbers to resource
> instances (as the saying goes "any computer science problem can be
> solved with one extra level of indirection").
> 
> So if the driver named these instances: A, B, C, D. Then a schemata
> file might look like:
> 
> XYZ:0=B;1=C
> 
> meaning that the driver will set up so CLOSID for this resctrl
> resource is mapped to instance "B" on domain 0 and to instance "C" on
> domain 1
Apologies but it is still not clear to me how this cannot be handled
in resctrl. For example, consider this hypothetical snippet that uses
a unique callback to parse a resource's schemata entry (similar to the
s->update() callback you introduce, but not have it be optional):

 	struct resctrl_schema *s;
 
 	list_for_each_entry(s, &resctrl_schema_all, list) {
		if (!strcmp(resname, s->name) && rdtgrp->closid < s->num_closid)
			return s->parse_line(tok, s, rdtgrp);
 	}
 	rdt_last_cmd_printf("Unknown or unsupported resource name '%s'\n", resname);

 
> The driver provides a way to set param1, param2, param3 for each of
> the A, B, C, D instances.

Seems like these could be some new RFTYPE_RES_XYZ files?

On the model specific motivation topic: If the goal of the driver
interface is to support model specific changes to resctrl there is a
publicly available use case for consideration:
https://lore.kernel.org/lkml/20230421141723.2405942-1-peternewman@google.com/

It is not obvious to me how that model specific problem could be solved
with this driver interface. Your insight here would be appreciated but it
seems to me that we will have to keep accommodating model specific code
in resctrl.

Reinette

RE: [RFC PATCH 0/7] Add driver registration i/f to resctrl

Posted by Luck, Tony 2 years, 9 months ago

>> You asked for examples of resources that don't fit well into the
>> existing schemata model.
>Primarily about me trying to understand what about the existing schemata
>model makes these new features hard to support. My first confusion is
>that it is not clear to me what you mean with "schemata
>model". I think there are two parts to this: the schemata model as
>the user interface and the schemata model as implemented within
>resctrl.
>
>It is not clear in your motivation whether you are talking about the
>former or the latter and thus dug through the patches to figure this
>out. At first it seemed as though the driver aims for a totally
>different configuration mechanism with the support for driver specific
>writable files in every group ... but then then the driver pivots and
>hooks into schemata files so it then seems that the driver mechanism does
>aim to maintain the schemata model as user interface (perhaps with
>additional configuration files to give more meaning to the values
>written to the schemata file).

Different drivers may choose different subsets of the hooks available.

I have one case where a new file is, IMHO, required. There's no way to
fit what needs to be done into schemata. Another where I thought
the schemata hook makes sense (though that could be handled with
a file ... but it would look very schemata-like). 

>So, it seems to me that the concern is not with the user interface, but
>how the schemata parsing is done in resctrl. I just seem to keep guessing
>here so I would appreciate more insight.
>
>> Here's the details behind one of those cases.
>>
>> Existing resources have a single value per domain (e.g. a cache-way
>> bitmask, or a memory bandwidth percentage or target). One of my new
>> resources has several parameters. At first glance this might be
>> solved by just listing a comma separated list of these parameters
>> for each domain. So the schemata entry for feature XYZ that has two
>> domains might look like this:
>
>
>> XYZ:0=param1,param2,param3;1=param1,param2,param3
>>
>> But this feature has a second problem. The hardware supports a very
>> limited set of variations. This could be handled by reporting
>> num_closid for this resource to that low number. But then resctrl
>> core code would limit all resources to that value. Instead the h/w
>> allows programming a mapping feature from closid numbers to resource
>> instances (as the saying goes "any computer science problem can be
>> solved with one extra level of indirection").
>>
>> So if the driver named these instances: A, B, C, D. Then a schemata
>> file might look like:
>>
>> XYZ:0=B;1=C
>>
>> meaning that the driver will set up so CLOSID for this resctrl
>> resource is mapped to instance "B" on domain 0 and to instance "C" on
>> domain 1
>Apologies but it is still not clear to me how this cannot be handled
>in resctrl. For example, consider this hypothetical snippet that uses
>a unique callback to parse a resource's schemata entry (similar to the
>s->update() callback you introduce, but not have it be optional):
>
>       struct resctrl_schema *s;
>
>       list_for_each_entry(s, &resctrl_schema_all, list) {
>               if (!strcmp(resname, s->name) && rdtgrp->closid < s->num_closid)
>                       return s->parse_line(tok, s, rdtgrp);
>       }
>       rdt_last_cmd_printf("Unknown or unsupported resource name '%s'\n", resname);

That is much cleaner. It requires patching up the existing resctrl_schema to provide
a ->parser_line()  ... also a ->show_line() function.

If this proceeds, I can make the changes to provide this consistent interface. Thanks.

>> The driver provides a way to set param1, param2, param3 for each of
>> the A, B, C, D instances.
>
>Seems like these could be some new RFTYPE_RES_XYZ files?

In my current implementation the driver can make some/all of the files added
under the info/ directory writable. So this XYZ driver uses info/XYZ/param* files
as the interface for the user to set the parameters.

>
>On the model specific motivation topic: If the goal of the driver
>interface is to support model specific changes to resctrl there is a
>publicly available use case for consideration:
>https://lore.kernel.org/lkml/20230421141723.2405942-1-peternewman@google.com/
>
>It is not obvious to me how that model specific problem could be solved
>with this driver interface. Your insight here would be appreciated but it
>seems to me that we will have to keep accommodating model specific code
>in resctrl.

All my drivers are currently related to the control features of resctrl rather than the
monitor features. I don't see a way for a model specific driver to step in and mitigate
that particular problem that Peter has.

Seems quite hacky though. The trick with the "hard" RMID per core is going to miscount
LLC eviction traffic. If a core is switching rapidly between tasks with different RMIDs
that could be a significant difference from actual values.


But requests for some model specific quirks to work around h/w limitations doesn't
mean that we shouldn't create a driver layer for cases where it is possible to
separate s/w into layers.

-Tony

Re: [RFC PATCH 0/7] Add driver registration i/f to resctrl

Posted by Reinette Chatre 2 years, 9 months ago

Hi Tony,

On 5/12/2023 1:35 PM, Luck, Tony wrote:

>>> The driver provides a way to set param1, param2, param3 for each of
>>> the A, B, C, D instances.
>>
>> Seems like these could be some new RFTYPE_RES_XYZ files?
> 
> In my current implementation the driver can make some/all of the files added
> under the info/ directory writable. So this XYZ driver uses info/XYZ/param* files
> as the interface for the user to set the parameters.

It sounds as though you are concerned about making info/* files writable?
I am not sure because whether the files are made writable via the core or
a driver does not matter since it will still be writable from the user
interface perspective.

Please note that there already are a few info files that are writable:
info/L3_MON/max_threshold_occupancy
info/L3_MON/mbm_total_bytes_config
info/L3_MON/mbm_local_bytes_config

It thus seems appropriate to use the info/* files as global resource
configuration and files within the resource groups handling local
configurations.

>> On the model specific motivation topic: If the goal of the driver
>> interface is to support model specific changes to resctrl there is a
>> publicly available use case for consideration:
>> https://lore.kernel.org/lkml/20230421141723.2405942-1-peternewman@google.com/
>>
>> It is not obvious to me how that model specific problem could be solved
>> with this driver interface. Your insight here would be appreciated but it
>> seems to me that we will have to keep accommodating model specific code
>> in resctrl.
> 
> All my drivers are currently related to the control features of resctrl rather than the
> monitor features. I don't see a way for a model specific driver to step in and mitigate
> that particular problem that Peter has.
> 
> Seems quite hacky though. The trick with the "hard" RMID per core is going to miscount
> LLC eviction traffic. If a core is switching rapidly between tasks with different RMIDs
> that could be a significant difference from actual values.

The submission is upfront about this and indeed disables LLC occupancy 
when "soft RMIDs" are enabled.
Please do provide direct feedback to that submission.

> But requests for some model specific quirks to work around h/w limitations doesn't
> mean that we shouldn't create a driver layer for cases where it is possible to
> separate s/w into layers.

Some of my questions were trimmed from replies without answers
so I am still trying to understand the driver interface motivation that
something being model specific is hard to integrate.

Reinette

Re: [RFC PATCH 0/7] Add driver registration i/f to resctrl

Posted by Reinette Chatre 2 years, 9 months ago

Hi Tony,

On 5/9/2023 4:35 PM, Luck, Tony wrote:
>>> Intel has always had some model specific "RDT" features that we have
>>> not attempted to push to upstream. Rationale is that architectural features
>>> have lasting value across CPU generations. Once the code is upstream it
>>> "just works" (TM) for all time. Cluttering up core code with features that are
>>> only applicable to one or two CPU generations seemed like a recipe for
>>> disaster from a long-term maintenance perspective.
>>
>> Could you please elaborate how this seems like a "recipe for disaster"? I
>> can certainly see how removing a driver is easy when it is decided that
>> something is "end of life". I rarely see "end of life" in practice
>> though and viewing removal of obsolete code from a driver as a "disaster"
>> is not clear to me. Re-factoring code occurs frequently.
> 
> I'm thinking of the amount of code under arch/x86/kernel/cpu/resctrl. In
> v6.4-rc1 it looks like:
> 
> $ wc -l $resctrl/*.[ch]
>    996 arch/x86/kernel/cpu/resctrl/core.c
>    581 arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>    560 arch/x86/kernel/cpu/resctrl/internal.h
>    845 arch/x86/kernel/cpu/resctrl/monitor.c
>   1600 arch/x86/kernel/cpu/resctrl/pseudo_lock.c
>     43 arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h
>   3733 arch/x86/kernel/cpu/resctrl/rdtgroup.c
>   8358 total
> 
> Fenghua did a built-in implementation for one of the features that I'd
> like to implement as a driver and the bottom line of "git diff --stat" for
> his series of patches was:
> 
> 9 files changed, 1300 insertions(+), 10 deletions(-)
> 
> Projecting forward a few CPU generations there may be 2-3 different
> versions of that code. Plus all the other model specific features that
> we'd like to support. The core resctrl architectural code is going to
> disappear in the maze of "do this for CPU models X & Y, but do that
> for CPU model Z". 

It is hard to tell from just a diffstat how this implementation impacts
the core. A similar diffstat for the driver implementation may
help. "1300 insertions(+), 10 deletions(-)" does not seem like a lot of 
core refactoring.

>>> Q2) What about exclusive groups?
>>> A2) I didn’t try to handle in this RFC. Additional bits will be needed.
>>
>> It seems like every time a driver needs "additional bits" it would impact
>> all the other drivers.
> 
> It depends. If the new hook is just some additional callback function, then
> existing drivers would have an implied ".newfunc = NULL," in the registration
> structure. So wouldn't need any changes.
> 
> The hooks I implemented in my RFC series are the union of the requirements
> of each driver. But each driver just sets up the hooks that it needs. E.g. my
> silly example driver only used the "add files to the ctrlmon directories" hook.

My point is that the hooks themselves appear to be made safe by just providing
limited information (no pointers back to structures maintained by resctrl) and
thus when a new driver has different requirements it would have broad impact.
Similar to the example driver that you provided, if I understood correctly it
already pointed out that there may be a missing parameter of the group type
(control vs monitor).

>> The second part of my original question was "Where would it be decided whether
>> the overriding driver should be loaded and why can that logic not be in
>> enumeration within resctrl?" It is the user that needs to determine that there are
>> conflicting resources?
> 
> The horror of model specific features is appalling, or non-existent, enumeration.
> In the dim and distant past of resctrl there was once a point where it did
> sting compares of model strings against a list of specific SKUs that supported
> early RDT features.

My question was trying to understand where this logic is moved to (re. "Where would
it be decided whether the overriding driver should be loaded"). The feature will
remain model specific whether it is implemented in the core or a driver, so these
checks will need to be done somewhere, no? 

> 
>>> Q6) Domain awareness of drivers.
>>> A6) This is a challenge. Especially as the domain for a driver may not match up
>>> with any existing resource scope (e.g. driver may be socket scoped, which may
>>> not be the same as "L3 cache" scoped). After I posted this series I added
>>> an entry in the resource table with socket scope to handle this. Dynamically adding
>>> a new resource with a custom scope has challenges (because the domain lists
>>> attached to that resource are maintained by the resctrl cpu hot plug callbacks as
>>> CPUs come online and go offline.
>>
>> My comment was not about a need to make drivers "domain aware". My assumption was that
>> drivers are not domain aware since I did not see any related information shared
>> with the drivers and since the drivers override the schemata entries I thus assumed
>> that the schemata entries use some driver specific scope.
>> The challenge to add a resource with a custom scope seems like the biggest problem
>> raised thus far. Is this perhaps what started the venture down this driver interface?
> 
> New domain scopes wasn't a driving motivation, just a thing that was found along
> the implementation journey. After playing with some ways to have each driver keep
> track of scope I found that I'd replicated some of the core domain tracking cpuhp
> code and decided that juat making the core keep track of a socket scoped resource
> with call backs to the driver(s) for socket add/delete was the cleanest way to go.
> 
> That might mean asking the core to track other scopes (like "tile") in future if some
> control/measure feature has that scope. Having created a "node" scope in my
> patch series for SNC[1], it then is quite trivial to add additional resources
> with any scope needed.

I see. I thought that tracking scope was the hardest problem needing solving
in the driver.

Reinette

RE: [RFC PATCH 0/7] Add driver registration i/f to resctrl

Posted by Luck, Tony 2 years, 9 months ago

> It is hard to tell from just a diffstat how this implementation impacts
> the core. A similar diffstat for the driver implementation may
> help. "1300 insertions(+), 10 deletions(-)" does not seem like a lot of 
> core refactoring.

Here's the resctrl bits of Fenghua's diffstat (with the filename of the new
feature changed to avoid giving too much away):

 arch/x86/kernel/cpu/resctrl/core.c        |  39 +++-
 arch/x86/kernel/cpu/resctrl/newfeature.c  | 780 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/monitor.c     |   4 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 333 ++++++++++++++++++++++++++++-

Changes to core files are ~375 lines ... about 50% more than I added to core code
to provide a registration hook for this and many other drivers.  My driver registration
hooks will likely inflate by that much when the rough edges are smoothed.

-Tony