[PATCH v5 06/29] x86,fs/resctrl: Improve domain type checking

Tony Luck posted 29 patches 6 months, 3 weeks ago
[PATCH v5 06/29] x86,fs/resctrl: Improve domain type checking
Posted by Tony Luck 6 months, 3 weeks ago
The rdt_domain_hdr structure is used in both control and monitor
domain structures to provide common methods for operations such as
adding a CPU to a domain, removing a CPU from a domain, accessing
the mask of all CPUs in a domain.

The "type" field provides a simple check whether a domain is a
control or monitor domain so that programming errors operating
on domains will be quickly caught.

To prepare for additional domain types that depend on the rdt_resource
to which they are connected add the resource id into the header
and check that in addition to the type.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/resctrl.h            |  9 +++++++++
 arch/x86/kernel/cpu/resctrl/core.c | 10 ++++++----
 fs/resctrl/ctrlmondata.c           |  2 +-
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 40f2d0d48d02..d6b09952ef92 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -131,15 +131,24 @@ enum resctrl_domain_type {
  * @list:		all instances of this resource
  * @id:			unique id for this instance
  * @type:		type of this instance
+ * @rid:		index of resource for this domain
  * @cpu_mask:		which CPUs share this resource
  */
 struct rdt_domain_hdr {
 	struct list_head		list;
 	int				id;
 	enum resctrl_domain_type	type;
+	enum resctrl_res_level		rid;
 	struct cpumask			cpu_mask;
 };
 
+static inline bool domain_header_is_valid(struct rdt_domain_hdr *hdr,
+					  enum resctrl_domain_type type,
+					  enum resctrl_res_level rid)
+{
+	return !WARN_ON_ONCE(hdr->type != type || hdr->rid != rid);
+}
+
 /**
  * struct rdt_ctrl_domain - group of CPUs sharing a resctrl control resource
  * @hdr:		common header for different domain types
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 4403a820db12..4983f6f81218 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -456,7 +456,7 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
 
 	hdr = resctrl_find_domain(&r->ctrl_domains, id, &add_pos);
 	if (hdr) {
-		if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
+		if (!domain_header_is_valid(hdr, RESCTRL_CTRL_DOMAIN, r->rid))
 			return;
 		d = container_of(hdr, struct rdt_ctrl_domain, hdr);
 
@@ -473,6 +473,7 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
 	d = &hw_dom->d_resctrl;
 	d->hdr.id = id;
 	d->hdr.type = RESCTRL_CTRL_DOMAIN;
+	d->hdr.rid = r->rid;
 	cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
 
 	rdt_domain_reconfigure_cdp(r);
@@ -511,7 +512,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 
 	hdr = resctrl_find_domain(&r->mon_domains, id, &add_pos);
 	if (hdr) {
-		if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
+		if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
 			return;
 		d = container_of(hdr, struct rdt_mon_domain, hdr);
 
@@ -526,6 +527,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 	d = &hw_dom->d_resctrl;
 	d->hdr.id = id;
 	d->hdr.type = RESCTRL_MON_DOMAIN;
+	d->hdr.rid = r->rid;
 	d->ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
 	if (!d->ci) {
 		pr_warn_once("Can't find L3 cache for CPU:%d resource %s\n", cpu, r->name);
@@ -581,7 +583,7 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
 		return;
 	}
 
-	if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
+	if (!domain_header_is_valid(hdr, RESCTRL_CTRL_DOMAIN, r->rid))
 		return;
 
 	d = container_of(hdr, struct rdt_ctrl_domain, hdr);
@@ -627,7 +629,7 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
 		return;
 	}
 
-	if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
+	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
 		return;
 
 	d = container_of(hdr, struct rdt_mon_domain, hdr);
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 6be423c5e2e0..7d16f7eb6985 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -638,7 +638,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 		 * the resource to find the domain with "domid".
 		 */
 		hdr = resctrl_find_domain(&r->mon_domains, domid, NULL);
-		if (!hdr || WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN)) {
+		if (!hdr || !domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid)) {
 			ret = -ENOENT;
 			goto out;
 		}
-- 
2.49.0
Re: [PATCH v5 06/29] x86,fs/resctrl: Improve domain type checking
Posted by Reinette Chatre 6 months, 1 week ago
Hi Tony,

On 5/21/25 3:50 PM, Tony Luck wrote:
> The rdt_domain_hdr structure is used in both control and monitor
> domain structures to provide common methods for operations such as
> adding a CPU to a domain, removing a CPU from a domain, accessing
> the mask of all CPUs in a domain.
> 
> The "type" field provides a simple check whether a domain is a
> control or monitor domain so that programming errors operating
> on domains will be quickly caught.
> 
> To prepare for additional domain types that depend on the rdt_resource
> to which they are connected add the resource id into the header
> and check that in addition to the type.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  include/linux/resctrl.h            |  9 +++++++++
>  arch/x86/kernel/cpu/resctrl/core.c | 10 ++++++----
>  fs/resctrl/ctrlmondata.c           |  2 +-
>  3 files changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 40f2d0d48d02..d6b09952ef92 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -131,15 +131,24 @@ enum resctrl_domain_type {
>   * @list:		all instances of this resource
>   * @id:			unique id for this instance
>   * @type:		type of this instance
> + * @rid:		index of resource for this domain
>   * @cpu_mask:		which CPUs share this resource
>   */
>  struct rdt_domain_hdr {
>  	struct list_head		list;
>  	int				id;
>  	enum resctrl_domain_type	type;
> +	enum resctrl_res_level		rid;
>  	struct cpumask			cpu_mask;
>  };
>  
> +static inline bool domain_header_is_valid(struct rdt_domain_hdr *hdr,
> +					  enum resctrl_domain_type type,
> +					  enum resctrl_res_level rid)
> +{
> +	return !WARN_ON_ONCE(hdr->type != type || hdr->rid != rid);
> +}
> +
>  /**
>   * struct rdt_ctrl_domain - group of CPUs sharing a resctrl control resource
>   * @hdr:		common header for different domain types
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 4403a820db12..4983f6f81218 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -456,7 +456,7 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
>  
>  	hdr = resctrl_find_domain(&r->ctrl_domains, id, &add_pos);
>  	if (hdr) {
> -		if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
> +		if (!domain_header_is_valid(hdr, RESCTRL_CTRL_DOMAIN, r->rid))
>  			return;
>  		d = container_of(hdr, struct rdt_ctrl_domain, hdr);
>  

This is quite subtle and not obvious until a few patches later that the
domain_header_is_valid() is done in preparation for using the
rdt_domain_hdr::rid to verify that the correct containing structure is
obtained in a subsequent container_of() call.

Patch #10 mentions it explicitly: "Add sanity checks where
container_of() is used to find the surrounding domain structure that
hdr has the expected type."
           
The change above, when combined with later changes, results in
code like:

	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
		/* handle failure */

	d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
	...

Considering this all I do not think using a variable r->rid is appropriate
here. Specifically, if the code has it hardcoded that, for example,
the containing structure is "struct rdt_l3_mon_domain" then should the
test not similarly be hardcoded to ensure that rid is RDT_RESOURCE_L3?

Reinette
Re: [PATCH v5 06/29] x86,fs/resctrl: Improve domain type checking
Posted by Luck, Tony 6 months, 1 week ago
On Tue, Jun 03, 2025 at 08:31:07PM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 5/21/25 3:50 PM, Tony Luck wrote:
> > The rdt_domain_hdr structure is used in both control and monitor
> > domain structures to provide common methods for operations such as
> > adding a CPU to a domain, removing a CPU from a domain, accessing
> > the mask of all CPUs in a domain.
> > 
> > The "type" field provides a simple check whether a domain is a
> > control or monitor domain so that programming errors operating
> > on domains will be quickly caught.
> > 
> > To prepare for additional domain types that depend on the rdt_resource
> > to which they are connected add the resource id into the header
> > and check that in addition to the type.
> > 
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > ---
> >  include/linux/resctrl.h            |  9 +++++++++
> >  arch/x86/kernel/cpu/resctrl/core.c | 10 ++++++----
> >  fs/resctrl/ctrlmondata.c           |  2 +-
> >  3 files changed, 16 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> > index 40f2d0d48d02..d6b09952ef92 100644
> > --- a/include/linux/resctrl.h
> > +++ b/include/linux/resctrl.h
> > @@ -131,15 +131,24 @@ enum resctrl_domain_type {
> >   * @list:		all instances of this resource
> >   * @id:			unique id for this instance
> >   * @type:		type of this instance
> > + * @rid:		index of resource for this domain
> >   * @cpu_mask:		which CPUs share this resource
> >   */
> >  struct rdt_domain_hdr {
> >  	struct list_head		list;
> >  	int				id;
> >  	enum resctrl_domain_type	type;
> > +	enum resctrl_res_level		rid;
> >  	struct cpumask			cpu_mask;
> >  };
> >  
> > +static inline bool domain_header_is_valid(struct rdt_domain_hdr *hdr,
> > +					  enum resctrl_domain_type type,
> > +					  enum resctrl_res_level rid)
> > +{
> > +	return !WARN_ON_ONCE(hdr->type != type || hdr->rid != rid);
> > +}
> > +
> >  /**
> >   * struct rdt_ctrl_domain - group of CPUs sharing a resctrl control resource
> >   * @hdr:		common header for different domain types
> > diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> > index 4403a820db12..4983f6f81218 100644
> > --- a/arch/x86/kernel/cpu/resctrl/core.c
> > +++ b/arch/x86/kernel/cpu/resctrl/core.c
> > @@ -456,7 +456,7 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
> >  
> >  	hdr = resctrl_find_domain(&r->ctrl_domains, id, &add_pos);
> >  	if (hdr) {
> > -		if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
> > +		if (!domain_header_is_valid(hdr, RESCTRL_CTRL_DOMAIN, r->rid))
> >  			return;

This type check was added as part of the split of the rdt_domain
structure into sepaarte ctrl and mon structures. I think the concern
was that some code might look at the wrong rdt_resource list and
try to operate on a ctrl domain structure that is actually a mon
structure (or vice versa). This felt like a real possibility.

Extending this to save and check the resource id seemed like a
natural extension at the time. But I'm starting to doubt the value
of doing so.

For this new check to ever fail we would have to somehow add
a domain for some resource type to a list on a different
rdt_resource structure. I'm struggling to see how such an
error could ever occur. Domains are only added to an rdt_resource
list by one of domain_add_cpu_ctrl() or domain_add_cpu_mon().
But these same functions are the ones to allocate the domain
structure and initialize the "d->hdr.id" field a dozen or so
lines earlier in the function.

Note that I'm not disputing your comments where my patches
are still passing a rdt_l3_mon_domain structure down through
several levels of function calls only to do:

	if (r->rid == RDT_RESOURCE_PERF_PKG)
		return intel_aet_read_event(d->hdr.id, rmid, eventid, val);

revealing that it wasn't an rdt_l3_mon_domain at all!

But these domain_header_is_valid() checks didn't help uncover
that.

Bottom line: I'd like to just keep the "type" check and not
extend to check the resource id.

> >  		d = container_of(hdr, struct rdt_ctrl_domain, hdr);
> >  
> 
> This is quite subtle and not obvious until a few patches later that the
> domain_header_is_valid() is done in preparation for using the
> rdt_domain_hdr::rid to verify that the correct containing structure is
> obtained in a subsequent container_of() call.
> 
> Patch #10 mentions it explicitly: "Add sanity checks where
> container_of() is used to find the surrounding domain structure that
> hdr has the expected type."
>            
> The change above, when combined with later changes, results in
> code like:
> 
> 	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
> 		/* handle failure */
> 
> 	d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
> 	...
> 
> Considering this all I do not think using a variable r->rid is appropriate
> here. Specifically, if the code has it hardcoded that, for example,
> the containing structure is "struct rdt_l3_mon_domain" then should the
> test not similarly be hardcoded to ensure that rid is RDT_RESOURCE_L3?
> 
> Reinette

-Tony
Re: [PATCH v5 06/29] x86,fs/resctrl: Improve domain type checking
Posted by Reinette Chatre 6 months, 1 week ago
Hi Tony,

On 6/4/25 3:58 PM, Luck, Tony wrote:
> On Tue, Jun 03, 2025 at 08:31:07PM -0700, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 5/21/25 3:50 PM, Tony Luck wrote:
>>> The rdt_domain_hdr structure is used in both control and monitor
>>> domain structures to provide common methods for operations such as
>>> adding a CPU to a domain, removing a CPU from a domain, accessing
>>> the mask of all CPUs in a domain.
>>>
>>> The "type" field provides a simple check whether a domain is a
>>> control or monitor domain so that programming errors operating
>>> on domains will be quickly caught.
>>>
>>> To prepare for additional domain types that depend on the rdt_resource
>>> to which they are connected add the resource id into the header
>>> and check that in addition to the type.
>>>
>>> Signed-off-by: Tony Luck <tony.luck@intel.com>
>>> ---
>>>  include/linux/resctrl.h            |  9 +++++++++
>>>  arch/x86/kernel/cpu/resctrl/core.c | 10 ++++++----
>>>  fs/resctrl/ctrlmondata.c           |  2 +-
>>>  3 files changed, 16 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>>> index 40f2d0d48d02..d6b09952ef92 100644
>>> --- a/include/linux/resctrl.h
>>> +++ b/include/linux/resctrl.h
>>> @@ -131,15 +131,24 @@ enum resctrl_domain_type {
>>>   * @list:		all instances of this resource
>>>   * @id:			unique id for this instance
>>>   * @type:		type of this instance
>>> + * @rid:		index of resource for this domain
>>>   * @cpu_mask:		which CPUs share this resource
>>>   */
>>>  struct rdt_domain_hdr {
>>>  	struct list_head		list;
>>>  	int				id;
>>>  	enum resctrl_domain_type	type;
>>> +	enum resctrl_res_level		rid;
>>>  	struct cpumask			cpu_mask;
>>>  };
>>>  
>>> +static inline bool domain_header_is_valid(struct rdt_domain_hdr *hdr,
>>> +					  enum resctrl_domain_type type,
>>> +					  enum resctrl_res_level rid)
>>> +{
>>> +	return !WARN_ON_ONCE(hdr->type != type || hdr->rid != rid);
>>> +}
>>> +
>>>  /**
>>>   * struct rdt_ctrl_domain - group of CPUs sharing a resctrl control resource
>>>   * @hdr:		common header for different domain types
>>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>>> index 4403a820db12..4983f6f81218 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>>> @@ -456,7 +456,7 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
>>>  
>>>  	hdr = resctrl_find_domain(&r->ctrl_domains, id, &add_pos);
>>>  	if (hdr) {
>>> -		if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
>>> +		if (!domain_header_is_valid(hdr, RESCTRL_CTRL_DOMAIN, r->rid))
>>>  			return;
> 
> This type check was added as part of the split of the rdt_domain
> structure into sepaarte ctrl and mon structures. I think the concern
> was that some code might look at the wrong rdt_resource list and
> try to operate on a ctrl domain structure that is actually a mon
> structure (or vice versa). This felt like a real possibility.
> 
> Extending this to save and check the resource id seemed like a
> natural extension at the time. But I'm starting to doubt the value
> of doing so.
> 
> For this new check to ever fail we would have to somehow add
> a domain for some resource type to a list on a different
> rdt_resource structure. I'm struggling to see how such an

I disagree with this statement. I do not see the failure as related
to the list to which the domain belongs but instead related to how
functions interpret a domain passed to it. There are a couple of functions
that are provided a domain pointer and the function is hardcoded to expect
the domain pointed to to be of a specific type.

For example, rdtgroup_mondata_show() is hardcoded to work with an
L3 resource domain. If my suggestion here is followed then
rdtgroup_mondata_show() would contain the specific check below because
it interprets the domain as belonging to a L3 resource:
	domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)

With the check used as above the current issue would be exposed.

> error could ever occur. Domains are only added to an rdt_resource
> list by one of domain_add_cpu_ctrl() or domain_add_cpu_mon().
> But these same functions are the ones to allocate the domain
> structure and initialize the "d->hdr.id" field a dozen or so
> lines earlier in the function.
> 
> Note that I'm not disputing your comments where my patches
> are still passing a rdt_l3_mon_domain structure down through
> several levels of function calls only to do:
> 
> 	if (r->rid == RDT_RESOURCE_PERF_PKG)
> 		return intel_aet_read_event(d->hdr.id, rmid, eventid, val);
> 
> revealing that it wasn't an rdt_l3_mon_domain at all!
> 
> But these domain_header_is_valid() checks didn't help uncover
> that.

This is not because of the check itself but how it is used in this version
... it essentially gave the check the wrong value to check against.


> Bottom line: I'd like to just keep the "type" check and not
> extend to check the resource id.

Pointers to domains of different types are passed around (irrespective
of the list they belong to) but required to be of particular type
when acted on. The way I see it this check is required if this
design continues. If used correctly in this implementation it will help
to expose those places where L3 domain specific functions are used as
"generic" to operate on PERF_PKG domains. 

Reinette