[PATCH v18 06/17] x86/resctrl: Introduce snc_nodes_per_l3_cache

Tony Luck posted 17 patches 1 year, 7 months ago
There is a newer version of this series
[PATCH v18 06/17] x86/resctrl: Introduce snc_nodes_per_l3_cache
Posted by Tony Luck 1 year, 7 months ago
Intel Sub-NUMA Cluster (SNC) is a feature that subdivides the CPU cores
and memory controllers on a socket into two or more groups. These are
presented to the operating system as NUMA nodes.

This may enable some workloads to have slightly lower latency to memory
as the memory controller(s) in an SNC node are electrically closer to the
CPU cores on that SNC node. This cost may be offset by lower bandwidth
since the memory accesses for each core can only be interleaved between
the memory controllers on the same SNC node.

Resctrl monitoring on an Intel system depends upon attaching RMIDs to tasks
to track L3 cache occupancy and memory bandwidth. There is an MSR that
controls how the RMIDs are shared between SNC nodes.

The default mode divides them numerically. E.g. when there are two SNC
nodes on a socket the lower number half of the RMIDs are given to the
first node, the remainder to the second node. This would be difficult
to use with the Linux resctrl interface as specific RMID values assigned
to resctrl groups are not visible to users.

The other mode divides the RMIDs and renumbers the ones on the second
SNC node to start from zero.

Even with this renumbering SNC mode requires several changes in resctrl
behavior for correct operation.

Add a global integer "snc_nodes_per_l3_cache" that shows how many
SNC nodes share each L3 cache. When "snc_nodes_per_l3_cache" is "1",
SNC mode is either not implemented, or not enabled.

Update all places to take appropriate action when SNC mode is enabled:
1) The number of logical RMIDs per L3 cache available for use is the
   number of physical RMIDs divided by the number of SNC nodes.
2) Likewise the "mon_scale" value must be divided by the number of SNC
   nodes.
3) Disable the "-o mba_MBps" mount option in SNC mode
   because the monitoring is being done per SNC node, while the
   bandwidth allocation is still done at the L3 cache scope.
   Trying to use this feedback loop might result in contradictory
   changes to the throttling level coming from each of the SNC
   node bandwidth measurements.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
 arch/x86/kernel/cpu/resctrl/core.c     | 6 ++++++
 arch/x86/kernel/cpu/resctrl/monitor.c  | 4 ++--
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 3 ++-
 4 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 135190e0711c..49440f194253 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -484,6 +484,8 @@ extern struct rdt_hw_resource rdt_resources_all[];
 extern struct rdtgroup rdtgroup_default;
 extern struct dentry *debugfs_resctrl;
 
+extern unsigned int snc_nodes_per_l3_cache;
+
 enum resctrl_res_level {
 	RDT_RESOURCE_L3,
 	RDT_RESOURCE_L2,
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 395bac851f6e..bfa9d3a429fd 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -331,6 +331,12 @@ static u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
 	return r->default_ctrl;
 }
 
+/*
+ * Number of SNC nodes that share each L3 cache.  Default is 1 for
+ * systems that do not support SNC, or have SNC disabled.
+ */
+unsigned int snc_nodes_per_l3_cache = 1;
+
 static void mba_wrmsr_intel(struct msr_param *m)
 {
 	struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(m->dom);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 89d7e6fcbaa1..0f66825a1ac9 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1022,8 +1022,8 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 	int ret;
 
 	resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
-	hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale;
-	r->num_rmid = boot_cpu_data.x86_cache_max_rmid + 1;
+	hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale / snc_nodes_per_l3_cache;
+	r->num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
 	hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
 
 	if (mbm_offset > 0 && mbm_offset <= MBM_CNTR_WIDTH_OFFSET_MAX)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index cc31ede1a1e7..0923492a8bd0 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2346,7 +2346,8 @@ static bool supports_mba_mbps(void)
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
 
 	return (is_mbm_local_enabled() &&
-		r->alloc_capable && is_mba_linear());
+		r->alloc_capable && is_mba_linear() &&
+		snc_nodes_per_l3_cache == 1);
 }
 
 /*
-- 
2.44.0
Re: [PATCH v18 06/17] x86/resctrl: Introduce snc_nodes_per_l3_cache
Posted by Reinette Chatre 1 year, 7 months ago
Hi Tony,

On 5/15/2024 3:23 PM, Tony Luck wrote:
> Intel Sub-NUMA Cluster (SNC) is a feature that subdivides the CPU cores
> and memory controllers on a socket into two or more groups. These are
> presented to the operating system as NUMA nodes.
> 
> This may enable some workloads to have slightly lower latency to memory
> as the memory controller(s) in an SNC node are electrically closer to the
> CPU cores on that SNC node. This cost may be offset by lower bandwidth
> since the memory accesses for each core can only be interleaved between
> the memory controllers on the same SNC node.
> 
> Resctrl monitoring on an Intel system depends upon attaching RMIDs to tasks
> to track L3 cache occupancy and memory bandwidth. There is an MSR that
> controls how the RMIDs are shared between SNC nodes.
> 
> The default mode divides them numerically. E.g. when there are two SNC
> nodes on a socket the lower number half of the RMIDs are given to the
> first node, the remainder to the second node. This would be difficult
> to use with the Linux resctrl interface as specific RMID values assigned
> to resctrl groups are not visible to users.
> 
> The other mode divides the RMIDs and renumbers the ones on the second
> SNC node to start from zero.
> 
> Even with this renumbering SNC mode requires several changes in resctrl
> behavior for correct operation.
> 
> Add a global integer "snc_nodes_per_l3_cache" that shows how many
> SNC nodes share each L3 cache. When "snc_nodes_per_l3_cache" is "1",
> SNC mode is either not implemented, or not enabled.
> 
> Update all places to take appropriate action when SNC mode is enabled:
> 1) The number of logical RMIDs per L3 cache available for use is the
>    number of physical RMIDs divided by the number of SNC nodes.

Should it then perhaps be "number of logical RMIDs per SNC node"?
The way this feature is introduced makes it hard to understand how
RMIDs are used when SNC is enabled since the implementation appears
to distinguish between (a) RMIDs assigned to monitor group and written
to PQR register and (b) RMIDs for which the event counters are read.
The former ((a)) is used directly after the adjustment described in (1) but
the latter needs an adjustment that I did not notice being mentioned.

I think it will help to move the get_node_rmid() helper that is
introduced later here and explain why it is needed.

> 2) Likewise the "mon_scale" value must be divided by the number of SNC
>    nodes.
> 3) Disable the "-o mba_MBps" mount option in SNC mode
>    because the monitoring is being done per SNC node, while the
>    bandwidth allocation is still done at the L3 cache scope.
>    Trying to use this feedback loop might result in contradictory
>    changes to the throttling level coming from each of the SNC
>    node bandwidth measurements.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
>  arch/x86/kernel/cpu/resctrl/core.c     | 6 ++++++
>  arch/x86/kernel/cpu/resctrl/monitor.c  | 4 ++--
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 3 ++-
>  4 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 135190e0711c..49440f194253 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -484,6 +484,8 @@ extern struct rdt_hw_resource rdt_resources_all[];
>  extern struct rdtgroup rdtgroup_default;
>  extern struct dentry *debugfs_resctrl;
>  
> +extern unsigned int snc_nodes_per_l3_cache;
> +
>  enum resctrl_res_level {
>  	RDT_RESOURCE_L3,
>  	RDT_RESOURCE_L2,
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 395bac851f6e..bfa9d3a429fd 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -331,6 +331,12 @@ static u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
>  	return r->default_ctrl;
>  }
>  
> +/*
> + * Number of SNC nodes that share each L3 cache.  Default is 1 for
> + * systems that do not support SNC, or have SNC disabled.
> + */
> +unsigned int snc_nodes_per_l3_cache = 1;
> +
>  static void mba_wrmsr_intel(struct msr_param *m)
>  {
>  	struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(m->dom);
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 89d7e6fcbaa1..0f66825a1ac9 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -1022,8 +1022,8 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>  	int ret;
>  
>  	resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
> -	hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale;
> -	r->num_rmid = boot_cpu_data.x86_cache_max_rmid + 1;
> +	hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale / snc_nodes_per_l3_cache;
> +	r->num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
>  	hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
>  
>  	if (mbm_offset > 0 && mbm_offset <= MBM_CNTR_WIDTH_OFFSET_MAX)
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index cc31ede1a1e7..0923492a8bd0 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2346,7 +2346,8 @@ static bool supports_mba_mbps(void)
>  	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
>  
>  	return (is_mbm_local_enabled() &&
> -		r->alloc_capable && is_mba_linear());
> +		r->alloc_capable && is_mba_linear() &&
> +		snc_nodes_per_l3_cache == 1);
>  }
>  
>  /*

Since the software controller is a filesystem feature the above
now requires that snc_nodes_per_l3_cache becomes part of the resctrl
filesystem code and every architecture will need to set snc_nodes_per_l3_cache.
Every architecture will thus need to interpret what "SNC" means for them
using the term introduced here. That may be ok ... but the term "SNC"
will then surely not identify an Intel feature and Intel needs to be ok
that any architecture calls their "similar to SNC but not quite identical"
"SNC".

I assume now that as part of the fs/arch split there needs to be
a new helper that allows different architectures to set this
filesystem variable?

Reinette
RE: [PATCH v18 06/17] x86/resctrl: Introduce snc_nodes_per_l3_cache
Posted by Luck, Tony 1 year, 6 months ago
> >     return (is_mbm_local_enabled() &&
> > -           r->alloc_capable && is_mba_linear());
> > +           r->alloc_capable && is_mba_linear() &&
> > +           snc_nodes_per_l3_cache == 1);
> >  }
> >
> >  /*
>
> Since the software controller is a filesystem feature the above
> now requires that snc_nodes_per_l3_cache becomes part of the resctrl
> filesystem code and every architecture will need to set snc_nodes_per_l3_cache.
> Every architecture will thus need to interpret what "SNC" means for them
> using the term introduced here. That may be ok ... but the term "SNC"
> will then surely not identify an Intel feature and Intel needs to be ok
> that any architecture calls their "similar to SNC but not quite identical"
> "SNC".
>
> I assume now that as part of the fs/arch split there needs to be
> a new helper that allows different architectures to set this
> filesystem variable?

I can change this check to better reflect the underlying reason to
disable the software controller. Which is that the MBM monitor scope
does not match the MBA control scope. This seems like an architecture
neutral expression.

So code would look like this:

	struct rdt_resource *rmbm = &rdt_resources_all[RDT_RESOURCE_L3].r_rescrl;
	struct rdt_resource *rmba = &rdt_resources_all[RDT_RESOURCE_MBA].r_rescrl;

	...

	 return (is_mbm_local_enabled() &&
                r->alloc_capable && is_mba_linear() &&
                rmbm->mon_scope == rmba->ctrl_scope);

I'm also contemplating dropping snc_nodes_per_l3_cache from being a
global variable and making it a field in "struct rdt_resource" (only needed
for the RDT_RESOURCE_L3 resource). N.B. Babu had suggested it
shouldn't be global many patch versions ago.

Perhaps name it .domains_per_l3_cache or .subdomains_per_l3_cache?

Bad idea? Good idea (but you have a better name for the field)?

-Tony


Re: [PATCH v18 06/17] x86/resctrl: Introduce snc_nodes_per_l3_cache
Posted by Reinette Chatre 1 year, 6 months ago
Hi Tony,

On 5/23/24 12:04 PM, Luck, Tony wrote:
>>>      return (is_mbm_local_enabled() &&
>>> -           r->alloc_capable && is_mba_linear());
>>> +           r->alloc_capable && is_mba_linear() &&
>>> +           snc_nodes_per_l3_cache == 1);
>>>   }
>>>
>>>   /*
>>
>> Since the software controller is a filesystem feature the above
>> now requires that snc_nodes_per_l3_cache becomes part of the resctrl
>> filesystem code and every architecture will need to set snc_nodes_per_l3_cache.
>> Every architecture will thus need to interpret what "SNC" means for them
>> using the term introduced here. That may be ok ... but the term "SNC"
>> will then surely not identify an Intel feature and Intel needs to be ok
>> that any architecture calls their "similar to SNC but not quite identical"
>> "SNC".
>>
>> I assume now that as part of the fs/arch split there needs to be
>> a new helper that allows different architectures to set this
>> filesystem variable?
> 
> I can change this check to better reflect the underlying reason to
> disable the software controller. Which is that the MBM monitor scope
> does not match the MBA control scope. This seems like an architecture
> neutral expression.
> 
> So code would look like this:
> 
> 	struct rdt_resource *rmbm = &rdt_resources_all[RDT_RESOURCE_L3].r_rescrl;
> 	struct rdt_resource *rmba = &rdt_resources_all[RDT_RESOURCE_MBA].r_rescrl;
> 
> 	...
> 
> 	 return (is_mbm_local_enabled() &&
>                  r->alloc_capable && is_mba_linear() &&
>                  rmbm->mon_scope == rmba->ctrl_scope);
> 
> I'm also contemplating dropping snc_nodes_per_l3_cache from being a
> global variable and making it a field in "struct rdt_resource" (only needed
> for the RDT_RESOURCE_L3 resource). N.B. Babu had suggested it
> shouldn't be global many patch versions ago.
> 
> Perhaps name it .domains_per_l3_cache or .subdomains_per_l3_cache?
> 
> Bad idea? Good idea (but you have a better name for the field)?

With the check in supports_mba_mbps() changed I do not see
snc_nodes_per_l3_cache needed by anything but arch specific code.
I thus do not see any reason for the resctrl filesystem (or, for
example, Arm) to know that this value even exists.

While struct rdt_hw_resource is a place to put architecture
specific information it does not seem appropriate to force every
resource to carry what is essentially a system wide (not specific to
resctrl) L3 specific property. To me this really seems like an
architecture specific global setting but I'd also like to hear the
motivations why it should not be.

Reinette
RE: [PATCH v18 06/17] x86/resctrl: Introduce snc_nodes_per_l3_cache
Posted by Luck, Tony 1 year, 6 months ago
> > I'm also contemplating dropping snc_nodes_per_l3_cache from being a
> > global variable and making it a field in "struct rdt_resource" (only needed
> > for the RDT_RESOURCE_L3 resource). N.B. Babu had suggested it
> > shouldn't be global many patch versions ago.
> >
> > Perhaps name it .domains_per_l3_cache or .subdomains_per_l3_cache?
> >
> > Bad idea? Good idea (but you have a better name for the field)?
>
> With the check in supports_mba_mbps() changed I do not see
> snc_nodes_per_l3_cache needed by anything but arch specific code.
> I thus do not see any reason for the resctrl filesystem (or, for
> example, Arm) to know that this value even exists.
>
> While struct rdt_hw_resource is a place to put architecture
> specific information it does not seem appropriate to force every
> resource to carry what is essentially a system wide (not specific to
> resctrl) L3 specific property. To me this really seems like an
> architecture specific global setting but I'd also like to hear the
> motivations why it should not be.

So (in arch/x86/kernel/cpu/resctrl/monitor.c)

static int snc_nodes_per_l3_cache = 1;

Set and use only in this (arch specific) file.

-Tony
Re: [PATCH v18 06/17] x86/resctrl: Introduce snc_nodes_per_l3_cache
Posted by Reinette Chatre 1 year, 6 months ago
Hi Tony,

On 5/23/24 2:25 PM, Luck, Tony wrote:
>>> I'm also contemplating dropping snc_nodes_per_l3_cache from being a
>>> global variable and making it a field in "struct rdt_resource" (only needed
>>> for the RDT_RESOURCE_L3 resource). N.B. Babu had suggested it
>>> shouldn't be global many patch versions ago.
>>>
>>> Perhaps name it .domains_per_l3_cache or .subdomains_per_l3_cache?
>>>
>>> Bad idea? Good idea (but you have a better name for the field)?
>>
>> With the check in supports_mba_mbps() changed I do not see
>> snc_nodes_per_l3_cache needed by anything but arch specific code.
>> I thus do not see any reason for the resctrl filesystem (or, for
>> example, Arm) to know that this value even exists.
>>
>> While struct rdt_hw_resource is a place to put architecture
>> specific information it does not seem appropriate to force every
>> resource to carry what is essentially a system wide (not specific to
>> resctrl) L3 specific property. To me this really seems like an
>> architecture specific global setting but I'd also like to hear the
>> motivations why it should not be.
> 
> So (in arch/x86/kernel/cpu/resctrl/monitor.c)
> 
> static int snc_nodes_per_l3_cache = 1;
> 
> Set and use only in this (arch specific) file.

Since this series initializes this value in
arch/x86/kernel/cpu/resctrl/core.c it is not clear to
me from just this one line how you envision the changes.

Just to be clear ... when I refer to arch specific and
filesystem code I am considering how this series integrates with [1]
since that is the direction resctrl is headed in.
Being "arch specific" thus does not require that it be moved into
monitor.c - it could be added to arch/x86/kernel/cpu/resctrl/internal.h
where it can remain after the fs/arch split.

It will be very helpful if you view your series while taking
[1] into account. For example, when looking at [1] you will see that
mon_event_count() and __mon_event_count() are resctrl filesystem
functions. When you consider that it should be clear that adding
an arch specific get_node_rmid() between these functions will make
the arch/fs split more difficult.

Reinette

[1] https://lore.kernel.org/lkml/20240321165106.31602-1-james.morse@arm.com/
RE: [PATCH v18 06/17] x86/resctrl: Introduce snc_nodes_per_l3_cache
Posted by Luck, Tony 1 year, 6 months ago
> > So (in arch/x86/kernel/cpu/resctrl/monitor.c)
> >
> > static int snc_nodes_per_l3_cache = 1;
> > 
> > Set and use only in this (arch specific) file.
>
> Since this series initializes this value in
> arch/x86/kernel/cpu/resctrl/core.c it is not clear to
> me from just this one line how you envision the changes.

v18 did the initialization in core.c. But since SNC is all about monitor
features it looks more logical to do this here:

resctrl_late_init()
    get_rdt_resources()
	get_rdt_mon_resources()
	    rdt_get_mon_l3_config()  <<<< Do SNC enumeration here
	    

> Just to be clear ... when I refer to arch specific and
> filesystem code I am considering how this series integrates with [1]
> since that is the direction resctrl is headed in.
> Being "arch specific" thus does not require that it be moved into
> monitor.c - it could be added to arch/x86/kernel/cpu/resctrl/internal.h
> where it can remain after the fs/arch split.

The logical place to convert from logical RMID to physical RMID looks
to be in __rmid_read(). Just need to pass in the domain pointer (from
both resctrl_arch_reset_rmid() and resctrl_arch_rmid_read().
>
> It will be very helpful if you view your series while taking
> [1] into account. For example, when looking at [1] you will see that
> mon_event_count() and __mon_event_count() are resctrl filesystem
> functions. When you consider that it should be clear that adding
> an arch specific get_node_rmid() between these functions will make
> the arch/fs split more difficult.

I'll try to keep that in mind as I rework my series. In v18 my "sum" code
went into __mon_event_count().  I'll see if I can push that down another
level.

-Tony
Re: [PATCH v18 06/17] x86/resctrl: Introduce snc_nodes_per_l3_cache
Posted by Reinette Chatre 1 year, 6 months ago
Hi Tony,

On 5/23/24 4:18 PM, Luck, Tony wrote:
>>> So (in arch/x86/kernel/cpu/resctrl/monitor.c)
>>>
>>> static int snc_nodes_per_l3_cache = 1;
>>>
>>> Set and use only in this (arch specific) file.
>>
>> Since this series initializes this value in
>> arch/x86/kernel/cpu/resctrl/core.c it is not clear to
>> me from just this one line how you envision the changes.
> 
> v18 did the initialization in core.c. But since SNC is all about monitor
> features it looks more logical to do this here:
> 
> resctrl_late_init()
>      get_rdt_resources()
> 	get_rdt_mon_resources()
> 	    rdt_get_mon_l3_config()  <<<< Do SNC enumeration here
> 	

ok.

> 
>> Just to be clear ... when I refer to arch specific and
>> filesystem code I am considering how this series integrates with [1]
>> since that is the direction resctrl is headed in.
>> Being "arch specific" thus does not require that it be moved into
>> monitor.c - it could be added to arch/x86/kernel/cpu/resctrl/internal.h
>> where it can remain after the fs/arch split.
> 
> The logical place to convert from logical RMID to physical RMID looks
> to be in __rmid_read(). Just need to pass in the domain pointer (from
> both resctrl_arch_reset_rmid() and resctrl_arch_rmid_read().

This is not obvious to me. Do you need domain pointer to figure out
which node the domain belongs to? It looks to me as though these
calls are already running on a CPU belonging to the domain so perhaps
smp_processor_id() is sufficient?

>> It will be very helpful if you view your series while taking
>> [1] into account. For example, when looking at [1] you will see that
>> mon_event_count() and __mon_event_count() are resctrl filesystem
>> functions. When you consider that it should be clear that adding
>> an arch specific get_node_rmid() between these functions will make
>> the arch/fs split more difficult.
> 
> I'll try to keep that in mind as I rework my series. In v18 my "sum" code
> went into __mon_event_count().  I'll see if I can push that down another
> level.

Thank you

Reinette