arch/x86/kernel/cpu/resctrl/monitor.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
Commit 6791e0ea3071 ("x86/resctrl: Access per-rmid structures by
index") adds logic to map individual monitoring groups into a
global index space used for tracking allocated RMIDs.
That patch keept the logic to ignore requests to free the default
RMID in free_rmid(), and this works fine on x86.
With arm64 MPAM, there is a latent bug here: on platforms with no
monitors exposed through resctrl, each control group still gets a
different monitoring group ID as seen by the hardware, since the
CLOSID always forms part of the monitoring group ID. This means
that when removing a control group, the code may try to free this
group's default monitoring group RMID for real. If there are no
monitors however, the RMID tracking table rmid_ptrs[] would be a
waste of memory and is never allocated, leading to a splat when a
free_rmid() tries to dereference the table.
One option would be to treat RMID 0 as special for every CLOSID,
but this would be ugly since we still want to do bookkeeping for
these monitoring group IDs when there are monitors present in the
hardware.
Instead, add a gating check of resctrl_arch_mon_capable() in
free_rmid(), and just do nothing if the hardware doesn't have
monitors.
This fix mirrors the gating checks already present in
mkdir_rdt_prepare_rmid_alloc() and elsewhere.
No functional change on x86.
Fixes: 6791e0ea3071 ("x86/resctrl: Access per-rmid structures by index")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
Based on v6.10-rc3.
Tested on x86 (But so far for the monitors-present case.
Testing on Atom would be appreciated.)
Tested on arm64 for the no-monitors case.
---
arch/x86/kernel/cpu/resctrl/monitor.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 2345e6836593..366f496ca3ce 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -519,7 +519,8 @@ void free_rmid(u32 closid, u32 rmid)
* allows architectures that ignore the closid parameter to avoid an
* unnecessary check.
*/
- if (idx == resctrl_arch_rmid_idx_encode(RESCTRL_RESERVED_CLOSID,
+ if (!resctrl_arch_mon_capable() ||
+ idx == resctrl_arch_rmid_idx_encode(RESCTRL_RESERVED_CLOSID,
RESCTRL_RESERVED_RMID))
return;
base-commit: 83a7eefedc9b56fe7bfeff13b6c7356688ffa670
--
2.34.1
Hi Dave,
On 6/14/24 9:08 AM, Dave Martin wrote:
> Commit 6791e0ea3071 ("x86/resctrl: Access per-rmid structures by
> index") adds logic to map individual monitoring groups into a
> global index space used for tracking allocated RMIDs.
>
> That patch keept the logic to ignore requests to free the default
keept -> kept
nitpick: I actually do not know if "that patch" gets same hate as
"this patch" so to avoid any potential feedback about this I'd like
to suggest that this is rewritten without this term. Perhaps
something like: "Requests to free the default RMID in free_rmid()
are ignored, and this works fine on x86."
> RMID in free_rmid(), and this works fine on x86.
>
> With arm64 MPAM, there is a latent bug here: on platforms with no
> monitors exposed through resctrl, each control group still gets a
> different monitoring group ID as seen by the hardware, since the
> CLOSID always forms part of the monitoring group ID. This means
> that when removing a control group, the code may try to free this
> group's default monitoring group RMID for real. If there are no
> monitors however, the RMID tracking table rmid_ptrs[] would be a
> waste of memory and is never allocated, leading to a splat when a
> free_rmid() tries to dereference the table.
>
> One option would be to treat RMID 0 as special for every CLOSID,
> but this would be ugly since we still want to do bookkeeping for
> these monitoring group IDs when there are monitors present in the
> hardware.
>
> Instead, add a gating check of resctrl_arch_mon_capable() in
> free_rmid(), and just do nothing if the hardware doesn't have
> monitors.
>
> This fix mirrors the gating checks already present in
> mkdir_rdt_prepare_rmid_alloc() and elsewhere.
>
> No functional change on x86.
>
> Fixes: 6791e0ea3071 ("x86/resctrl: Access per-rmid structures by index")
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
>
> ---
>
> Based on v6.10-rc3.
>
> Tested on x86 (But so far for the monitors-present case.
Tested by booting with "rdt=!cmt,!mbmtotal,!mbmlocal".
> Testing on Atom would be appreciated.)
>
> Tested on arm64 for the no-monitors case.
> ---
> arch/x86/kernel/cpu/resctrl/monitor.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 2345e6836593..366f496ca3ce 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -519,7 +519,8 @@ void free_rmid(u32 closid, u32 rmid)
> * allows architectures that ignore the closid parameter to avoid an
> * unnecessary check.
> */
> - if (idx == resctrl_arch_rmid_idx_encode(RESCTRL_RESERVED_CLOSID,
> + if (!resctrl_arch_mon_capable() ||
> + idx == resctrl_arch_rmid_idx_encode(RESCTRL_RESERVED_CLOSID,
> RESCTRL_RESERVED_RMID))
> return;
>
>
Thank you for catching this. This reveals how subtle resctrl is by
calling free_rmid() in paths for convenience and then relying on an
uninitialized variable for correct behavior.
This replaces this subtle assumption with an actual check.
With changelog comments addressed:
| Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reinette
Hi Reinette,
On Fri, Jun 14, 2024 at 03:47:58PM -0700, Reinette Chatre wrote:
> Hi Dave,
>
> On 6/14/24 9:08 AM, Dave Martin wrote:
> > Commit 6791e0ea3071 ("x86/resctrl: Access per-rmid structures by
> > index") adds logic to map individual monitoring groups into a
> > global index space used for tracking allocated RMIDs.
> >
> > That patch keept the logic to ignore requests to free the default
>
> keept -> kept
>
> nitpick: I actually do not know if "that patch" gets same hate as
> "this patch" so to avoid any potential feedback about this I'd like
> to suggest that this is rewritten without this term. Perhaps
> something like: "Requests to free the default RMID in free_rmid()
> are ignored, and this works fine on x86."
>
> > RMID in free_rmid(), and this works fine on x86.
> >
How about recasting the first paragraph into the past tense (since it
relates a past commit), and rewording as "Requests to free the default
RMID continued to be ignored in free_rmid(), and this works fine on
x86."
(I agree that "this patch" would have been ambiguous. "That patch" was
an attempt to be clearer, but felt a bit clumsy. Naming the commit
again felt worse, though would have been clearer. I've noticed that
people who do not have English as their first language tend to use
"this" and "that" a little differently from native English speakers, so
there is probably more scope for confusion here that I like to
assume...)
> > With arm64 MPAM, there is a latent bug here: on platforms with no
> > monitors exposed through resctrl, each control group still gets a
> > different monitoring group ID as seen by the hardware, since the
> > CLOSID always forms part of the monitoring group ID. This means
> > that when removing a control group, the code may try to free this
> > group's default monitoring group RMID for real. If there are no
> > monitors however, the RMID tracking table rmid_ptrs[] would be a
> > waste of memory and is never allocated, leading to a splat when a
> > free_rmid() tries to dereference the table.
> >
> > One option would be to treat RMID 0 as special for every CLOSID,
> > but this would be ugly since we still want to do bookkeeping for
> > these monitoring group IDs when there are monitors present in the
> > hardware.
> >
> > Instead, add a gating check of resctrl_arch_mon_capable() in
> > free_rmid(), and just do nothing if the hardware doesn't have
> > monitors.
> >
> > This fix mirrors the gating checks already present in
> > mkdir_rdt_prepare_rmid_alloc() and elsewhere.
> >
> > No functional change on x86.
> >
> > Fixes: 6791e0ea3071 ("x86/resctrl: Access per-rmid structures by index")
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> >
> > ---
> >
> > Based on v6.10-rc3.
> >
> > Tested on x86 (But so far for the monitors-present case.
>
> Tested by booting with "rdt=!cmt,!mbmtotal,!mbmlocal".
Thanks (I take it that's your test, not a request to be more specific
about mine?)
As it happens I tested with rdt=cmt,mbmtotal,mbmlocal,l3cat,l3cdp
(though I made no effort to exercise these features other than running
the selftests). I can note this in the commit if you prefer.
>
> > Testing on Atom would be appreciated.)
> >
> > Tested on arm64 for the no-monitors case.
> > ---
> > arch/x86/kernel/cpu/resctrl/monitor.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> > index 2345e6836593..366f496ca3ce 100644
> > --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> > +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> > @@ -519,7 +519,8 @@ void free_rmid(u32 closid, u32 rmid)
> > * allows architectures that ignore the closid parameter to avoid an
> > * unnecessary check.
> > */
> > - if (idx == resctrl_arch_rmid_idx_encode(RESCTRL_RESERVED_CLOSID,
> > + if (!resctrl_arch_mon_capable() ||
> > + idx == resctrl_arch_rmid_idx_encode(RESCTRL_RESERVED_CLOSID,
> > RESCTRL_RESERVED_RMID))
> > return;
> >
>
> Thank you for catching this. This reveals how subtle resctrl is by
> calling free_rmid() in paths for convenience and then relying on an
> uninitialized variable for correct behavior.
>
> This replaces this subtle assumption with an actual check.
>
> With changelog comments addressed:
> | Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
>
> Reinette
Thanks for that. I'll wait for your reply before respinning this.
Cheers
---Dave
Hi Dave,
On 6/17/24 4:55 AM, Dave Martin wrote:
> Hi Reinette,
>
> On Fri, Jun 14, 2024 at 03:47:58PM -0700, Reinette Chatre wrote:
>> Hi Dave,
>>
>> On 6/14/24 9:08 AM, Dave Martin wrote:
>>> Commit 6791e0ea3071 ("x86/resctrl: Access per-rmid structures by
>>> index") adds logic to map individual monitoring groups into a
>>> global index space used for tracking allocated RMIDs.
>>>
>>> That patch keept the logic to ignore requests to free the default
>>
>> keept -> kept
>>
>> nitpick: I actually do not know if "that patch" gets same hate as
>> "this patch" so to avoid any potential feedback about this I'd like
>> to suggest that this is rewritten without this term. Perhaps
>> something like: "Requests to free the default RMID in free_rmid()
>> are ignored, and this works fine on x86."
>>
>>> RMID in free_rmid(), and this works fine on x86.
>>>
>
> How about recasting the first paragraph into the past tense (since it
> relates a past commit), and rewording as "Requests to free the default
> RMID continued to be ignored in free_rmid(), and this works fine on
> x86."
Please keep it in the present tense. I do not see this as relating to
a "past commit" but instead it is an existing commit responsible for
current behavior. Documentation/process/maintainer-tip.rst contains
some example changelogs created by x86 maintainers that captures their
requirements. The beginning "context" portion is always in present
tense.
>
> (I agree that "this patch" would have been ambiguous. "That patch" was
> an attempt to be clearer, but felt a bit clumsy. Naming the commit
> again felt worse, though would have been clearer. I've noticed that
> people who do not have English as their first language tend to use
> "this" and "that" a little differently from native English speakers, so
> there is probably more scope for confusion here that I like to
> assume...)
>
>>> With arm64 MPAM, there is a latent bug here: on platforms with no
>>> monitors exposed through resctrl, each control group still gets a
>>> different monitoring group ID as seen by the hardware, since the
>>> CLOSID always forms part of the monitoring group ID. This means
>>> that when removing a control group, the code may try to free this
>>> group's default monitoring group RMID for real. If there are no
>>> monitors however, the RMID tracking table rmid_ptrs[] would be a
>>> waste of memory and is never allocated, leading to a splat when a
>>> free_rmid() tries to dereference the table.
>>>
>>> One option would be to treat RMID 0 as special for every CLOSID,
>>> but this would be ugly since we still want to do bookkeeping for
>>> these monitoring group IDs when there are monitors present in the
>>> hardware.
>>>
>>> Instead, add a gating check of resctrl_arch_mon_capable() in
>>> free_rmid(), and just do nothing if the hardware doesn't have
>>> monitors.
>>>
>>> This fix mirrors the gating checks already present in
>>> mkdir_rdt_prepare_rmid_alloc() and elsewhere.
>>>
>>> No functional change on x86.
>>>
>>> Fixes: 6791e0ea3071 ("x86/resctrl: Access per-rmid structures by index")
>>> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
>>>
>>> ---
>>>
>>> Based on v6.10-rc3.
>>>
>>> Tested on x86 (But so far for the monitors-present case.
>>
>> Tested by booting with "rdt=!cmt,!mbmtotal,!mbmlocal".
>
> Thanks (I take it that's your test, not a request to be more specific
> about mine?)
Yes, I did test it with those parameters. You are also welcome to
add:
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
>
> As it happens I tested with rdt=cmt,mbmtotal,mbmlocal,l3cat,l3cdp
> (though I made no effort to exercise these features other than running
> the selftests). I can note this in the commit if you prefer.
hmmm ... those parameters should not be necessary unless the system
has those features forced off by default because of errata. Doing
functional testing on these systems via such enabling is fine
though.
Reinette
Hi,
On Mon, Jun 17, 2024 at 08:53:38AM -0700, Reinette Chatre wrote:
> Hi Dave,
>
> On 6/17/24 4:55 AM, Dave Martin wrote:
> > Hi Reinette,
> >
> > On Fri, Jun 14, 2024 at 03:47:58PM -0700, Reinette Chatre wrote:
> > > Hi Dave,
> > >
> > > On 6/14/24 9:08 AM, Dave Martin wrote:
> > > > Commit 6791e0ea3071 ("x86/resctrl: Access per-rmid structures by
> > > > index") adds logic to map individual monitoring groups into a
> > > > global index space used for tracking allocated RMIDs.
> > > >
> > > > That patch keept the logic to ignore requests to free the default
> > >
> > > keept -> kept
> > >
> > > nitpick: I actually do not know if "that patch" gets same hate as
> > > "this patch" so to avoid any potential feedback about this I'd like
> > > to suggest that this is rewritten without this term. Perhaps
> > > something like: "Requests to free the default RMID in free_rmid()
> > > are ignored, and this works fine on x86."
> > >
> > > > RMID in free_rmid(), and this works fine on x86.
> > > >
> >
> > How about recasting the first paragraph into the past tense (since it
> > relates a past commit), and rewording as "Requests to free the default
> > RMID continued to be ignored in free_rmid(), and this works fine on
> > x86."
>
> Please keep it in the present tense. I do not see this as relating to
> a "past commit" but instead it is an existing commit responsible for
> current behavior. Documentation/process/maintainer-tip.rst contains
> some example changelogs created by x86 maintainers that captures their
> requirements. The beginning "context" portion is always in present
> tense.
Fair enough. I'll reword this as per your suggestion.
[...]
> > > > Tested on x86 (But so far for the monitors-present case.
> > >
> > > Tested by booting with "rdt=!cmt,!mbmtotal,!mbmlocal".
> >
> > Thanks (I take it that's your test, not a request to be more specific
> > about mine?)
>
> Yes, I did test it with those parameters. You are also welcome to
> add:
> Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Ah, right. Thanks.
> > As it happens I tested with rdt=cmt,mbmtotal,mbmlocal,l3cat,l3cdp
> > (though I made no effort to exercise these features other than running
> > the selftests). I can note this in the commit if you prefer.
>
> hmmm ... those parameters should not be necessary unless the system
> has those features forced off by default because of errata. Doing
> functional testing on these systems via such enabling is fine
> though.
>
> Reinette
I got these kernel args from James. Apparently some of these features
are defaulted to off by the firmware on the box I'm using, but we're not
sure why. I haven't dug into it.
Cheers
---Dave
Hi Dave, On 6/18/24 2:30 AM, Dave Martin wrote: > On Mon, Jun 17, 2024 at 08:53:38AM -0700, Reinette Chatre wrote: >> On 6/17/24 4:55 AM, Dave Martin wrote: >>> As it happens I tested with rdt=cmt,mbmtotal,mbmlocal,l3cat,l3cdp >>> (though I made no effort to exercise these features other than running >>> the selftests). I can note this in the commit if you prefer. >> >> hmmm ... those parameters should not be necessary unless the system >> has those features forced off by default because of errata. Doing >> functional testing on these systems via such enabling is fine >> though. >> >> Reinette > > I got these kernel args from James. Apparently some of these features > are defaulted to off by the firmware on the box I'm using, but we're not > sure why. I haven't dug into it. resctrl does not provide capability to override features that the hardware does not advertise as supporting. resctrl does sometimes force hardware advertised features off because of errata (for example, see [1]) and the kernel "rdt=" parameter can be used to override that. This same option also enables administrators to ignore some features on their platforms. Reinette [1] https://lore.kernel.org/all/3aea0a3bae219062c812668bd9b7b8f1a25003ba.1503512900.git.tony.luck@intel.com/
Hi Reinette, On Tue, Jun 18, 2024 at 08:43:51AM -0700, Reinette Chatre wrote: > Hi Dave, > > On 6/18/24 2:30 AM, Dave Martin wrote: > > On Mon, Jun 17, 2024 at 08:53:38AM -0700, Reinette Chatre wrote: > > > On 6/17/24 4:55 AM, Dave Martin wrote: > > > > > As it happens I tested with rdt=cmt,mbmtotal,mbmlocal,l3cat,l3cdp > > > > (though I made no effort to exercise these features other than running > > > > the selftests). I can note this in the commit if you prefer. > > > > > > hmmm ... those parameters should not be necessary unless the system > > > has those features forced off by default because of errata. Doing > > > functional testing on these systems via such enabling is fine > > > though. > > > > > > Reinette > > > > I got these kernel args from James. Apparently some of these features > > are defaulted to off by the firmware on the box I'm using, but we're not > > sure why. I haven't dug into it. > > resctrl does not provide capability to override features that the hardware > does not advertise as supporting. resctrl does sometimes force hardware > advertised features off because of errata (for example, see [1]) and the > kernel "rdt=" parameter can be used to override that. This same option also > enables administrators to ignore some features on their platforms. > > Reinette > > [1] https://lore.kernel.org/all/3aea0a3bae219062c812668bd9b7b8f1a25003ba.1503512900.git.tony.luck@intel.com/ Possibly the box in question falls under this case. I'll need to check with James and/or experiment. Either way, I don't think this impacts the tested-ness of this patch, but please shout if you have concerns. Cheers ---Dave
Hi Dave, On 6/18/24 9:16 AM, Dave Martin wrote: > > Either way, I don't think this impacts the tested-ness of this patch, > but please shout if you have concerns. > I agree. The submission stated that it was tested on x86 "for the monitors-present case". When booting with "rdt=!cmt,!mbmtotal,!mbmlocal" (as I did in my test) resctrl is not able to discover monitoring support and it thus tested the "monitors _not_ present case". Reinette
Hi, On Tue, Jun 18, 2024 at 10:08:47AM -0700, Reinette Chatre wrote: > Hi Dave, > > On 6/18/24 9:16 AM, Dave Martin wrote: > > > > > Either way, I don't think this impacts the tested-ness of this patch, > > but please shout if you have concerns. > > > > I agree. The submission stated that it was tested on x86 > "for the monitors-present case". When booting with > "rdt=!cmt,!mbmtotal,!mbmlocal" (as I did in my test) resctrl > is not able to discover monitoring support and it > thus tested the "monitors _not_ present case". > > Reinette Ah, right. I missed the significance of the options. Thanks again! ---Dave > >
© 2016 - 2025 Red Hat, Inc.