[PATCH v11 0/8] x86/resctrl: mba_MBps enhancement

Tony Luck posted 8 patches 1 year ago
Documentation/arch/x86/resctrl.rst        | 10 +++
include/linux/resctrl.h                   |  2 +
arch/x86/kernel/cpu/resctrl/internal.h    |  9 ++-
arch/x86/kernel/cpu/resctrl/core.c        |  9 ++-
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 70 +++++++++++++++++
arch/x86/kernel/cpu/resctrl/monitor.c     | 94 +++++++++++------------
arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 43 +++++++----
7 files changed, 171 insertions(+), 66 deletions(-)
[PATCH v11 0/8] x86/resctrl: mba_MBps enhancement
Posted by Tony Luck 1 year ago
Background
----------

The resctrl filesystem supports a mount option that allows users to
specify a memory bandwidth limit in MiB/s for each domain of a CTRL_MON
group.  The underlying implementation uses data collected from the local
memory bandwidth monitoring event for the CTRL_MON group and all of
its MON subgroups as input to a feedback loop that adjusts the memory
bandwidth allocation control percentage up or down to keep the group
within the limit set by the user.

Problem statement
-----------------

Hard coding the local memory bandwidth monitoring event has the following
issues:

1) Some systems may support total memory bandwidth monitoring but
not local.  The user cannot use this mount option on such systems.

2) For large workloads that span NUMA domains using local bandwidth
monitoring will not throttle jobs correctly.

3) Users may have a mix of large and small workloads and may want to
use different input events per CTRL_MON group.

Solution
--------

A) Provide a new user interface to choose which event is used for each
   CTRL_MON group.

B) Allow systems that only support total memory bandwidth monitoring to
   use total bandwidth event.

Changes since v10:
Link: https://lore.kernel.org/all/20241122235832.27498-1-tony.luck@intel.com/

Patch	Change
-----	------

2	s/filesystm/filesystem/
	Reformat block comment above definition of mba_mbps_default_event using
	more of 80 column width

3	s/Update_mba_bw()/update_mba_bw()

Added Reinette's Reviewed-by tag to parts 2-8

N.B. v6.13-rc1 introduced a locking change in commit f1be1788a32e ("block:
model freeze & enter queue as lock for supporting lockdep") that throws
a lockdep splat when mounting the resctrl filesystem (with certain
CONFIG options).

Proposed patches that fix this are here:
Link: https://lore.kernel.org/all/20241206111611.978870-2-ming.lei@redhat.com/
Link: https://lore.kernel.org/all/20241206111611.978870-3-ming.lei@redhat.com/


Babu Moger (1):
  x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags

Tony Luck (7):
  x86/resctrl: Prepare for per-CTRL_MON group mba_MBps control
  x86/resctrl: Modify update_mba_bw() to use per CTRL_MON group event
  x86/resctrl: Compute memory bandwidth for all supported events
  x86/resctrl: Make mba_sc use total bandwidth if local is not supported
  x86/resctrl: Add "mba_MBps_event" file to CTRL_MON directories
  x86/resctrl: Add write option to "mba_MBps_event" file
  x86/resctrl: Document the new "mba_MBps_event" file

 Documentation/arch/x86/resctrl.rst        | 10 +++
 include/linux/resctrl.h                   |  2 +
 arch/x86/kernel/cpu/resctrl/internal.h    |  9 ++-
 arch/x86/kernel/cpu/resctrl/core.c        |  9 ++-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 70 +++++++++++++++++
 arch/x86/kernel/cpu/resctrl/monitor.c     | 94 +++++++++++------------
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 43 +++++++----
 7 files changed, 171 insertions(+), 66 deletions(-)


base-commit: 40384c840ea1944d7c5a392e8975ed088ecf0b37
-- 
2.47.0
Re: [PATCH v11 0/8] x86/resctrl: mba_MBps enhancement
Posted by Reinette Chatre 1 year ago
Hi Boris,

On 12/6/24 8:31 AM, Tony Luck wrote:
> Background
> ----------
> 
> The resctrl filesystem supports a mount option that allows users to
> specify a memory bandwidth limit in MiB/s for each domain of a CTRL_MON
> group.  The underlying implementation uses data collected from the local
> memory bandwidth monitoring event for the CTRL_MON group and all of
> its MON subgroups as input to a feedback loop that adjusts the memory
> bandwidth allocation control percentage up or down to keep the group
> within the limit set by the user.
> 
> Problem statement
> -----------------
> 
> Hard coding the local memory bandwidth monitoring event has the following
> issues:
> 
> 1) Some systems may support total memory bandwidth monitoring but
> not local.  The user cannot use this mount option on such systems.
> 
> 2) For large workloads that span NUMA domains using local bandwidth
> monitoring will not throttle jobs correctly.
> 
> 3) Users may have a mix of large and small workloads and may want to
> use different input events per CTRL_MON group.
> 
> Solution
> --------
> 
> A) Provide a new user interface to choose which event is used for each
>    CTRL_MON group.
> 
> B) Allow systems that only support total memory bandwidth monitoring to
>    use total bandwidth event.
> 
> Changes since v10:
> Link: https://lore.kernel.org/all/20241122235832.27498-1-tony.luck@intel.com/
> 
> Patch	Change
> -----	------
> 
> 2	s/filesystm/filesystem/
> 	Reformat block comment above definition of mba_mbps_default_event using
> 	more of 80 column width
> 
> 3	s/Update_mba_bw()/update_mba_bw()
> 
> Added Reinette's Reviewed-by tag to parts 2-8


Could you please consider this series for inclusion?

> 
> N.B. v6.13-rc1 introduced a locking change in commit f1be1788a32e ("block:
> model freeze & enter queue as lock for supporting lockdep") that throws
> a lockdep splat when mounting the resctrl filesystem (with certain
> CONFIG options).
> 
> Proposed patches that fix this are here:
> Link: https://lore.kernel.org/all/20241206111611.978870-2-ming.lei@redhat.com/
> Link: https://lore.kernel.org/all/20241206111611.978870-3-ming.lei@redhat.com/
> 
> 
This fix has been merged into v6.13-rc2.

Thank you very much.

Reinette
Re: [PATCH v11 0/8] x86/resctrl: mba_MBps enhancement
Posted by Borislav Petkov 1 year ago
On Mon, Dec 09, 2024 at 09:10:20AM -0800, Reinette Chatre wrote:
> Could you please consider this series for inclusion?

Lemme take a look...

> This fix has been merged into v6.13-rc2.

You mean, I should queue the set ontop of -rc2?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
RE: [PATCH v11 0/8] x86/resctrl: mba_MBps enhancement
Posted by Luck, Tony 1 year ago
> > Could you please consider this series for inclusion?
>
> Lemme take a look...
>
> > This fix has been merged into v6.13-rc2.
>
> You mean, I should queue the set ontop of -rc2?

v6.13-rc1 may give a lockdep splat when mounting /sys/fs/resctrl
(known CONFIG options to trigger this are:

	PROVE_LOCKING && SELINUX && (EXT4 || BTRFS)

but there may be others).

These resctrl patches are not directly dependent on the fixes that
went into -rc2:

   22465bbac53c blk-mq: move cpuhp callback registering out of q->sysfs_lock
   4bf485a7db5d blk-mq: register cpuhp callback after hctx is added to xarray table

they work ok on top of -rc1  (as long as you ignore the lockdep splat).

I see you already have one patch in TIP x86/cache on top of v6.13-rc1

Your choice whether to rebase that to -rc2 and then apply my series. It
might make testing smoother.

-Tony


Re: [PATCH v11 0/8] x86/resctrl: mba_MBps enhancement
Posted by Borislav Petkov 1 year ago
On Mon, Dec 09, 2024 at 05:53:12PM +0000, Luck, Tony wrote:
> v6.13-rc1 may give a lockdep splat when mounting /sys/fs/resctrl
> (known CONFIG options to trigger this are:
> 
> 	PROVE_LOCKING && SELINUX && (EXT4 || BTRFS)
> 
> but there may be others).
> 
> These resctrl patches are not directly dependent on the fixes that
> went into -rc2:
> 
>    22465bbac53c blk-mq: move cpuhp callback registering out of q->sysfs_lock
>    4bf485a7db5d blk-mq: register cpuhp callback after hctx is added to xarray table
> 
> they work ok on top of -rc1  (as long as you ignore the lockdep splat).

Thanks for the details.

> I see you already have one patch in TIP x86/cache on top of v6.13-rc1
> 
> Your choice whether to rebase that to -rc2 and then apply my series. It
> might make testing smoother.

Yah, I can rebase. Not worth the hassle of dealing with bogus reports.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v11 0/8] x86/resctrl: mba_MBps enhancement
Posted by Moger, Babu 1 year ago

On 12/9/24 11:57, Borislav Petkov wrote:
> On Mon, Dec 09, 2024 at 05:53:12PM +0000, Luck, Tony wrote:
>> v6.13-rc1 may give a lockdep splat when mounting /sys/fs/resctrl
>> (known CONFIG options to trigger this are:
>>
>> 	PROVE_LOCKING && SELINUX && (EXT4 || BTRFS)
>>
>> but there may be others).
>>
>> These resctrl patches are not directly dependent on the fixes that
>> went into -rc2:
>>
>>    22465bbac53c blk-mq: move cpuhp callback registering out of q->sysfs_lock
>>    4bf485a7db5d blk-mq: register cpuhp callback after hctx is added to xarray table
>>
>> they work ok on top of -rc1  (as long as you ignore the lockdep splat).
> 
> Thanks for the details.
> 
>> I see you already have one patch in TIP x86/cache on top of v6.13-rc1
>>
>> Your choice whether to rebase that to -rc2 and then apply my series. It
>> might make testing smoother.
> 
> Yah, I can rebase. Not worth the hassle of dealing with bogus reports.
> 

Sanity tested on AMD system. Changes look good.

Tested-by: Babu Moger <babu.moger@amd.com>

-- 
Thanks
Babu Moger