[PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)

Babu Moger posted 27 patches 7 months ago
.../admin-guide/kernel-parameters.txt         |   2 +-
Documentation/filesystems/resctrl.rst         | 188 +++++
arch/x86/include/asm/cpufeatures.h            |   1 +
arch/x86/include/asm/msr-index.h              |   2 +
arch/x86/kernel/cpu/cpuid-deps.c              |   2 +
arch/x86/kernel/cpu/resctrl/core.c            |  13 +-
arch/x86/kernel/cpu/resctrl/internal.h        |  47 ++
arch/x86/kernel/cpu/resctrl/monitor.c         | 176 +++-
arch/x86/kernel/cpu/scattered.c               |   1 +
fs/resctrl/ctrlmondata.c                      |  14 +
fs/resctrl/internal.h                         |  37 +-
fs/resctrl/monitor.c                          | 309 ++++++-
fs/resctrl/rdtgroup.c                         | 768 +++++++++++++++++-
include/linux/resctrl.h                       |  74 +-
include/linux/resctrl_types.h                 |  11 +
15 files changed, 1577 insertions(+), 68 deletions(-)
[PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Babu Moger 7 months ago
This series adds the support for Assignable Bandwidth Monitoring Counters
(ABMC). It is also called QoS RMID Pinning feature

Series is written such that it is easier to support other assignable
features supported from different vendors.

The feature details are documented in the  APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC). The documentation is available at
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

The patches are based on top of commit
92a09c47464d0 (tag: v6.15-rc5, tip/irq/merge) Linux 6.15-rc5
plus 
https://lore.kernel.org/lkml/20250515165855.31452-1-james.morse@arm.com/

It is very clear these patches will go after James's resctrl FS/ARCH
restructure. Hoping to avoid one review cycle due to the merge.

# Introduction

Users can create as many monitor groups as RMIDs supported by the hardware.
However, bandwidth monitoring feature on AMD system only guarantees that
RMIDs currently assigned to a processor will be tracked by hardware.
The counters of any other RMIDs which are no longer being tracked will be
reset to zero. The MBM event counters return "Unavailable" for the RMIDs
that are not tracked by hardware. So, there can be only limited number of
groups that can give guaranteed monitoring numbers. With ever changing
configurations there is no way to definitely know which of these groups
are being tracked for certain point of time. Users do not have the option
to monitor a group or set of groups for certain period of time without
worrying about counter being reset in between.
    
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID, event pair and monitor the bandwidth as long as it is
assigned.  The assigned RMID will be tracked by the hardware until the user
unassigns it manually. There is no need to worry about counters being reset
during this period. Additionally, the user can specify a bitmask identifying
the specific bandwidth types from the given source to track with the counter.

Without ABMC enabled, monitoring will work in current 'default' mode without
assignment option.

# History

Earlier implementation of ABMC had dependancy on BMEC (Bandwidth Monitoring
Event Configuration). Peter had concerns with that implementation because
it may be not be compatible with ARM's MPAM.

Here are the threads discussing the concerns and new interface to address the concerns.
https://lore.kernel.org/lkml/CALPaoCg97cLVVAcacnarp+880xjsedEWGJPXhYpy4P7=ky4MZw@mail.gmail.com/
https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/

Here are the finalized requirements based on the discussion:

*   Remove BMEC dependency on the ABMC feature.

*   Eliminate global assignment listing. The interface
    /sys/fs/resctrl/info/L3_MON/mbm_assign_control is no longer required.

*   Create the configuration directories at /sys/fs/resctrl/info/L3_MON/counter_configs/.
    The configuration file names should be free-form, allowing users to create them as needed.

*   Perform assignment listing at the group level by introducing mbm_L3_assignments
    in each monitoring group. The listing should provide the following details:

    Event Configuration: Specifies the event configuration applied. This will be crucial
    when "mkdir" on event configuration is added in the future, leading to the creation
    of mon_data/mon_l3_*/<event configuration>.

    Domains: Identifies the domains where the configuration is applied, supporting multi-domain setups.

    Assignment Type: Indicates whether the assignment is Exclusive (e or d), Shared (s), or Unassigned (_).

*   Provide option to enable or disable auto assignment when new group is created.

This series tries to address all the requirements listed above.

# Implementation details

Create a generic interface aimed to support user space assignment of scarce
counters used for monitoring. First usage of interface is by ABMC with option
to expand usage to "soft-ABMC" and MPAM counters in future.

Feature adds following interface files:

/sys/fs/resctrl/info/L3_MON/mbm_assign_mode: Reports the list of assignable
monitoring features supported. The enclosed brackets indicate which
feature is enabled.

/sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
counters available for assignment.

/sys/fs/resctrl/info/L3_MON/available_mbm_cntrs: Reports the number of monitoring
counters free in each domain.

/sys/fs/resctrl/info/L3_MON/counter_configs : Directory to hold the counter configuration.

/sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter : Default configuration
for MBM total events.

/sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter : Default configuration
for MBM local events.

/sys/fs/resctrl/mbm_L3_assignments: Interface to list or modify assignment states on each group.

# Examples

a. Check if ABMC support is available
	#mount -t resctrl resctrl /sys/fs/resctrl/

	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
	[mbm_cntr_assign]
	default

	ABMC feature is detected and it is enabled.

b. Check how many ABMC counters are available. 

	# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
	32

c. Check how many ABMC counters are available in each domain.

	# cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs 
	0=30;1=30

d. Check default counter configuration.

	# cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter 
	local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
        local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all

	# cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter 
	local_reads, local_non_temporal_writes, local_reads_slow_memory

e. Series adds a new interface file "mbm_L3_assignments" in each monitoring group
   to list and modify any group's monitoring states.

	The list is displayed in the following format:

        <Event configuration>:<Domain id>=<Assignment type>

        Event configuration: A valid event configuration listed in the
        /sys/fs/resctrl/info/L3_MON/counter_configs directory.

        Domain ID: A valid domain ID number.

        Assignment types:

        _ : No event configuration assigned

        e : Event configuration assigned in exclusive mode

	To list the default group states:
	# cat /sys/fs/resctrl/mbm_L3_assignments
	mbm_total_bytes:0=e;1=e
	mbm_local_bytes:0=e;1=e

	To unassign the configuration of mbm_total_bytes on domain 0:
	#echo "mbm_total_bytes:0=_" > mbm_L3_assignments
	#cat mbm_L3_assignments
	mbm_total_bytes:0=_;1=e
	mbm_local_bytes:0=e;1=e

	To unassign the mbm_total_bytes configuration on all domains:
    	$echo "mbm_total_bytes:*=_" > mbm_L3_assignments
	$cat mbm_L3_assignments
	mbm_total_bytes:0=_;1=_
	mbm_local_bytes:0=e;1=e

	To assign the mbm_total_bytes configuration on all domains in exclusive mode:
    	$echo "mbm_total_bytes:*=e" > mbm_L3_assignments
	$cat mbm_L3_assignments
	mbm_total_bytes:0=e;1=e
	mbm_local_bytes:0=e;1=e

g. Read the events mbm_total_bytes and mbm_local_bytes of the default group.
   There is no change in reading the events with ABMC. If the event is unassigned
   when reading, then the read will come back as "Unassigned".
	
	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	779247936
	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 
	765207488
	
h. Check the default event configurations.

	#cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
	local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
	local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all

	#cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
	local_reads, local_non_temporal_writes, local_reads_slow_memory

i. Change the event configuration for mbm_local_bytes.

	#echo "local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads" >
	/sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter

	#cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
	local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads
	
        This will update the assignments where mbm_local_bytes are configured.
	
j. Now read the total event again. The first read may come back with "Unavailable"
   status. The subsequent read of mbm_total_bytes will display only the read events.
	
	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	Unavailable
	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	314101

k. Users will have the option to go back to 'default' mbm_assign_mode if required.
   This can be done using the following command. Note that switching the
   mbm_assign_mode will reset all the MBM counters of all resctrl groups.

	# echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
	mbm_cntr_assign
	[default]
	
l. Unmount the resctrl
	 
	#umount /sys/fs/resctrl/
---
v13:
   Removed BMEC related 2 patches which were in the previous series.
   It was related to optimization which can be doen later.

   Patches are created on top of FS/ARCH restructure. So, major changes
   are due to FS/ARCH restructure. The files are split between
   arch/x86/kernel/cpu/resctrl/ and fs/resctrl/. So, functions
   are moved between these files accordingly.

   Added fflag RFTYPE_RES_CACHE for mbm_assign_mode, num_mbm_cntrs, available_mbm_cntrs.

   Removed the references to "mbm_assign_control".
  
   Moved resctrl_arch_config_cntr() prototype to include/linux/resctrl.h.
   Changed resctrl_arch_config_cntr() to retun void from int to simplify few call
   sequences.

   Added the event configuration details inside the evt_list in monitor domains.
   The avoids the need for new structure mbm_assign_config. 

   Passed evtid to functions resctrl_alloc_config_cntr() and resctrl_assign_cntr_event().
   Event configuration value can be easily obtained from mon_evt list.

   Added new patch to pass the entire struct rdtgroup to __mon_event_count(),
   mbm_update(), and related functions. We can easily get RMID,CLOSID etc from rdtgroup.

   Added new function __cntr_id_read_phys() to handle ABMC event reading.

   Added a new patch to hide BMEC related files when mbm_cntr_assign mode is enabled..
  
   Added the call resctrl_init_evt_configuration() to setup the event configuration during init.

   And few other commit message updates and user doc updates.

   Removed Reviewed-by from few patches as patches have changed due to FS/ARCH restructure.

   Let me know if I missed something.

v12:
   This version is kind of RFC series with a new interface.
   
   Removed Reviewed-by tag on few patches when the patch has changed.

   Moved BMEC related patches (1 and 2) to beginning of the series.
   Removed the dependancy on BMEC to ABMC feature.

   Removed the un-necessary initialization of mon_config_info structure.
   Changed wrmsrl instead of wrmsr to address the below comment.
   https://lore.kernel.org/lkml/0fc8dbd4-07d8-40bd-8eec-402b48762807@zytor.com/

   Fixed the conflicts due to recent changes in rdt_resource data structure.
   Added new mbm_cfg_mask field to resctrl_mon.
   
   Added the code to reset arch state inside _resctrl_abmc_enable().

   Added the check CONFIG_RESCTRL_ASSIGN_FIXED to take care of arm platforms.
   This will be defined only in arm and not in x86.

   Changed the code to display the max supported monitoring counters in each domain.
   
   Fixed the struct mbm_cntr_cfg code documentation.
   Moved the struct mbm_cntr_cfg definition to resctrl/internal.h as suggested by James.

   Replaced seq_puts(s, ";") with seq_putc(s, ';');
   Added missing rdt_last_cmd_clear() in resctrl_available_mbm_cntrs_show().

   Added the check to reset the architecture-specific state only when assign is requested.

   Added evt_cfg as the parameter to resctrl_arch_config_cntr() as the user will
   be passing the event configuration from /info/L3_MON/event_configs/.

   Changed the check in resctrl_alloc_config_cntr() to reduce the indentation.
   Fixed the handling error on first failure while assigning.
   Added new parameter event configuration (evt_cfg) to get the event configuration from user space.

   Added tte support for reading ABMC counters. This is bit involved change and affects lots of code.

   New patch to support event configurations via new counter_configs method.

   Removed mbm_cntr_reset() as it is not required while removing the group.

   Added new patch to handle auto assign on group creation ("mbm_assign_on_mkdir")

   Added couple of patches add interface for "mbm_L3_assignments" on each mon group.

   Introduced mbm_cntr_free_all() and resctrl_reset_rmid_all() to clear counters and
   non-architectural states when monitor mode is changed.
   https://lore.kernel.org/lkml/b60b4f72-6245-46db-a126-428fb13b6310@intel.com/

   Moved the resctrl_arch_mbm_cntr_assign_set_one to domain_add_cpu_mon().

   Patches 17, 18, 19, 20, 21, 23, 24 are completely new to address the new interface requirement.

v11:
   The commit 2937f9c361f7a ("x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags")
   is already merged. Removed from the series.
   
   Resolved minor conflicts due to code displacement in latest code.
 
   Moved the monitoring related calls to monitor.c file when possible.
   Moved some of the changes from include/linux/resctrl.h to arch/x86/kernel/cpu/resctrl/internal.h
   as requested by Reinette. This changes will be moved back when arch and non code is separated.
   
   Renamed rdtgroup_mbm_assign_mode_show() to resctrl_mbm_assign_mode_show().
   Renamed rdtgroup_num_mbm_cntrs_show() to resctrl_num_mbm_cntrs_show().

   Moved the mon_config_info structure definition to internal.h.
   Moved resctrl_arch_mon_event_config_get() and resctrl_arch_mon_event_config_set()
   to monitor.c file.

   Moved resctrl_arch_assign_cntr() and resctrl_abmc_config_one_amd() to monitor.c.
   Added the code to reset the arch state in resctrl_arch_assign_cntr().
   Also removed resctrl_arch_reset_rmid() inside IPI as the counters are reset from the callers.

   Renamed rdtgroup_assign_cntr_event() to resctrl_assign_cntr_event().
   Refactored the resctrl_assign_cntr_event().
   Added functionality to exit on the first error during assignment.
   Simplified mbm_cntr_free().
   Removed the function mbm_cntr_assigned(). Will be using mbm_cntr_get() to
   figure out if the counter is assigned or not.
   
   Renamed rdtgroup_unassign_cntr_event() to resctrl_unassign_cntr_event().
   Refactored the resctrl_unassign_cntr_event().

   Moved mbm_cntr_reset() to monitor.c.
   Added code reset non-architectural state in mbm_cntr_reset().
   Added missing rdtgroup_unassign_cntrs() calls on failure path.

   Domain can be NULL with SNC support so moved the unassign check in rdtgroup_mondata_show().

   Renamed rdtgroup_mbm_assign_mode_write() to resctrl_mbm_assign_mode_write().
   Added more details in resctrl.rst about mbm_cntr_assign mode.
   Re-arranged the text in resctrl.rst file in section mbm_cntr_assign.

   Moved resctrl_arch_mbm_cntr_assign_set_one() to monitor.c

   Added non-arch RMID reset in mbm_config_write_domain().
   Removed resctrl_arch_reset_rmid() call in resctrl_abmc_config_one_amd(). Not required
   as reset of arch and non-arch rmid counters done from the callers. It simplies the IPI code.

   Fixed printing the separator after each domain while listing the group assignments.
   Renamed rdtgroup_mbm_assign_control_show to resctrl_mbm_assign_control_show().

   Fixed the static check warning with initializing dom_id in resctrl_process_flags()

   Added change log in each patch for specific changes.

v10:
   Major change is related to domain specific assignment.
   Added struct mbm_cntr_cfg inside mon domains. This will handle
   the domain specific assignments as discussed in below.
   https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
   I did not see the need to add cntr_id in mbm_state structure. Not used in the code.
   Following patches take care of these changes.
   Patch 12, 13, 15, 16, 17, 18.
   
   Added __init attribute to cache_alloc_hsw_probe(). Followed function
   prototype rules (preferred order is storage class before return type).
   
   Moved the mon_config_info structure definition to resctrl.h
   
   Added call resctrl_arch_reset_rmid() to reset the RMID in the domain inside IPI call
   resctrl_abmc_config_one_amd.
   
   SMP and non-SMP call support is not required in resctrl_arch_config_cntr with new
   domain specific assign approach/data structure.
   
   Assigned the counter before exposing the event files.
   Moved the call rdtgroup_assign_cntrs() inside mkdir_rdt_prepare_rmid_alloc().
   This is called both CNTR_MON and MON group creation.
   
   Call mbm_cntr_reset() when unmounted to clear all the assignments.
   
   Fixed the issue with finding the domain in multiple iterations in rdtgroup_process_flags().
   
   Printed full error message with domain information when assign fails.
   
   Taken care of other text comments in all the patches. Patch specific changes are in each patch.
   
   If I missed something please point me and it is not intentional.

v9:
   Patch 14 is a new addition. 
   Major change in patch 24.
   Moved the fix patch to address __init attribute to begining of the series.
   Fixed all the call sequences. Added additional Fixed tags.

   Added Reviewed-by where applicable.

   Took care of couple of minor merge conflicts with latest code.
   Re-ordered the MSR in couple of instances.
   Added available_mbm_cntrs (patch 14) to print the number of counter in a domain.

   Used MBM_EVENT_ARRAY_INDEX macro to get the event index.
   Introduced rdtgroup_cntr_id_init() to initialize the cntr_id

   Introduced new function resctrl_config_cntr to assign the counter, update
   the bitmap and reset the architectural state.
   Taken care of error handling(freeing the counter) when assignment fails.
  
   Changed rdtgroup_assign_cntrs() and rdtgroup_unassign_cntrs() to return void.
   Updated couple of rdtgroup_unassign_cntrs() calls properly.

   Fixed problem changing the mode to mbm_cntr_assign mode when it is
   not supported. Added extra checks to detect if systems supports it.
   
   https://lore.kernel.org/lkml/03b278b5-6c15-4d09-9ab7-3317e84a409e@intel.com/
   As discussed in the above comment, introduced resctrl_mon_event_config_set to
   handle IPI. But sending another IPI inside IPI causes problem. Kernel
   reports SMP warning. So, introduced resctrl_arch_update_cntr() to send the
   command directly.

   Fixed handling special case '//0=' and '//".
   Removed extra strstr() call in rdtgroup_mbm_assign_control_write().
   Added generic failure text when assignment operation fails.
   Corrected user documentation format texts.

v8:
  Patches are getting into final stages. 
  Couple of changes Patch 8, Patch 19 and Patch 23.
  Most of the other changes are related to rename and text message updates.

  Details are in each patch. Here is the summary.

  Added __init attribute to dom_data_init() in patch 8/25.
  Moved the mbm_cntrs_init() and mbm_cntrs_exit() functionality inside
  dom_data_init() and dom_data_exit() respectively.

  Renamed resctrl_mbm_evt_config_init() to arch_mbm_evt_config_init()
  Renamed resctrl_arch_event_config_get() to resctrl_arch_mon_event_config_get().
          resctrl_arch_event_config_set() to resctrl_arch_mon_event_config_set().

  Rename resctrl_arch_assign_cntr to resctrl_arch_config_cntr.
  Renamed rdtgroup_assign_cntr() to rdtgroup_assign_cntr_event().
  Added the code to return the error if rdtgroup_assign_cntr_event fails.
  Moved definition of MBM_EVENT_ARRAY_INDEX to resctrl/internal.h.
  Renamed rdtgroup_mbm_cntr_is_assigned to mbm_cntr_assigned_to_domain
  Added return error handling in resctrl_arch_config_cntr().
  Renamed rdtgroup_assign_grp to rdtgroup_assign_cntrs.
  Renamed rdtgroup_unassign_grp to rdtgroup_unassign_cntrs.
  Fixed the problem with unassigning the child MON groups of CTRL_MON group.
  Reset the internal counters after mbm_cntr_assign mode is changed.
  Renamed rdtgroup_mbm_cntr_reset() to mbm_cntr_reset()
  Renamed resctrl_arch_mbm_cntr_assign_configure to
            resctrl_arch_mbm_cntr_assign_set_one.

  Used the same IPI as event update to modify the assignment.
  Could not do the way we discussed in the thread.
  https://lore.kernel.org/lkml/f77737ac-d3f6-3e4b-3565-564f79c86ca8@amd.com/
  Needed to figure out event type to update the configuration.

  Moved unassign first and assign during the assign modification.
  Assign none "_" takes priority. Cannot be mixed with other flags.
  Updated the documentation and .rst file format. htmldoc looks ok.

v7:
   Major changes are related to FS and arch codes separation.
   Changed few interface names based on feedback.
   Here are the summary and each patch contains changes specific the patch.

   Removed WARN_ON for num_mbm_cntrs. Decided to dynamically allocate the bitmap.
   WARN_ON is not required anymore.
 
   Renamed the function resctrl_arch_get_abmc_enabled() to resctrl_arch_mbm_cntr_assign_enabled().

   Merged resctrl_arch_mbm_cntr_assign_disable, resctrl_arch_mbm_cntr_assign_disable
   and renamed to resctrl_arch_mbm_cntr_assign_set(). Passed the struct rdt_resource
   to these functions.

   Removed resctrl_arch_reset_rmid_all() from arch code. This will be done from FS the caller.

   Updated the descriptions/commit log in resctrl.rst to generic text. Removed ABMC references.
   Renamed mbm_mode to mbm_assign_mode.
   Renamed mbm_control to  mbm_assign_control.
   Introduced mutex lock in rdtgroup_mbm_mode_show().
 
   The 'legacy' mode is called 'default' mode. 

   Removed the static allocation and now allocating bitmap mbm_cntr_free_map dynamically.

   Merged rdtgroup_assign_cntr(), rdtgroup_alloc_cntr() into one.
   Merged rdtgroup_unassign_cntr(), rdtgroup_free_cntr() into one.
   
  Added struct rdt_resource to the interface functions resctrl_arch_assign_cntr ()
  and resctrl_arch_unassign_cntr().
  Rename rdtgroup_abmc_cfg() to resctrl_abmc_config_one_amd().
   
  Added a new patch to fix counter assignment on event config changes.

  Removed the references of ABMC from user interfaces.

  Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
  Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.

  Thomas Gleixner asked us to update  https://gitlab.com/x86-cpuid.org/x86-cpuid-db. 
  It needs internal approval. We are working on it.

v6:
  We still need to finalize few interface details on mbm_assign_mode and mbm_assign_control
  in case of ABMC and Soft-ABMC. We can continue the discussion with this series.

  Added support for domain-id '*' to update all the domains at once.
  Fixed assign interface to allocate the counter if counter is
  not assigned.   
  Fixed unassign interface to free the counter if the counter is not
  assigned in any of the domains.

  Renamed abmc_capable to mbm_cntr_assignable.

  Renamed abmc_enabled to mbm_cntr_assign_enabled.
  Used msr_set_bit and msr_clear_bit for msr updates.
  Renamed resctrl_arch_abmc_enable() to resctrl_arch_mbm_cntr_assign_enable().
  Renamed resctrl_arch_abmc_disable() to resctrl_arch_mbm_cntr_assign_disable().

  Changed the display name from num_cntrs to num_mbm_cntrs.

  Removed the variable mbm_cntrs_free_map_len. This is not required.
  Removed the call mbm_cntrs_init() in arch code. This needs to be done at higher level.
  Used DECLARE_BITMAP to initialize mbm_cntrs_free_map.
  Removed unused config value definitions.

  Introduced mbm_cntr_map to track counters at domain level. With this
  we dont need to send MSR read to read the counter configuration.

  Separated all the counter id management to upper level in FS code.

  Added checks to detect "Unassigned" before reading the RMID.

  More details in each patch.

v5:
  Rebase changes (because of SNC support)

  Interface changes.
   /sys/fs/resctrl/mbm_assign to /sys/fs/resctrl/mbm_assign_mode.
   /sys/fs/resctrl/mbm_assign_control to /sys/fs/resctrl/mbm_assign_control.

  Added few arch specific routines.
  resctrl_arch_get_abmc_enabled.
  resctrl_arch_abmc_enable.
  resctrl_arch_abmc_disable.

  Few renames
   num_cntrs_free_map -> mbm_cntrs_free_map
   num_cntrs_init -> mbm_cntrs_init
   arch_domain_mbm_evt_config -> resctrl_arch_mbm_evt_config

  Introduced resctrl_arch_event_config_get and
    resctrl_arch_event_config_set() to update event configuration.

  Removed mon_state field mongroup. Added MON_CNTR_UNSET to initialize counters.

  Renamed ctr_id to cntr_id for the hardware counter.
 
  Report "Unassigned" in case the user attempts to read the events without assigning the counter.
  
  ABMC is enabled during the boot up. Can be enabled or disabled later.

  Fixed opcode and flags combination.
    '=_" is valid.
    "-_" amd "+_" is not valid.

 Added all the comments as far as I know. If I missed something, it is not intentional.

v4: 
  Main change is domain specific event assignment.
  Kept the ABMC feature as a default.
  Dynamcic switching between ABMC and mbm_legacy is still allowed.
  We are still not clear about mount option.
  Moved the monitoring related data in resctrl_mon structure from rdt_resource.
  Fixed the display of legacy and ABMC mode.
  Used bimap APIs when possible.
  Removed event configuration read from MSRs. We can use the
  internal saved data.(patch 12)
  Added more comments about L3_QOS_ABMC_CFG MSR.
  Added IPIs to read the assignment status for each domain (patch 18 and 19)
  More details in each patch.

v3:
   This series adds the support for global assignment mode discussed in
   the thread. https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/
   Removed the individual assignment mode and included the global assignment interface.
   Added following interface files.
   a. /sys/fs/resctrl/info/L3_MON/mbm_assign
      Used for displaying the current assignment mode and switch between
      ABMC and legacy mode.
   b. /sys/fs/resctrl/info/L3_MON/mbm_assign_control
      Used for lising the groups assignment mode and modify the assignment states.
   c. Most of the changes are related to the new interface.
   d. Addressed the comments from Reinette, James and Peter.
   e. Hope I have addressed most of the major feedbacks discussed. If I missed
      something then it is not intentional. Please feel free to comment.
   f. Sending this as an RFC as per Reinette's comment. So, this is still open
      for discussion.

v2:
   a. Major change is the way ABMC is enabled. Earlier, user needed to remount
      with -o abmc to enable ABMC feature. Removed that option now.
      Now users can enable ABMC by "$echo 1 to /sys/fs/resctrl/info/L3_MON/mbm_assign_enable".
     
   b. Added new word 21 to x86/cpufeatures.h.

   c. Display unsupported if user attempts to read the events when ABMC is enabled
      and event is not assigned.

   d. Display monitor_state as "Unsupported" when ABMC is disabled.
  
   e. Text updates and rebase to latest tip tree (as of Jan 18).
 
   f. This series is still work in progress. I am yet to hear from ARM developers. 

--------------------------------------------------------------------------------------

Previous revisions:
v12: https://lore.kernel.org/lkml/cover.1743725907.git.babu.moger@amd.com/
v11: https://lore.kernel.org/lkml/cover.1737577229.git.babu.moger@amd.com/
v10: https://lore.kernel.org/lkml/cover.1734034524.git.babu.moger@amd.com/
v9: https://lore.kernel.org/lkml/cover.1730244116.git.babu.moger@amd.com/
v8: https://lore.kernel.org/lkml/cover.1728495588.git.babu.moger@amd.com/
v7: https://lore.kernel.org/lkml/cover.1725488488.git.babu.moger@amd.com/
v6: https://lore.kernel.org/lkml/cover.1722981659.git.babu.moger@amd.com/
v5: https://lore.kernel.org/lkml/cover.1720043311.git.babu.moger@amd.com/
v4: https://lore.kernel.org/lkml/cover.1716552602.git.babu.moger@amd.com/
v3: https://lore.kernel.org/lkml/cover.1711674410.git.babu.moger@amd.com/  
v2: https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/
v1: https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/

Babu Moger (27):
  x86/cpufeatures: Add support for Assignable Bandwidth Monitoring
    Counters (ABMC)
  x86/resctrl: Add ABMC feature in the command line options
  x86/resctrl: Consolidate monitoring related data from rdt_resource
  x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  x86/resctrl: Add support to enable/disable AMD ABMC feature
  x86/resctrl: Introduce the interface to display monitor mode
  x86/resctrl: Introduce interface to display number of monitoring
    counters
  x86/resctrl: Introduce mbm_cntr_cfg to track assignable counters at
    domain
  x86/resctrl: Introduce interface to display number of free MBM
    counters
  x86/resctrl: Add data structures and definitions for ABMC assignment
  x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter
    with ABMC
  x86/resctrl: Introduce event configuration modes
  x86/resctrl: Add the functionality to assign MBM events
  x86/resctrl: Add the functionality to unassign MBM events
  x86/resctrl: Report 'Unassigned' for MBM events in mbm_cntr_assign
    mode
  x86/resctrl: Pass entire struct rdtgroup rather than passing
    individual members
  x86/resctrl: Add the support for reading ABMC counters
  x86/resctrl: Add definitions for MBM event configuration
  x86/resctrl: Add event configuration directory under info/L3_MON/
  x86/resctrl: Provide interface to update the event configurations
  x86/resctrl: Introduce mbm_assign_on_mkdir to configure assignments
  x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is
    enabled
  x86/resctrl: Introduce mbm_L3_assignments to list assignments in a
    group
  x86/resctrl: Introduce the interface to modify assignments in a group
  x86/resctrl: Hide the BMEC related files when mbm_cnt_assign is
    enabled
  x86/resctrl: Introduce the interface to switch between monitor modes
  x86/resctrl: Configure mbm_cntr_assign mode if supported

 .../admin-guide/kernel-parameters.txt         |   2 +-
 Documentation/filesystems/resctrl.rst         | 188 +++++
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/msr-index.h              |   2 +
 arch/x86/kernel/cpu/cpuid-deps.c              |   2 +
 arch/x86/kernel/cpu/resctrl/core.c            |  13 +-
 arch/x86/kernel/cpu/resctrl/internal.h        |  47 ++
 arch/x86/kernel/cpu/resctrl/monitor.c         | 176 +++-
 arch/x86/kernel/cpu/scattered.c               |   1 +
 fs/resctrl/ctrlmondata.c                      |  14 +
 fs/resctrl/internal.h                         |  37 +-
 fs/resctrl/monitor.c                          | 309 ++++++-
 fs/resctrl/rdtgroup.c                         | 768 +++++++++++++++++-
 include/linux/resctrl.h                       |  74 +-
 include/linux/resctrl_types.h                 |  11 +
 15 files changed, 1577 insertions(+), 68 deletions(-)

-- 
2.34.1
Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 6 months, 4 weeks ago
Hi Babu,

On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
>
>
> This series adds the support for Assignable Bandwidth Monitoring Counters
> (ABMC). It is also called QoS RMID Pinning feature
>
> Series is written such that it is easier to support other assignable
> features supported from different vendors.
>
> The feature details are documented in the  APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC). The documentation is available at
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>
> The patches are based on top of commit
> 92a09c47464d0 (tag: v6.15-rc5, tip/irq/merge) Linux 6.15-rc5
> plus
> https://lore.kernel.org/lkml/20250515165855.31452-1-james.morse@arm.com/
>
> It is very clear these patches will go after James's resctrl FS/ARCH
> restructure. Hoping to avoid one review cycle due to the merge.
>
> # Introduction
>
> Users can create as many monitor groups as RMIDs supported by the hardware.
> However, bandwidth monitoring feature on AMD system only guarantees that
> RMIDs currently assigned to a processor will be tracked by hardware.
> The counters of any other RMIDs which are no longer being tracked will be
> reset to zero. The MBM event counters return "Unavailable" for the RMIDs
> that are not tracked by hardware. So, there can be only limited number of
> groups that can give guaranteed monitoring numbers. With ever changing
> configurations there is no way to definitely know which of these groups
> are being tracked for certain point of time. Users do not have the option
> to monitor a group or set of groups for certain period of time without
> worrying about counter being reset in between.
>
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID, event pair and monitor the bandwidth as long as it is
> assigned.  The assigned RMID will be tracked by the hardware until the user
> unassigns it manually. There is no need to worry about counters being reset
> during this period. Additionally, the user can specify a bitmask identifying
> the specific bandwidth types from the given source to track with the counter.
>
> Without ABMC enabled, monitoring will work in current 'default' mode without
> assignment option.
>
> # History
>
> Earlier implementation of ABMC had dependancy on BMEC (Bandwidth Monitoring
> Event Configuration). Peter had concerns with that implementation because
> it may be not be compatible with ARM's MPAM.
>
> Here are the threads discussing the concerns and new interface to address the concerns.
> https://lore.kernel.org/lkml/CALPaoCg97cLVVAcacnarp+880xjsedEWGJPXhYpy4P7=ky4MZw@mail.gmail.com/
> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>
> Here are the finalized requirements based on the discussion:
>
> *   Remove BMEC dependency on the ABMC feature.
>
> *   Eliminate global assignment listing. The interface
>     /sys/fs/resctrl/info/L3_MON/mbm_assign_control is no longer required.
>
> *   Create the configuration directories at /sys/fs/resctrl/info/L3_MON/counter_configs/.
>     The configuration file names should be free-form, allowing users to create them as needed.
>
> *   Perform assignment listing at the group level by introducing mbm_L3_assignments
>     in each monitoring group. The listing should provide the following details:
>
>     Event Configuration: Specifies the event configuration applied. This will be crucial
>     when "mkdir" on event configuration is added in the future, leading to the creation
>     of mon_data/mon_l3_*/<event configuration>.
>
>     Domains: Identifies the domains where the configuration is applied, supporting multi-domain setups.
>
>     Assignment Type: Indicates whether the assignment is Exclusive (e or d), Shared (s), or Unassigned (_).
>
> *   Provide option to enable or disable auto assignment when new group is created.

So far I was able to reenable MBM on AMD implementations (for some
users) while deferring on the counter assignment interface discussion
by just making shared assignment the default for newly-created groups.
Until they want to upgrade assignments to exclusive or break down
traffic with multiple counters to watch a particular group more
closely, they won't need to change any assignments.

Just pointing out that this turned out to be a useful first step in
deploying ABMC support.

>
> This series tries to address all the requirements listed above.
>
> # Implementation details
>
> Create a generic interface aimed to support user space assignment of scarce
> counters used for monitoring. First usage of interface is by ABMC with option
> to expand usage to "soft-ABMC" and MPAM counters in future.

I'll try to identify any issues I've encountered with "soft-ABMC".
Hopefully I'll be able to share a sample implementation based on these
patches soon.

There's now more interest in Google for allowing explicit control of
where RMIDs are assigned on Intel platforms. Even though the number of
RMIDs implemented by hardware tends to be roughly the number of
containers they want to support, they often still need to create
containers when all RMIDs have already been allocated, which is not
currently allowed. Once the container has been created and starts
running, it's no longer possible to move its threads into a monitoring
group whenever RMIDs should become available again, so it's important
for resctrl to maintain an accurate task list for a container even
when RMIDs are not available.

>
> Feature adds following interface files:
>
> /sys/fs/resctrl/info/L3_MON/mbm_assign_mode: Reports the list of assignable
> monitoring features supported. The enclosed brackets indicate which
> feature is enabled.
>
> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
> counters available for assignment.

Earlier I discussed with Reinette[1] what num_mbm_cntrs should
represent in a "soft-ABMC" implementation where assignment is
implemented by assigning an RMID, which would result in all events
being assigned at once.

My main concern is how many "counters" you can assign by assigning
RMIDs. I recall Reinette proposed reporting the number of groups which
can be assigned separately from counters which can be assigned.

>
> /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs: Reports the number of monitoring
> counters free in each domain.
>
> /sys/fs/resctrl/info/L3_MON/counter_configs : Directory to hold the counter configuration.
>
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter : Default configuration
> for MBM total events.
>
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter : Default configuration
> for MBM local events.

IIUC, this needs to be implemented now so you can drop BMEC with this series?

>
> /sys/fs/resctrl/mbm_L3_assignments: Interface to list or modify assignment states on each group.
>
> # Examples
>
> a. Check if ABMC support is available
>         #mount -t resctrl resctrl /sys/fs/resctrl/
>
>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>         [mbm_cntr_assign]
>         default
>
>         ABMC feature is detected and it is enabled.
>
> b. Check how many ABMC counters are available.
>
>         # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>         32
>
> c. Check how many ABMC counters are available in each domain.
>
>         # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
>         0=30;1=30
>
> d. Check default counter configuration.
>
>         # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>         local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
>         local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
>
>         # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>         local_reads, local_non_temporal_writes, local_reads_slow_memory
>
> e. Series adds a new interface file "mbm_L3_assignments" in each monitoring group
>    to list and modify any group's monitoring states.

To confirm, would we have "mbm_<resource_name>_assignments" for each
resource where MBM-ish events could be assigned?

>
>         The list is displayed in the following format:
>
>         <Event configuration>:<Domain id>=<Assignment type>

For soft-ABMC assignment, is there just a single event configuration
representing all the events tracked by the RMID?

>
>         Event configuration: A valid event configuration listed in the
>         /sys/fs/resctrl/info/L3_MON/counter_configs directory.
>
>         Domain ID: A valid domain ID number.
>
>         Assignment types:
>
>         _ : No event configuration assigned
>
>         e : Event configuration assigned in exclusive mode
>
>         To list the default group states:
>         # cat /sys/fs/resctrl/mbm_L3_assignments
>         mbm_total_bytes:0=e;1=e
>         mbm_local_bytes:0=e;1=e
>
>         To unassign the configuration of mbm_total_bytes on domain 0:
>         #echo "mbm_total_bytes:0=_" > mbm_L3_assignments
>         #cat mbm_L3_assignments
>         mbm_total_bytes:0=_;1=e
>         mbm_local_bytes:0=e;1=e
>
>         To unassign the mbm_total_bytes configuration on all domains:
>         $echo "mbm_total_bytes:*=_" > mbm_L3_assignments
>         $cat mbm_L3_assignments
>         mbm_total_bytes:0=_;1=_
>         mbm_local_bytes:0=e;1=e
>
>         To assign the mbm_total_bytes configuration on all domains in exclusive mode:
>         $echo "mbm_total_bytes:*=e" > mbm_L3_assignments
>         $cat mbm_L3_assignments
>         mbm_total_bytes:0=e;1=e
>         mbm_local_bytes:0=e;1=e
>
> g. Read the events mbm_total_bytes and mbm_local_bytes of the default group.
>    There is no change in reading the events with ABMC. If the event is unassigned
>    when reading, then the read will come back as "Unassigned".
>
>         # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>         779247936
>         # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>         765207488
>
> h. Check the default event configurations.
>
>         #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>         local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
>         local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
>
>         #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>         local_reads, local_non_temporal_writes, local_reads_slow_memory

These look like the BMEC event names converted from camel case. Will
event filter programming be portable?

Thanks,
-Peter


[1] https://lore.kernel.org/lkml/b3babdac-da08-4dfd-9544-47db31d574f5@intel.com/
Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 6 months, 3 weeks ago
Hi Peter,

Thanks for trying the series.

On 5/19/25 10:59, Peter Newman wrote:
> Hi Babu,
> 
> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
>>
>>
>> This series adds the support for Assignable Bandwidth Monitoring Counters
>> (ABMC). It is also called QoS RMID Pinning feature
>>
>> Series is written such that it is easier to support other assignable
>> features supported from different vendors.
>>
>> The feature details are documented in the  APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC). The documentation is available at
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>
>> The patches are based on top of commit
>> 92a09c47464d0 (tag: v6.15-rc5, tip/irq/merge) Linux 6.15-rc5
>> plus
>> https://lore.kernel.org/lkml/20250515165855.31452-1-james.morse@arm.com/
>>
>> It is very clear these patches will go after James's resctrl FS/ARCH
>> restructure. Hoping to avoid one review cycle due to the merge.
>>
>> # Introduction
>>
>> Users can create as many monitor groups as RMIDs supported by the hardware.
>> However, bandwidth monitoring feature on AMD system only guarantees that
>> RMIDs currently assigned to a processor will be tracked by hardware.
>> The counters of any other RMIDs which are no longer being tracked will be
>> reset to zero. The MBM event counters return "Unavailable" for the RMIDs
>> that are not tracked by hardware. So, there can be only limited number of
>> groups that can give guaranteed monitoring numbers. With ever changing
>> configurations there is no way to definitely know which of these groups
>> are being tracked for certain point of time. Users do not have the option
>> to monitor a group or set of groups for certain period of time without
>> worrying about counter being reset in between.
>>
>> The ABMC feature provides an option to the user to assign a hardware
>> counter to an RMID, event pair and monitor the bandwidth as long as it is
>> assigned.  The assigned RMID will be tracked by the hardware until the user
>> unassigns it manually. There is no need to worry about counters being reset
>> during this period. Additionally, the user can specify a bitmask identifying
>> the specific bandwidth types from the given source to track with the counter.
>>
>> Without ABMC enabled, monitoring will work in current 'default' mode without
>> assignment option.
>>
>> # History
>>
>> Earlier implementation of ABMC had dependancy on BMEC (Bandwidth Monitoring
>> Event Configuration). Peter had concerns with that implementation because
>> it may be not be compatible with ARM's MPAM.
>>
>> Here are the threads discussing the concerns and new interface to address the concerns.
>> https://lore.kernel.org/lkml/CALPaoCg97cLVVAcacnarp+880xjsedEWGJPXhYpy4P7=ky4MZw@mail.gmail.com/
>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>
>> Here are the finalized requirements based on the discussion:
>>
>> *   Remove BMEC dependency on the ABMC feature.
>>
>> *   Eliminate global assignment listing. The interface
>>     /sys/fs/resctrl/info/L3_MON/mbm_assign_control is no longer required.
>>
>> *   Create the configuration directories at /sys/fs/resctrl/info/L3_MON/counter_configs/.
>>     The configuration file names should be free-form, allowing users to create them as needed.
>>
>> *   Perform assignment listing at the group level by introducing mbm_L3_assignments
>>     in each monitoring group. The listing should provide the following details:
>>
>>     Event Configuration: Specifies the event configuration applied. This will be crucial
>>     when "mkdir" on event configuration is added in the future, leading to the creation
>>     of mon_data/mon_l3_*/<event configuration>.
>>
>>     Domains: Identifies the domains where the configuration is applied, supporting multi-domain setups.
>>
>>     Assignment Type: Indicates whether the assignment is Exclusive (e or d), Shared (s), or Unassigned (_).
>>
>> *   Provide option to enable or disable auto assignment when new group is created.
> 
> So far I was able to reenable MBM on AMD implementations (for some
> users) while deferring on the counter assignment interface discussion
> by just making shared assignment the default for newly-created groups.
> Until they want to upgrade assignments to exclusive or break down
> traffic with multiple counters to watch a particular group more
> closely, they won't need to change any assignments.
> 
> Just pointing out that this turned out to be a useful first step in
> deploying ABMC support.

Thank you.

> 
>>
>> This series tries to address all the requirements listed above.
>>
>> # Implementation details
>>
>> Create a generic interface aimed to support user space assignment of scarce
>> counters used for monitoring. First usage of interface is by ABMC with option
>> to expand usage to "soft-ABMC" and MPAM counters in future.
> 
> I'll try to identify any issues I've encountered with "soft-ABMC".
> Hopefully I'll be able to share a sample implementation based on these
> patches soon.

That would be wonderful.

> 
> There's now more interest in Google for allowing explicit control of
> where RMIDs are assigned on Intel platforms. Even though the number of
> RMIDs implemented by hardware tends to be roughly the number of
> containers they want to support, they often still need to create
> containers when all RMIDs have already been allocated, which is not
> currently allowed. Once the container has been created and starts
> running, it's no longer possible to move its threads into a monitoring
> group whenever RMIDs should become available again, so it's important
> for resctrl to maintain an accurate task list for a container even
> when RMIDs are not available.
> 
>>
>> Feature adds following interface files:
>>
>> /sys/fs/resctrl/info/L3_MON/mbm_assign_mode: Reports the list of assignable
>> monitoring features supported. The enclosed brackets indicate which
>> feature is enabled.
>>
>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
>> counters available for assignment.
> 
> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
> represent in a "soft-ABMC" implementation where assignment is
> implemented by assigning an RMID, which would result in all events
> being assigned at once.
> 
> My main concern is how many "counters" you can assign by assigning
> RMIDs. I recall Reinette proposed reporting the number of groups which
> can be assigned separately from counters which can be assigned.

More context may be needed here. Currently, num_mbm_cntrs indicates the
number of counters available per domain, which is 32.

At the moment, we can assign 2 counters to each group, meaning each RMID
can be associated with 2 hardware counters. In theory, it's possible to
assign all 32 hardware counters to a group—allowing one RMID to be linked
with up to 32 counters. However, we currently lack the interface to
support that level of assignment.

For now, the plan is to support basic assignment and expand functionality
later once we have the necessary data structure and requirements.

> 
>>
>> /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs: Reports the number of monitoring
>> counters free in each domain.
>>
>> /sys/fs/resctrl/info/L3_MON/counter_configs : Directory to hold the counter configuration.
>>
>> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter : Default configuration
>> for MBM total events.
>>
>> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter : Default configuration
>> for MBM local events.
> 
> IIUC, this needs to be implemented now so you can drop BMEC with this series?

This series hides the configuration files (mbm_local_bytes_config and
mbm_total_bytes_config) required for BMEC when ABMC is enabled.

When the user switches back to "default" mode, BMEC becomes available
again. I believe it's a good approach to keep it this way.

> 
>>
>> /sys/fs/resctrl/mbm_L3_assignments: Interface to list or modify assignment states on each group.
>>
>> # Examples
>>
>> a. Check if ABMC support is available
>>         #mount -t resctrl resctrl /sys/fs/resctrl/
>>
>>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>         [mbm_cntr_assign]
>>         default
>>
>>         ABMC feature is detected and it is enabled.
>>
>> b. Check how many ABMC counters are available.
>>
>>         # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>>         32
>>
>> c. Check how many ABMC counters are available in each domain.
>>
>>         # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
>>         0=30;1=30
>>
>> d. Check default counter configuration.
>>
>>         # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>         local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
>>         local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
>>
>>         # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>         local_reads, local_non_temporal_writes, local_reads_slow_memory
>>
>> e. Series adds a new interface file "mbm_L3_assignments" in each monitoring group
>>    to list and modify any group's monitoring states.
> 
> To confirm, would we have "mbm_<resource_name>_assignments" for each
> resource where MBM-ish events could be assigned?

This is a group-level property—it resides within each group and is not
related to any specific resource.

> 
>>
>>         The list is displayed in the following format:
>>
>>         <Event configuration>:<Domain id>=<Assignment type>
> 
> For soft-ABMC assignment, is there just a single event configuration
> representing all the events tracked by the RMID?


I’m not sure about the details of how soft-ABMC will be supported. It’s
not available at the moment, but I believe it can be added once soft-ABMC
support is in place.

> 
>>
>>         Event configuration: A valid event configuration listed in the
>>         /sys/fs/resctrl/info/L3_MON/counter_configs directory.
>>
>>         Domain ID: A valid domain ID number.
>>
>>         Assignment types:
>>
>>         _ : No event configuration assigned
>>
>>         e : Event configuration assigned in exclusive mode
>>
>>         To list the default group states:
>>         # cat /sys/fs/resctrl/mbm_L3_assignments
>>         mbm_total_bytes:0=e;1=e
>>         mbm_local_bytes:0=e;1=e
>>
>>         To unassign the configuration of mbm_total_bytes on domain 0:
>>         #echo "mbm_total_bytes:0=_" > mbm_L3_assignments
>>         #cat mbm_L3_assignments
>>         mbm_total_bytes:0=_;1=e
>>         mbm_local_bytes:0=e;1=e
>>
>>         To unassign the mbm_total_bytes configuration on all domains:
>>         $echo "mbm_total_bytes:*=_" > mbm_L3_assignments
>>         $cat mbm_L3_assignments
>>         mbm_total_bytes:0=_;1=_
>>         mbm_local_bytes:0=e;1=e
>>
>>         To assign the mbm_total_bytes configuration on all domains in exclusive mode:
>>         $echo "mbm_total_bytes:*=e" > mbm_L3_assignments
>>         $cat mbm_L3_assignments
>>         mbm_total_bytes:0=e;1=e
>>         mbm_local_bytes:0=e;1=e
>>
>> g. Read the events mbm_total_bytes and mbm_local_bytes of the default group.
>>    There is no change in reading the events with ABMC. If the event is unassigned
>>    when reading, then the read will come back as "Unassigned".
>>
>>         # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>         779247936
>>         # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>         765207488
>>
>> h. Check the default event configurations.
>>
>>         #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>         local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
>>         local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
>>
>>         #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>         local_reads, local_non_temporal_writes, local_reads_slow_memory
> 
> These look like the BMEC event names converted from camel case. Will
> event filter programming be portable?


Yes, that’s correct. The event types (reads, writes, etc.) supported by
both BMEC and ABMC are the same, so I’ve used generalized names here.

As for portability, I can’t comment, since I’m not familiar with how event
configuration is handled in MPAM or other architectures.

> 
> Thanks,
> -Peter
> 
> 
> [1] https://lore.kernel.org/lkml/b3babdac-da08-4dfd-9544-47db31d574f5@intel.com/

-- 
Thanks
Babu Moger
Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 6 months, 3 weeks ago
Hi Babu,

On 5/20/25 8:28 AM, Moger, Babu wrote:
> On 5/19/25 10:59, Peter Newman wrote:
>> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:

...

>>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
>>> counters available for assignment.
>>
>> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
>> represent in a "soft-ABMC" implementation where assignment is
>> implemented by assigning an RMID, which would result in all events
>> being assigned at once.
>>
>> My main concern is how many "counters" you can assign by assigning
>> RMIDs. I recall Reinette proposed reporting the number of groups which
>> can be assigned separately from counters which can be assigned.
> 
> More context may be needed here. Currently, num_mbm_cntrs indicates the
> number of counters available per domain, which is 32.
> 
> At the moment, we can assign 2 counters to each group, meaning each RMID
> can be associated with 2 hardware counters. In theory, it's possible to
> assign all 32 hardware counters to a group—allowing one RMID to be linked
> with up to 32 counters. However, we currently lack the interface to
> support that level of assignment.
> 
> For now, the plan is to support basic assignment and expand functionality
> later once we have the necessary data structure and requirements.

Looks like some requirements did not make it into this implementation.
Do you recall the discussion that resulted in you writing [2]? Looks like
there is a question to Peter in there on how to determine how many "counters"
are available in soft-ABMC. I interpreted [3] at that time to mean that this
information would be available in a future AMD publication.

Reinette

[2] https://lore.kernel.org/lkml/afb99efe-0de2-f7ad-d0b8-f2a0ea998efd@amd.com/ 
[3] https://lore.kernel.org/lkml/CALPaoCg3KpF94g2MEmfP_Ro2mQZYFA8sKVkmb+7isotKNgdY9A@mail.gmail.com/
Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 6 months, 3 weeks ago
Hi Reinette,

On 5/20/25 11:06, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/20/25 8:28 AM, Moger, Babu wrote:
>> On 5/19/25 10:59, Peter Newman wrote:
>>> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
> 
> ...
> 
>>>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
>>>> counters available for assignment.
>>>
>>> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
>>> represent in a "soft-ABMC" implementation where assignment is
>>> implemented by assigning an RMID, which would result in all events
>>> being assigned at once.
>>>
>>> My main concern is how many "counters" you can assign by assigning
>>> RMIDs. I recall Reinette proposed reporting the number of groups which
>>> can be assigned separately from counters which can be assigned.
>>
>> More context may be needed here. Currently, num_mbm_cntrs indicates the
>> number of counters available per domain, which is 32.
>>
>> At the moment, we can assign 2 counters to each group, meaning each RMID
>> can be associated with 2 hardware counters. In theory, it's possible to
>> assign all 32 hardware counters to a group—allowing one RMID to be linked
>> with up to 32 counters. However, we currently lack the interface to
>> support that level of assignment.
>>
>> For now, the plan is to support basic assignment and expand functionality
>> later once we have the necessary data structure and requirements.
> 
> Looks like some requirements did not make it into this implementation.
> Do you recall the discussion that resulted in you writing [2]? Looks like
> there is a question to Peter in there on how to determine how many "counters"
> are available in soft-ABMC. I interpreted [3] at that time to mean that this
> information would be available in a future AMD publication.

We already have a method to determine the number of counters in soft-ABMC
mode, which Peter has addressed [4].

[4]
https://lore.kernel.org/lkml/20250203132642.2746754-1-peternewman@google.com/

This appears to be more of a workaround, and I doubt it will be included
in any official AMD documentation. Additionally, the long-term direction
is moving towards ABMC.

I don’t believe this workaround needs to be part of the current series. It
can be added later when soft-ABMC is implemented.

> 
> Reinette
> 
> [2] https://lore.kernel.org/lkml/afb99efe-0de2-f7ad-d0b8-f2a0ea998efd@amd.com/ 
> [3] https://lore.kernel.org/lkml/CALPaoCg3KpF94g2MEmfP_Ro2mQZYFA8sKVkmb+7isotKNgdY9A@mail.gmail.com/

-- 
Thanks
Babu Moger
Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 6 months, 3 weeks ago
Hi Babu,

On 5/20/25 10:51 AM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 5/20/25 11:06, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 5/20/25 8:28 AM, Moger, Babu wrote:
>>> On 5/19/25 10:59, Peter Newman wrote:
>>>> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
>>
>> ...
>>
>>>>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
>>>>> counters available for assignment.
>>>>
>>>> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
>>>> represent in a "soft-ABMC" implementation where assignment is
>>>> implemented by assigning an RMID, which would result in all events
>>>> being assigned at once.
>>>>
>>>> My main concern is how many "counters" you can assign by assigning
>>>> RMIDs. I recall Reinette proposed reporting the number of groups which
>>>> can be assigned separately from counters which can be assigned.
>>>
>>> More context may be needed here. Currently, num_mbm_cntrs indicates the
>>> number of counters available per domain, which is 32.
>>>
>>> At the moment, we can assign 2 counters to each group, meaning each RMID
>>> can be associated with 2 hardware counters. In theory, it's possible to
>>> assign all 32 hardware counters to a group—allowing one RMID to be linked
>>> with up to 32 counters. However, we currently lack the interface to
>>> support that level of assignment.
>>>
>>> For now, the plan is to support basic assignment and expand functionality
>>> later once we have the necessary data structure and requirements.
>>
>> Looks like some requirements did not make it into this implementation.
>> Do you recall the discussion that resulted in you writing [2]? Looks like
>> there is a question to Peter in there on how to determine how many "counters"
>> are available in soft-ABMC. I interpreted [3] at that time to mean that this
>> information would be available in a future AMD publication.
> 
> We already have a method to determine the number of counters in soft-ABMC
> mode, which Peter has addressed [4].
> 
> [4]
> https://lore.kernel.org/lkml/20250203132642.2746754-1-peternewman@google.com/
> 
> This appears to be more of a workaround, and I doubt it will be included
> in any official AMD documentation. Additionally, the long-term direction
> is moving towards ABMC.
> 
> I don’t believe this workaround needs to be part of the current series. It
> can be added later when soft-ABMC is implemented.

Agreed. What about the plans described in [2]? (Thanks to Peter for
catching this!).

It is important to keep track of requirements while working on a feature to
ensure that the implementation supports the planned use cases. Re-reading that
thread it is not clear to me how soft-ABMC's per-group assignment would look.
Could you please share how you see it progress from this implementation?
This includes the single event vs. multiple event assignment. I would like to
highlight that this is not a request for this to be supported in this implementation
but there needs to be a plan for how this can be supported on top of interfaces
established by this work.

Reinette

> 
>>
>> Reinette
>>
>> [2] https://lore.kernel.org/lkml/afb99efe-0de2-f7ad-d0b8-f2a0ea998efd@amd.com/ 
>> [3] https://lore.kernel.org/lkml/CALPaoCg3KpF94g2MEmfP_Ro2mQZYFA8sKVkmb+7isotKNgdY9A@mail.gmail.com/
> 

Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 6 months, 3 weeks ago
Hi Reinette,

On 5/20/2025 1:23 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/20/25 10:51 AM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 5/20/25 11:06, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 5/20/25 8:28 AM, Moger, Babu wrote:
>>>> On 5/19/25 10:59, Peter Newman wrote:
>>>>> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
>>>
>>> ...
>>>
>>>>>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
>>>>>> counters available for assignment.
>>>>>
>>>>> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
>>>>> represent in a "soft-ABMC" implementation where assignment is
>>>>> implemented by assigning an RMID, which would result in all events
>>>>> being assigned at once.
>>>>>
>>>>> My main concern is how many "counters" you can assign by assigning
>>>>> RMIDs. I recall Reinette proposed reporting the number of groups which
>>>>> can be assigned separately from counters which can be assigned.
>>>>
>>>> More context may be needed here. Currently, num_mbm_cntrs indicates the
>>>> number of counters available per domain, which is 32.
>>>>
>>>> At the moment, we can assign 2 counters to each group, meaning each RMID
>>>> can be associated with 2 hardware counters. In theory, it's possible to
>>>> assign all 32 hardware counters to a group—allowing one RMID to be linked
>>>> with up to 32 counters. However, we currently lack the interface to
>>>> support that level of assignment.
>>>>
>>>> For now, the plan is to support basic assignment and expand functionality
>>>> later once we have the necessary data structure and requirements.
>>>
>>> Looks like some requirements did not make it into this implementation.
>>> Do you recall the discussion that resulted in you writing [2]? Looks like
>>> there is a question to Peter in there on how to determine how many "counters"
>>> are available in soft-ABMC. I interpreted [3] at that time to mean that this
>>> information would be available in a future AMD publication.
>>
>> We already have a method to determine the number of counters in soft-ABMC
>> mode, which Peter has addressed [4].
>>
>> [4]
>> https://lore.kernel.org/lkml/20250203132642.2746754-1-peternewman@google.com/
>>
>> This appears to be more of a workaround, and I doubt it will be included
>> in any official AMD documentation. Additionally, the long-term direction
>> is moving towards ABMC.
>>
>> I don’t believe this workaround needs to be part of the current series. It
>> can be added later when soft-ABMC is implemented.
> 
> Agreed. What about the plans described in [2]? (Thanks to Peter for
> catching this!).
> 
> It is important to keep track of requirements while working on a feature to
> ensure that the implementation supports the planned use cases. Re-reading that
> thread it is not clear to me how soft-ABMC's per-group assignment would look.
> Could you please share how you see it progress from this implementation?
> This includes the single event vs. multiple event assignment. I would like to
> highlight that this is not a request for this to be supported in this implementation
> but there needs to be a plan for how this can be supported on top of interfaces
> established by this work.
> 

Here’s my current understanding of soft-ABMC. Peter may have a more 
in-depth perspective on this.

Soft-ABMC:
a. num_mbm_cntrs: This is a software-defined limit based on the number 
of active RMIDs that can be supported. The value can be obtained using 
the code referenced in [4].

b. Assignments: No hardware configuration is required. We simply need to 
ensure that no more than num_mbm_cntrs RMIDs are active at any given time.

c. Configuration: Controlled via /info/L3_MON/mbm_total_bytes_config and 
mbm_local_bytes_config.

d. Events: Only two events can be assigned(local and total).

ABMC:
a. num_mbm_cntrs: This is defined by the hardware.
b. Assignments: Requires special MSR writes to assign counters.
c. Configuration: Comes from /info/L3_MON/counter_configs/.
d. Events: More than two events can be assigned to a group (currently up 
to 2).

Commonalities:
a. Assignments can be either exclusive or shared in both these modes.

Given these, I believe we can easily accommodate soft-ABMC in this 
interface.

>>>
>>> [2] https://lore.kernel.org/lkml/afb99efe-0de2-f7ad-d0b8-f2a0ea998efd@amd.com/
>>> [3] https://lore.kernel.org/lkml/CALPaoCg3KpF94g2MEmfP_Ro2mQZYFA8sKVkmb+7isotKNgdY9A@mail.gmail.com/
>>
> 
> 

Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 6 months, 3 weeks ago
Hi Babu,

On 5/20/25 4:25 PM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 5/20/2025 1:23 PM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 5/20/25 10:51 AM, Moger, Babu wrote:
>>> Hi Reinette,
>>>
>>> On 5/20/25 11:06, Reinette Chatre wrote:
>>>> Hi Babu,
>>>>
>>>> On 5/20/25 8:28 AM, Moger, Babu wrote:
>>>>> On 5/19/25 10:59, Peter Newman wrote:
>>>>>> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
>>>>
>>>> ...
>>>>
>>>>>>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
>>>>>>> counters available for assignment.
>>>>>>
>>>>>> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
>>>>>> represent in a "soft-ABMC" implementation where assignment is
>>>>>> implemented by assigning an RMID, which would result in all events
>>>>>> being assigned at once.
>>>>>>
>>>>>> My main concern is how many "counters" you can assign by assigning
>>>>>> RMIDs. I recall Reinette proposed reporting the number of groups which
>>>>>> can be assigned separately from counters which can be assigned.
>>>>>
>>>>> More context may be needed here. Currently, num_mbm_cntrs indicates the
>>>>> number of counters available per domain, which is 32.
>>>>>
>>>>> At the moment, we can assign 2 counters to each group, meaning each RMID
>>>>> can be associated with 2 hardware counters. In theory, it's possible to
>>>>> assign all 32 hardware counters to a group—allowing one RMID to be linked
>>>>> with up to 32 counters. However, we currently lack the interface to
>>>>> support that level of assignment.
>>>>>
>>>>> For now, the plan is to support basic assignment and expand functionality
>>>>> later once we have the necessary data structure and requirements.
>>>>
>>>> Looks like some requirements did not make it into this implementation.
>>>> Do you recall the discussion that resulted in you writing [2]? Looks like
>>>> there is a question to Peter in there on how to determine how many "counters"
>>>> are available in soft-ABMC. I interpreted [3] at that time to mean that this
>>>> information would be available in a future AMD publication.
>>>
>>> We already have a method to determine the number of counters in soft-ABMC
>>> mode, which Peter has addressed [4].
>>>
>>> [4]
>>> https://lore.kernel.org/lkml/20250203132642.2746754-1-peternewman@google.com/
>>>
>>> This appears to be more of a workaround, and I doubt it will be included
>>> in any official AMD documentation. Additionally, the long-term direction
>>> is moving towards ABMC.
>>>
>>> I don’t believe this workaround needs to be part of the current series. It
>>> can be added later when soft-ABMC is implemented.
>>
>> Agreed. What about the plans described in [2]? (Thanks to Peter for
>> catching this!).
>>
>> It is important to keep track of requirements while working on a feature to
>> ensure that the implementation supports the planned use cases. Re-reading that
>> thread it is not clear to me how soft-ABMC's per-group assignment would look.
>> Could you please share how you see it progress from this implementation?
>> This includes the single event vs. multiple event assignment. I would like to
>> highlight that this is not a request for this to be supported in this implementation
>> but there needs to be a plan for how this can be supported on top of interfaces
>> established by this work.
>>
> 
> Here’s my current understanding of soft-ABMC. Peter may have a more in-depth perspective on this.
> 
> Soft-ABMC:
> a. num_mbm_cntrs: This is a software-defined limit based on the number of active RMIDs that can be supported. The value can be obtained using the code referenced in [4].
> 
> b. Assignments: No hardware configuration is required. We simply need to ensure that no more than num_mbm_cntrs RMIDs are active at any given time.
> 
> c. Configuration: Controlled via /info/L3_MON/mbm_total_bytes_config and mbm_local_bytes_config.
> 
> d. Events: Only two events can be assigned(local and total).
> 
> ABMC:
> a. num_mbm_cntrs: This is defined by the hardware.
> b. Assignments: Requires special MSR writes to assign counters.
> c. Configuration: Comes from /info/L3_MON/counter_configs/.
> d. Events: More than two events can be assigned to a group (currently up to 2).
> 
> Commonalities:
> a. Assignments can be either exclusive or shared in both these modes.
> 
> Given these, I believe we can easily accommodate soft-ABMC in this interface.

This is not so obvious to me. It looks to me as though the user is forced to interpret
the content of resctrl files differently based on soft-ABMC vs ABMC making the interface 
inconsistent and user thus needing to know details of implementations. This is what the previous
discussion I linked to aimed to address. It sounds to me as though you believe that this is no longer
an issue. Could you please show examples of what a user can expect from the interfaces and how a user
will interact with the interfaces on both a non-ABMC and ABMC system? 

Thank you

Reinette

> 
>>>>
>>>> [2] https://lore.kernel.org/lkml/afb99efe-0de2-f7ad-d0b8-f2a0ea998efd@amd.com/
>>>> [3] https://lore.kernel.org/lkml/CALPaoCg3KpF94g2MEmfP_Ro2mQZYFA8sKVkmb+7isotKNgdY9A@mail.gmail.com/
>>>
>>
>>
> 

Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 6 months, 3 weeks ago
Hi Reinette,

On Wed, May 21, 2025 at 1:44 AM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Babu,
>
> On 5/20/25 4:25 PM, Moger, Babu wrote:
> > Hi Reinette,
> >
> > On 5/20/2025 1:23 PM, Reinette Chatre wrote:
> >> Hi Babu,
> >>
> >> On 5/20/25 10:51 AM, Moger, Babu wrote:
> >>> Hi Reinette,
> >>>
> >>> On 5/20/25 11:06, Reinette Chatre wrote:
> >>>> Hi Babu,
> >>>>
> >>>> On 5/20/25 8:28 AM, Moger, Babu wrote:
> >>>>> On 5/19/25 10:59, Peter Newman wrote:
> >>>>>> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
> >>>>
> >>>> ...
> >>>>
> >>>>>>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
> >>>>>>> counters available for assignment.
> >>>>>>
> >>>>>> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
> >>>>>> represent in a "soft-ABMC" implementation where assignment is
> >>>>>> implemented by assigning an RMID, which would result in all events
> >>>>>> being assigned at once.
> >>>>>>
> >>>>>> My main concern is how many "counters" you can assign by assigning
> >>>>>> RMIDs. I recall Reinette proposed reporting the number of groups which
> >>>>>> can be assigned separately from counters which can be assigned.
> >>>>>
> >>>>> More context may be needed here. Currently, num_mbm_cntrs indicates the
> >>>>> number of counters available per domain, which is 32.
> >>>>>
> >>>>> At the moment, we can assign 2 counters to each group, meaning each RMID
> >>>>> can be associated with 2 hardware counters. In theory, it's possible to
> >>>>> assign all 32 hardware counters to a group—allowing one RMID to be linked
> >>>>> with up to 32 counters. However, we currently lack the interface to
> >>>>> support that level of assignment.
> >>>>>
> >>>>> For now, the plan is to support basic assignment and expand functionality
> >>>>> later once we have the necessary data structure and requirements.
> >>>>
> >>>> Looks like some requirements did not make it into this implementation.
> >>>> Do you recall the discussion that resulted in you writing [2]? Looks like
> >>>> there is a question to Peter in there on how to determine how many "counters"
> >>>> are available in soft-ABMC. I interpreted [3] at that time to mean that this
> >>>> information would be available in a future AMD publication.
> >>>
> >>> We already have a method to determine the number of counters in soft-ABMC
> >>> mode, which Peter has addressed [4].
> >>>
> >>> [4]
> >>> https://lore.kernel.org/lkml/20250203132642.2746754-1-peternewman@google.com/
> >>>
> >>> This appears to be more of a workaround, and I doubt it will be included
> >>> in any official AMD documentation. Additionally, the long-term direction
> >>> is moving towards ABMC.
> >>>
> >>> I don’t believe this workaround needs to be part of the current series. It
> >>> can be added later when soft-ABMC is implemented.
> >>
> >> Agreed. What about the plans described in [2]? (Thanks to Peter for
> >> catching this!).
> >>
> >> It is important to keep track of requirements while working on a feature to
> >> ensure that the implementation supports the planned use cases. Re-reading that
> >> thread it is not clear to me how soft-ABMC's per-group assignment would look.
> >> Could you please share how you see it progress from this implementation?
> >> This includes the single event vs. multiple event assignment. I would like to
> >> highlight that this is not a request for this to be supported in this implementation
> >> but there needs to be a plan for how this can be supported on top of interfaces
> >> established by this work.
> >>
> >
> > Here’s my current understanding of soft-ABMC. Peter may have a more in-depth perspective on this.
> >
> > Soft-ABMC:
> > a. num_mbm_cntrs: This is a software-defined limit based on the number of active RMIDs that can be supported. The value can be obtained using the code referenced in [4].
> >
> > b. Assignments: No hardware configuration is required. We simply need to ensure that no more than num_mbm_cntrs RMIDs are active at any given time.
> >
> > c. Configuration: Controlled via /info/L3_MON/mbm_total_bytes_config and mbm_local_bytes_config.
> >
> > d. Events: Only two events can be assigned(local and total).
> >
> > ABMC:
> > a. num_mbm_cntrs: This is defined by the hardware.
> > b. Assignments: Requires special MSR writes to assign counters.
> > c. Configuration: Comes from /info/L3_MON/counter_configs/.
> > d. Events: More than two events can be assigned to a group (currently up to 2).
> >
> > Commonalities:
> > a. Assignments can be either exclusive or shared in both these modes.
> >
> > Given these, I believe we can easily accommodate soft-ABMC in this interface.
>
> This is not so obvious to me. It looks to me as though the user is forced to interpret
> the content of resctrl files differently based on soft-ABMC vs ABMC making the interface
> inconsistent and user thus needing to know details of implementations. This is what the previous
> discussion I linked to aimed to address. It sounds to me as though you believe that this is no longer
> an issue. Could you please show examples of what a user can expect from the interfaces and how a user
> will interact with the interfaces on both a non-ABMC and ABMC system?

At the interface level, I think mbm_L3_assignments on a non-ABMC
system would only need to contain a single line:

0=s;1=s;...;31=s

But maybe for consistency we would synthesize a single, unmodifiable
counter configuration to reflect that allocating an RMID in a domain
results in assignment to all events and deallocating the RMID
unassigns all events. We could call it "group" to say it's assigning
at the group level, or perhaps just '*':

*:0=s;1=s;...;31=s

I'm not sure about allowing a '*' on ABMC hardware, because it could
be interpreted as allocating a lot of counters when a large number of
event configurations exist.

*:0=s;1=s;...;31=s

-Peter


>
> Thank you
>
> Reinette
>
> >
> >>>>
> >>>> [2] https://lore.kernel.org/lkml/afb99efe-0de2-f7ad-d0b8-f2a0ea998efd@amd.com/
> >>>> [3] https://lore.kernel.org/lkml/CALPaoCg3KpF94g2MEmfP_Ro2mQZYFA8sKVkmb+7isotKNgdY9A@mail.gmail.com/
> >>>
> >>
> >>
> >
>
Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 6 months, 3 weeks ago
Hi Peter,

On 5/21/25 7:27 AM, Peter Newman wrote:
> On Wed, May 21, 2025 at 1:44 AM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>> On 5/20/25 4:25 PM, Moger, Babu wrote:

...
>>>
>>> Here’s my current understanding of soft-ABMC. Peter may have a more in-depth perspective on this.
>>>
>>> Soft-ABMC:
>>> a. num_mbm_cntrs: This is a software-defined limit based on the number of active RMIDs that can be supported. The value can be obtained using the code referenced in [4].
>>>
>>> b. Assignments: No hardware configuration is required. We simply need to ensure that no more than num_mbm_cntrs RMIDs are active at any given time.
>>>
>>> c. Configuration: Controlled via /info/L3_MON/mbm_total_bytes_config and mbm_local_bytes_config.
>>>
>>> d. Events: Only two events can be assigned(local and total).
>>>
>>> ABMC:
>>> a. num_mbm_cntrs: This is defined by the hardware.
>>> b. Assignments: Requires special MSR writes to assign counters.
>>> c. Configuration: Comes from /info/L3_MON/counter_configs/.
>>> d. Events: More than two events can be assigned to a group (currently up to 2).
>>>
>>> Commonalities:
>>> a. Assignments can be either exclusive or shared in both these modes.
>>>
>>> Given these, I believe we can easily accommodate soft-ABMC in this interface.
>>
>> This is not so obvious to me. It looks to me as though the user is forced to interpret
>> the content of resctrl files differently based on soft-ABMC vs ABMC making the interface
>> inconsistent and user thus needing to know details of implementations. This is what the previous
>> discussion I linked to aimed to address. It sounds to me as though you believe that this is no longer
>> an issue. Could you please show examples of what a user can expect from the interfaces and how a user
>> will interact with the interfaces on both a non-ABMC and ABMC system?
> 
> At the interface level, I think mbm_L3_assignments on a non-ABMC
> system would only need to contain a single line:
> 
> 0=s;1=s;...;31=s

It should be obvious to user space how to interpret the fields. When there is
thus a single "mbm_cntr_assign" mode used for ABMC and soft-ABMC a single
line like this would be difficult to parse since that would imply/require
that user space knows whether it is running on ABMC or soft-ABMC system,
which we should avoid.

If there are different modes, for example "mbm_cntr_event_assign" and
"mbm_cntr_group_assign" then this could be used by user space to distinguish
how to interact with mbm_L3_assignments making something like this possible.

> 
> But maybe for consistency we would synthesize a single, unmodifiable
> counter configuration to reflect that allocating an RMID in a domain
> results in assignment to all events and deallocating the RMID
> unassigns all events. We could call it "group" to say it's assigning
> at the group level, or perhaps just '*':
> 
> *:0=s;1=s;...;31=s
> 
> I'm not sure about allowing a '*' on ABMC hardware, because it could
> be interpreted as allocating a lot of counters when a large number of
> event configurations exist.
> 
> *:0=s;1=s;...;31=s
> 

Either could work also. Whether it is "group" or "*" ABMC systems could
respond with "not supported". Will think about this more but would
like to hear your opinion about the flexibility that distinguishing between
a "mbm_cntr_event_assign" and "mbm_cntr_group_assign" mode provides.

Reinette

> -Peter
> 
> 
>>
>> Thank you
>>
>> Reinette
>>
>>>
>>>>>>
>>>>>> [2] https://lore.kernel.org/lkml/afb99efe-0de2-f7ad-d0b8-f2a0ea998efd@amd.com/
>>>>>> [3] https://lore.kernel.org/lkml/CALPaoCg3KpF94g2MEmfP_Ro2mQZYFA8sKVkmb+7isotKNgdY9A@mail.gmail.com/
>>>>>
>>>>
>>>>
>>>
>>

Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 6 months, 3 weeks ago
Hi Reinette,

On Thu, May 22, 2025 at 1:05 AM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Peter,
>
> On 5/21/25 7:27 AM, Peter Newman wrote:
> > On Wed, May 21, 2025 at 1:44 AM Reinette Chatre
> > <reinette.chatre@intel.com> wrote:
> >> On 5/20/25 4:25 PM, Moger, Babu wrote:
>
> ...
> >>>
> >>> Here’s my current understanding of soft-ABMC. Peter may have a more in-depth perspective on this.
> >>>
> >>> Soft-ABMC:
> >>> a. num_mbm_cntrs: This is a software-defined limit based on the number of active RMIDs that can be supported. The value can be obtained using the code referenced in [4].
> >>>
> >>> b. Assignments: No hardware configuration is required. We simply need to ensure that no more than num_mbm_cntrs RMIDs are active at any given time.
> >>>
> >>> c. Configuration: Controlled via /info/L3_MON/mbm_total_bytes_config and mbm_local_bytes_config.
> >>>
> >>> d. Events: Only two events can be assigned(local and total).
> >>>
> >>> ABMC:
> >>> a. num_mbm_cntrs: This is defined by the hardware.
> >>> b. Assignments: Requires special MSR writes to assign counters.
> >>> c. Configuration: Comes from /info/L3_MON/counter_configs/.
> >>> d. Events: More than two events can be assigned to a group (currently up to 2).
> >>>
> >>> Commonalities:
> >>> a. Assignments can be either exclusive or shared in both these modes.
> >>>
> >>> Given these, I believe we can easily accommodate soft-ABMC in this interface.
> >>
> >> This is not so obvious to me. It looks to me as though the user is forced to interpret
> >> the content of resctrl files differently based on soft-ABMC vs ABMC making the interface
> >> inconsistent and user thus needing to know details of implementations. This is what the previous
> >> discussion I linked to aimed to address. It sounds to me as though you believe that this is no longer
> >> an issue. Could you please show examples of what a user can expect from the interfaces and how a user
> >> will interact with the interfaces on both a non-ABMC and ABMC system?
> >
> > At the interface level, I think mbm_L3_assignments on a non-ABMC
> > system would only need to contain a single line:
> >
> > 0=s;1=s;...;31=s
>
> It should be obvious to user space how to interpret the fields. When there is
> thus a single "mbm_cntr_assign" mode used for ABMC and soft-ABMC a single
> line like this would be difficult to parse since that would imply/require
> that user space knows whether it is running on ABMC or soft-ABMC system,
> which we should avoid.
>
> If there are different modes, for example "mbm_cntr_event_assign" and
> "mbm_cntr_group_assign" then this could be used by user space to distinguish
> how to interact with mbm_L3_assignments making something like this possible.

I meant to say I was proposing the format of this file when in the
group assignment mode. I didn't mean to imply that a separate mode
wasn't needed.

>
> >
> > But maybe for consistency we would synthesize a single, unmodifiable
> > counter configuration to reflect that allocating an RMID in a domain
> > results in assignment to all events and deallocating the RMID
> > unassigns all events. We could call it "group" to say it's assigning
> > at the group level, or perhaps just '*':
> >
> > *:0=s;1=s;...;31=s
> >
> > I'm not sure about allowing a '*' on ABMC hardware, because it could
> > be interpreted as allocating a lot of counters when a large number of
> > event configurations exist.
> >
> > *:0=s;1=s;...;31=s
> >
>
> Either could work also. Whether it is "group" or "*" ABMC systems could
> respond with "not supported". Will think about this more but would
> like to hear your opinion about the flexibility that distinguishing between
> a "mbm_cntr_event_assign" and "mbm_cntr_group_assign" mode provides.

I agree it's clearer when they are separate modes. Between "*" and
"group", I prefer "group" because it seems the least ambiguous.

I just want to make sure we'd never want both modes at the same time,
such as an implementation with both a small number of monitoring IDs
and a small number of MBM counters. I support one MPAM implementation
that has a small number of PARTIDs and only one MBWU counter per
domain. Fingers crossed that the number of PARTIDs it supports isn't
small compared to the number of jobs we would run on it. Otherwise
maybe it will work out to just pick the more limited of the two
(monitor IDs or counters) and make allocation of one drive the other.

(In case you read this before my earlier reply[1], see the note about
rdtgroup pointers in the task_struct, as this is a prerequisite for
overcommitting HW monitor IDs.)

Thanks,
-Peter

[1] https://lore.kernel.org/lkml/CALPaoCjh_NXQLtNBqei=7a6Jsr17fEnPO+kqMaNq4xNu2UPDJA@mail.gmail.com/

>
> Reinette
>
> > -Peter
> >
> >
> >>
> >> Thank you
> >>
> >> Reinette
> >>
> >>>
> >>>>>>
> >>>>>> [2] https://lore.kernel.org/lkml/afb99efe-0de2-f7ad-d0b8-f2a0ea998efd@amd.com/
> >>>>>> [3] https://lore.kernel.org/lkml/CALPaoCg3KpF94g2MEmfP_Ro2mQZYFA8sKVkmb+7isotKNgdY9A@mail.gmail.com/
> >>>>>
> >>>>
> >>>>
> >>>
> >>
>
Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 6 months, 3 weeks ago
Hi Peter,

On 5/22/25 2:14 AM, Peter Newman wrote:
> On Thu, May 22, 2025 at 1:05 AM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>> On 5/21/25 7:27 AM, Peter Newman wrote:

...

>>> At the interface level, I think mbm_L3_assignments on a non-ABMC
>>> system would only need to contain a single line:
>>>
>>> 0=s;1=s;...;31=s
>>
>> It should be obvious to user space how to interpret the fields. When there is
>> thus a single "mbm_cntr_assign" mode used for ABMC and soft-ABMC a single
>> line like this would be difficult to parse since that would imply/require
>> that user space knows whether it is running on ABMC or soft-ABMC system,
>> which we should avoid.
>>
>> If there are different modes, for example "mbm_cntr_event_assign" and
>> "mbm_cntr_group_assign" then this could be used by user space to distinguish
>> how to interact with mbm_L3_assignments making something like this possible.
> 
> I meant to say I was proposing the format of this file when in the
> group assignment mode. I didn't mean to imply that a separate mode
> wasn't needed.

Thanks for confirming.

>>> But maybe for consistency we would synthesize a single, unmodifiable
>>> counter configuration to reflect that allocating an RMID in a domain
>>> results in assignment to all events and deallocating the RMID
>>> unassigns all events. We could call it "group" to say it's assigning
>>> at the group level, or perhaps just '*':
>>>
>>> *:0=s;1=s;...;31=s
>>>
>>> I'm not sure about allowing a '*' on ABMC hardware, because it could
>>> be interpreted as allocating a lot of counters when a large number of
>>> event configurations exist.
>>>
>>> *:0=s;1=s;...;31=s
>>>
>>
>> Either could work also. Whether it is "group" or "*" ABMC systems could
>> respond with "not supported". Will think about this more but would
>> like to hear your opinion about the flexibility that distinguishing between
>> a "mbm_cntr_event_assign" and "mbm_cntr_group_assign" mode provides.
> 
> I agree it's clearer when they are separate modes. Between "*" and
> "group", I prefer "group" because it seems the least ambiguous.

Sounds good to me. resctrl will need extra guards to prevent user
from creating an event named "group" but this matches what resctrl already
needs to do for other parts (eg. user cannot create a monitor group named
"mon_groups").

> 
> I just want to make sure we'd never want both modes at the same time,
> such as an implementation with both a small number of monitoring IDs

hmmm ... my assumption was that a system could only support one of these
modes ("mbm_cntr_event_assign" or "mbm_cntr_group_assign") but it could
be possible to have both possible on a system. But beyond that to have
both active at the *same* time? That will take a lot of wrangling
during runtime.

> and a small number of MBM counters. I support one MPAM implementation
> that has a small number of PARTIDs and only one MBWU counter per
> domain. Fingers crossed that the number of PARTIDs it supports isn't
> small compared to the number of jobs we would run on it. Otherwise
> maybe it will work out to just pick the more limited of the two
> (monitor IDs or counters) and make allocation of one drive the other.

Could a scenario like this be addressed by "mbm_cntr_event_assign" mode
gaining support for "shared assignment"?


> (In case you read this before my earlier reply[1], see the note about
> rdtgroup pointers in the task_struct, as this is a prerequisite for
> overcommitting HW monitor IDs.)
> 

Reinette

> [1] https://lore.kernel.org/lkml/CALPaoCjh_NXQLtNBqei=7a6Jsr17fEnPO+kqMaNq4xNu2UPDJA@mail.gmail.com/

...

>>>>>>>> [2] https://lore.kernel.org/lkml/afb99efe-0de2-f7ad-d0b8-f2a0ea998efd@amd.com/
>>>>>>>> [3] https://lore.kernel.org/lkml/CALPaoCg3KpF94g2MEmfP_Ro2mQZYFA8sKVkmb+7isotKNgdY9A@mail.gmail.com/


Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 6 months, 3 weeks ago
Hi Babu/Reinette,

On Wed, May 21, 2025 at 1:44 AM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Babu,
>
> On 5/20/25 4:25 PM, Moger, Babu wrote:
> > Hi Reinette,
> >
> > On 5/20/2025 1:23 PM, Reinette Chatre wrote:
> >> Hi Babu,
> >>
> >> On 5/20/25 10:51 AM, Moger, Babu wrote:
> >>> Hi Reinette,
> >>>
> >>> On 5/20/25 11:06, Reinette Chatre wrote:
> >>>> Hi Babu,
> >>>>
> >>>> On 5/20/25 8:28 AM, Moger, Babu wrote:
> >>>>> On 5/19/25 10:59, Peter Newman wrote:
> >>>>>> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
> >>>>
> >>>> ...
> >>>>
> >>>>>>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
> >>>>>>> counters available for assignment.
> >>>>>>
> >>>>>> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
> >>>>>> represent in a "soft-ABMC" implementation where assignment is
> >>>>>> implemented by assigning an RMID, which would result in all events
> >>>>>> being assigned at once.
> >>>>>>
> >>>>>> My main concern is how many "counters" you can assign by assigning
> >>>>>> RMIDs. I recall Reinette proposed reporting the number of groups which
> >>>>>> can be assigned separately from counters which can be assigned.
> >>>>>
> >>>>> More context may be needed here. Currently, num_mbm_cntrs indicates the
> >>>>> number of counters available per domain, which is 32.
> >>>>>
> >>>>> At the moment, we can assign 2 counters to each group, meaning each RMID
> >>>>> can be associated with 2 hardware counters. In theory, it's possible to
> >>>>> assign all 32 hardware counters to a group—allowing one RMID to be linked
> >>>>> with up to 32 counters. However, we currently lack the interface to
> >>>>> support that level of assignment.
> >>>>>
> >>>>> For now, the plan is to support basic assignment and expand functionality
> >>>>> later once we have the necessary data structure and requirements.
> >>>>
> >>>> Looks like some requirements did not make it into this implementation.
> >>>> Do you recall the discussion that resulted in you writing [2]? Looks like
> >>>> there is a question to Peter in there on how to determine how many "counters"
> >>>> are available in soft-ABMC. I interpreted [3] at that time to mean that this
> >>>> information would be available in a future AMD publication.
> >>>
> >>> We already have a method to determine the number of counters in soft-ABMC
> >>> mode, which Peter has addressed [4].
> >>>
> >>> [4]
> >>> https://lore.kernel.org/lkml/20250203132642.2746754-1-peternewman@google.com/
> >>>
> >>> This appears to be more of a workaround, and I doubt it will be included
> >>> in any official AMD documentation. Additionally, the long-term direction
> >>> is moving towards ABMC.
> >>>
> >>> I don’t believe this workaround needs to be part of the current series. It
> >>> can be added later when soft-ABMC is implemented.
> >>
> >> Agreed. What about the plans described in [2]? (Thanks to Peter for
> >> catching this!).
> >>
> >> It is important to keep track of requirements while working on a feature to
> >> ensure that the implementation supports the planned use cases. Re-reading that
> >> thread it is not clear to me how soft-ABMC's per-group assignment would look.
> >> Could you please share how you see it progress from this implementation?
> >> This includes the single event vs. multiple event assignment. I would like to
> >> highlight that this is not a request for this to be supported in this implementation
> >> but there needs to be a plan for how this can be supported on top of interfaces
> >> established by this work.
> >>
> >
> > Here’s my current understanding of soft-ABMC. Peter may have a more in-depth perspective on this.
> >
> > Soft-ABMC:
> > a. num_mbm_cntrs: This is a software-defined limit based on the number of active RMIDs that can be supported. The value can be obtained using the code referenced in [4].

I would call it a hardware-defined limit that can be probed by software.

The main question is whether this file returns the exact number of
RMIDs hardware can track or double that number (mbm_total_bytes +
mbm_local_bytes) so that the value is always measured in events.

There's also the mongroup-RMID overcommit use case I described
above[1]. On Intel we can safely assume that there are counters to
back all RMIDs, so num_mbm_cntrs would be calculated directly from
num_rmids.

I realized this use case is more difficult to implement on MPAM,
because a PARTID is effectively a CLOSID+RMID, so deferring assigning
a unique PARTID to a group also results in it being in a different
allocation group. It will work if the unmonitored groups could find a
way to share PARTIDs, but this has consequences on allocation - but
hopefully no worse than sharing CLOSIDs on x86.

There's a lot of interest in monitoring ID overcommit in Google, so I
think it's worth it for me to investigate the additional structural
changes needed in resctrl (i.e., breaking the FS-level association
between mongroups and HW monitoring IDs). Such a framework could be a
better fit for soft-ABMC. For example, if overcommit is allowed, we
would just report the number of simultaneous RMIDs we were able to
probe as num_rmids. I would want the same shared assignment scheduler
to be able to work with RMIDs and counters, though.

Thanks,
-Peter

[1] https://lore.kernel.org/lkml/CALPaoChSzzU5mzMZsdT6CeyEn0WD1qdT9fKCoNW_ty4tojtrkw@mail.gmail.com/
Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 6 months, 3 weeks ago
Hi Peter and Babu,

On 5/21/25 2:18 AM, Peter Newman wrote:
> Hi Babu/Reinette,
> 
> On Wed, May 21, 2025 at 1:44 AM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>>
>> Hi Babu,
>>
>> On 5/20/25 4:25 PM, Moger, Babu wrote:
>>> Hi Reinette,
>>>
>>> On 5/20/2025 1:23 PM, Reinette Chatre wrote:
>>>> Hi Babu,
>>>>
>>>> On 5/20/25 10:51 AM, Moger, Babu wrote:
>>>>> Hi Reinette,
>>>>>
>>>>> On 5/20/25 11:06, Reinette Chatre wrote:
>>>>>> Hi Babu,
>>>>>>
>>>>>> On 5/20/25 8:28 AM, Moger, Babu wrote:
>>>>>>> On 5/19/25 10:59, Peter Newman wrote:
>>>>>>>> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
>>>>>>
>>>>>> ...
>>>>>>
>>>>>>>>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
>>>>>>>>> counters available for assignment.
>>>>>>>>
>>>>>>>> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
>>>>>>>> represent in a "soft-ABMC" implementation where assignment is
>>>>>>>> implemented by assigning an RMID, which would result in all events
>>>>>>>> being assigned at once.
>>>>>>>>
>>>>>>>> My main concern is how many "counters" you can assign by assigning
>>>>>>>> RMIDs. I recall Reinette proposed reporting the number of groups which
>>>>>>>> can be assigned separately from counters which can be assigned.
>>>>>>>
>>>>>>> More context may be needed here. Currently, num_mbm_cntrs indicates the
>>>>>>> number of counters available per domain, which is 32.
>>>>>>>
>>>>>>> At the moment, we can assign 2 counters to each group, meaning each RMID
>>>>>>> can be associated with 2 hardware counters. In theory, it's possible to
>>>>>>> assign all 32 hardware counters to a group—allowing one RMID to be linked
>>>>>>> with up to 32 counters. However, we currently lack the interface to
>>>>>>> support that level of assignment.
>>>>>>>
>>>>>>> For now, the plan is to support basic assignment and expand functionality
>>>>>>> later once we have the necessary data structure and requirements.
>>>>>>
>>>>>> Looks like some requirements did not make it into this implementation.
>>>>>> Do you recall the discussion that resulted in you writing [2]? Looks like
>>>>>> there is a question to Peter in there on how to determine how many "counters"
>>>>>> are available in soft-ABMC. I interpreted [3] at that time to mean that this
>>>>>> information would be available in a future AMD publication.
>>>>>
>>>>> We already have a method to determine the number of counters in soft-ABMC
>>>>> mode, which Peter has addressed [4].
>>>>>
>>>>> [4]
>>>>> https://lore.kernel.org/lkml/20250203132642.2746754-1-peternewman@google.com/
>>>>>
>>>>> This appears to be more of a workaround, and I doubt it will be included
>>>>> in any official AMD documentation. Additionally, the long-term direction
>>>>> is moving towards ABMC.
>>>>>
>>>>> I don’t believe this workaround needs to be part of the current series. It
>>>>> can be added later when soft-ABMC is implemented.
>>>>
>>>> Agreed. What about the plans described in [2]? (Thanks to Peter for
>>>> catching this!).
>>>>
>>>> It is important to keep track of requirements while working on a feature to
>>>> ensure that the implementation supports the planned use cases. Re-reading that
>>>> thread it is not clear to me how soft-ABMC's per-group assignment would look.
>>>> Could you please share how you see it progress from this implementation?
>>>> This includes the single event vs. multiple event assignment. I would like to
>>>> highlight that this is not a request for this to be supported in this implementation
>>>> but there needs to be a plan for how this can be supported on top of interfaces
>>>> established by this work.
>>>>
>>>
>>> Here’s my current understanding of soft-ABMC. Peter may have a more in-depth perspective on this.
>>>
>>> Soft-ABMC:
>>> a. num_mbm_cntrs: This is a software-defined limit based on the number of active RMIDs that can be supported. The value can be obtained using the code referenced in [4].
> 
> I would call it a hardware-defined limit that can be probed by software.
> 
> The main question is whether this file returns the exact number of
> RMIDs hardware can track or double that number (mbm_total_bytes +
> mbm_local_bytes) so that the value is always measured in events.

tl;dr: I continue [3] to find it most intuitive for num_mbm_cntrs to be the exact
number of "active" RMIDs that the system can support *and* changing the name of
the modes to help user interpret num_mbm_cntrs: "mbm_cntr_event_assign" for ABMC,
"mbm_cntr_group_assign" for soft-ABMC.

details
-------

We are now back to the previous discussion about what user can expect from
the interface. Let me try and re-cap that discussion so that we can all hopefully
get back on the same page. Please add corrections/updates where needed.

soft-ABMC
---------
  soft-ABMC manages "active" (term TBD) RMID assignment to monitor groups. When an
  "active" RMID is assigned to a monitor group then *all* MBM events (not LLC occupancy)
  in that monitor group are counted. "Active" RMID assignment can be done per domain.

  Requirement: resctrl should accurately reflect which events are counted. That is,
  we do not want resctrl to pretend to allow user to assign an "active" RMID to
  only one event in a monitor group while all events are actually counted.

  Caveat: To support rapid re-assignment of RMIDs to monitor groups, llc_occupancy
  event is disabled when soft-ABMC is enabled.

ABMC
----
  ABMC manages (hardware) counter assignment to monitor group (RMID), event pairs.
  When a hardware counter is assigned to an RMID, event pair then only that
  RMID, event is counted. Hardware counter assignment can be done per domain.


shared assignment
-----------------
A shared assignment applies to both soft-ABMC and ABMC. A user can designate a
"counter" (could be hardware counter or "active" RMID) as shared and that means
the counter within that domain is shared between different monitor groups and actual
assignment is scheduled by resctrl.  


user interface
--------------

Next, consider the interface while keeping above definitions and requirements in mind.

This series introduces (using implementation, not cover-letter):

/sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
"num_mbm_cntrs":                                                               
	The maximum number of monitoring counters (total of available and assigned
	counters) in each domain when the system supports mbm_cntr_assign mode. 

/sys/fs/resctrl/mbm_L3_assignments
"mbm_L3_assignments":                                                          
	This interface file is created when the mbm_cntr_assign mode is supported
	and shows the assignment status for each group.              

Consider "mbm_L3_assignments" first. The interface is documented for ABMC support
where it is possible to manage individual event assignment within monitor group.

For ABMC it is possible to assign just one event at a time and doing so consumes
one counter in that domain:

a) Starting state on system with 32 counters per domain, two events in default
   resource group consumes two counters in that domain:
# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
0=30;1=32
# cat /sys/fs/resctrl/mbm_L3_assignments
mbm_total_bytes:0=e;1=_
mbm_local_bytes:0=e;1=_

b) Assign counter to mbm_local_bytes in domain 1:
# echo "mbm_local_bytes:1=e" > /sys/fs/resctrl/mbm_L3_assignments
# cat /sys/fs/resctrl/mbm_L3_assignments
mbm_total_bytes:0=e;1=_
mbm_local_bytes:0=e;1=e
# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
0=30;1=31

The question is how this should look on soft-ABMC system. Let's say hypothetically
that on a soft-ABMC system it is possible to have 32 "active" RMIDs.

a) Starting state on system with 32 "active RMIDs" per domain, two events in default
   resource group consumes one RMID in that domain:

# cat /sys/fs/resctrl/mbm_L3_assignments
mbm_total_bytes:0=e;1=_
mbm_local_bytes:0=e;1=_

What should num_mbm_cntrs display?

Option A (counters are RMIDs):
# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
0=31;1=32

Option B (pretend RMIDs are events):
# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
0=62;1=64

b) Assign counter to mbm_local_bytes in domain 1:
# echo "mbm_local_bytes:1=e" > /sys/fs/resctrl/mbm_L3_assignments
# cat /sys/fs/resctrl/mbm_L3_assignments
mbm_total_bytes:0=e;1=e
mbm_local_bytes:0=e;1=e

Note that even though user requested only mbm_local_bytes to be assigned, it
actually results in both mbm_total_bytes and mbm_local_bytes to be assigned. This
ensures accurate state representation to user space but this also creates an
inconsistent user interface between soft-ABMC and ABMC since user space intends
to use the same interface but "sometimes" assigning one event results in assign
of one event while "sometimes" it results in assign of multiple events.

wrt "num_mbm_cntrs"

Option A (counters are RMIDs):
# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
0=31;1=31

Option B (pretend RMIDs are events):
# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
0=62;1=62 

Neither option seems ideal to me since the interface cannot be consistent
between ABMC and soft-ABMC.
As I mentioned in [2] it is not possible to hide ABMC and soft-ABMC behind
the same interface. When user space wants to monitor a particular monitor group
then it should be clear how that can be accomplished. Not knowing if
an assignment/unassignment to/from an event would impact one or all events
and whether it will consume one or multiple counters does not sound like a good
interface to me. 

As I understand current interface, user is required to know how ABMC and soft-ABMC
is implemented to be able to configure the system. For example, if user has file like:
	# cat /sys/fs/resctrl/mbm_L3_assignments
	mbm_total_bytes:0=e;1=e
	mbm_local_bytes:0=e;1=e
user must know underlying implementation to be able to manage monitoring of
events and assigning counters otherwise it will be a surprise to lose monitoring
of all events when unassigning one event.

This is why I proposed in [3] that the name of the mode reflects how user can interact
with the system. Instead of one "mbm_cntr_assign" mode there can be "mbm_cntr_event_assign"
that is used for ABMC and "mbm_cntr_group_assign" that is used for soft-ABMC. The mode should
make it clear what the system is capable of wrt counter assignments.

Considering this the interface should be clear:
num_mbm_cntrs: reflects the number of counters in each domain that can be assigned. In
"mbm_cntr_event_assign" this will be the number of counters that can be assigned to 
each event within a monitoring group, in "mbm_cntr_group_assign" this will be the number
of counters that can be assigned to entire monitoring groups impacting all MBM events.

mbm_L3_assignments: manages the counter assignment in each group. When user knows the mode
is "mbm_cntr_event_assign"/"mbm_cntr_group_assign" then it should be clear to user space how the
interface behaves wrt assignment, no surprises of multiple events impacted when
assigning/unassigning single event.

For soft-ABMC I thus find it most intuitive for num_mbm_cntrs to be the exact number
of "active" RMIDs that the system can support *and* changing the name of the modes
to help user interpret num_mbm_cntrs.

> 
> There's also the mongroup-RMID overcommit use case I described
> above[1]. On Intel we can safely assume that there are counters to
> back all RMIDs, so num_mbm_cntrs would be calculated directly from
> num_rmids.

This is about the:
	There's now more interest in Google for allowing explicit control of
	where RMIDs are assigned on Intel platforms. Even though the number of
	RMIDs implemented by hardware tends to be roughly the number of
	containers they want to support, they often still need to create
	containers when all RMIDs have already been allocated, which is not
	currently allowed. Once the container has been created and starts
	running, it's no longer possible to move its threads into a monitoring
	group whenever RMIDs should become available again, so it's important
	for resctrl to maintain an accurate task list for a container even
	when RMIDs are not available.

I see a monitor group as a collection of tasks that need to be monitored together.
The "task list" is the group of tasks that share a monitoring ID that
is required to be a valid ID since when any of the tasks are scheduled that ID is
written to the hardware. I intentionally tried to not use RMID since I believe
this is required for all archs.
I thus do not understand how a task can start running when it does not have
a valid monitoring ID. The idea of "deferred assignment" is not clear to me,
there can never be "unmonitored tasks", no? I think I am missing something here.

> I realized this use case is more difficult to implement on MPAM,
> because a PARTID is effectively a CLOSID+RMID, so deferring assigning
> a unique PARTID to a group also results in it being in a different
> allocation group. It will work if the unmonitored groups could find a
> way to share PARTIDs, but this has consequences on allocation - but
> hopefully no worse than sharing CLOSIDs on x86.
> 
> There's a lot of interest in monitoring ID overcommit in Google, so I
> think it's worth it for me to investigate the additional structural
> changes needed in resctrl (i.e., breaking the FS-level association
> between mongroups and HW monitoring IDs). Such a framework could be a
> better fit for soft-ABMC. For example, if overcommit is allowed, we
> would just report the number of simultaneous RMIDs we were able to
> probe as num_rmids. I would want the same shared assignment scheduler
> to be able to work with RMIDs and counters, though.
> 
> Thanks,
> -Peter
> 
> [1] https://lore.kernel.org/lkml/CALPaoChSzzU5mzMZsdT6CeyEn0WD1qdT9fKCoNW_ty4tojtrkw@mail.gmail.com/

Reinette

[2] https://lore.kernel.org/lkml/b9e48e8f-3035-4a7e-a983-ce829bd9215a@intel.com/
[3] https://lore.kernel.org/lkml/b3babdac-da08-4dfd-9544-47db31d574f5@intel.com/
Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 6 months, 3 weeks ago
Hi  Reinette,

On 5/21/25 18:03, Reinette Chatre wrote:
> Hi Peter and Babu,
> 
> On 5/21/25 2:18 AM, Peter Newman wrote:
>> Hi Babu/Reinette,
>>
>> On Wed, May 21, 2025 at 1:44 AM Reinette Chatre
>> <reinette.chatre@intel.com> wrote:
>>>
>>> Hi Babu,
>>>
>>> On 5/20/25 4:25 PM, Moger, Babu wrote:
>>>> Hi Reinette,
>>>>
>>>> On 5/20/2025 1:23 PM, Reinette Chatre wrote:
>>>>> Hi Babu,
>>>>>
>>>>> On 5/20/25 10:51 AM, Moger, Babu wrote:
>>>>>> Hi Reinette,
>>>>>>
>>>>>> On 5/20/25 11:06, Reinette Chatre wrote:
>>>>>>> Hi Babu,
>>>>>>>
>>>>>>> On 5/20/25 8:28 AM, Moger, Babu wrote:
>>>>>>>> On 5/19/25 10:59, Peter Newman wrote:
>>>>>>>>> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>>>>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
>>>>>>>>>> counters available for assignment.
>>>>>>>>>
>>>>>>>>> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
>>>>>>>>> represent in a "soft-ABMC" implementation where assignment is
>>>>>>>>> implemented by assigning an RMID, which would result in all events
>>>>>>>>> being assigned at once.
>>>>>>>>>
>>>>>>>>> My main concern is how many "counters" you can assign by assigning
>>>>>>>>> RMIDs. I recall Reinette proposed reporting the number of groups which
>>>>>>>>> can be assigned separately from counters which can be assigned.
>>>>>>>>
>>>>>>>> More context may be needed here. Currently, num_mbm_cntrs indicates the
>>>>>>>> number of counters available per domain, which is 32.
>>>>>>>>
>>>>>>>> At the moment, we can assign 2 counters to each group, meaning each RMID
>>>>>>>> can be associated with 2 hardware counters. In theory, it's possible to
>>>>>>>> assign all 32 hardware counters to a group—allowing one RMID to be linked
>>>>>>>> with up to 32 counters. However, we currently lack the interface to
>>>>>>>> support that level of assignment.
>>>>>>>>
>>>>>>>> For now, the plan is to support basic assignment and expand functionality
>>>>>>>> later once we have the necessary data structure and requirements.
>>>>>>>
>>>>>>> Looks like some requirements did not make it into this implementation.
>>>>>>> Do you recall the discussion that resulted in you writing [2]? Looks like
>>>>>>> there is a question to Peter in there on how to determine how many "counters"
>>>>>>> are available in soft-ABMC. I interpreted [3] at that time to mean that this
>>>>>>> information would be available in a future AMD publication.
>>>>>>
>>>>>> We already have a method to determine the number of counters in soft-ABMC
>>>>>> mode, which Peter has addressed [4].
>>>>>>
>>>>>> [4]
>>>>>> https://lore.kernel.org/lkml/20250203132642.2746754-1-peternewman@google.com/
>>>>>>
>>>>>> This appears to be more of a workaround, and I doubt it will be included
>>>>>> in any official AMD documentation. Additionally, the long-term direction
>>>>>> is moving towards ABMC.
>>>>>>
>>>>>> I don’t believe this workaround needs to be part of the current series. It
>>>>>> can be added later when soft-ABMC is implemented.
>>>>>
>>>>> Agreed. What about the plans described in [2]? (Thanks to Peter for
>>>>> catching this!).
>>>>>
>>>>> It is important to keep track of requirements while working on a feature to
>>>>> ensure that the implementation supports the planned use cases. Re-reading that
>>>>> thread it is not clear to me how soft-ABMC's per-group assignment would look.
>>>>> Could you please share how you see it progress from this implementation?
>>>>> This includes the single event vs. multiple event assignment. I would like to
>>>>> highlight that this is not a request for this to be supported in this implementation
>>>>> but there needs to be a plan for how this can be supported on top of interfaces
>>>>> established by this work.
>>>>>
>>>>
>>>> Here’s my current understanding of soft-ABMC. Peter may have a more in-depth perspective on this.
>>>>
>>>> Soft-ABMC:
>>>> a. num_mbm_cntrs: This is a software-defined limit based on the number of active RMIDs that can be supported. The value can be obtained using the code referenced in [4].
>>
>> I would call it a hardware-defined limit that can be probed by software.
>>
>> The main question is whether this file returns the exact number of
>> RMIDs hardware can track or double that number (mbm_total_bytes +
>> mbm_local_bytes) so that the value is always measured in events.
> 
> tl;dr: I continue [3] to find it most intuitive for num_mbm_cntrs to be the exact
> number of "active" RMIDs that the system can support *and* changing the name of
> the modes to help user interpret num_mbm_cntrs: "mbm_cntr_event_assign" for ABMC,
> "mbm_cntr_group_assign" for soft-ABMC.
> 
> details
> -------
> 
> We are now back to the previous discussion about what user can expect from
> the interface. Let me try and re-cap that discussion so that we can all hopefully
> get back on the same page. Please add corrections/updates where needed.
> 
> soft-ABMC
> ---------
>   soft-ABMC manages "active" (term TBD) RMID assignment to monitor groups. When an
>   "active" RMID is assigned to a monitor group then *all* MBM events (not LLC occupancy)
>   in that monitor group are counted. "Active" RMID assignment can be done per domain.
> 
>   Requirement: resctrl should accurately reflect which events are counted. That is,
>   we do not want resctrl to pretend to allow user to assign an "active" RMID to
>   only one event in a monitor group while all events are actually counted.
> 
>   Caveat: To support rapid re-assignment of RMIDs to monitor groups, llc_occupancy
>   event is disabled when soft-ABMC is enabled.
> 
> ABMC
> ----
>   ABMC manages (hardware) counter assignment to monitor group (RMID), event pairs.
>   When a hardware counter is assigned to an RMID, event pair then only that
>   RMID, event is counted. Hardware counter assignment can be done per domain.
> 
> 
> shared assignment
> -----------------
> A shared assignment applies to both soft-ABMC and ABMC. A user can designate a
> "counter" (could be hardware counter or "active" RMID) as shared and that means
> the counter within that domain is shared between different monitor groups and actual
> assignment is scheduled by resctrl.  

Good summary: Thanks.

> 
> 
> user interface
> --------------
> 
> Next, consider the interface while keeping above definitions and requirements in mind.
> 
> This series introduces (using implementation, not cover-letter):
> 
> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
> "num_mbm_cntrs":                                                               
> 	The maximum number of monitoring counters (total of available and assigned
> 	counters) in each domain when the system supports mbm_cntr_assign mode. 
> 
> /sys/fs/resctrl/mbm_L3_assignments
> "mbm_L3_assignments":                                                          
> 	This interface file is created when the mbm_cntr_assign mode is supported
> 	and shows the assignment status for each group.              
> 
> Consider "mbm_L3_assignments" first. The interface is documented for ABMC support
> where it is possible to manage individual event assignment within monitor group.
> 
> For ABMC it is possible to assign just one event at a time and doing so consumes
> one counter in that domain:
> 
> a) Starting state on system with 32 counters per domain, two events in default
>    resource group consumes two counters in that domain:
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=30;1=32
> # cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=_
> mbm_local_bytes:0=e;1=_
> 
> b) Assign counter to mbm_local_bytes in domain 1:
> # echo "mbm_local_bytes:1=e" > /sys/fs/resctrl/mbm_L3_assignments
> # cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=_
> mbm_local_bytes:0=e;1=e
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=30;1=31
> 
> The question is how this should look on soft-ABMC system. Let's say hypothetically
> that on a soft-ABMC system it is possible to have 32 "active" RMIDs.
> 
> a) Starting state on system with 32 "active RMIDs" per domain, two events in default
>    resource group consumes one RMID in that domain:
> 
> # cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=_
> mbm_local_bytes:0=e;1=_
> 
> What should num_mbm_cntrs display?
> 
> Option A (counters are RMIDs):
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=31;1=32
> 
> Option B (pretend RMIDs are events):
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=62;1=64
> 
> b) Assign counter to mbm_local_bytes in domain 1:
> # echo "mbm_local_bytes:1=e" > /sys/fs/resctrl/mbm_L3_assignments
> # cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=e
> mbm_local_bytes:0=e;1=e
> 
> Note that even though user requested only mbm_local_bytes to be assigned, it
> actually results in both mbm_total_bytes and mbm_local_bytes to be assigned. This
> ensures accurate state representation to user space but this also creates an
> inconsistent user interface between soft-ABMC and ABMC since user space intends
> to use the same interface but "sometimes" assigning one event results in assign
> of one event while "sometimes" it results in assign of multiple events.
> 
> wrt "num_mbm_cntrs"
> 
> Option A (counters are RMIDs):
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=31;1=31
> 
> Option B (pretend RMIDs are events):
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=62;1=62 
> 
> Neither option seems ideal to me since the interface cannot be consistent
> between ABMC and soft-ABMC.
> As I mentioned in [2] it is not possible to hide ABMC and soft-ABMC behind
> the same interface. When user space wants to monitor a particular monitor group
> then it should be clear how that can be accomplished. Not knowing if
> an assignment/unassignment to/from an event would impact one or all events
> and whether it will consume one or multiple counters does not sound like a good
> interface to me. 
> 
> As I understand current interface, user is required to know how ABMC and soft-ABMC
> is implemented to be able to configure the system. For example, if user has file like:
> 	# cat /sys/fs/resctrl/mbm_L3_assignments
> 	mbm_total_bytes:0=e;1=e
> 	mbm_local_bytes:0=e;1=e
> user must know underlying implementation to be able to manage monitoring of
> events and assigning counters otherwise it will be a surprise to lose monitoring
> of all events when unassigning one event.
> 
> This is why I proposed in [3] that the name of the mode reflects how user can interact
> with the system. Instead of one "mbm_cntr_assign" mode there can be "mbm_cntr_event_assign"
> that is used for ABMC and "mbm_cntr_group_assign" that is used for soft-ABMC. The mode should
> make it clear what the system is capable of wrt counter assignments.

Yes, that makes sense. Perhaps we can also simplify it further:

# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode:
[mbm_cntr_evt_assign] <- for ABMC
 mbm_cntr_grp_assign  <- for soft-ABMC

> 
> Considering this the interface should be clear:
> num_mbm_cntrs: reflects the number of counters in each domain that can be assigned. In
> "mbm_cntr_event_assign" this will be the number of counters that can be assigned to 
> each event within a monitoring group, in "mbm_cntr_group_assign" this will be the number
> of counters that can be assigned to entire monitoring groups impacting all MBM events.
> 
> mbm_L3_assignments: manages the counter assignment in each group. When user knows the mode
> is "mbm_cntr_event_assign"/"mbm_cntr_group_assign" then it should be clear to user space how the
> interface behaves wrt assignment, no surprises of multiple events impacted when
> assigning/unassigning single event.
> 
> For soft-ABMC I thus find it most intuitive for num_mbm_cntrs to be the exact number
> of "active" RMIDs that the system can support *and* changing the name of the modes
> to help user interpret num_mbm_cntrs.

Sure. The option A: fits well here then.

 Option A (counters are RMIDs):
 # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
 0=31;1=31

> 
>>
>> There's also the mongroup-RMID overcommit use case I described
>> above[1]. On Intel we can safely assume that there are counters to
>> back all RMIDs, so num_mbm_cntrs would be calculated directly from
>> num_rmids.
> 
> This is about the:
> 	There's now more interest in Google for allowing explicit control of
> 	where RMIDs are assigned on Intel platforms. Even though the number of
> 	RMIDs implemented by hardware tends to be roughly the number of
> 	containers they want to support, they often still need to create
> 	containers when all RMIDs have already been allocated, which is not
> 	currently allowed. Once the container has been created and starts
> 	running, it's no longer possible to move its threads into a monitoring
> 	group whenever RMIDs should become available again, so it's important
> 	for resctrl to maintain an accurate task list for a container even
> 	when RMIDs are not available.
> 
> I see a monitor group as a collection of tasks that need to be monitored together.
> The "task list" is the group of tasks that share a monitoring ID that
> is required to be a valid ID since when any of the tasks are scheduled that ID is
> written to the hardware. I intentionally tried to not use RMID since I believe
> this is required for all archs.
> I thus do not understand how a task can start running when it does not have
> a valid monitoring ID. The idea of "deferred assignment" is not clear to me,
> there can never be "unmonitored tasks", no? I think I am missing something here.
> 
>> I realized this use case is more difficult to implement on MPAM,
>> because a PARTID is effectively a CLOSID+RMID, so deferring assigning
>> a unique PARTID to a group also results in it being in a different
>> allocation group. It will work if the unmonitored groups could find a
>> way to share PARTIDs, but this has consequences on allocation - but
>> hopefully no worse than sharing CLOSIDs on x86.
>>
>> There's a lot of interest in monitoring ID overcommit in Google, so I
>> think it's worth it for me to investigate the additional structural
>> changes needed in resctrl (i.e., breaking the FS-level association
>> between mongroups and HW monitoring IDs). Such a framework could be a
>> better fit for soft-ABMC. For example, if overcommit is allowed, we
>> would just report the number of simultaneous RMIDs we were able to
>> probe as num_rmids. I would want the same shared assignment scheduler
>> to be able to work with RMIDs and counters, though.
>>
>> Thanks,
>> -Peter
>>
>> [1] https://lore.kernel.org/lkml/CALPaoChSzzU5mzMZsdT6CeyEn0WD1qdT9fKCoNW_ty4tojtrkw@mail.gmail.com/
> 
> Reinette
> 
> [2] https://lore.kernel.org/lkml/b9e48e8f-3035-4a7e-a983-ce829bd9215a@intel.com/
> [3] https://lore.kernel.org/lkml/b3babdac-da08-4dfd-9544-47db31d574f5@intel.com/
> 

-- 
Thanks
Babu Moger
Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 6 months, 3 weeks ago
Hi Babu,

On 5/22/25 8:44 AM, Moger, Babu wrote:
> On 5/21/25 18:03, Reinette Chatre wrote:

...

>> This is why I proposed in [3] that the name of the mode reflects how user can interact
>> with the system. Instead of one "mbm_cntr_assign" mode there can be "mbm_cntr_event_assign"
>> that is used for ABMC and "mbm_cntr_group_assign" that is used for soft-ABMC. The mode should
>> make it clear what the system is capable of wrt counter assignments.
> 
> Yes, that makes sense. Perhaps we can also simplify it further:
> 
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode:
> [mbm_cntr_evt_assign] <- for ABMC
>  mbm_cntr_grp_assign  <- for soft-ABMC

Looks good to me. Thank you.

>> Considering this the interface should be clear:
>> num_mbm_cntrs: reflects the number of counters in each domain that can be assigned. In
>> "mbm_cntr_event_assign" this will be the number of counters that can be assigned to 
>> each event within a monitoring group, in "mbm_cntr_group_assign" this will be the number
>> of counters that can be assigned to entire monitoring groups impacting all MBM events.
>>
>> mbm_L3_assignments: manages the counter assignment in each group. When user knows the mode
>> is "mbm_cntr_event_assign"/"mbm_cntr_group_assign" then it should be clear to user space how the
>> interface behaves wrt assignment, no surprises of multiple events impacted when
>> assigning/unassigning single event.
>>
>> For soft-ABMC I thus find it most intuitive for num_mbm_cntrs to be the exact number
>> of "active" RMIDs that the system can support *and* changing the name of the modes
>> to help user interpret num_mbm_cntrs.
> 
> Sure. The option A: fits well here then.
> 
>  Option A (counters are RMIDs):
>  # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>  0=31;1=31

Thank you for considering.

Please add the requirements from this discussion to your running list. Also please keep in mind
how soft-ABMC intends to use the interfaces created by this work so that the documentation that
accompanies the ABMC support in this series leaves enough "wiggle room" for soft-ABMC to be built on top.

>>> [1] https://lore.kernel.org/lkml/CALPaoChSzzU5mzMZsdT6CeyEn0WD1qdT9fKCoNW_ty4tojtrkw@mail.gmail.com/
>>
>> Reinette
>>
>> [2] https://lore.kernel.org/lkml/b9e48e8f-3035-4a7e-a983-ce829bd9215a@intel.com/
>> [3] https://lore.kernel.org/lkml/b3babdac-da08-4dfd-9544-47db31d574f5@intel.com/
>>
> 

Reinette
Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 6 months ago
Hi Reinette,

On 5/22/2025 11:33 AM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/22/25 8:44 AM, Moger, Babu wrote:
>> On 5/21/25 18:03, Reinette Chatre wrote:
> 
> ...
> 
>>> This is why I proposed in [3] that the name of the mode reflects how user can interact
>>> with the system. Instead of one "mbm_cntr_assign" mode there can be "mbm_cntr_event_assign"
>>> that is used for ABMC and "mbm_cntr_group_assign" that is used for soft-ABMC. The mode should
>>> make it clear what the system is capable of wrt counter assignments.
>>
>> Yes, that makes sense. Perhaps we can also simplify it further:
>>
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode:
>> [mbm_cntr_evt_assign] <- for ABMC
>>   mbm_cntr_grp_assign  <- for soft-ABMC
> 
> Looks good to me. Thank you.

I am actually ready with v14 series. I have good feeling that we are 
getting closer to making these changes final.

So, Looking back again, it might make more sense to rename few user 
visible interfaces.

1. # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode.
    [mbm_assign_event] <- for ABMC
     mbm_assign_group  <- for soft-ABMC

    This looks much more cleaner.  It matches with "mbm_assign_mode"

Similarly, we can rename few functions and variable names to make little 
more readable.

2. mbm_cntr_assignable -> mbm_assignable

3. resctrl_arch_mbm_cntr_assign_enabled
  -> >resctrl_arch_mbm_assign_enabled

4. mbm_cntr_assign_enabled -> mbm_assign_enabled

5. resctrl_arch_mbm_cntr_assign_set_one ->

    resctrl_arch_mbm_assign_set_one.

6. There will few more functions. I will look into that if you agree 
with approach.

7. No need to change few of these below. These are related to actual 
counters.
    num_mbm_cntrs
    available_mbm_cntrs

What do you think?

Thanks
Babu Moger
Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 6 months ago
Hi Babu,

On 6/10/25 4:19 PM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 5/22/2025 11:33 AM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 5/22/25 8:44 AM, Moger, Babu wrote:
>>> On 5/21/25 18:03, Reinette Chatre wrote:
>>
>> ...
>>
>>>> This is why I proposed in [3] that the name of the mode reflects how user can interact
>>>> with the system. Instead of one "mbm_cntr_assign" mode there can be "mbm_cntr_event_assign"
>>>> that is used for ABMC and "mbm_cntr_group_assign" that is used for soft-ABMC. The mode should
>>>> make it clear what the system is capable of wrt counter assignments.
>>>
>>> Yes, that makes sense. Perhaps we can also simplify it further:
>>>
>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode:
>>> [mbm_cntr_evt_assign] <- for ABMC
>>>   mbm_cntr_grp_assign  <- for soft-ABMC
>>
>> Looks good to me. Thank you.
> 
> I am actually ready with v14 series. I have good feeling that we are getting closer to making these changes final.
> 
> So, Looking back again, it might make more sense to rename few user visible interfaces.
> 
> 1. # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode.
>    [mbm_assign_event] <- for ABMC
>     mbm_assign_group  <- for soft-ABMC
> 
>    This looks much more cleaner.  It matches with "mbm_assign_mode"

ah, I see, by dropping "cntr" it reduces confusion where ABMC assigns counters
and soft-ABMC assigned RMID. This looks good.

Taking this further, the "assign" term in "mbm_assign_event" and "mbm_assign_group" may also
be redundant considering that the filename, "mbm_assign_mode", already has "assign" in its name.

> 
> Similarly, we can rename few functions and variable names to make little more readable.
> 
> 2. mbm_cntr_assignable -> mbm_assignable
> 

I have no insight into how the soft-ABMC implementation will look and thus if it will
build on this property. If soft-ABMC uses the property then making it more generic may
help, but if it does not then it may make the code harder to read. Since this is all
internal I'd vote for keeping it mbm_cntr_assignable since the current implementation
directly associates it with hardware counters. I do not know if there will be a scenario
where a system may support *both* event and group assignable counters. The idea did
briefly come up[1]. If that may be possible then resctrl would need to distinguish them.
Also, interesting to note that the example used in (1) above notes a system that
supports both event and group assignment.

> 3. resctrl_arch_mbm_cntr_assign_enabled
>  -> >resctrl_arch_mbm_assign_enabled
> 

This is directly connected to choice for (2)

> 4. mbm_cntr_assign_enabled -> mbm_assign_enabled

hmmm ... here mbm_cntr_assign_enabled is even more directly associated with hardware
support for counter assignment. It is not clear what the benefit is to make it generic.

> 
> 5. resctrl_arch_mbm_cntr_assign_set_one ->
> 
>    resctrl_arch_mbm_assign_set_one.

Same as (4)

> 
> 6. There will few more functions. I will look into that if you agree with approach.
> 
> 7. No need to change few of these below. These are related to actual counters.
>    num_mbm_cntrs
>    available_mbm_cntrs
> 
> What do you think?

It sounds to me as though you are aiming to make the ABMC implementation more
generic in preparation for soft-ABMC support. If you have insight into the soft-ABMC
implementation then please share the details for this to be taken into account.
Until then I think it will be simpler for the implementation to be specific to
the feature being enabled here. When soft-ABMC enabling arrives the needed changes
can be made. Since this is about internals of resctrl (not the user interface) we
are not as pressured to "get it right" while not having all information required
to make these choices.

Reinette
 
[1] https://lore.kernel.org/lkml/CALPaoCj438UfH3QA_VnGo-pj2a_48sJufUWjBKT3MQatcMJ_Uw@mail.gmail.com/
Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 6 months ago
Hi Reinette,

On 6/11/2025 1:29 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 6/10/25 4:19 PM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 5/22/2025 11:33 AM, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 5/22/25 8:44 AM, Moger, Babu wrote:
>>>> On 5/21/25 18:03, Reinette Chatre wrote:
>>>
>>> ...
>>>
>>>>> This is why I proposed in [3] that the name of the mode reflects how user can interact
>>>>> with the system. Instead of one "mbm_cntr_assign" mode there can be "mbm_cntr_event_assign"
>>>>> that is used for ABMC and "mbm_cntr_group_assign" that is used for soft-ABMC. The mode should
>>>>> make it clear what the system is capable of wrt counter assignments.
>>>>
>>>> Yes, that makes sense. Perhaps we can also simplify it further:
>>>>
>>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode:
>>>> [mbm_cntr_evt_assign] <- for ABMC
>>>>    mbm_cntr_grp_assign  <- for soft-ABMC
>>>
>>> Looks good to me. Thank you.
>>
>> I am actually ready with v14 series. I have good feeling that we are getting closer to making these changes final.
>>
>> So, Looking back again, it might make more sense to rename few user visible interfaces.
>>
>> 1. # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode.
>>     [mbm_assign_event] <- for ABMC
>>      mbm_assign_group  <- for soft-ABMC
>>
>>     This looks much more cleaner.  It matches with "mbm_assign_mode"
> 
> ah, I see, by dropping "cntr" it reduces confusion where ABMC assigns counters
> and soft-ABMC assigned RMID. This looks good.
> 
> Taking this further, the "assign" term in "mbm_assign_event" and "mbm_assign_group" may also
> be redundant considering that the filename, "mbm_assign_mode", already has "assign" in its name.

ok. Sure. It will be

# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode.
   [mbm_event] <- for ABMC
    mbm_group  <- for soft-ABMC


> 
>>
>> Similarly, we can rename few functions and variable names to make little more readable.
>>
>> 2. mbm_cntr_assignable -> mbm_assignable
>>
> 
> I have no insight into how the soft-ABMC implementation will look and thus if it will
> build on this property. If soft-ABMC uses the property then making it more generic may
> help, but if it does not then it may make the code harder to read. Since this is all
> internal I'd vote for keeping it mbm_cntr_assignable since the current implementation
> directly associates it with hardware counters. I do not know if there will be a scenario
> where a system may support *both* event and group assignable counters. The idea did
> briefly come up[1]. If that may be possible then resctrl would need to distinguish them.
> Also, interesting to note that the example used in (1) above notes a system that
> supports both event and group assignment.

Ok. That is fine. Lets keep it as is then.

> 
>> 3. resctrl_arch_mbm_cntr_assign_enabled
>>   -> >resctrl_arch_mbm_assign_enabled
>>
> 
> This is directly connected to choice for (2)

Ok.

> 
>> 4. mbm_cntr_assign_enabled -> mbm_assign_enabled
> 
> hmmm ... here mbm_cntr_assign_enabled is even more directly associated with hardware
> support for counter assignment. It is not clear what the benefit is to make it generic.

Ok.

> 
>>
>> 5. resctrl_arch_mbm_cntr_assign_set_one ->
>>
>>     resctrl_arch_mbm_assign_set_one.
> 
> Same as (4)
> 
>>
>> 6. There will few more functions. I will look into that if you agree with approach.
>>
>> 7. No need to change few of these below. These are related to actual counters.
>>     num_mbm_cntrs
>>     available_mbm_cntrs
>>
>> What do you think?
> 
> It sounds to me as though you are aiming to make the ABMC implementation more
> generic in preparation for soft-ABMC support. If you have insight into the soft-ABMC
> implementation then please share the details for this to be taken into account.
> Until then I think it will be simpler for the implementation to be specific to
> the feature being enabled here. When soft-ABMC enabling arrives the needed changes
> can be made. Since this is about internals of resctrl (not the user interface) we
> are not as pressured to "get it right" while not having all information required
> to make these choices.

Ok. Sure. That is fine. Lets keep the internals to implementation 
specific for now.

Thanks
Babu

> 
> Reinette
>   
> [1] https://lore.kernel.org/lkml/CALPaoCj438UfH3QA_VnGo-pj2a_48sJufUWjBKT3MQatcMJ_Uw@mail.gmail.com/
> 

Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 6 months, 3 weeks ago
Hi Reinette,

On 5/22/25 11:33, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/22/25 8:44 AM, Moger, Babu wrote:
>> On 5/21/25 18:03, Reinette Chatre wrote:
> 
> ...
> 
>>> This is why I proposed in [3] that the name of the mode reflects how user can interact
>>> with the system. Instead of one "mbm_cntr_assign" mode there can be "mbm_cntr_event_assign"
>>> that is used for ABMC and "mbm_cntr_group_assign" that is used for soft-ABMC. The mode should
>>> make it clear what the system is capable of wrt counter assignments.
>>
>> Yes, that makes sense. Perhaps we can also simplify it further:
>>
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode:
>> [mbm_cntr_evt_assign] <- for ABMC
>>  mbm_cntr_grp_assign  <- for soft-ABMC
> 
> Looks good to me. Thank you.
> 
>>> Considering this the interface should be clear:
>>> num_mbm_cntrs: reflects the number of counters in each domain that can be assigned. In
>>> "mbm_cntr_event_assign" this will be the number of counters that can be assigned to 
>>> each event within a monitoring group, in "mbm_cntr_group_assign" this will be the number
>>> of counters that can be assigned to entire monitoring groups impacting all MBM events.
>>>
>>> mbm_L3_assignments: manages the counter assignment in each group. When user knows the mode
>>> is "mbm_cntr_event_assign"/"mbm_cntr_group_assign" then it should be clear to user space how the
>>> interface behaves wrt assignment, no surprises of multiple events impacted when
>>> assigning/unassigning single event.
>>>
>>> For soft-ABMC I thus find it most intuitive for num_mbm_cntrs to be the exact number
>>> of "active" RMIDs that the system can support *and* changing the name of the modes
>>> to help user interpret num_mbm_cntrs.
>>
>> Sure. The option A: fits well here then.
>>
>>  Option A (counters are RMIDs):
>>  # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>>  0=31;1=31
> 
> Thank you for considering.
> 
> Please add the requirements from this discussion to your running list. Also please keep in mind
> how soft-ABMC intends to use the interfaces created by this work so that the documentation that
> accompanies the ABMC support in this series leaves enough "wiggle room" for soft-ABMC to be built on top.

Sure. Thanks

> 
>>>> [1] https://lore.kernel.org/lkml/CALPaoChSzzU5mzMZsdT6CeyEn0WD1qdT9fKCoNW_ty4tojtrkw@mail.gmail.com/
>>>
>>> Reinette
>>>
>>> [2] https://lore.kernel.org/lkml/b9e48e8f-3035-4a7e-a983-ce829bd9215a@intel.com/
>>> [3] https://lore.kernel.org/lkml/b3babdac-da08-4dfd-9544-47db31d574f5@intel.com/
>>>
>>
> 
> Reinette
> 

-- 
Thanks
Babu Moger
Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Luck, Tony 6 months, 3 weeks ago
On Wed, May 21, 2025 at 04:03:37PM -0700, Reinette Chatre wrote:
> Hi Peter and Babu,
> 
> On 5/21/25 2:18 AM, Peter Newman wrote:
> > Hi Babu/Reinette,
> > 
> > On Wed, May 21, 2025 at 1:44 AM Reinette Chatre
> > <reinette.chatre@intel.com> wrote:
> >>
> >> Hi Babu,
> >>
> >> On 5/20/25 4:25 PM, Moger, Babu wrote:
> >>> Hi Reinette,
> >>>
> >>> On 5/20/2025 1:23 PM, Reinette Chatre wrote:
> >>>> Hi Babu,
> >>>>
> >>>> On 5/20/25 10:51 AM, Moger, Babu wrote:
> >>>>> Hi Reinette,
> >>>>>
> >>>>> On 5/20/25 11:06, Reinette Chatre wrote:
> >>>>>> Hi Babu,
> >>>>>>
> >>>>>> On 5/20/25 8:28 AM, Moger, Babu wrote:
> >>>>>>> On 5/19/25 10:59, Peter Newman wrote:
> >>>>>>>> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@amd.com> wrote:
> >>>>>>
> >>>>>> ...
> >>>>>>
> >>>>>>>>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
> >>>>>>>>> counters available for assignment.
> >>>>>>>>
> >>>>>>>> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
> >>>>>>>> represent in a "soft-ABMC" implementation where assignment is
> >>>>>>>> implemented by assigning an RMID, which would result in all events
> >>>>>>>> being assigned at once.
> >>>>>>>>
> >>>>>>>> My main concern is how many "counters" you can assign by assigning
> >>>>>>>> RMIDs. I recall Reinette proposed reporting the number of groups which
> >>>>>>>> can be assigned separately from counters which can be assigned.
> >>>>>>>
> >>>>>>> More context may be needed here. Currently, num_mbm_cntrs indicates the
> >>>>>>> number of counters available per domain, which is 32.
> >>>>>>>
> >>>>>>> At the moment, we can assign 2 counters to each group, meaning each RMID
> >>>>>>> can be associated with 2 hardware counters. In theory, it's possible to
> >>>>>>> assign all 32 hardware counters to a group—allowing one RMID to be linked
> >>>>>>> with up to 32 counters. However, we currently lack the interface to
> >>>>>>> support that level of assignment.
> >>>>>>>
> >>>>>>> For now, the plan is to support basic assignment and expand functionality
> >>>>>>> later once we have the necessary data structure and requirements.
> >>>>>>
> >>>>>> Looks like some requirements did not make it into this implementation.
> >>>>>> Do you recall the discussion that resulted in you writing [2]? Looks like
> >>>>>> there is a question to Peter in there on how to determine how many "counters"
> >>>>>> are available in soft-ABMC. I interpreted [3] at that time to mean that this
> >>>>>> information would be available in a future AMD publication.
> >>>>>
> >>>>> We already have a method to determine the number of counters in soft-ABMC
> >>>>> mode, which Peter has addressed [4].
> >>>>>
> >>>>> [4]
> >>>>> https://lore.kernel.org/lkml/20250203132642.2746754-1-peternewman@google.com/
> >>>>>
> >>>>> This appears to be more of a workaround, and I doubt it will be included
> >>>>> in any official AMD documentation. Additionally, the long-term direction
> >>>>> is moving towards ABMC.
> >>>>>
> >>>>> I don’t believe this workaround needs to be part of the current series. It
> >>>>> can be added later when soft-ABMC is implemented.
> >>>>
> >>>> Agreed. What about the plans described in [2]? (Thanks to Peter for
> >>>> catching this!).
> >>>>
> >>>> It is important to keep track of requirements while working on a feature to
> >>>> ensure that the implementation supports the planned use cases. Re-reading that
> >>>> thread it is not clear to me how soft-ABMC's per-group assignment would look.
> >>>> Could you please share how you see it progress from this implementation?
> >>>> This includes the single event vs. multiple event assignment. I would like to
> >>>> highlight that this is not a request for this to be supported in this implementation
> >>>> but there needs to be a plan for how this can be supported on top of interfaces
> >>>> established by this work.
> >>>>
> >>>
> >>> Here’s my current understanding of soft-ABMC. Peter may have a more in-depth perspective on this.
> >>>
> >>> Soft-ABMC:
> >>> a. num_mbm_cntrs: This is a software-defined limit based on the number of active RMIDs that can be supported. The value can be obtained using the code referenced in [4].
> > 
> > I would call it a hardware-defined limit that can be probed by software.
> > 
> > The main question is whether this file returns the exact number of
> > RMIDs hardware can track or double that number (mbm_total_bytes +
> > mbm_local_bytes) so that the value is always measured in events.
> 
> tl;dr: I continue [3] to find it most intuitive for num_mbm_cntrs to be the exact
> number of "active" RMIDs that the system can support *and* changing the name of
> the modes to help user interpret num_mbm_cntrs: "mbm_cntr_event_assign" for ABMC,
> "mbm_cntr_group_assign" for soft-ABMC.
> 
> details
> -------
> 
> We are now back to the previous discussion about what user can expect from
> the interface. Let me try and re-cap that discussion so that we can all hopefully
> get back on the same page. Please add corrections/updates where needed.
> 
> soft-ABMC
> ---------
>   soft-ABMC manages "active" (term TBD) RMID assignment to monitor groups. When an
>   "active" RMID is assigned to a monitor group then *all* MBM events (not LLC occupancy)
>   in that monitor group are counted. "Active" RMID assignment can be done per domain.
> 
>   Requirement: resctrl should accurately reflect which events are counted. That is,
>   we do not want resctrl to pretend to allow user to assign an "active" RMID to
>   only one event in a monitor group while all events are actually counted.
> 
>   Caveat: To support rapid re-assignment of RMIDs to monitor groups, llc_occupancy
>   event is disabled when soft-ABMC is enabled.
> 
> ABMC
> ----
>   ABMC manages (hardware) counter assignment to monitor group (RMID), event pairs.
>   When a hardware counter is assigned to an RMID, event pair then only that
>   RMID, event is counted. Hardware counter assignment can be done per domain.
> 
> 
> shared assignment
> -----------------
> A shared assignment applies to both soft-ABMC and ABMC. A user can designate a
> "counter" (could be hardware counter or "active" RMID) as shared and that means
> the counter within that domain is shared between different monitor groups and actual
> assignment is scheduled by resctrl.  
> 
> 
> user interface
> --------------
> 
> Next, consider the interface while keeping above definitions and requirements in mind.
> 
> This series introduces (using implementation, not cover-letter):
> 
> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
> "num_mbm_cntrs":                                                               
> 	The maximum number of monitoring counters (total of available and assigned
> 	counters) in each domain when the system supports mbm_cntr_assign mode. 
> 
> /sys/fs/resctrl/mbm_L3_assignments
> "mbm_L3_assignments":                                                          
> 	This interface file is created when the mbm_cntr_assign mode is supported
> 	and shows the assignment status for each group.              
> 
> Consider "mbm_L3_assignments" first. The interface is documented for ABMC support
> where it is possible to manage individual event assignment within monitor group.
> 
> For ABMC it is possible to assign just one event at a time and doing so consumes
> one counter in that domain:
> 
> a) Starting state on system with 32 counters per domain, two events in default
>    resource group consumes two counters in that domain:
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=30;1=32
> # cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=_
> mbm_local_bytes:0=e;1=_
> 
> b) Assign counter to mbm_local_bytes in domain 1:
> # echo "mbm_local_bytes:1=e" > /sys/fs/resctrl/mbm_L3_assignments
> # cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=_
> mbm_local_bytes:0=e;1=e
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=30;1=31
> 
> The question is how this should look on soft-ABMC system. Let's say hypothetically
> that on a soft-ABMC system it is possible to have 32 "active" RMIDs.
> 
> a) Starting state on system with 32 "active RMIDs" per domain, two events in default
>    resource group consumes one RMID in that domain:
> 
> # cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=_
> mbm_local_bytes:0=e;1=_
> 
> What should num_mbm_cntrs display?
> 
> Option A (counters are RMIDs):
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=31;1=32
> 
> Option B (pretend RMIDs are events):
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=62;1=64
> 
> b) Assign counter to mbm_local_bytes in domain 1:
> # echo "mbm_local_bytes:1=e" > /sys/fs/resctrl/mbm_L3_assignments
> # cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=e
> mbm_local_bytes:0=e;1=e
> 
> Note that even though user requested only mbm_local_bytes to be assigned, it
> actually results in both mbm_total_bytes and mbm_local_bytes to be assigned. This
> ensures accurate state representation to user space but this also creates an
> inconsistent user interface between soft-ABMC and ABMC since user space intends
> to use the same interface but "sometimes" assigning one event results in assign
> of one event while "sometimes" it results in assign of multiple events.
> 
> wrt "num_mbm_cntrs"
> 
> Option A (counters are RMIDs):
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=31;1=31
> 
> Option B (pretend RMIDs are events):
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 0=62;1=62 
> 
> Neither option seems ideal to me since the interface cannot be consistent
> between ABMC and soft-ABMC.
> As I mentioned in [2] it is not possible to hide ABMC and soft-ABMC behind
> the same interface. When user space wants to monitor a particular monitor group
> then it should be clear how that can be accomplished. Not knowing if
> an assignment/unassignment to/from an event would impact one or all events
> and whether it will consume one or multiple counters does not sound like a good
> interface to me. 
> 
> As I understand current interface, user is required to know how ABMC and soft-ABMC
> is implemented to be able to configure the system. For example, if user has file like:
> 	# cat /sys/fs/resctrl/mbm_L3_assignments
> 	mbm_total_bytes:0=e;1=e
> 	mbm_local_bytes:0=e;1=e
> user must know underlying implementation to be able to manage monitoring of
> events and assigning counters otherwise it will be a surprise to lose monitoring
> of all events when unassigning one event.
> 
> This is why I proposed in [3] that the name of the mode reflects how user can interact
> with the system. Instead of one "mbm_cntr_assign" mode there can be "mbm_cntr_event_assign"
> that is used for ABMC and "mbm_cntr_group_assign" that is used for soft-ABMC. The mode should
> make it clear what the system is capable of wrt counter assignments.
> 
> Considering this the interface should be clear:
> num_mbm_cntrs: reflects the number of counters in each domain that can be assigned. In
> "mbm_cntr_event_assign" this will be the number of counters that can be assigned to 
> each event within a monitoring group, in "mbm_cntr_group_assign" this will be the number
> of counters that can be assigned to entire monitoring groups impacting all MBM events.
> 
> mbm_L3_assignments: manages the counter assignment in each group. When user knows the mode
> is "mbm_cntr_event_assign"/"mbm_cntr_group_assign" then it should be clear to user space how the
> interface behaves wrt assignment, no surprises of multiple events impacted when
> assigning/unassigning single event.
> 
> For soft-ABMC I thus find it most intuitive for num_mbm_cntrs to be the exact number
> of "active" RMIDs that the system can support *and* changing the name of the modes
> to help user interpret num_mbm_cntrs.
> 
> > 
> > There's also the mongroup-RMID overcommit use case I described
> > above[1]. On Intel we can safely assume that there are counters to
> > back all RMIDs, so num_mbm_cntrs would be calculated directly from
> > num_rmids.
> 
> This is about the:
> 	There's now more interest in Google for allowing explicit control of
> 	where RMIDs are assigned on Intel platforms. Even though the number of
> 	RMIDs implemented by hardware tends to be roughly the number of
> 	containers they want to support, they often still need to create
> 	containers when all RMIDs have already been allocated, which is not
> 	currently allowed. Once the container has been created and starts
> 	running, it's no longer possible to move its threads into a monitoring
> 	group whenever RMIDs should become available again, so it's important
> 	for resctrl to maintain an accurate task list for a container even
> 	when RMIDs are not available.
> 
> I see a monitor group as a collection of tasks that need to be monitored together.
> The "task list" is the group of tasks that share a monitoring ID that
> is required to be a valid ID since when any of the tasks are scheduled that ID is
> written to the hardware. I intentionally tried to not use RMID since I believe
> this is required for all archs.
> I thus do not understand how a task can start running when it does not have
> a valid monitoring ID. The idea of "deferred assignment" is not clear to me,
> there can never be "unmonitored tasks", no? I think I am missing something here.

In the AMD/RMID implemenentation this might be achieved with something
extra in the task structure to denote whether a task is in a monitored
group or not. E.g. We add "task->rmid_valid" as well as "task->rmid".
Tasks in an unmonitored group retain their "task->rmid" (that's what
identifies them as a member of a group) but have task->rmid_valid set
to false.  Context switch code would be updated to load "0" into the
IA32_PQR_ASSOC.RMID field for tasks without a valid RMID. So they
would still be monitored, but activity would be bundled with all
tasks in the default resctrl group.

Presumably something analogous could be done for ARM/MPAM.

> > I realized this use case is more difficult to implement on MPAM,
> > because a PARTID is effectively a CLOSID+RMID, so deferring assigning
> > a unique PARTID to a group also results in it being in a different
> > allocation group. It will work if the unmonitored groups could find a
> > way to share PARTIDs, but this has consequences on allocation - but
> > hopefully no worse than sharing CLOSIDs on x86.
> > 
> > There's a lot of interest in monitoring ID overcommit in Google, so I
> > think it's worth it for me to investigate the additional structural
> > changes needed in resctrl (i.e., breaking the FS-level association
> > between mongroups and HW monitoring IDs). Such a framework could be a
> > better fit for soft-ABMC. For example, if overcommit is allowed, we
> > would just report the number of simultaneous RMIDs we were able to
> > probe as num_rmids. I would want the same shared assignment scheduler
> > to be able to work with RMIDs and counters, though.
> > 
> > Thanks,
> > -Peter
> > 
> > [1] https://lore.kernel.org/lkml/CALPaoChSzzU5mzMZsdT6CeyEn0WD1qdT9fKCoNW_ty4tojtrkw@mail.gmail.com/
> 
> Reinette
> 
> [2] https://lore.kernel.org/lkml/b9e48e8f-3035-4a7e-a983-ce829bd9215a@intel.com/
> [3] https://lore.kernel.org/lkml/b3babdac-da08-4dfd-9544-47db31d574f5@intel.com/

-Tony
Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 6 months, 3 weeks ago
Hi Tony,

On 5/21/25 4:43 PM, Luck, Tony wrote:
> On Wed, May 21, 2025 at 04:03:37PM -0700, Reinette Chatre wrote:
>> Hi Peter and Babu,
>>
>> On 5/21/25 2:18 AM, Peter Newman wrote:

..

>>> There's also the mongroup-RMID overcommit use case I described
>>> above[1]. On Intel we can safely assume that there are counters to
>>> back all RMIDs, so num_mbm_cntrs would be calculated directly from
>>> num_rmids.
>>
>> This is about the:
>> 	There's now more interest in Google for allowing explicit control of
>> 	where RMIDs are assigned on Intel platforms. Even though the number of
>> 	RMIDs implemented by hardware tends to be roughly the number of
>> 	containers they want to support, they often still need to create
>> 	containers when all RMIDs have already been allocated, which is not
>> 	currently allowed. Once the container has been created and starts
>> 	running, it's no longer possible to move its threads into a monitoring
>> 	group whenever RMIDs should become available again, so it's important
>> 	for resctrl to maintain an accurate task list for a container even
>> 	when RMIDs are not available.
>>
>> I see a monitor group as a collection of tasks that need to be monitored together.
>> The "task list" is the group of tasks that share a monitoring ID that
>> is required to be a valid ID since when any of the tasks are scheduled that ID is
>> written to the hardware. I intentionally tried to not use RMID since I believe
>> this is required for all archs.
>> I thus do not understand how a task can start running when it does not have
>> a valid monitoring ID. The idea of "deferred assignment" is not clear to me,
>> there can never be "unmonitored tasks", no? I think I am missing something here.
> 
> In the AMD/RMID implemenentation this might be achieved with something
> extra in the task structure to denote whether a task is in a monitored
> group or not. E.g. We add "task->rmid_valid" as well as "task->rmid".
> Tasks in an unmonitored group retain their "task->rmid" (that's what
> identifies them as a member of a group) but have task->rmid_valid set
> to false.  Context switch code would be updated to load "0" into the
> IA32_PQR_ASSOC.RMID field for tasks without a valid RMID. So they
> would still be monitored, but activity would be bundled with all
> tasks in the default resctrl group.
> 
> Presumably something analogous could be done for ARM/MPAM.
> 

I do not interpret this as an unmonitored task but instead a task that
belongs to the default resource group. Specifically, any data accumulated by
such a task is attributed to the default resource group. Having tasks
in a separate group but their monitoring data accumulating in/contributed to
the default resource group (that has its own set of tasks) sounds wrong to me. 
Such an implementation makes any monitoring data of default resource group
invalid, and by extension impossible to use default resource group to manage
an allocation for a group of monitor groups if user space needs insight
in monitoring data across all these monitor groups. User space will need to
interact with resctrl differently and individually query monitor groups instead
of CTRL_MON group once.

Reinette
RE: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Luck, Tony 6 months, 3 weeks ago
> >>> There's also the mongroup-RMID overcommit use case I described
> >>> above[1]. On Intel we can safely assume that there are counters to
> >>> back all RMIDs, so num_mbm_cntrs would be calculated directly from
> >>> num_rmids.
> >>
> >> This is about the:
> >>    There's now more interest in Google for allowing explicit control of
> >>    where RMIDs are assigned on Intel platforms. Even though the number of
> >>    RMIDs implemented by hardware tends to be roughly the number of
> >>    containers they want to support, they often still need to create
> >>    containers when all RMIDs have already been allocated, which is not
> >>    currently allowed. Once the container has been created and starts
> >>    running, it's no longer possible to move its threads into a monitoring
> >>    group whenever RMIDs should become available again, so it's important
> >>    for resctrl to maintain an accurate task list for a container even
> >>    when RMIDs are not available.
> >>
> >> I see a monitor group as a collection of tasks that need to be monitored together.
> >> The "task list" is the group of tasks that share a monitoring ID that
> >> is required to be a valid ID since when any of the tasks are scheduled that ID is
> >> written to the hardware. I intentionally tried to not use RMID since I believe
> >> this is required for all archs.
> >> I thus do not understand how a task can start running when it does not have
> >> a valid monitoring ID. The idea of "deferred assignment" is not clear to me,
> >> there can never be "unmonitored tasks", no? I think I am missing something here.
> >
> > In the AMD/RMID implemenentation this might be achieved with something
> > extra in the task structure to denote whether a task is in a monitored
> > group or not. E.g. We add "task->rmid_valid" as well as "task->rmid".
> > Tasks in an unmonitored group retain their "task->rmid" (that's what
> > identifies them as a member of a group) but have task->rmid_valid set
> > to false.  Context switch code would be updated to load "0" into the
> > IA32_PQR_ASSOC.RMID field for tasks without a valid RMID. So they
> > would still be monitored, but activity would be bundled with all
> > tasks in the default resctrl group.
> >
> > Presumably something analogous could be done for ARM/MPAM.
> >
>
> I do not interpret this as an unmonitored task but instead a task that
> belongs to the default resource group. Specifically, any data accumulated by
> such a task is attributed to the default resource group. Having tasks
> in a separate group but their monitoring data accumulating in/contributed to
> the default resource group (that has its own set of tasks) sounds wrong to me.
> Such an implementation makes any monitoring data of default resource group
> invalid, and by extension impossible to use default resource group to manage
> an allocation for a group of monitor groups if user space needs insight
> in monitoring data across all these monitor groups. User space will need to
> interact with resctrl differently and individually query monitor groups instead
> of CTRL_MON group once.

Maybe assign one of the limited supply of RMIDs for these "unmonitored"
tasks. Populate a resctrl group named "unmonitored" that lists all the
unmonitored tasks in a (read-only) "tasks" file. And supply all the counts
for these tasks in normal looking "mon_data" directory.

-Tony
Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 6 months, 3 weeks ago
Hi Tony, Reinette,

On Thu, May 22, 2025 at 2:21 AM Luck, Tony <tony.luck@intel.com> wrote:
>
> > >>> There's also the mongroup-RMID overcommit use case I described
> > >>> above[1]. On Intel we can safely assume that there are counters to
> > >>> back all RMIDs, so num_mbm_cntrs would be calculated directly from
> > >>> num_rmids.
> > >>
> > >> This is about the:
> > >>    There's now more interest in Google for allowing explicit control of
> > >>    where RMIDs are assigned on Intel platforms. Even though the number of
> > >>    RMIDs implemented by hardware tends to be roughly the number of
> > >>    containers they want to support, they often still need to create
> > >>    containers when all RMIDs have already been allocated, which is not
> > >>    currently allowed. Once the container has been created and starts
> > >>    running, it's no longer possible to move its threads into a monitoring
> > >>    group whenever RMIDs should become available again, so it's important
> > >>    for resctrl to maintain an accurate task list for a container even
> > >>    when RMIDs are not available.
> > >>
> > >> I see a monitor group as a collection of tasks that need to be monitored together.
> > >> The "task list" is the group of tasks that share a monitoring ID that
> > >> is required to be a valid ID since when any of the tasks are scheduled that ID is
> > >> written to the hardware. I intentionally tried to not use RMID since I believe
> > >> this is required for all archs.
> > >> I thus do not understand how a task can start running when it does not have
> > >> a valid monitoring ID. The idea of "deferred assignment" is not clear to me,
> > >> there can never be "unmonitored tasks", no? I think I am missing something here.

You are correct. I did forget to mention something...

> > >
> > > In the AMD/RMID implemenentation this might be achieved with something
> > > extra in the task structure to denote whether a task is in a monitored
> > > group or not. E.g. We add "task->rmid_valid" as well as "task->rmid".
> > > Tasks in an unmonitored group retain their "task->rmid" (that's what
> > > identifies them as a member of a group) but have task->rmid_valid set
> > > to false.  Context switch code would be updated to load "0" into the
> > > IA32_PQR_ASSOC.RMID field for tasks without a valid RMID. So they
> > > would still be monitored, but activity would be bundled with all
> > > tasks in the default resctrl group.
> > >
> > > Presumably something analogous could be done for ARM/MPAM.
> > >
> >
> > I do not interpret this as an unmonitored task but instead a task that
> > belongs to the default resource group. Specifically, any data accumulated by
> > such a task is attributed to the default resource group. Having tasks
> > in a separate group but their monitoring data accumulating in/contributed to
> > the default resource group (that has its own set of tasks) sounds wrong to me.
> > Such an implementation makes any monitoring data of default resource group
> > invalid, and by extension impossible to use default resource group to manage
> > an allocation for a group of monitor groups if user space needs insight
> > in monitoring data across all these monitor groups. User space will need to
> > interact with resctrl differently and individually query monitor groups instead
> > of CTRL_MON group once.
>
> Maybe assign one of the limited supply of RMIDs for these "unmonitored"
> tasks. Populate a resctrl group named "unmonitored" that lists all the
> unmonitored tasks in a (read-only) "tasks" file. And supply all the counts
> for these tasks in normal looking "mon_data" directory.

I needed to switch to an rdtgroup struct pointer rather than hardware
IDs in the task structure to indicate group membership[1], otherwise
it's not possible to determine which tasks are in a group when it
doesn't have a unique HW ID value.

Also this is required for shared assignment so that changing a group's
IDs in a domain only requires updating running tasks rather than
needing to search the entire task list, which would lead to the same
problem we encountered in mongroup rename[2].

-Peter

[1] https://lore.kernel.org/lkml/20240325172707.73966-5-peternewman@google.com/
[2] https://lore.kernel.org/lkml/CALPaoCh0SbG1+VbbgcxjubE7Cc2Pb6QqhG3NH6X=WwsNfqNjtA@mail.gmail.com/
Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Luck, Tony 6 months, 3 weeks ago
> > Maybe assign one of the limited supply of RMIDs for these "unmonitored"
> > tasks. Populate a resctrl group named "unmonitored" that lists all the
> > unmonitored tasks in a (read-only) "tasks" file. And supply all the counts
> > for these tasks in normal looking "mon_data" directory.
> 
> I needed to switch to an rdtgroup struct pointer rather than hardware
> IDs in the task structure to indicate group membership[1], otherwise
> it's not possible to determine which tasks are in a group when it
> doesn't have a unique HW ID value.
> 
> Also this is required for shared assignment so that changing a group's
> IDs in a domain only requires updating running tasks rather than
> needing to search the entire task list, which would lead to the same
> problem we encountered in mongroup rename[2].

Having a pointer to the rdtgroup in the task structure does make
file system operations easier. But the cost appears to be more
complexity (and memory references) in the context switch code.

Your patch[1] seems to do some extra work outside of the static_branch
protected sections. So has a cost to context switch even if resctrl
is not in use.

Chasing pointers "closid = rgrp->mon.parent->closid;" could be
expensive when those miss in the cache.

> 
> -Peter
> 
> [1] https://lore.kernel.org/lkml/20240325172707.73966-5-peternewman@google.com/
> [2] https://lore.kernel.org/lkml/CALPaoCh0SbG1+VbbgcxjubE7Cc2Pb6QqhG3NH6X=WwsNfqNjtA@mail.gmail.com/

-Tony
Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 6 months, 3 weeks ago
Hi Peter,

On 5/22/25 1:47 AM, Peter Newman wrote:
> Hi Tony, Reinette,
> 
> On Thu, May 22, 2025 at 2:21 AM Luck, Tony <tony.luck@intel.com> wrote:
>>
>>>>>> There's also the mongroup-RMID overcommit use case I described
>>>>>> above[1]. On Intel we can safely assume that there are counters to
>>>>>> back all RMIDs, so num_mbm_cntrs would be calculated directly from
>>>>>> num_rmids.
>>>>>
>>>>> This is about the:
>>>>>    There's now more interest in Google for allowing explicit control of
>>>>>    where RMIDs are assigned on Intel platforms. Even though the number of
>>>>>    RMIDs implemented by hardware tends to be roughly the number of
>>>>>    containers they want to support, they often still need to create
>>>>>    containers when all RMIDs have already been allocated, which is not
>>>>>    currently allowed. Once the container has been created and starts
>>>>>    running, it's no longer possible to move its threads into a monitoring
>>>>>    group whenever RMIDs should become available again, so it's important
>>>>>    for resctrl to maintain an accurate task list for a container even
>>>>>    when RMIDs are not available.
>>>>>
>>>>> I see a monitor group as a collection of tasks that need to be monitored together.
>>>>> The "task list" is the group of tasks that share a monitoring ID that
>>>>> is required to be a valid ID since when any of the tasks are scheduled that ID is
>>>>> written to the hardware. I intentionally tried to not use RMID since I believe
>>>>> this is required for all archs.
>>>>> I thus do not understand how a task can start running when it does not have
>>>>> a valid monitoring ID. The idea of "deferred assignment" is not clear to me,
>>>>> there can never be "unmonitored tasks", no? I think I am missing something here.
> 
> You are correct. I did forget to mention something...
> 
>>>>
>>>> In the AMD/RMID implemenentation this might be achieved with something
>>>> extra in the task structure to denote whether a task is in a monitored
>>>> group or not. E.g. We add "task->rmid_valid" as well as "task->rmid".
>>>> Tasks in an unmonitored group retain their "task->rmid" (that's what
>>>> identifies them as a member of a group) but have task->rmid_valid set
>>>> to false.  Context switch code would be updated to load "0" into the
>>>> IA32_PQR_ASSOC.RMID field for tasks without a valid RMID. So they
>>>> would still be monitored, but activity would be bundled with all
>>>> tasks in the default resctrl group.
>>>>
>>>> Presumably something analogous could be done for ARM/MPAM.
>>>>
>>>
>>> I do not interpret this as an unmonitored task but instead a task that
>>> belongs to the default resource group. Specifically, any data accumulated by
>>> such a task is attributed to the default resource group. Having tasks
>>> in a separate group but their monitoring data accumulating in/contributed to
>>> the default resource group (that has its own set of tasks) sounds wrong to me.
>>> Such an implementation makes any monitoring data of default resource group
>>> invalid, and by extension impossible to use default resource group to manage
>>> an allocation for a group of monitor groups if user space needs insight
>>> in monitoring data across all these monitor groups. User space will need to
>>> interact with resctrl differently and individually query monitor groups instead
>>> of CTRL_MON group once.
>>
>> Maybe assign one of the limited supply of RMIDs for these "unmonitored"
>> tasks. Populate a resctrl group named "unmonitored" that lists all the
>> unmonitored tasks in a (read-only) "tasks" file. And supply all the counts
>> for these tasks in normal looking "mon_data" directory.
> 
> I needed to switch to an rdtgroup struct pointer rather than hardware
> IDs in the task structure to indicate group membership[1], otherwise
> it's not possible to determine which tasks are in a group when it
> doesn't have a unique HW ID value.

Whether the task struct contains a pointer (albeit accompanied with its
own complexities) does not address the issue that I am concerned about.

Looking at [1] I expect this new feature handles "unmonitored" groups by
placing them in the default monitoring group, following Tony's first [3]
suggestion.

When considering [1] by itself in the context of current resctrl all tasks
should be members of resource groups that have valid HW monitoring IDs allocated.
Using the default resource group in this way seems like addressing edge cases
where pointer is not yet valid (unclear what these scenarios may be) instead of
routing many tasks to the default group. I am not sure and I'll have to study
that change closer to reason accurately.

From what I understand the new proposal that builds on [1] involves creating
new monitor groups that are "unmonitored" for any length of time and when backed
by the implementation in [1] this would mean these groups will actually
still be monitored but the data attributed to the default resource group.

As I mentioned in response [4] to Tony this fundamentally changes the
behavior users can expect from the default resource group. In addition,
this breaks the first of the "Resource monitoring rules" from
Documentation/filesystems/resctrl.rst:

1) If a task is a member of a MON group, or non-default CTRL_MON group          
   then RDT events for the task will be reported in that group.  

How does this fit with the ABMC work? I continue to think that I am missing
parts of the discussion as it seems this new feature discussion mixed in
with ABMC work.

Reinette

> 
> Also this is required for shared assignment so that changing a group's
> IDs in a domain only requires updating running tasks rather than
> needing to search the entire task list, which would lead to the same
> problem we encountered in mongroup rename[2].
> 
> -Peter
> 
> [1] https://lore.kernel.org/lkml/20240325172707.73966-5-peternewman@google.com/
> [2] https://lore.kernel.org/lkml/CALPaoCh0SbG1+VbbgcxjubE7Cc2Pb6QqhG3NH6X=WwsNfqNjtA@mail.gmail.com/
[3] https://lore.kernel.org/lkml/aC5lL_qY00vd8qp4@agluck-desk3/
[4] https://lore.kernel.org/lkml/a131e8ed-88b2-4fed-983b-5deea955a9a5@intel.com/
Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 6 months, 3 weeks ago
Hi Babu,

On 5/15/25 3:51 PM, Babu Moger wrote:
> 
> This series adds the support for Assignable Bandwidth Monitoring Counters
> (ABMC). It is also called QoS RMID Pinning feature
> 
> Series is written such that it is easier to support other assignable
> features supported from different vendors.
> 
> The feature details are documented in the  APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC). The documentation is available at
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> 
> The patches are based on top of commit
> 92a09c47464d0 (tag: v6.15-rc5, tip/irq/merge) Linux 6.15-rc5
> plus 
> https://lore.kernel.org/lkml/20250515165855.31452-1-james.morse@arm.com/
> 
> It is very clear these patches will go after James's resctrl FS/ARCH
> restructure. Hoping to avoid one review cycle due to the merge.
> 
> # Introduction
> 
> Users can create as many monitor groups as RMIDs supported by the hardware.
> However, bandwidth monitoring feature on AMD system only guarantees that

"bandwidth monitoring feature on AMD system" -> " the bandwidth monitoring
feature on AMD systems"? or "the bandwidth monitoring feature on an AMD system".
Not sure.

> RMIDs currently assigned to a processor will be tracked by hardware.
> The counters of any other RMIDs which are no longer being tracked will be
> reset to zero. The MBM event counters return "Unavailable" for the RMIDs
> that are not tracked by hardware. So, there can be only limited number of
> groups that can give guaranteed monitoring numbers. With ever changing
> configurations there is no way to definitely know which of these groups
> are being tracked for certain point of time. Users do not have the option

"for certain point of time" -> "during a particular/certain(?) time"?

> to monitor a group or set of groups for certain period of time without

"for certain period of time" -> "for a certain period of time"?

This series contains many duplicate snippets. When you update one, please
check that all the duplicates are updated also.


> worrying about counter being reset in between.
>     
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID, event pair and monitor the bandwidth as long as it is
> assigned.  The assigned RMID will be tracked by the hardware until the user
> unassigns it manually. There is no need to worry about counters being reset
> during this period. Additionally, the user can specify a bitmask identifying
> the specific bandwidth types from the given source to track with the counter.

Instead of tacking it on as an "additionally" I see this capability now as essential
to this new implementation. I tried to give this series a thorough review to help finalize
this work but I kept being turned around by all the descriptions and finally it dawned
that all the descriptions are at their code still based on the original "event ID"
based implementation with either a small append or as little change as possible to
adjust to the "extended event ID" based implementation. 

The previous implementation still used (and copy&pasted many times) in these descriptions
as "assign a hardware counter to an RMID, event pair" can only be accurate for this new
implementation if an event is re-defined ... it is no longer the original constrained
"event IDs" but instead an MBM event has become a generic name that identifies the
configurable "bandwidth types" (but, see note about terminology later) to be monitored. 
This is never done.

I assume "the given source" is the assigned RMID? If so I think it will help to
understand if this is specific: "bandwidth types from the assigned RMID ..."

I find this series to use several terms for the same concept,
for example, "bandwidth types", "memory transactions", "types of L3 transactions",
"bandwidth sources", etc. This work will be easier to consume if it uses consistent
and specific terminology.

> Without ABMC enabled, monitoring will work in current 'default' mode without
> assignment option.
> 
> # History
> 
> Earlier implementation of ABMC had dependancy on BMEC (Bandwidth Monitoring
> Event Configuration). Peter had concerns with that implementation because
> it may be not be compatible with ARM's MPAM.
> 
> Here are the threads discussing the concerns and new interface to address the concerns.
> https://lore.kernel.org/lkml/CALPaoCg97cLVVAcacnarp+880xjsedEWGJPXhYpy4P7=ky4MZw@mail.gmail.com/
> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
> 
> Here are the finalized requirements based on the discussion:
> 
> *   Remove BMEC dependency on the ABMC feature.

Even stronger, BMEC and ABMC are now "incompatible" in that resctrl will not let them be used
at the same time.

> 
> *   Eliminate global assignment listing. The interface
>     /sys/fs/resctrl/info/L3_MON/mbm_assign_control is no longer required.
> 
> *   Create the configuration directories at /sys/fs/resctrl/info/L3_MON/counter_configs/.
>     The configuration file names should be free-form, allowing users to create them as needed.
> 
> *   Perform assignment listing at the group level by introducing mbm_L3_assignments

"the group level" -> "the monitoring group level"

>     in each monitoring group. The listing should provide the following details:
> 
>     Event Configuration: Specifies the event configuration applied. This will be crucial
>     when "mkdir" on event configuration is added in the future, leading to the creation
>     of mon_data/mon_l3_*/<event configuration>.

hmmm ... sounds like it has become more natural to refer to it as "event configuration", which is
a good match for what the purpose is. This thus sounds like good motivation to change "counter_configs"
to "event_configs".

> 
>     Domains: Identifies the domains where the configuration is applied, supporting multi-domain setups.
> 
>     Assignment Type: Indicates whether the assignment is Exclusive (e or d), Shared (s), or Unassigned (_).

Could you please add definition of what "exclusive" and "shared" means?

> 
> *   Provide option to enable or disable auto assignment when new group is created.
> 
> This series tries to address all the requirements listed above.
> 
> # Implementation details
> 
> Create a generic interface aimed to support user space assignment of scarce
> counters used for monitoring. First usage of interface is by ABMC with option
> to expand usage to "soft-ABMC" and MPAM counters in future.
> 
> Feature adds following interface files:
> 
> /sys/fs/resctrl/info/L3_MON/mbm_assign_mode: Reports the list of assignable
> monitoring features supported. The enclosed brackets indicate which
> feature is enabled.
> 
> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
> counters available for assignment.

Please aim to use consistent and clear terms to help understand this work. It is
confusing that above uses "available" in description for num_mbm_cntrs and then below
there is a new interface "available_mbm_cntrs" that uses the "available" term in name
but not description.

> 
> /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs: Reports the number of monitoring
> counters free in each domain.
> 
> /sys/fs/resctrl/info/L3_MON/counter_configs : Directory to hold the counter configuration.

Everywhere else seems to refer to this as "event configurations". Please just stick to one,
"event configuration" seems most appropriate.

> 
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter : Default configuration
> for MBM total events.

I think "default" should be dropped to make it clear that this is the actual configuration
that is always used, not a static "default" that may be used in "some" circumstances.

> 
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter : Default configuration
> for MBM local events.

Same wrt "default"

> 
> /sys/fs/resctrl/mbm_L3_assignments: Interface to list or modify assignment states on each group.

"Per monitor group interface to list or modify counters assigned to the group."? (Please improve.)

> 
> # Examples
> 
> a. Check if ABMC support is available

Please drop the "ABMC" from all the descriptions since this is intended to be a generic interface.

> 	#mount -t resctrl resctrl /sys/fs/resctrl/
> 
> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> 	[mbm_cntr_assign]
> 	default
> 

I believe the naming has been finalized in
https://lore.kernel.org/lkml/7628cec8-5914-4895-8289-027e7821777e@amd.com/.

> 	ABMC feature is detected and it is enabled.
> 
> b. Check how many ABMC counters are available. 

available -> supported? This will help distinguish it from the
next interface file named "available_mbm_cntrs".

> 
> 	# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
> 	32

Please update to reflect what implementation does.

> 
> c. Check how many ABMC counters are available in each domain.

"available" -> "available for assignment"

> 
> 	# cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs 
> 	0=30;1=30
> 
> d. Check default counter configuration.
> 
> 	# cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter 
> 	local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
>         local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
> 
> 	# cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter 
> 	local_reads, local_non_temporal_writes, local_reads_slow_memory

Does not look like this matches implementation wrt spacing?

> 
> e. Series adds a new interface file "mbm_L3_assignments" in each monitoring group
>    to list and modify any group's monitoring states.

"any group's" -> "that group's"

> 
> 	The list is displayed in the following format:
> 
>         <Event configuration>:<Domain id>=<Assignment type>
> 
>         Event configuration: A valid event configuration listed in the
>         /sys/fs/resctrl/info/L3_MON/counter_configs directory.
> 
>         Domain ID: A valid domain ID number.

"A valid domain ID number" -> "A valid domain ID"

> 
>         Assignment types:
> 
>         _ : No event configuration assigned
> 
>         e : Event configuration assigned in exclusive mode
> 
> 	To list the default group states:
> 	# cat /sys/fs/resctrl/mbm_L3_assignments
> 	mbm_total_bytes:0=e;1=e
> 	mbm_local_bytes:0=e;1=e
> 
> 	To unassign the configuration of mbm_total_bytes on domain 0:

This unassigns a counter, as opposed to a configuration, no? How about
"To unassign the counter associated with the mbm_total_bytes event"?

> 	#echo "mbm_total_bytes:0=_" > mbm_L3_assignments
> 	#cat mbm_L3_assignments

(May help to follow if the examples consistently uses full path.)

> 	mbm_total_bytes:0=_;1=e
> 	mbm_local_bytes:0=e;1=e
> 
> 	To unassign the mbm_total_bytes configuration on all domains:

same wrt unassigning a counter

>     	$echo "mbm_total_bytes:*=_" > mbm_L3_assignments
> 	$cat mbm_L3_assignments

# prompt is usually used for administrator and $ for user without
administrator privileges. Switching between # and $ in these examples 
is confusing.

> 	mbm_total_bytes:0=_;1=_
> 	mbm_local_bytes:0=e;1=e
> 
> 	To assign the mbm_total_bytes configuration on all domains in exclusive mode:

same wrt unassigning a counter

>     	$echo "mbm_total_bytes:*=e" > mbm_L3_assignments
> 	$cat mbm_L3_assignments
> 	mbm_total_bytes:0=e;1=e
> 	mbm_local_bytes:0=e;1=e
> 
> g. Read the events mbm_total_bytes and mbm_local_bytes of the default group.
>    There is no change in reading the events with ABMC. If the event is unassigned
>    when reading, then the read will come back as "Unassigned".
> 	
> 	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 	779247936
> 	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 
> 	765207488
> 	
> h. Check the default event configurations.
> 
> 	#cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
> 	local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
> 	local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
> 
> 	#cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> 	local_reads, local_non_temporal_writes, local_reads_slow_memory
> 
> i. Change the event configuration for mbm_local_bytes.
> 
> 	#echo "local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads" >
> 	/sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> 
> 	#cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> 	local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads
> 	
>         This will update the assignments where mbm_local_bytes are configured.

"This will update all (across all domains of all monitor groups) counter assignments 
associated with the mbm_local_bytes event." (Please improve).

> 	
> j. Now read the total event again. The first read may come back with "Unavailable"
>    status. The subsequent read of mbm_total_bytes will display only the read events.

Was this intended to be example of reading *local* bytes after modification in previous step?

> 	
> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 	Unavailable
> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 	314101
> 
> k. Users will have the option to go back to 'default' mbm_assign_mode if required.

"Users will have the option" -> "Users have the option"

>    This can be done using the following command. Note that switching the
>    mbm_assign_mode will reset all the MBM counters of all resctrl groups.

"all the MBM counters " -> "all the MBM counters (and thus all MBM events)"? 

> 
> 	# echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> 	mbm_cntr_assign
> 	[default]
> 	
> l. Unmount the resctrl
> 	 
> 	#umount /sys/fs/resctrl/
> ---

Reinette