[PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)

Babu Moger posted 23 patches 1 year ago
There is a newer version of this series
.../admin-guide/kernel-parameters.txt         |   2 +-
Documentation/arch/x86/resctrl.rst            | 242 +++++++
arch/x86/include/asm/cpufeatures.h            |   1 +
arch/x86/include/asm/msr-index.h              |   2 +
arch/x86/kernel/cpu/cpuid-deps.c              |   3 +
arch/x86/kernel/cpu/resctrl/core.c            |  23 +-
arch/x86/kernel/cpu/resctrl/ctrlmondata.c     |  13 +
arch/x86/kernel/cpu/resctrl/internal.h        |  91 ++-
arch/x86/kernel/cpu/resctrl/monitor.c         | 402 +++++++++++-
arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 620 ++++++++++++++++--
arch/x86/kernel/cpu/scattered.c               |   1 +
include/linux/resctrl.h                       |  34 +-
12 files changed, 1350 insertions(+), 84 deletions(-)
[PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Babu Moger 1 year ago
This series adds the support for Assignable Bandwidth Monitoring Counters
(ABMC). It is also called QoS RMID Pinning feature

Series is written such that it is easier to support other assignable
features supported from different vendors.

The feature details are documented in the  APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC). The documentation is available at
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

The patches are based on top of commit
d361b84d51bfe (tip/master) Merge branch into tip/master: 'x86/tdx'

# Introduction

Users can create as many monitor groups as RMIDs supported by the hardware.
However, bandwidth monitoring feature on AMD system only guarantees that
RMIDs currently assigned to a processor will be tracked by hardware.
The counters of any other RMIDs which are no longer being tracked will be
reset to zero. The MBM event counters return "Unavailable" for the RMIDs
that are not tracked by hardware. So, there can be only limited number of
groups that can give guaranteed monitoring numbers. With ever changing
configurations there is no way to definitely know which of these groups
are being tracked for certain point of time. Users do not have the option
to monitor a group or set of groups for certain period of time without
worrying about counter being reset in between.
    
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID, event pair and monitor the bandwidth as long as it is
assigned.  The assigned RMID will be tracked by the hardware until the user
unassigns it manually. There is no need to worry about counters being reset
during this period. Additionally, the user can specify a bitmask identifying
the specific bandwidth types from the given source to track with the counter.

Without ABMC enabled, monitoring will work in current 'default' mode without
assignment option.

# Linux Implementation

Create a generic interface aimed to support user space assignment
of scarce counters used for monitoring. First usage of interface
is by ABMC with option to expand usage to "soft-ABMC" and MPAM
counters in future.

Feature adds following interface files:

/sys/fs/resctrl/info/L3_MON/mbm_assign_mode: Reports the list of assignable
monitoring features supported. The enclosed brackets indicate which
feature is enabled.

/sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
counters available for assignment.

/sys/fs/resctrl/info/L3_MON/available_mbm_cntrs: Reports the number of monitoring
counters free in each domain.

/sys/fs/resctrl/info/L3_MON/mbm_assign_control: Reports the resctrl group and monitor
status of each group. Assignment state can be updated by writing to the
interface.

# Examples

a. Check if ABMC support is available
	#mount -t resctrl resctrl /sys/fs/resctrl/

	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
	[mbm_cntr_assign]
	default

	ABMC feature is detected and it is enabled.

b. Check how many ABMC counters are available. 

	# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
	32

c. Check how many ABMC counters are available in each domain.

	# cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs 
	0=30;1=30

d. Create few resctrl groups.

	# mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp

e. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
   to list and modify any group's monitoring states. File provides single place
   to list monitoring states of all the resctrl groups. It makes it easier for
   user space to learn about the used counters without needing to traverse all
   the groups thus reducing the number of file system calls.

	The list follows the following format:

	"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"

	Format for specific type of groups:

	* Default CTRL_MON group:
	 "//<domain_id>=<flags>"

       * Non-default CTRL_MON group:
               "<CTRL_MON group>//<domain_id>=<flags>"

       * Child MON group of default CTRL_MON group:
               "/<MON group>/<domain_id>=<flags>"

       * Child MON group of non-default CTRL_MON group:
               "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"

       Flags can be one of the following:

        t  MBM total event is enabled.
        l  MBM local event is enabled.
        tl Both total and local MBM events are enabled.
        _  None of the MBM events are enabled

	Examples:

	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control 
	non_default_ctrl_mon_grp//0=tl;1=tl
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl
	//0=tl;1=tl
	/child_default_mon_grp/0=tl;1=tl
	
	There are four groups and all the groups have local and total
	event enabled on domain 0 and 1.

f. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control.

 	The write format is similar to the above list format with addition
	of opcode for the assignment operation.
    	“<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”
	
	* Default CTRL_MON group:
	        "//<domain_id><opcode><flags>"
	
	* Non-default CTRL_MON group:
	        "<CTRL_MON group>//<domain_id><opcode><flags>"
	
	* Child MON group of default CTRL_MON group:
	        "/<MON group>/<domain_id><opcode><flags>"
	
	* Child MON group of non-default CTRL_MON group:
	        "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
	
	Opcode can be one of the following:
	
	= Update the assignment to match the flags.
	+ Assign a new MBM event without impacting existing assignments.
	- Unassign a MBM event from currently assigned events.

	Flags can be one of the following:

        t  MBM total event.
        l  MBM local event.
        tl Both total and local MBM events.
        _  None of the MBM events. Only works with '=' opcode. This flag cannot be combined with other flags.
	
	Initial group status:
	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
	non_default_ctrl_mon_grp//0=tl;1=tl
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl
	//0=tl;1=tl
	/child_default_mon_grp/0=tl;1=tl

	To update the default group to enable only total event on domain 0:
	# echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

	Assignment status after the update:
	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
	non_default_ctrl_mon_grp//0=tl;1=tl
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl
	//0=t;1=tl
	/child_default_mon_grp/0=tl;1=tl

	To update the MON group child_default_mon_grp to remove total event on domain 1:
	# echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

	Assignment status after the update:
	$ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
	non_default_ctrl_mon_grp//0=tl;1=tl
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl
	//0=t;1=tl
	/child_default_mon_grp/0=tl;1=l

	To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
	remove both local and total events on domain 1:
	# echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
	       /sys/fs/resctrl/info/L3_MON/mbm_assign_control

	Assignment status after the update:
	$ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
	non_default_ctrl_mon_grp//0=tl;1=tl
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_
	//0=t;1=tl
	/child_default_mon_grp/0=tl;1=l

	To update the default group to add a local event domain 0.
	# echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

	Assignment status after the update:
	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
	non_default_ctrl_mon_grp//0=tl;1=tl
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_
	//0=tl;1=tl
	/child_default_mon_grp/0=tl;1=l

	To update the non default CTRL_MON group non_default_ctrl_mon_grp to unassign all
	the MBM events on all the domains.
	# echo "non_default_ctrl_mon_grp//*=_" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

	Assignment status after the update:
	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
	non_default_ctrl_mon_grp//0=_;1=_
	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_
	//0=tl;1=tl
	/child_default_mon_grp/0=tl;1=l

g. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
   There is no change in reading the events with ABMC. If the event is unassigned
   when reading, then the read will come back as "Unassigned".
	
	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	779247936
	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 
	765207488
	
h. Check the bandwidth configuration for the group. Note that bandwidth
   configuration has a domain scope. Total event defaults to 0x7F (to
   count all the events) and local event defaults to 0x15 (to count all
   the local numa events). The event bitmap decoding is available at
   https://www.kernel.org/doc/Documentation/x86/resctrl.rst
   in section "mbm_total_bytes_config", "mbm_local_bytes_config":
	
	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
	0=0x7f;1=0x7f
	
	#cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config 
	0=0x15;1=0x15
	
i. Change the bandwidth source for domain 0 for the total event to count only reads.
   Note that this change effects total events on the domain 0.
	
	#echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
	0=0x33;1=0x7F
	
j. Now read the total event again. The first read may come back with "Unavailable"
   status. The subsequent read of mbm_total_bytes will display only the read events.
	
	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	Unavailable
	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	314101

k. Users will have the option to go back to 'default' mbm_assign_mode if required.
   This can be done using the following command. Note that switching the
   mbm_assign_mode will reset all the MBM counters of all resctrl groups.

	# echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
	mbm_cntr_assign
	[default]
	
l. Unmount the resctrl
	 
	#umount /sys/fs/resctrl/
---
v11:
   The commit 2937f9c361f7a ("x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags")
   is already merged. Removed from the series.
   
   Resolved minor conflicts due to code displacement in latest code.
 
   Moved the monitoring related calls to monitor.c file when possible.
   Moved some of the changes from include/linux/resctrl.h to arch/x86/kernel/cpu/resctrl/internal.h
   as requested by Reinette. This changes will be moved back when arch and non code is separated.
   
   Renamed rdtgroup_mbm_assign_mode_show() to resctrl_mbm_assign_mode_show().
   Renamed rdtgroup_num_mbm_cntrs_show() to resctrl_num_mbm_cntrs_show().

   Moved the mon_config_info structure definition to internal.h.
   Moved resctrl_arch_mon_event_config_get() and resctrl_arch_mon_event_config_set()
   to monitor.c file.

   Moved resctrl_arch_assign_cntr() and resctrl_abmc_config_one_amd() to monitor.c.
   Added the code to reset the arch state in resctrl_arch_assign_cntr().
   Also removed resctrl_arch_reset_rmid() inside IPI as the counters are reset from the callers.

   Renamed rdtgroup_assign_cntr_event() to resctrl_assign_cntr_event().
   Refactored the resctrl_assign_cntr_event().
   Added functionality to exit on the first error during assignment.
   Simplified mbm_cntr_free().
   Removed the function mbm_cntr_assigned(). Will be using mbm_cntr_get() to
   figure out if the counter is assigned or not.
   
   Renamed rdtgroup_unassign_cntr_event() to resctrl_unassign_cntr_event().
   Refactored the resctrl_unassign_cntr_event().

   Moved mbm_cntr_reset() to monitor.c.
   Added code reset non-architectural state in mbm_cntr_reset().
   Added missing rdtgroup_unassign_cntrs() calls on failure path.

   Domain can be NULL with SNC support so moved the unassign check in rdtgroup_mondata_show().

   Renamed rdtgroup_mbm_assign_mode_write() to resctrl_mbm_assign_mode_write().
   Added more details in resctrl.rst about mbm_cntr_assign mode.
   Re-arranged the text in resctrl.rst file in section mbm_cntr_assign.

   Moved resctrl_arch_mbm_cntr_assign_set_one() to monitor.c

   Added non-arch RMID reset in mbm_config_write_domain().
   Removed resctrl_arch_reset_rmid() call in resctrl_abmc_config_one_amd(). Not required
   as reset of arch and non-arch rmid counters done from the callers. It simplies the IPI code.

   Fixed printing the separator after each domain while listing the group assignments.
   Renamed rdtgroup_mbm_assign_control_show to resctrl_mbm_assign_control_show().

   Fixed the static check warning with initializing dom_id in resctrl_process_flags()

   Added change log in each patch for specific changes.

v10:
   Major change is related to domain specific assignment.
   Added struct mbm_cntr_cfg inside mon domains. This will handle
   the domain specific assignments as discussed in below.
   https://lore.kernel.org/lkml/CALPaoCj+zWq1vkHVbXYP0znJbe6Ke3PXPWjtri5AFgD9cQDCUg@mail.gmail.com/
   I did not see the need to add cntr_id in mbm_state structure. Not used in the code.
   Following patches take care of these changes.
   Patch 12, 13, 15, 16, 17, 18.
   
   Added __init attribute to cache_alloc_hsw_probe(). Followed function
   prototype rules (preferred order is storage class before return type).
   
   Moved the mon_config_info structure definition to resctrl.h
   
   Added call resctrl_arch_reset_rmid() to reset the RMID in the domain inside IPI call
   resctrl_abmc_config_one_amd.
   
   SMP and non-SMP call support is not required in resctrl_arch_config_cntr with new
   domain specific assign approach/data structure.
   
   Assigned the counter before exposing the event files.
   Moved the call rdtgroup_assign_cntrs() inside mkdir_rdt_prepare_rmid_alloc().
   This is called both CNTR_MON and MON group creation.
   
   Call mbm_cntr_reset() when unmounted to clear all the assignments.
   
   Fixed the issue with finding the domain in multiple iterations in rdtgroup_process_flags().
   
   Printed full error message with domain information when assign fails.
   
   Taken care of other text comments in all the patches. Patch specific changes are in each patch.
   
   If I missed something please point me and it is not intentional.

v9:
   Patch 14 is a new addition. 
   Major change in patch 24.
   Moved the fix patch to address __init attribute to begining of the series.
   Fixed all the call sequences. Added additional Fixed tags.

   Added Reviewed-by where applicable.

   Took care of couple of minor merge conflicts with latest code.
   Re-ordered the MSR in couple of instances.
   Added available_mbm_cntrs (patch 14) to print the number of counter in a domain.

   Used MBM_EVENT_ARRAY_INDEX macro to get the event index.
   Introduced rdtgroup_cntr_id_init() to initialize the cntr_id

   Introduced new function resctrl_config_cntr to assign the counter, update
   the bitmap and reset the architectural state.
   Taken care of error handling(freeing the counter) when assignment fails.
  
   Changed rdtgroup_assign_cntrs() and rdtgroup_unassign_cntrs() to return void.
   Updated couple of rdtgroup_unassign_cntrs() calls properly.

   Fixed problem changing the mode to mbm_cntr_assign mode when it is
   not supported. Added extra checks to detect if systems supports it.
   
   https://lore.kernel.org/lkml/03b278b5-6c15-4d09-9ab7-3317e84a409e@intel.com/
   As discussed in the above comment, introduced resctrl_mon_event_config_set to
   handle IPI. But sending another IPI inside IPI causes problem. Kernel
   reports SMP warning. So, introduced resctrl_arch_update_cntr() to send the
   command directly.

   Fixed handling special case '//0=' and '//".
   Removed extra strstr() call in rdtgroup_mbm_assign_control_write().
   Added generic failure text when assignment operation fails.
   Corrected user documentation format texts.

v8:
  Patches are getting into final stages. 
  Couple of changes Patch 8, Patch 19 and Patch 23.
  Most of the other changes are related to rename and text message updates.

  Details are in each patch. Here is the summary.

  Added __init attribute to dom_data_init() in patch 8/25.
  Moved the mbm_cntrs_init() and mbm_cntrs_exit() functionality inside
  dom_data_init() and dom_data_exit() respectively.

  Renamed resctrl_mbm_evt_config_init() to arch_mbm_evt_config_init()
  Renamed resctrl_arch_event_config_get() to resctrl_arch_mon_event_config_get().
          resctrl_arch_event_config_set() to resctrl_arch_mon_event_config_set().

  Rename resctrl_arch_assign_cntr to resctrl_arch_config_cntr.
  Renamed rdtgroup_assign_cntr() to rdtgroup_assign_cntr_event().
  Added the code to return the error if rdtgroup_assign_cntr_event fails.
  Moved definition of MBM_EVENT_ARRAY_INDEX to resctrl/internal.h.
  Renamed rdtgroup_mbm_cntr_is_assigned to mbm_cntr_assigned_to_domain
  Added return error handling in resctrl_arch_config_cntr().
  Renamed rdtgroup_assign_grp to rdtgroup_assign_cntrs.
  Renamed rdtgroup_unassign_grp to rdtgroup_unassign_cntrs.
  Fixed the problem with unassigning the child MON groups of CTRL_MON group.
  Reset the internal counters after mbm_cntr_assign mode is changed.
  Renamed rdtgroup_mbm_cntr_reset() to mbm_cntr_reset()
  Renamed resctrl_arch_mbm_cntr_assign_configure to
            resctrl_arch_mbm_cntr_assign_set_one.

  Used the same IPI as event update to modify the assignment.
  Could not do the way we discussed in the thread.
  https://lore.kernel.org/lkml/f77737ac-d3f6-3e4b-3565-564f79c86ca8@amd.com/
  Needed to figure out event type to update the configuration.

  Moved unassign first and assign during the assign modification.
  Assign none "_" takes priority. Cannot be mixed with other flags.
  Updated the documentation and .rst file format. htmldoc looks ok.

v7:
   Major changes are related to FS and arch codes separation.
   Changed few interface names based on feedback.
   Here are the summary and each patch contains changes specific the patch.

   Removed WARN_ON for num_mbm_cntrs. Decided to dynamically allocate the bitmap.
   WARN_ON is not required anymore.
 
   Renamed the function resctrl_arch_get_abmc_enabled() to resctrl_arch_mbm_cntr_assign_enabled().

   Merged resctrl_arch_mbm_cntr_assign_disable, resctrl_arch_mbm_cntr_assign_disable
   and renamed to resctrl_arch_mbm_cntr_assign_set(). Passed the struct rdt_resource
   to these functions.

   Removed resctrl_arch_reset_rmid_all() from arch code. This will be done from FS the caller.

   Updated the descriptions/commit log in resctrl.rst to generic text. Removed ABMC references.
   Renamed mbm_mode to mbm_assign_mode.
   Renamed mbm_control to  mbm_assign_control.
   Introduced mutex lock in rdtgroup_mbm_mode_show().
 
   The 'legacy' mode is called 'default' mode. 

   Removed the static allocation and now allocating bitmap mbm_cntr_free_map dynamically.

   Merged rdtgroup_assign_cntr(), rdtgroup_alloc_cntr() into one.
   Merged rdtgroup_unassign_cntr(), rdtgroup_free_cntr() into one.
   
  Added struct rdt_resource to the interface functions resctrl_arch_assign_cntr ()
  and resctrl_arch_unassign_cntr().
  Rename rdtgroup_abmc_cfg() to resctrl_abmc_config_one_amd().
   
  Added a new patch to fix counter assignment on event config changes.

  Removed the references of ABMC from user interfaces.

  Simplified the parsing (strsep(&token, "//") in rdtgroup_mbm_assign_control_write().
  Added mutex lock in rdtgroup_mbm_assign_control_write() while processing.

  Thomas Gleixner asked us to update  https://gitlab.com/x86-cpuid.org/x86-cpuid-db. 
  It needs internal approval. We are working on it.

v6:
  We still need to finalize few interface details on mbm_assign_mode and mbm_assign_control
  in case of ABMC and Soft-ABMC. We can continue the discussion with this series.

  Added support for domain-id '*' to update all the domains at once.
  Fixed assign interface to allocate the counter if counter is
  not assigned.   
  Fixed unassign interface to free the counter if the counter is not
  assigned in any of the domains.

  Renamed abmc_capable to mbm_cntr_assignable.

  Renamed abmc_enabled to mbm_cntr_assign_enabled.
  Used msr_set_bit and msr_clear_bit for msr updates.
  Renamed resctrl_arch_abmc_enable() to resctrl_arch_mbm_cntr_assign_enable().
  Renamed resctrl_arch_abmc_disable() to resctrl_arch_mbm_cntr_assign_disable().

  Changed the display name from num_cntrs to num_mbm_cntrs.

  Removed the variable mbm_cntrs_free_map_len. This is not required.
  Removed the call mbm_cntrs_init() in arch code. This needs to be done at higher level.
  Used DECLARE_BITMAP to initialize mbm_cntrs_free_map.
  Removed unused config value definitions.

  Introduced mbm_cntr_map to track counters at domain level. With this
  we dont need to send MSR read to read the counter configuration.

  Separated all the counter id management to upper level in FS code.

  Added checks to detect "Unassigned" before reading the RMID.

  More details in each patch.

v5:
  Rebase changes (because of SNC support)

  Interface changes.
   /sys/fs/resctrl/mbm_assign to /sys/fs/resctrl/mbm_assign_mode.
   /sys/fs/resctrl/mbm_assign_control to /sys/fs/resctrl/mbm_assign_control.

  Added few arch specific routines.
  resctrl_arch_get_abmc_enabled.
  resctrl_arch_abmc_enable.
  resctrl_arch_abmc_disable.

  Few renames
   num_cntrs_free_map -> mbm_cntrs_free_map
   num_cntrs_init -> mbm_cntrs_init
   arch_domain_mbm_evt_config -> resctrl_arch_mbm_evt_config

  Introduced resctrl_arch_event_config_get and
    resctrl_arch_event_config_set() to update event configuration.

  Removed mon_state field mongroup. Added MON_CNTR_UNSET to initialize counters.

  Renamed ctr_id to cntr_id for the hardware counter.
 
  Report "Unassigned" in case the user attempts to read the events without assigning the counter.
  
  ABMC is enabled during the boot up. Can be enabled or disabled later.

  Fixed opcode and flags combination.
    '=_" is valid.
    "-_" amd "+_" is not valid.

 Added all the comments as far as I know. If I missed something, it is not intentional.

v4: 
  Main change is domain specific event assignment.
  Kept the ABMC feature as a default.
  Dynamcic switching between ABMC and mbm_legacy is still allowed.
  We are still not clear about mount option.
  Moved the monitoring related data in resctrl_mon structure from rdt_resource.
  Fixed the display of legacy and ABMC mode.
  Used bimap APIs when possible.
  Removed event configuration read from MSRs. We can use the
  internal saved data.(patch 12)
  Added more comments about L3_QOS_ABMC_CFG MSR.
  Added IPIs to read the assignment status for each domain (patch 18 and 19)
  More details in each patch.

v3:
   This series adds the support for global assignment mode discussed in
   the thread. https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/
   Removed the individual assignment mode and included the global assignment interface.
   Added following interface files.
   a. /sys/fs/resctrl/info/L3_MON/mbm_assign
      Used for displaying the current assignment mode and switch between
      ABMC and legacy mode.
   b. /sys/fs/resctrl/info/L3_MON/mbm_assign_control
      Used for lising the groups assignment mode and modify the assignment states.
   c. Most of the changes are related to the new interface.
   d. Addressed the comments from Reinette, James and Peter.
   e. Hope I have addressed most of the major feedbacks discussed. If I missed
      something then it is not intentional. Please feel free to comment.
   f. Sending this as an RFC as per Reinette's comment. So, this is still open
      for discussion.

v2:
   a. Major change is the way ABMC is enabled. Earlier, user needed to remount
      with -o abmc to enable ABMC feature. Removed that option now.
      Now users can enable ABMC by "$echo 1 to /sys/fs/resctrl/info/L3_MON/mbm_assign_enable".
     
   b. Added new word 21 to x86/cpufeatures.h.

   c. Display unsupported if user attempts to read the events when ABMC is enabled
      and event is not assigned.

   d. Display monitor_state as "Unsupported" when ABMC is disabled.
  
   e. Text updates and rebase to latest tip tree (as of Jan 18).
 
   f. This series is still work in progress. I am yet to hear from ARM developers. 

--------------------------------------------------------------------------------------

Previous revisions:

v10: https://lore.kernel.org/lkml/cover.1734034524.git.babu.moger@amd.com/
v9: https://lore.kernel.org/lkml/cover.1730244116.git.babu.moger@amd.com/
v8: https://lore.kernel.org/lkml/cover.1728495588.git.babu.moger@amd.com/
v7: https://lore.kernel.org/lkml/cover.1725488488.git.babu.moger@amd.com/
v6: https://lore.kernel.org/lkml/cover.1722981659.git.babu.moger@amd.com/
v5: https://lore.kernel.org/lkml/cover.1720043311.git.babu.moger@amd.com/
v4: https://lore.kernel.org/lkml/cover.1716552602.git.babu.moger@amd.com/
v3: https://lore.kernel.org/lkml/cover.1711674410.git.babu.moger@amd.com/  
v2: https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/
v1: https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/


Babu Moger (23):
  x86/resctrl: Add __init attribute to functions called from
    resctrl_late_init()
  x86/cpufeatures: Add support for Assignable Bandwidth Monitoring
    Counters (ABMC)
  x86/resctrl: Add ABMC feature in the command line options
  x86/resctrl: Consolidate monitoring related data from rdt_resource
  x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  x86/resctrl: Add support to enable/disable AMD ABMC feature
  x86/resctrl: Introduce the interface to display monitor mode
  x86/resctrl: Introduce interface to display number of monitoring
    counters
  x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct
    rdt_hw_mon_domain
  x86/resctrl: Remove MSR reading of event configuration value
  x86/resctrl: Introduce mbm_cntr_cfg to track assignable counters at
    domain
  x86/resctrl: Introduce interface to display number of free counters
  x86/resctrl: Add data structures and definitions for ABMC assignment
  x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter
    with ABMC
  x86/resctrl: Add the functionality to assigm MBM events
  x86/resctrl: Add the functionality to unassigm MBM events
  x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is
    enabled
  x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign
    mode
  x86/resctrl: Introduce the interface to switch between monitor modes
  x86/resctrl: Configure mbm_cntr_assign mode if supported
  x86/resctrl: Update assignments on event configuration changes
  x86/resctrl: Introduce interface to list assignment states of all the
    groups
  x86/resctrl: Introduce interface to modify assignment states of the
    groups

 .../admin-guide/kernel-parameters.txt         |   2 +-
 Documentation/arch/x86/resctrl.rst            | 242 +++++++
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/msr-index.h              |   2 +
 arch/x86/kernel/cpu/cpuid-deps.c              |   3 +
 arch/x86/kernel/cpu/resctrl/core.c            |  23 +-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c     |  13 +
 arch/x86/kernel/cpu/resctrl/internal.h        |  91 ++-
 arch/x86/kernel/cpu/resctrl/monitor.c         | 402 +++++++++++-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 620 ++++++++++++++++--
 arch/x86/kernel/cpu/scattered.c               |   1 +
 include/linux/resctrl.h                       |  34 +-
 12 files changed, 1350 insertions(+), 84 deletions(-)

-- 
2.34.1

Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by James Morse 11 months, 3 weeks ago
Hi Babu,

On 22/01/2025 20:20, Babu Moger wrote:
> This series adds the support for Assignable Bandwidth Monitoring Counters
> (ABMC). It is also called QoS RMID Pinning feature
> 
> Series is written such that it is easier to support other assignable
> features supported from different vendors.
> 
> The feature details are documented in the  APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC). The documentation is available at
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> 
> The patches are based on top of commit
> d361b84d51bfe (tip/master) Merge branch into tip/master: 'x86/tdx'

I've rebased the MPAM tree on top of this v11, here:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/log/?h=mpam/abmc/v11
Hopefully this is sufficient evidence that this interface works for MPAM.

It would be convenient for MPAM platforms to not have to support a 'default' mode if they
are emulating ABMC - this was something that was never supported, and its not a problem
that can be solved. (comments on the relevant patches).


Thanks,

James
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Dave Martin 12 months ago
Hi there,

On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
> 
> This series adds the support for Assignable Bandwidth Monitoring Counters
> (ABMC). It is also called QoS RMID Pinning feature
> 
> Series is written such that it is easier to support other assignable
> features supported from different vendors.
> 
> The feature details are documented in the  APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC). The documentation is available at
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> 
> The patches are based on top of commit
> d361b84d51bfe (tip/master) Merge branch into tip/master: 'x86/tdx'
> 
> # Introduction

[...]

> # Examples
> 
> a. Check if ABMC support is available
> 	#mount -t resctrl resctrl /sys/fs/resctrl/
> 
> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> 	[mbm_cntr_assign]
> 	default

(Nit: can this be called "mbm_counter_assign"?  The name is already
long, so I wonder whether anything is gained by using a cryptic
abbreviation for "counter".  Same with all the "cntrs" elsewhere.
This is purely cosmetic, though -- the interface works either way.)

> 	ABMC feature is detected and it is enabled.
> 
> b. Check how many ABMC counters are available. 
> 
> 	# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
> 	32

Is this file needed?

With MPAM, it is more difficult to promise that the same number of
counters will be available everywhere.

Rather than lie, or report a "safe" value here that may waste some
counters, can we just allow the number of counters to be be discovered
per domain via available_mbm_cntrs?

num_closids and num_rmids are already problematic for MPAM, so it would
be good to avoid any more parameters of this sort from being reported
to userspace unless there is a clear understanding of why they are
needed.

Reporting number of counters per monitoring domain is a more natural
fit for MPAM, as below:

> c. Check how many ABMC counters are available in each domain.
> 
> 	# cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs 
> 	0=30;1=30

For MPAM, this seems supportable.  Each monitoring domain will have
some counters, and a well-defined number of them will be available for
allocation at any one time.

> d. Create few resctrl groups.
> 
> 	# mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
> 	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
> 	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
> 
> e. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>    to list and modify any group's monitoring states. File provides single place
>    to list monitoring states of all the resctrl groups. It makes it easier for
>    user space to learn about the used counters without needing to traverse all
>    the groups thus reducing the number of file system calls.
> 
> 	The list follows the following format:
> 
> 	"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
> 
> 	Format for specific type of groups:
> 
> 	* Default CTRL_MON group:
> 	 "//<domain_id>=<flags>"

[...]

>        Flags can be one of the following:
> 
>         t  MBM total event is enabled.
>         l  MBM local event is enabled.
>         tl Both total and local MBM events are enabled.
>         _  None of the MBM events are enabled
> 
> 	Examples:

[...]

I think that this basically works for MPAM.

The local/total distinction doesn't map in a consistent way onto MPAM,
but this problem is not specific to ABMC.  It feels sensible for ABMC
to be built around the same concepts that resctrl already has elsewhere
in the interface.  MPAM will do its best to fit (as already).

Regarding Peter's use case of assiging multiple counters to a
monitoring group [1], I feel that it's probably good enough to make
sure that the ABMC interface can be extended in future in a backwards
compatible way so as to support this, without trying to support it
immediately.

[1] https://lore.kernel.org/lkml/CALPaoCjY-3f2tWvBjuaQPfoPhxveWxxCxHqQMn4BEaeBXBa0bA@mail.gmail.com/


For example, if we added new generic "letters" -- say, "0" to "9",
combined with new counter files in resctrlfs, that feels like a
possible approach.  ABMC (as in this series) should just reject such
such assignments, and the new counter files wouldn't exist.

Availability of this feature could also be reported as a distinct mode
in mbm_assign_mode, say "mbm_cntr_generic", or whatever.


A _sketch_ of this follows.  This is NOT a proposal -- the key
question is whether we are confident that we can extend the interface
in this way in the future without breaking anything.

If "yes", then the ABMC interface (as proposed by this series) works as
a foundation to build on.

--8<--

[artists's impression]

# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
 	mbm_cntr_generic
 	[mbm_cntr_assign]
 	default

# echo mbm_cntr_generic >/sys/fs/resctrl/info/L3_MON/mbm_assign_mode
# echo '//0=01;1=23' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
# echo t >/sys/fs/resctrl/info/L3_MON/mbm_counter0_bytes_type 
# echo l >/sys/fs/resctrl/info/L3_MON/mbm_counter1_bytes_type 
# echo t >/sys/fs/resctrl/info/L3_MON/mbm_counter2_bytes_type 
# echo l >/sys/fs/resctrl/info/L3_MON/mbm_counter3_bytes_type 

...

# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_counter1_bytes

etc.

-->8--

Any thoughts on this, Peter?

[...]

Cheers
---Dave
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 12 months ago
Hi Dave,

On 2/12/25 9:46 AM, Dave Martin wrote:
> Hi there,
> 
> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>
>> This series adds the support for Assignable Bandwidth Monitoring Counters
>> (ABMC). It is also called QoS RMID Pinning feature
>>
>> Series is written such that it is easier to support other assignable
>> features supported from different vendors.
>>
>> The feature details are documented in the  APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC). The documentation is available at
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>
>> The patches are based on top of commit
>> d361b84d51bfe (tip/master) Merge branch into tip/master: 'x86/tdx'
>>
>> # Introduction
> 
> [...]
> 
>> # Examples
>>
>> a. Check if ABMC support is available
>> 	#mount -t resctrl resctrl /sys/fs/resctrl/
>>
>> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> 	[mbm_cntr_assign]
>> 	default
> 
> (Nit: can this be called "mbm_counter_assign"?  The name is already
> long, so I wonder whether anything is gained by using a cryptic
> abbreviation for "counter".  Same with all the "cntrs" elsewhere.
> This is purely cosmetic, though -- the interface works either way.)
> 
>> 	ABMC feature is detected and it is enabled.
>>
>> b. Check how many ABMC counters are available. 
>>
>> 	# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
>> 	32
> 
> Is this file needed?
> 
> With MPAM, it is more difficult to promise that the same number of
> counters will be available everywhere.
> 
> Rather than lie, or report a "safe" value here that may waste some
> counters, can we just allow the number of counters to be be discovered
> per domain via available_mbm_cntrs?

This sounds reasonable to me. I think us having trouble with the
user documentation of this file so late in development should also have been
a sign to rethink its value.

For a user to discover the number of counters supported via available_mbm_cntrs
would require the file's contents to be captured right after mount. Since we've
had scenarios where new userspace needs to discover an up-and-running system's
configuration this may not be possible. I thus wonder instead of removing
num_mbm_cntrs, it could be modified to return the per-domain supported counters
instead of a single value? 

> num_closids and num_rmids are already problematic for MPAM, so it would
> be good to avoid any more parameters of this sort from being reported
> to userspace unless there is a clear understanding of why they are
> needed.

Yes. Appreciate your help in identifying what could be problematic for MPAM.

> 
> Reporting number of counters per monitoring domain is a more natural
> fit for MPAM, as below:
> 
>> c. Check how many ABMC counters are available in each domain.
>>
>> 	# cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs 
>> 	0=30;1=30
> 
> For MPAM, this seems supportable.  Each monitoring domain will have
> some counters, and a well-defined number of them will be available for
> allocation at any one time.
> 
>> d. Create few resctrl groups.
>>
>> 	# mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
>> 	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
>> 	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
>>
>> e. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>    to list and modify any group's monitoring states. File provides single place
>>    to list monitoring states of all the resctrl groups. It makes it easier for
>>    user space to learn about the used counters without needing to traverse all
>>    the groups thus reducing the number of file system calls.
>>
>> 	The list follows the following format:
>>
>> 	"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>
>> 	Format for specific type of groups:
>>
>> 	* Default CTRL_MON group:
>> 	 "//<domain_id>=<flags>"
> 
> [...]
> 
>>        Flags can be one of the following:
>>
>>         t  MBM total event is enabled.
>>         l  MBM local event is enabled.
>>         tl Both total and local MBM events are enabled.
>>         _  None of the MBM events are enabled
>>
>> 	Examples:
> 
> [...]
> 
> I think that this basically works for MPAM.
> 
> The local/total distinction doesn't map in a consistent way onto MPAM,
> but this problem is not specific to ABMC.  It feels sensible for ABMC
> to be built around the same concepts that resctrl already has elsewhere
> in the interface.  MPAM will do its best to fit (as already).
> 
> Regarding Peter's use case of assiging multiple counters to a
> monitoring group [1], I feel that it's probably good enough to make
> sure that the ABMC interface can be extended in future in a backwards
> compatible way so as to support this, without trying to support it
> immediately.
> 
> [1] https://lore.kernel.org/lkml/CALPaoCjY-3f2tWvBjuaQPfoPhxveWxxCxHqQMn4BEaeBXBa0bA@mail.gmail.com/
> 

I do not think that resctrl's current support of the mbm_total_bytes and
mbm_local_bytes should be considered as the "only" two available "slots"
into which all possible events should be forced into. "mon_features" exists
to guide user space to which events are supported and as I see it new events
can be listed here to inform user space of their availability, with their
associated event files available in the resource groups.

> 
> For example, if we added new generic "letters" -- say, "0" to "9",
> combined with new counter files in resctrlfs, that feels like a
> possible approach.  ABMC (as in this series) should just reject such
> such assignments, and the new counter files wouldn't exist.
> 
> Availability of this feature could also be reported as a distinct mode
> in mbm_assign_mode, say "mbm_cntr_generic", or whatever.
> 
> 
> A _sketch_ of this follows.  This is NOT a proposal -- the key
> question is whether we are confident that we can extend the interface
> in this way in the future without breaking anything.
> 
> If "yes", then the ABMC interface (as proposed by this series) works as
> a foundation to build on.
> 
> --8<--
> 
> [artists's impression]
> 
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>  	mbm_cntr_generic
>  	[mbm_cntr_assign]
>  	default
> 
> # echo mbm_cntr_generic >/sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> # echo '//0=01;1=23' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> # echo t >/sys/fs/resctrl/info/L3_MON/mbm_counter0_bytes_type 
> # echo l >/sys/fs/resctrl/info/L3_MON/mbm_counter1_bytes_type 
> # echo t >/sys/fs/resctrl/info/L3_MON/mbm_counter2_bytes_type 
> # echo l >/sys/fs/resctrl/info/L3_MON/mbm_counter3_bytes_type 
> 
> ...
> 
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_counter1_bytes
> 
> etc.
> 

It is not clear to me what additional features such an interface enables. It
also looks like user space will need to track and manage counter IDs?

It sounds to me as though the issue starts with your statement
"The local/total distinction doesn't map in a consistent way onto MPAM". To
address this I expect that an MPAM system will not support nor list
mbm_total_bytes and/or mbm_local_bytes in its mon_features file (*)? Instead,
it would list the events that are appropriate to the system? Trying to match
with what Peter said [1] in the message you refer to, this may be possible:

# cat /sys/fs/resctrl/info/L3_MON/mon_features
mbm_local_read_bytes
mbm_local_write_bytes
mbm_local_bytes

(*) I am including mbm_local_bytes since it could be an event that can be software
defined as a sum of mbm_local_read_bytes and mbm_local_write_bytes when they are both
counted.

I see the support for MPAM events distinct from the support of assignable counters.
Once the MPAM events are sorted, I think that they can be assigned with existing interface.
Please help me understand if you see it differently.
	
Doing so would need to come up with alphabetical letters for these events,
which seems to be needed for your proposal also? If we use possible flags of:

mbm_local_read_bytes a
mbm_local_write_bytes b

Then mbm_assign_control can be used as:
# echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
<value>
# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
<sum of mbm_local_read_bytes and mbm_local_write_bytes>

One issue would be when resctrl needs to support more than 26 events (no more flags available),
assuming that upper case would be used for "shared" counters (unless this interface is defined
differently and only few uppercase letters used for it). Would this be too low of a limit?

Reinette

[1] https://lore.kernel.org/lkml/CALPaoCjY-3f2tWvBjuaQPfoPhxveWxxCxHqQMn4BEaeBXBa0bA@mail.gmail.com/
RE: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Luck, Tony 12 months ago
> I do not think that resctrl's current support of the mbm_total_bytes and
> mbm_local_bytes should be considered as the "only" two available "slots"
> into which all possible events should be forced into. "mon_features" exists
> to guide user space to which events are supported and as I see it new events
> can be listed here to inform user space of their availability, with their
> associated event files available in the resource groups.

100%  I have a number of "events" in the pipeline that do not fit these
names. I'm planning on new files with descriptive[1] names for the events
they report.

-Tony

[1] When these are ready to post we can discuss the names I chose and
change them if there are better names that work across architectures.
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Dave Martin 12 months ago
Hi Tony,

On Thu, Feb 13, 2025 at 12:11:13AM +0000, Luck, Tony wrote:
> > I do not think that resctrl's current support of the mbm_total_bytes and
> > mbm_local_bytes should be considered as the "only" two available "slots"
> > into which all possible events should be forced into. "mon_features" exists
> > to guide user space to which events are supported and as I see it new events
> > can be listed here to inform user space of their availability, with their
> > associated event files available in the resource groups.
> 
> 100%  I have a number of "events" in the pipeline that do not fit these
> names. I'm planning on new files with descriptive[1] names for the events
> they report.
> 
> -Tony
> 
> [1] When these are ready to post we can discuss the names I chose and
> change them if there are better names that work across architectures.

Do any of the approaches discussed in [2] look viable for this?

(Ideally, reply over there.)

Cheers
---Dave

[2] https://lore.kernel.org/lkml/Z64tw2NbJXbKpLrH@e133380.arm.com/
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 12 months ago
-Fenghua (his email address does not work anymore)

On 2/12/25 3:33 PM, Reinette Chatre wrote:
> Hi Dave,
> 
> On 2/12/25 9:46 AM, Dave Martin wrote:
>> Hi there,
>>
>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>
>>> This series adds the support for Assignable Bandwidth Monitoring Counters
>>> (ABMC). It is also called QoS RMID Pinning feature
>>>
>>> Series is written such that it is easier to support other assignable
>>> features supported from different vendors.
>>>
>>> The feature details are documented in the  APM listed below [1].
>>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>>> Monitoring (ABMC). The documentation is available at
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>>
>>> The patches are based on top of commit
>>> d361b84d51bfe (tip/master) Merge branch into tip/master: 'x86/tdx'
>>>
>>> # Introduction
>>
>> [...]
>>
>>> # Examples
>>>
>>> a. Check if ABMC support is available
>>> 	#mount -t resctrl resctrl /sys/fs/resctrl/
>>>
>>> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>> 	[mbm_cntr_assign]
>>> 	default
>>
>> (Nit: can this be called "mbm_counter_assign"?  The name is already
>> long, so I wonder whether anything is gained by using a cryptic
>> abbreviation for "counter".  Same with all the "cntrs" elsewhere.
>> This is purely cosmetic, though -- the interface works either way.)
>>
>>> 	ABMC feature is detected and it is enabled.
>>>
>>> b. Check how many ABMC counters are available. 
>>>
>>> 	# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
>>> 	32
>>
>> Is this file needed?
>>
>> With MPAM, it is more difficult to promise that the same number of
>> counters will be available everywhere.
>>
>> Rather than lie, or report a "safe" value here that may waste some
>> counters, can we just allow the number of counters to be be discovered
>> per domain via available_mbm_cntrs?
> 
> This sounds reasonable to me. I think us having trouble with the
> user documentation of this file so late in development should also have been
> a sign to rethink its value.
> 
> For a user to discover the number of counters supported via available_mbm_cntrs
> would require the file's contents to be captured right after mount. Since we've
> had scenarios where new userspace needs to discover an up-and-running system's
> configuration this may not be possible. I thus wonder instead of removing
> num_mbm_cntrs, it could be modified to return the per-domain supported counters
> instead of a single value? 
> 
>> num_closids and num_rmids are already problematic for MPAM, so it would
>> be good to avoid any more parameters of this sort from being reported
>> to userspace unless there is a clear understanding of why they are
>> needed.
> 
> Yes. Appreciate your help in identifying what could be problematic for MPAM.
> 
>>
>> Reporting number of counters per monitoring domain is a more natural
>> fit for MPAM, as below:
>>
>>> c. Check how many ABMC counters are available in each domain.
>>>
>>> 	# cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs 
>>> 	0=30;1=30
>>
>> For MPAM, this seems supportable.  Each monitoring domain will have
>> some counters, and a well-defined number of them will be available for
>> allocation at any one time.
>>
>>> d. Create few resctrl groups.
>>>
>>> 	# mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
>>> 	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
>>> 	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
>>>
>>> e. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>    to list and modify any group's monitoring states. File provides single place
>>>    to list monitoring states of all the resctrl groups. It makes it easier for
>>>    user space to learn about the used counters without needing to traverse all
>>>    the groups thus reducing the number of file system calls.
>>>
>>> 	The list follows the following format:
>>>
>>> 	"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>>
>>> 	Format for specific type of groups:
>>>
>>> 	* Default CTRL_MON group:
>>> 	 "//<domain_id>=<flags>"
>>
>> [...]
>>
>>>        Flags can be one of the following:
>>>
>>>         t  MBM total event is enabled.
>>>         l  MBM local event is enabled.
>>>         tl Both total and local MBM events are enabled.
>>>         _  None of the MBM events are enabled
>>>
>>> 	Examples:
>>
>> [...]
>>
>> I think that this basically works for MPAM.
>>
>> The local/total distinction doesn't map in a consistent way onto MPAM,
>> but this problem is not specific to ABMC.  It feels sensible for ABMC
>> to be built around the same concepts that resctrl already has elsewhere
>> in the interface.  MPAM will do its best to fit (as already).
>>
>> Regarding Peter's use case of assiging multiple counters to a
>> monitoring group [1], I feel that it's probably good enough to make
>> sure that the ABMC interface can be extended in future in a backwards
>> compatible way so as to support this, without trying to support it
>> immediately.
>>
>> [1] https://lore.kernel.org/lkml/CALPaoCjY-3f2tWvBjuaQPfoPhxveWxxCxHqQMn4BEaeBXBa0bA@mail.gmail.com/
>>
> 
> I do not think that resctrl's current support of the mbm_total_bytes and
> mbm_local_bytes should be considered as the "only" two available "slots"
> into which all possible events should be forced into. "mon_features" exists
> to guide user space to which events are supported and as I see it new events
> can be listed here to inform user space of their availability, with their
> associated event files available in the resource groups.
> 
>>
>> For example, if we added new generic "letters" -- say, "0" to "9",
>> combined with new counter files in resctrlfs, that feels like a
>> possible approach.  ABMC (as in this series) should just reject such
>> such assignments, and the new counter files wouldn't exist.
>>
>> Availability of this feature could also be reported as a distinct mode
>> in mbm_assign_mode, say "mbm_cntr_generic", or whatever.
>>
>>
>> A _sketch_ of this follows.  This is NOT a proposal -- the key
>> question is whether we are confident that we can extend the interface
>> in this way in the future without breaking anything.
>>
>> If "yes", then the ABMC interface (as proposed by this series) works as
>> a foundation to build on.
>>
>> --8<--
>>
>> [artists's impression]
>>
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>  	mbm_cntr_generic
>>  	[mbm_cntr_assign]
>>  	default
>>
>> # echo mbm_cntr_generic >/sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> # echo '//0=01;1=23' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> # echo t >/sys/fs/resctrl/info/L3_MON/mbm_counter0_bytes_type 
>> # echo l >/sys/fs/resctrl/info/L3_MON/mbm_counter1_bytes_type 
>> # echo t >/sys/fs/resctrl/info/L3_MON/mbm_counter2_bytes_type 
>> # echo l >/sys/fs/resctrl/info/L3_MON/mbm_counter3_bytes_type 
>>
>> ...
>>
>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_counter1_bytes
>>
>> etc.
>>
> 
> It is not clear to me what additional features such an interface enables. It
> also looks like user space will need to track and manage counter IDs?
> 
> It sounds to me as though the issue starts with your statement
> "The local/total distinction doesn't map in a consistent way onto MPAM". To
> address this I expect that an MPAM system will not support nor list
> mbm_total_bytes and/or mbm_local_bytes in its mon_features file (*)? Instead,
> it would list the events that are appropriate to the system? Trying to match
> with what Peter said [1] in the message you refer to, this may be possible:
> 
> # cat /sys/fs/resctrl/info/L3_MON/mon_features
> mbm_local_read_bytes
> mbm_local_write_bytes
> mbm_local_bytes
> 
> (*) I am including mbm_local_bytes since it could be an event that can be software
> defined as a sum of mbm_local_read_bytes and mbm_local_write_bytes when they are both
> counted.
> 
> I see the support for MPAM events distinct from the support of assignable counters.
> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
> Please help me understand if you see it differently.
> 	
> Doing so would need to come up with alphabetical letters for these events,
> which seems to be needed for your proposal also? If we use possible flags of:
> 
> mbm_local_read_bytes a
> mbm_local_write_bytes b
> 
> Then mbm_assign_control can be used as:
> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
> <value>
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
> 
> One issue would be when resctrl needs to support more than 26 events (no more flags available),
> assuming that upper case would be used for "shared" counters (unless this interface is defined
> differently and only few uppercase letters used for it). Would this be too low of a limit?
> 
> Reinette
> 
> [1] https://lore.kernel.org/lkml/CALPaoCjY-3f2tWvBjuaQPfoPhxveWxxCxHqQMn4BEaeBXBa0bA@mail.gmail.com/
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Dave Martin 12 months ago
Hi Reinette,

On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
> Hi Dave,
> 
> On 2/12/25 9:46 AM, Dave Martin wrote:
> > Hi there,
> > 
> > On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
> >>
> >> This series adds the support for Assignable Bandwidth Monitoring Counters
> >> (ABMC). It is also called QoS RMID Pinning feature
> >>
> >> Series is written such that it is easier to support other assignable
> >> features supported from different vendors.
> >>
> >> The feature details are documented in the  APM listed below [1].
> >> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> >> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> >> Monitoring (ABMC). The documentation is available at
> >> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> >>
> >> The patches are based on top of commit
> >> d361b84d51bfe (tip/master) Merge branch into tip/master: 'x86/tdx'

[...]

> >> b. Check how many ABMC counters are available. 
> >>
> >> 	# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
> >> 	32
> > 
> > Is this file needed?
> > 
> > With MPAM, it is more difficult to promise that the same number of
> > counters will be available everywhere.
> > 
> > Rather than lie, or report a "safe" value here that may waste some
> > counters, can we just allow the number of counters to be be discovered
> > per domain via available_mbm_cntrs?
> 
> This sounds reasonable to me. I think us having trouble with the
> user documentation of this file so late in development should also have been
> a sign to rethink its value.
> 
> For a user to discover the number of counters supported via available_mbm_cntrs
> would require the file's contents to be captured right after mount. Since we've
> had scenarios where new userspace needs to discover an up-and-running system's
> configuration this may not be possible. I thus wonder instead of removing
> num_mbm_cntrs, it could be modified to return the per-domain supported counters
> instead of a single value? 

Is it actually useful to be able to discover the number of counters
that exist?  A counter that exists but is not available cannot be used,
so perhaps it is not useful to know about it in the first place.

But if we keep this file but make it report the number of counters for
each domain (similarly to mbm_available_cntrs), then I think the MPAM
driver should be able to work with that.

> > num_closids and num_rmids are already problematic for MPAM, so it would
> > be good to avoid any more parameters of this sort from being reported
> > to userspace unless there is a clear understanding of why they are
> > needed.
> 
> Yes. Appreciate your help in identifying what could be problematic for MPAM.

For clarity: this is a background issue, mostly orthogonal to this
series.

If this series is merged as-is, with a global per-resource
num_mbm_cntrs property, then this not really worse than the current
situation -- it's just a bit annoying from the MPAM perspective.


In a nutshell, the num_closids / num_rmids parameters seem to expose
RDT-specific hardware semantics to userspace, implying a specific
allocation model for control group and monitoring group identifiers.

The guarantees that userspace is entitled to asssume when resctrl
reports particular values do not seem to be well described and are hard
to map onto the nearest-equivalent MPAM implementation.  A combination
of control and monitoring groups that can be created on x86 may not be
creatable on MPAM, even when the number of supportable control and
monitoring partitions is the same.

Even with the ABMC series, we may still be constrained on what we can
report for num_rmids: we can't know in advance whether or not the user
is going to use mbm_cntr_assign mode -- if not, we can't promise to
create more monitoring groups than the number of counters in the
hardware.


It seems natural for the counts reported by "available_mbm_cntrs" to
change dynamically when the ABMC assignment mode is changed, but I
think userspace are likely to expect the global "num_rmids" parameters
to be fixed for the lifetime of the resctrl mount (and possibly fixed
for all time on a given hardware platform -- at least, modulo CDP).


I think it might be possible to tighten up the docmentation of
num_closids in particular in a way that doesn't conflict with x86 and
may make it easier for MPAM to fit in with, but that feels like a
separate conversation.

None of this should be considered a blocker for this series, either way.

> > 
> > Reporting number of counters per monitoring domain is a more natural
> > fit for MPAM, as below:
> > 
> >> c. Check how many ABMC counters are available in each domain.
> >>
> >> 	# cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs 
> >> 	0=30;1=30
> > 
> > For MPAM, this seems supportable.  Each monitoring domain will have
> > some counters, and a well-defined number of them will be available for
> > allocation at any one time.

[...]

> >> e. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control

[...]

> >>        Flags can be one of the following:
> >>
> >>         t  MBM total event is enabled.
> >>         l  MBM local event is enabled.
> >>         tl Both total and local MBM events are enabled.
> >>         _  None of the MBM events are enabled
> >>
> >> 	Examples:
> > 
> > [...]
> > 
> > I think that this basically works for MPAM.
> > 
> > The local/total distinction doesn't map in a consistent way onto MPAM,
> > but this problem is not specific to ABMC.  It feels sensible for ABMC
> > to be built around the same concepts that resctrl already has elsewhere
> > in the interface.  MPAM will do its best to fit (as already).
> > 
> > Regarding Peter's use case of assiging multiple counters to a
> > monitoring group [1], I feel that it's probably good enough to make
> > sure that the ABMC interface can be extended in future in a backwards
> > compatible way so as to support this, without trying to support it
> > immediately.
> > 
> > [1] https://lore.kernel.org/lkml/CALPaoCjY-3f2tWvBjuaQPfoPhxveWxxCxHqQMn4BEaeBXBa0bA@mail.gmail.com/
> > 
> 
> I do not think that resctrl's current support of the mbm_total_bytes and
> mbm_local_bytes should be considered as the "only" two available "slots"
> into which all possible events should be forced into. "mon_features" exists
> to guide user space to which events are supported and as I see it new events
> can be listed here to inform user space of their availability, with their
> associated event files available in the resource groups.

That's fair.  I wasn't currently sure how (or if) the set of countable
events was expected to grow / evolve via this route.

Either way, I think this confirms that there is at least one viable way
to enable more counters for a single control group, on top of this
series.

(If there is more than one way, that seems fine?)

> > 
> > For example, if we added new generic "letters" -- say, "0" to "9",
> > combined with new counter files in resctrlfs, that feels like a
> > possible approach.  ABMC (as in this series) should just reject such
> > such assignments, and the new counter files wouldn't exist.
> > 
> > Availability of this feature could also be reported as a distinct mode
> > in mbm_assign_mode, say "mbm_cntr_generic", or whatever.
> > 
> > 
> > A _sketch_ of this follows.  This is NOT a proposal -- the key
> > question is whether we are confident that we can extend the interface
> > in this way in the future without breaking anything.
> > 
> > If "yes", then the ABMC interface (as proposed by this series) works as
> > a foundation to build on.
> > 
> > --8<--
> > 
> > [artists's impression]
> > 
> > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> >  	mbm_cntr_generic
> >  	[mbm_cntr_assign]
> >  	default
> > 
> > # echo mbm_cntr_generic >/sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> > # echo '//0=01;1=23' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> > # echo t >/sys/fs/resctrl/info/L3_MON/mbm_counter0_bytes_type 
> > # echo l >/sys/fs/resctrl/info/L3_MON/mbm_counter1_bytes_type 
> > # echo t >/sys/fs/resctrl/info/L3_MON/mbm_counter2_bytes_type 
> > # echo l >/sys/fs/resctrl/info/L3_MON/mbm_counter3_bytes_type 
> > 
> > ...
> > 
> > # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_counter1_bytes
> > 
> > etc.
> > 
> 
> It is not clear to me what additional features such an interface enables. It
> also looks like user space will need to track and manage counter IDs?

My idea was that for these generic counters, new files could be exposed
to configure what they actually count (the ..._type files shown above;
or possibly via the ..._config files that already exist).

The "IDs" were inteded as abstract; the number only relates the
assignments in mbm_assign_control to the files created elsewhere.  This
wouldn't be related to IDs assigned by the hardware.

If there are multiple resctrl users then using numeric IDs might be
problematic; though if we go eventually in the direction of making
resctrlfs multi-mountable then each mount could have its own namespace.

Allowing counters to be named and configured with a mkdir()-style
interface might be possible too; that might make it easier for users to
coexist within a single resctrl mount (if we think that's important
enough).

> It sounds to me as though the issue starts with your statement
> "The local/total distinction doesn't map in a consistent way onto MPAM". To
> address this I expect that an MPAM system will not support nor list
> mbm_total_bytes and/or mbm_local_bytes in its mon_features file (*)? Instead,
> it would list the events that are appropriate to the system? Trying to match
> with what Peter said [1] in the message you refer to, this may be possible:
> 
> # cat /sys/fs/resctrl/info/L3_MON/mon_features
> mbm_local_read_bytes
> mbm_local_write_bytes
> mbm_local_bytes
> 
> (*) I am including mbm_local_bytes since it could be an event that can be software
> defined as a sum of mbm_local_read_bytes and mbm_local_write_bytes when they are both
> counted.
> 
> I see the support for MPAM events distinct from the support of assignable counters.
> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
> Please help me understand if you see it differently.
> 	
> Doing so would need to come up with alphabetical letters for these events,
> which seems to be needed for your proposal also? If we use possible flags of:
> 
> mbm_local_read_bytes a
> mbm_local_write_bytes b
> 
> Then mbm_assign_control can be used as:
> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
> <value>
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
> 
> One issue would be when resctrl needs to support more than 26 events (no more flags available),
> assuming that upper case would be used for "shared" counters (unless this interface is defined
> differently and only few uppercase letters used for it). Would this be too low of a limit?
> 
> Reinette
> 
> [1] https://lore.kernel.org/lkml/CALPaoCjY-3f2tWvBjuaQPfoPhxveWxxCxHqQMn4BEaeBXBa0bA@mail.gmail.com/

That approach would also work, where an MPAM system has events are not
a reasonable approximation of the generic "total" or "local".

For now we would probably stick with "total" and "local" anyway though,
because the MPAM architecture doesn't natively allow the mapping onto
the memory system topology to be discovered, and the information in
ACPI / device tree is insufficient to tell us everything we'd need to
know.  But I guess what counts as "local" in particular will be quite
hardware and topology dependent even on x86, so perhaps we shouldn't
worry about having the behaviour match exactly (?)

Regarding the code letters, my idea was that the event type might be
configured by a separate file, instead of in mbm_assign_control
directly, in which case running out of letters wouldn't be a problem.

Alternatively, if we want to be able to expand beyond single letters,
could we reserve one or more characters for extension purposes?

If braces are forbidden by the syntax today, could we add support for
something like the following later on, without breaking anything?

# echo '//0={foo}{bar};1={bar}' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control


For now, my main concern would be whether this series prevents that
sort of thing being added in a backwards compatible way later.

I don't really see anything that is a blocker.

What do you think?

Cheers
---Dave
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 12 months ago
Hi Dave,

On 2/13/25 9:37 AM, Dave Martin wrote:
> Hi Reinette,
> 
> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>> Hi Dave,
>>
>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>> Hi there,
>>>
>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>
>>>> This series adds the support for Assignable Bandwidth Monitoring Counters
>>>> (ABMC). It is also called QoS RMID Pinning feature
>>>>
>>>> Series is written such that it is easier to support other assignable
>>>> features supported from different vendors.
>>>>
>>>> The feature details are documented in the  APM listed below [1].
>>>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>>>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>>>> Monitoring (ABMC). The documentation is available at
>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>>>
>>>> The patches are based on top of commit
>>>> d361b84d51bfe (tip/master) Merge branch into tip/master: 'x86/tdx'
> 
> [...]
> 
>>>> b. Check how many ABMC counters are available. 
>>>>
>>>> 	# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
>>>> 	32
>>>
>>> Is this file needed?
>>>
>>> With MPAM, it is more difficult to promise that the same number of
>>> counters will be available everywhere.
>>>
>>> Rather than lie, or report a "safe" value here that may waste some
>>> counters, can we just allow the number of counters to be be discovered
>>> per domain via available_mbm_cntrs?
>>
>> This sounds reasonable to me. I think us having trouble with the
>> user documentation of this file so late in development should also have been
>> a sign to rethink its value.
>>
>> For a user to discover the number of counters supported via available_mbm_cntrs
>> would require the file's contents to be captured right after mount. Since we've
>> had scenarios where new userspace needs to discover an up-and-running system's
>> configuration this may not be possible. I thus wonder instead of removing
>> num_mbm_cntrs, it could be modified to return the per-domain supported counters
>> instead of a single value? 
> 
> Is it actually useful to be able to discover the number of counters
> that exist?  A counter that exists but is not available cannot be used,
> so perhaps it is not useful to know about it in the first place.

An alternative perspective of what "available" means is "how many counters
could I possibly get to do this new monitoring task". A user may be willing
to re-assign counters if the new monitoring task is important. Knowing
how many counters are already free and available for assignment would be
easy from available_mbm_cntrs but to get an idea of how many counters
could be re-assigned to help out with the new task would require
some intricate parsing of mbm_assign_control.


> But if we keep this file but make it report the number of counters for
> each domain (similarly to mbm_available_cntrs), then I think the MPAM
> driver should be able to work with that.
> 
>>> num_closids and num_rmids are already problematic for MPAM, so it would
>>> be good to avoid any more parameters of this sort from being reported
>>> to userspace unless there is a clear understanding of why they are
>>> needed.
>>
>> Yes. Appreciate your help in identifying what could be problematic for MPAM.
> 
> For clarity: this is a background issue, mostly orthogonal to this
> series.
> 
> If this series is merged as-is, with a global per-resource
> num_mbm_cntrs property, then this not really worse than the current
> situation -- it's just a bit annoying from the MPAM perspective.
> 
> 
> In a nutshell, the num_closids / num_rmids parameters seem to expose
> RDT-specific hardware semantics to userspace, implying a specific
> allocation model for control group and monitoring group identifiers.
> 
> The guarantees that userspace is entitled to asssume when resctrl
> reports particular values do not seem to be well described and are hard
> to map onto the nearest-equivalent MPAM implementation.  A combination
> of control and monitoring groups that can be created on x86 may not be
> creatable on MPAM, even when the number of supportable control and
> monitoring partitions is the same.

I understand. This interface was created almost a decade ago. It would have been
wonderful if the user interface could have been created with a clear vision
of all the use cases it would end up needing to support. I am trying to be
very careful with this new user interface as I try to consider all the things I
learned while working on resctrl. All help get this new interface right is
greatly appreciated.

Since your specifically mention issues that MPAM has with num_rmids, please
note that we have been trying (see [1], but maybe start reading thread at [2])
to find ways to make this work with MPAM but no word from MPAM side. 
I see that you were not cc'd on the discussion so this is not a criticism of
you personally but I would like to highlight that we do try to make things
work well for MPAM but so far this work seems ignored, yet critisized
for not being done. I expect the more use cases are thrown at an interface
as it is developed the better it would get and I would gladly work with MPAM
folks to improve things.

> Even with the ABMC series, we may still be constrained on what we can
> report for num_rmids: we can't know in advance whether or not the user
> is going to use mbm_cntr_assign mode -- if not, we can't promise to
> create more monitoring groups than the number of counters in the
> hardware.

It is the architecture that decides which modes are supported and
which is default.

> It seems natural for the counts reported by "available_mbm_cntrs" to
> change dynamically when the ABMC assignment mode is changed, but I
> think userspace are likely to expect the global "num_rmids" parameters
> to be fixed for the lifetime of the resctrl mount (and possibly fixed
> for all time on a given hardware platform -- at least, modulo CDP).
> 
> 
> I think it might be possible to tighten up the docmentation of
> num_closids in particular in a way that doesn't conflict with x86 and
> may make it easier for MPAM to fit in with, but that feels like a
> separate conversation.
> 
> None of this should be considered a blocker for this series, either way.
> 
>>>
>>> Reporting number of counters per monitoring domain is a more natural
>>> fit for MPAM, as below:
>>>
>>>> c. Check how many ABMC counters are available in each domain.
>>>>
>>>> 	# cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs 
>>>> 	0=30;1=30
>>>
>>> For MPAM, this seems supportable.  Each monitoring domain will have
>>> some counters, and a well-defined number of them will be available for
>>> allocation at any one time.
> 
> [...]
> 
>>>> e. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> 
> [...]
> 
>>>>        Flags can be one of the following:
>>>>
>>>>         t  MBM total event is enabled.
>>>>         l  MBM local event is enabled.
>>>>         tl Both total and local MBM events are enabled.
>>>>         _  None of the MBM events are enabled
>>>>
>>>> 	Examples:
>>>
>>> [...]
>>>
>>> I think that this basically works for MPAM.
>>>
>>> The local/total distinction doesn't map in a consistent way onto MPAM,
>>> but this problem is not specific to ABMC.  It feels sensible for ABMC
>>> to be built around the same concepts that resctrl already has elsewhere
>>> in the interface.  MPAM will do its best to fit (as already).
>>>
>>> Regarding Peter's use case of assiging multiple counters to a
>>> monitoring group [1], I feel that it's probably good enough to make
>>> sure that the ABMC interface can be extended in future in a backwards
>>> compatible way so as to support this, without trying to support it
>>> immediately.
>>>
>>> [1] https://lore.kernel.org/lkml/CALPaoCjY-3f2tWvBjuaQPfoPhxveWxxCxHqQMn4BEaeBXBa0bA@mail.gmail.com/
>>>
>>
>> I do not think that resctrl's current support of the mbm_total_bytes and
>> mbm_local_bytes should be considered as the "only" two available "slots"
>> into which all possible events should be forced into. "mon_features" exists
>> to guide user space to which events are supported and as I see it new events
>> can be listed here to inform user space of their availability, with their
>> associated event files available in the resource groups.
> 
> That's fair.  I wasn't currently sure how (or if) the set of countable
> events was expected to grow / evolve via this route.
> 
> Either way, I think this confirms that there is at least one viable way
> to enable more counters for a single control group, on top of this
> series.
> 
> (If there is more than one way, that seems fine?)
> 
>>>
>>> For example, if we added new generic "letters" -- say, "0" to "9",
>>> combined with new counter files in resctrlfs, that feels like a
>>> possible approach.  ABMC (as in this series) should just reject such
>>> such assignments, and the new counter files wouldn't exist.
>>>
>>> Availability of this feature could also be reported as a distinct mode
>>> in mbm_assign_mode, say "mbm_cntr_generic", or whatever.
>>>
>>>
>>> A _sketch_ of this follows.  This is NOT a proposal -- the key
>>> question is whether we are confident that we can extend the interface
>>> in this way in the future without breaking anything.
>>>
>>> If "yes", then the ABMC interface (as proposed by this series) works as
>>> a foundation to build on.
>>>
>>> --8<--
>>>
>>> [artists's impression]
>>>
>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>>  	mbm_cntr_generic
>>>  	[mbm_cntr_assign]
>>>  	default
>>>
>>> # echo mbm_cntr_generic >/sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>> # echo '//0=01;1=23' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>> # echo t >/sys/fs/resctrl/info/L3_MON/mbm_counter0_bytes_type 
>>> # echo l >/sys/fs/resctrl/info/L3_MON/mbm_counter1_bytes_type 
>>> # echo t >/sys/fs/resctrl/info/L3_MON/mbm_counter2_bytes_type 
>>> # echo l >/sys/fs/resctrl/info/L3_MON/mbm_counter3_bytes_type 
>>>
>>> ...
>>>
>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_counter1_bytes
>>>
>>> etc.
>>>
>>
>> It is not clear to me what additional features such an interface enables. It
>> also looks like user space will need to track and manage counter IDs?
> 
> My idea was that for these generic counters, new files could be exposed
> to configure what they actually count (the ..._type files shown above;
> or possibly via the ..._config files that already exist).
> 
> The "IDs" were inteded as abstract; the number only relates the
> assignments in mbm_assign_control to the files created elsewhere.  This
> wouldn't be related to IDs assigned by the hardware.

I see. Yes, this sounds related to and a generalization of the AMD
configurable event feature.

> 
> If there are multiple resctrl users then using numeric IDs might be
> problematic; though if we go eventually in the direction of making
> resctrlfs multi-mountable then each mount could have its own namespace.

I am not aware of "multi-mountable" direction.

> 
> Allowing counters to be named and configured with a mkdir()-style
> interface might be possible too; that might make it easier for users to
> coexist within a single resctrl mount (if we think that's important
> enough).
> 
>> It sounds to me as though the issue starts with your statement
>> "The local/total distinction doesn't map in a consistent way onto MPAM". To
>> address this I expect that an MPAM system will not support nor list
>> mbm_total_bytes and/or mbm_local_bytes in its mon_features file (*)? Instead,
>> it would list the events that are appropriate to the system? Trying to match
>> with what Peter said [1] in the message you refer to, this may be possible:
>>
>> # cat /sys/fs/resctrl/info/L3_MON/mon_features
>> mbm_local_read_bytes
>> mbm_local_write_bytes
>> mbm_local_bytes
>>
>> (*) I am including mbm_local_bytes since it could be an event that can be software
>> defined as a sum of mbm_local_read_bytes and mbm_local_write_bytes when they are both
>> counted.
>>
>> I see the support for MPAM events distinct from the support of assignable counters.
>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>> Please help me understand if you see it differently.
>> 	
>> Doing so would need to come up with alphabetical letters for these events,
>> which seems to be needed for your proposal also? If we use possible flags of:
>>
>> mbm_local_read_bytes a
>> mbm_local_write_bytes b
>>
>> Then mbm_assign_control can be used as:
>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>> <value>
>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>
>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>
>> Reinette
>>
>> [1] https://lore.kernel.org/lkml/CALPaoCjY-3f2tWvBjuaQPfoPhxveWxxCxHqQMn4BEaeBXBa0bA@mail.gmail.com/
> 
> That approach would also work, where an MPAM system has events are not
> a reasonable approximation of the generic "total" or "local".
> 
> For now we would probably stick with "total" and "local" anyway though,
> because the MPAM architecture doesn't natively allow the mapping onto
> the memory system topology to be discovered, and the information in
> ACPI / device tree is insufficient to tell us everything we'd need to
> know.  But I guess what counts as "local" in particular will be quite
> hardware and topology dependent even on x86, so perhaps we shouldn't
> worry about having the behaviour match exactly (?)
> 
> Regarding the code letters, my idea was that the event type might be
> configured by a separate file, instead of in mbm_assign_control
> directly, in which case running out of letters wouldn't be a problem.

This work started with individual files for counters but the issue was
raised that this will require a large number of filesystem calls when, for
example, a user wants to move a group of counters associated with the events
of one set of monitoring groups to another set of monitoring groups. This
is for the use case where there are a significant number of monitor groups
for which there are not sufficient counters. With mbm_assign_control this
can be done in a single write and such a monitoring transition can thus
be accomplished more efficiently.

> 
> Alternatively, if we want to be able to expand beyond single letters,
> could we reserve one or more characters for extension purposes?
> 
> If braces are forbidden by the syntax today, could we add support for
> something like the following later on, without breaking anything?
> 
> # echo '//0={foo}{bar};1={bar}' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> 

Thank you for the suggestion. I think we may need something like this.
Babu, what do you think?

> 
> For now, my main concern would be whether this series prevents that
> sort of thing being added in a backwards compatible way later.
> 
> I don't really see anything that is a blocker.
> 
> What do you think?

I do not fully understand the MPAM counter feature. It almost sounds like
every counter could be configured independently with the expectation to
configure and assign each counter independently to a domain. As I understand
these capabilities match AMD's ABMC feature, but the planned implementation
to support ABMC first configures events per-domain and then assign these
events to counters. hmmm ... but in your example a file like
"mbm_counter0_bytes_type" is global. Could you please elaborate how in
your example writing a single letter to that file will be interpreted?


Reinette

[1] https://lore.kernel.org/lkml/46767ca7-1f1b-48e8-8ce6-be4b00d129f9@intel.com/
[2] https://lore.kernel.org/lkml/CALPaoChad6=xqz+BQQd=dB915xhj1gusmcrS9ya+T2GyhTQc5Q@mail.gmail.com/
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 12 months ago
Hi Dave/Reinette,

On 2/14/2025 12:26 AM, Reinette Chatre wrote:
> Hi Dave,
> 
> On 2/13/25 9:37 AM, Dave Martin wrote:
>> Hi Reinette,
>>
>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>> Hi Dave,
>>>
>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>> Hi there,
>>>>
>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>
>>>>> This series adds the support for Assignable Bandwidth Monitoring Counters
>>>>> (ABMC). It is also called QoS RMID Pinning feature
>>>>>
>>>>> Series is written such that it is easier to support other assignable
>>>>> features supported from different vendors.
>>>>>
>>>>> The feature details are documented in the  APM listed below [1].
>>>>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>>>>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>>>>> Monitoring (ABMC). The documentation is available at
>>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>>>>
>>>>> The patches are based on top of commit
>>>>> d361b84d51bfe (tip/master) Merge branch into tip/master: 'x86/tdx'
>>
>> [...]
>>
>>>>> b. Check how many ABMC counters are available.
>>>>>
>>>>> 	# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>>>>> 	32
>>>>
>>>> Is this file needed?
>>>>
>>>> With MPAM, it is more difficult to promise that the same number of
>>>> counters will be available everywhere.
>>>>
>>>> Rather than lie, or report a "safe" value here that may waste some
>>>> counters, can we just allow the number of counters to be be discovered
>>>> per domain via available_mbm_cntrs?
>>>
>>> This sounds reasonable to me. I think us having trouble with the
>>> user documentation of this file so late in development should also have been
>>> a sign to rethink its value.
>>>
>>> For a user to discover the number of counters supported via available_mbm_cntrs
>>> would require the file's contents to be captured right after mount. Since we've
>>> had scenarios where new userspace needs to discover an up-and-running system's
>>> configuration this may not be possible. I thus wonder instead of removing
>>> num_mbm_cntrs, it could be modified to return the per-domain supported counters
>>> instead of a single value?
>>
>> Is it actually useful to be able to discover the number of counters
>> that exist?  A counter that exists but is not available cannot be used,
>> so perhaps it is not useful to know about it in the first place.
> 
> An alternative perspective of what "available" means is "how many counters
> could I possibly get to do this new monitoring task". A user may be willing
> to re-assign counters if the new monitoring task is important. Knowing
> how many counters are already free and available for assignment would be
> easy from available_mbm_cntrs but to get an idea of how many counters
> could be re-assigned to help out with the new task would require
> some intricate parsing of mbm_assign_control.
> 
> 
>> But if we keep this file but make it report the number of counters for
>> each domain (similarly to mbm_available_cntrs), then I think the MPAM
>> driver should be able to work with that.
>>
>>>> num_closids and num_rmids are already problematic for MPAM, so it would
>>>> be good to avoid any more parameters of this sort from being reported
>>>> to userspace unless there is a clear understanding of why they are
>>>> needed.
>>>
>>> Yes. Appreciate your help in identifying what could be problematic for MPAM.
>>
>> For clarity: this is a background issue, mostly orthogonal to this
>> series.
>>
>> If this series is merged as-is, with a global per-resource
>> num_mbm_cntrs property, then this not really worse than the current
>> situation -- it's just a bit annoying from the MPAM perspective.
>>
>>
>> In a nutshell, the num_closids / num_rmids parameters seem to expose
>> RDT-specific hardware semantics to userspace, implying a specific
>> allocation model for control group and monitoring group identifiers.
>>
>> The guarantees that userspace is entitled to asssume when resctrl
>> reports particular values do not seem to be well described and are hard
>> to map onto the nearest-equivalent MPAM implementation.  A combination
>> of control and monitoring groups that can be created on x86 may not be
>> creatable on MPAM, even when the number of supportable control and
>> monitoring partitions is the same.
> 
> I understand. This interface was created almost a decade ago. It would have been
> wonderful if the user interface could have been created with a clear vision
> of all the use cases it would end up needing to support. I am trying to be
> very careful with this new user interface as I try to consider all the things I
> learned while working on resctrl. All help get this new interface right is
> greatly appreciated.
> 
> Since your specifically mention issues that MPAM has with num_rmids, please
> note that we have been trying (see [1], but maybe start reading thread at [2])
> to find ways to make this work with MPAM but no word from MPAM side.
> I see that you were not cc'd on the discussion so this is not a criticism of
> you personally but I would like to highlight that we do try to make things
> work well for MPAM but so far this work seems ignored, yet critisized
> for not being done. I expect the more use cases are thrown at an interface
> as it is developed the better it would get and I would gladly work with MPAM
> folks to improve things.
> 
>> Even with the ABMC series, we may still be constrained on what we can
>> report for num_rmids: we can't know in advance whether or not the user
>> is going to use mbm_cntr_assign mode -- if not, we can't promise to
>> create more monitoring groups than the number of counters in the
>> hardware.
> 
> It is the architecture that decides which modes are supported and
> which is default.
> 
>> It seems natural for the counts reported by "available_mbm_cntrs" to
>> change dynamically when the ABMC assignment mode is changed, but I
>> think userspace are likely to expect the global "num_rmids" parameters
>> to be fixed for the lifetime of the resctrl mount (and possibly fixed
>> for all time on a given hardware platform -- at least, modulo CDP).
>>
>>
>> I think it might be possible to tighten up the docmentation of
>> num_closids in particular in a way that doesn't conflict with x86 and
>> may make it easier for MPAM to fit in with, but that feels like a
>> separate conversation.
>>
>> None of this should be considered a blocker for this series, either way.
>>
>>>>
>>>> Reporting number of counters per monitoring domain is a more natural
>>>> fit for MPAM, as below:
>>>>
>>>>> c. Check how many ABMC counters are available in each domain.
>>>>>
>>>>> 	# cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
>>>>> 	0=30;1=30
>>>>
>>>> For MPAM, this seems supportable.  Each monitoring domain will have
>>>> some counters, and a well-defined number of them will be available for
>>>> allocation at any one time.
>>
>> [...]
>>
>>>>> e. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>
>> [...]
>>
>>>>>         Flags can be one of the following:
>>>>>
>>>>>          t  MBM total event is enabled.
>>>>>          l  MBM local event is enabled.
>>>>>          tl Both total and local MBM events are enabled.
>>>>>          _  None of the MBM events are enabled
>>>>>
>>>>> 	Examples:
>>>>
>>>> [...]
>>>>
>>>> I think that this basically works for MPAM.
>>>>
>>>> The local/total distinction doesn't map in a consistent way onto MPAM,
>>>> but this problem is not specific to ABMC.  It feels sensible for ABMC
>>>> to be built around the same concepts that resctrl already has elsewhere
>>>> in the interface.  MPAM will do its best to fit (as already).
>>>>
>>>> Regarding Peter's use case of assiging multiple counters to a
>>>> monitoring group [1], I feel that it's probably good enough to make
>>>> sure that the ABMC interface can be extended in future in a backwards
>>>> compatible way so as to support this, without trying to support it
>>>> immediately.
>>>>
>>>> [1] https://lore.kernel.org/lkml/CALPaoCjY-3f2tWvBjuaQPfoPhxveWxxCxHqQMn4BEaeBXBa0bA@mail.gmail.com/
>>>>
>>>
>>> I do not think that resctrl's current support of the mbm_total_bytes and
>>> mbm_local_bytes should be considered as the "only" two available "slots"
>>> into which all possible events should be forced into. "mon_features" exists
>>> to guide user space to which events are supported and as I see it new events
>>> can be listed here to inform user space of their availability, with their
>>> associated event files available in the resource groups.
>>
>> That's fair.  I wasn't currently sure how (or if) the set of countable
>> events was expected to grow / evolve via this route.
>>
>> Either way, I think this confirms that there is at least one viable way
>> to enable more counters for a single control group, on top of this
>> series.
>>
>> (If there is more than one way, that seems fine?)
>>
>>>>
>>>> For example, if we added new generic "letters" -- say, "0" to "9",
>>>> combined with new counter files in resctrlfs, that feels like a
>>>> possible approach.  ABMC (as in this series) should just reject such
>>>> such assignments, and the new counter files wouldn't exist.
>>>>
>>>> Availability of this feature could also be reported as a distinct mode
>>>> in mbm_assign_mode, say "mbm_cntr_generic", or whatever.
>>>>
>>>>
>>>> A _sketch_ of this follows.  This is NOT a proposal -- the key
>>>> question is whether we are confident that we can extend the interface
>>>> in this way in the future without breaking anything.
>>>>
>>>> If "yes", then the ABMC interface (as proposed by this series) works as
>>>> a foundation to build on.
>>>>
>>>> --8<--
>>>>
>>>> [artists's impression]
>>>>
>>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>>>   	mbm_cntr_generic
>>>>   	[mbm_cntr_assign]
>>>>   	default
>>>>
>>>> # echo mbm_cntr_generic >/sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>>> # echo '//0=01;1=23' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>> # echo t >/sys/fs/resctrl/info/L3_MON/mbm_counter0_bytes_type
>>>> # echo l >/sys/fs/resctrl/info/L3_MON/mbm_counter1_bytes_type
>>>> # echo t >/sys/fs/resctrl/info/L3_MON/mbm_counter2_bytes_type
>>>> # echo l >/sys/fs/resctrl/info/L3_MON/mbm_counter3_bytes_type
>>>>
>>>> ...
>>>>
>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_counter1_bytes
>>>>
>>>> etc.
>>>>
>>>
>>> It is not clear to me what additional features such an interface enables. It
>>> also looks like user space will need to track and manage counter IDs?
>>
>> My idea was that for these generic counters, new files could be exposed
>> to configure what they actually count (the ..._type files shown above;
>> or possibly via the ..._config files that already exist).
>>
>> The "IDs" were inteded as abstract; the number only relates the
>> assignments in mbm_assign_control to the files created elsewhere.  This
>> wouldn't be related to IDs assigned by the hardware.
> 
> I see. Yes, this sounds related to and a generalization of the AMD
> configurable event feature.
> 
>>
>> If there are multiple resctrl users then using numeric IDs might be
>> problematic; though if we go eventually in the direction of making
>> resctrlfs multi-mountable then each mount could have its own namespace.
> 
> I am not aware of "multi-mountable" direction.
> 
>>
>> Allowing counters to be named and configured with a mkdir()-style
>> interface might be possible too; that might make it easier for users to
>> coexist within a single resctrl mount (if we think that's important
>> enough).
>>
>>> It sounds to me as though the issue starts with your statement
>>> "The local/total distinction doesn't map in a consistent way onto MPAM". To
>>> address this I expect that an MPAM system will not support nor list
>>> mbm_total_bytes and/or mbm_local_bytes in its mon_features file (*)? Instead,
>>> it would list the events that are appropriate to the system? Trying to match
>>> with what Peter said [1] in the message you refer to, this may be possible:
>>>
>>> # cat /sys/fs/resctrl/info/L3_MON/mon_features
>>> mbm_local_read_bytes
>>> mbm_local_write_bytes
>>> mbm_local_bytes
>>>
>>> (*) I am including mbm_local_bytes since it could be an event that can be software
>>> defined as a sum of mbm_local_read_bytes and mbm_local_write_bytes when they are both
>>> counted.
>>>
>>> I see the support for MPAM events distinct from the support of assignable counters.
>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>> Please help me understand if you see it differently.
>>> 	
>>> Doing so would need to come up with alphabetical letters for these events,
>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>
>>> mbm_local_read_bytes a
>>> mbm_local_write_bytes b
>>>
>>> Then mbm_assign_control can be used as:
>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>> <value>
>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>
>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>
>>> Reinette
>>>
>>> [1] https://lore.kernel.org/lkml/CALPaoCjY-3f2tWvBjuaQPfoPhxveWxxCxHqQMn4BEaeBXBa0bA@mail.gmail.com/
>>
>> That approach would also work, where an MPAM system has events are not
>> a reasonable approximation of the generic "total" or "local".
>>
>> For now we would probably stick with "total" and "local" anyway though,
>> because the MPAM architecture doesn't natively allow the mapping onto
>> the memory system topology to be discovered, and the information in
>> ACPI / device tree is insufficient to tell us everything we'd need to
>> know.  But I guess what counts as "local" in particular will be quite
>> hardware and topology dependent even on x86, so perhaps we shouldn't
>> worry about having the behaviour match exactly (?)
>>
>> Regarding the code letters, my idea was that the event type might be
>> configured by a separate file, instead of in mbm_assign_control
>> directly, in which case running out of letters wouldn't be a problem.
> 
> This work started with individual files for counters but the issue was
> raised that this will require a large number of filesystem calls when, for
> example, a user wants to move a group of counters associated with the events
> of one set of monitoring groups to another set of monitoring groups. This
> is for the use case where there are a significant number of monitor groups
> for which there are not sufficient counters. With mbm_assign_control this
> can be done in a single write and such a monitoring transition can thus
> be accomplished more efficiently.
> 
>>
>> Alternatively, if we want to be able to expand beyond single letters,
>> could we reserve one or more characters for extension purposes?
>>
>> If braces are forbidden by the syntax today, could we add support for
>> something like the following later on, without breaking anything?
>>
>> # echo '//0={foo}{bar};1={bar}' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>
> 
> Thank you for the suggestion. I think we may need something like this.
> Babu, what do you think?

I'm not quite clear on this. Do we know what 'foo' and 'bar' refer to?
It is a random text?

In his example from
https://lore.kernel.org/lkml/Z643WdXYARTADSBy@e133380.arm.com/
--------------------------------------------------------------
The numbers are not supposed to have an hardware significance.

	'//0=6'

just "means assign some unused counter for domain 0, and create files
in resctrl so I can configure and read it".

The "6" is really just a tag for labelling the resulting resctrl
file names so that the user can tell them apart.  It's not supposed
to imply any specific hardware counter or event.
------------------------------------------------------------------

It seems that 'foo' and 'bar' are tags used to create files in 
/sys/fs/resctrl/info/L3_MON/.

Given that, it looks like we're discussing entirely different things.

> 
>>
>> For now, my main concern would be whether this series prevents that
>> sort of thing being added in a backwards compatible way later.
>>
>> I don't really see anything that is a blocker.
>>
>> What do you think?
> 
> I do not fully understand the MPAM counter feature. It almost sounds like
> every counter could be configured independently with the expectation to
> configure and assign each counter independently to a domain. As I understand
> these capabilities match AMD's ABMC feature, but the planned implementation
> to support ABMC first configures events per-domain and then assign these
> events to counters. hmmm ... but in your example a file like
> "mbm_counter0_bytes_type" is global. Could you please elaborate how in
> your example writing a single letter to that file will be interpreted?
> 
> 
> Reinette
> 
> [1] https://lore.kernel.org/lkml/46767ca7-1f1b-48e8-8ce6-be4b00d129f9@intel.com/
> [2] https://lore.kernel.org/lkml/CALPaoChad6=xqz+BQQd=dB915xhj1gusmcrS9ya+T2GyhTQc5Q@mail.gmail.com/
>
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 12 months ago
Hi Babu,

On 2/14/25 10:31 AM, Moger, Babu wrote:
> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:

(quoting relevant parts with goal to focus discussion on new possible syntax)

>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>> Please help me understand if you see it differently.
>>>>     
>>>> Doing so would need to come up with alphabetical letters for these events,
>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>
>>>> mbm_local_read_bytes a
>>>> mbm_local_write_bytes b
>>>>
>>>> Then mbm_assign_control can be used as:
>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>> <value>
>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>
>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?

As mentioned above, one possible issue with existing interface is that
it is limited to 26 events (assuming only lower case letters are used). The limit
is low enough to be of concern.

....

>>>
>>> Alternatively, if we want to be able to expand beyond single letters,
>>> could we reserve one or more characters for extension purposes?
>>>
>>> If braces are forbidden by the syntax today, could we add support for
>>> something like the following later on, without breaking anything?
>>>
>>> # echo '//0={foo}{bar};1={bar}' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>
>>

Dave proposed a change in syntax that can (a) support unlimited events,
(b) be more intuitive than the one letter flags that may be hard to match
to the events they correspond to.

>> Thank you for the suggestion. I think we may need something like this.
>> Babu, what do you think?
> 
> I'm not quite clear on this. Do we know what 'foo' and 'bar' refer to?
> It is a random text?

Not random text. It refers to the events.

I do not know if braces is what will be settled on but a slight change in
example to make it match your series can be:

# echo '//0={mbm_total_bytes}{mbm_local_bytes};1={mbm_local_bytes}' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control

With syntax like above there is no concern that we will run out of
flags and the events assigned are clear without needing to parse separate flags.
For a system with a lot of events and domains this will become quite a lot
to parse though.

> 
> In his example from
> https://lore.kernel.org/lkml/Z643WdXYARTADSBy@e133380.arm.com/
> --------------------------------------------------------------
> The numbers are not supposed to have an hardware significance.
> 
>     '//0=6'
> 
> just "means assign some unused counter for domain 0, and create files
> in resctrl so I can configure and read it".

Thanks for pointing this out. I missed that the idea was that the
configuration files are dynamically created.

> 
> The "6" is really just a tag for labelling the resulting resctrl
> file names so that the user can tell them apart.  It's not supposed
> to imply any specific hardware counter or event.

Right.

> ------------------------------------------------------------------
> 
> It seems that 'foo' and 'bar' are tags used to create files in /sys/fs/resctrl/info/L3_MON/.
> 
> Given that, it looks like we're discussing entirely different things.

I am still trying to understand how MPAM counters can be supported.

Reinette
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 11 months, 4 weeks ago
Hi Reinette,

On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Babu,
>
> On 2/14/25 10:31 AM, Moger, Babu wrote:
> > On 2/14/2025 12:26 AM, Reinette Chatre wrote:
> >> On 2/13/25 9:37 AM, Dave Martin wrote:
> >>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
> >>>> On 2/12/25 9:46 AM, Dave Martin wrote:
> >>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>
> (quoting relevant parts with goal to focus discussion on new possible syntax)
>
> >>>> I see the support for MPAM events distinct from the support of assignable counters.
> >>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
> >>>> Please help me understand if you see it differently.
> >>>>
> >>>> Doing so would need to come up with alphabetical letters for these events,
> >>>> which seems to be needed for your proposal also? If we use possible flags of:
> >>>>
> >>>> mbm_local_read_bytes a
> >>>> mbm_local_write_bytes b
> >>>>
> >>>> Then mbm_assign_control can be used as:
> >>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
> >>>> <value>
> >>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> >>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
> >>>>
> >>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
> >>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
> >>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>
> As mentioned above, one possible issue with existing interface is that
> it is limited to 26 events (assuming only lower case letters are used). The limit
> is low enough to be of concern.

The events which can be monitored by a single counter on ABMC and MPAM
so far are combinable, so 26 counters per group today means it limits
breaking down MBM traffic for each group 26 ways. If a user complained
that a 26-way breakdown of a group's MBM traffic was limiting their
investigation, I would question whether they know what they're looking
for.

-Peter
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 11 months, 3 weeks ago
Hi Peter,

On 2/17/25 2:26 AM, Peter Newman wrote:
> Hi Reinette,
> 
> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>>
>> Hi Babu,
>>
>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>
>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>
>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>> Please help me understand if you see it differently.
>>>>>>
>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>
>>>>>> mbm_local_read_bytes a
>>>>>> mbm_local_write_bytes b
>>>>>>
>>>>>> Then mbm_assign_control can be used as:
>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>> <value>
>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>
>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>
>> As mentioned above, one possible issue with existing interface is that
>> it is limited to 26 events (assuming only lower case letters are used). The limit
>> is low enough to be of concern.
> 
> The events which can be monitored by a single counter on ABMC and MPAM
> so far are combinable, so 26 counters per group today means it limits
> breaking down MBM traffic for each group 26 ways. If a user complained
> that a 26-way breakdown of a group's MBM traffic was limiting their
> investigation, I would question whether they know what they're looking
> for.

The key here is "so far" as well as the focus on MBM only. 

It is impossible for me to predict what we will see in a couple of years
from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
to support their users. Just looking at the Intel RDT spec the event register
has space for 32 events for each "CPU agent" resource. That does not take into
account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
that he is working on patches [1] that will add new events and shared the idea
that we may be trending to support "perf" like events associated with RMID. I
expect AMD PQoS and Arm MPAM to provide related enhancements to support their
customers.
This all makes me think that resctrl should be ready to support more events than 26.

My goal is for resctrl to have a user interface that can as much as possible
be ready for whatever may be required from it years down the line. Of course,
I may be wrong and resctrl would never need to support more than 26 events per
resource (*). The risk is that resctrl *may* need to support more than 26 events
and how could resctrl support that?

What is the risk of supporting more than 26 events? As I highlighted earlier
the interface I used as demonstration may become unwieldy to parse on a system
with many domains that supports many events. This is a concern for me. Any suggestions
will be appreciated, especially from you since I know that you are very familiar with
issues related to large scale use of resctrl interfaces.

Reinette

[1] https://lore.kernel.org/lkml/SJ1PR11MB6083759CCE59FF2FE931471EFCFF2@SJ1PR11MB6083.namprd11.prod.outlook.com/

(*) There is also the scenario where combined between resources there may be
more than 26 events supported that will require the same one letter flag to be
used for different events of different resources. This may potentially be
confusing.
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 11 months, 3 weeks ago
Hi Reinette,

On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Peter,
>
> On 2/17/25 2:26 AM, Peter Newman wrote:
> > Hi Reinette,
> >
> > On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
> > <reinette.chatre@intel.com> wrote:
> >>
> >> Hi Babu,
> >>
> >> On 2/14/25 10:31 AM, Moger, Babu wrote:
> >>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
> >>>> On 2/13/25 9:37 AM, Dave Martin wrote:
> >>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
> >>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
> >>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
> >>
> >> (quoting relevant parts with goal to focus discussion on new possible syntax)
> >>
> >>>>>> I see the support for MPAM events distinct from the support of assignable counters.
> >>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
> >>>>>> Please help me understand if you see it differently.
> >>>>>>
> >>>>>> Doing so would need to come up with alphabetical letters for these events,
> >>>>>> which seems to be needed for your proposal also? If we use possible flags of:
> >>>>>>
> >>>>>> mbm_local_read_bytes a
> >>>>>> mbm_local_write_bytes b
> >>>>>>
> >>>>>> Then mbm_assign_control can be used as:
> >>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
> >>>>>> <value>
> >>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> >>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
> >>>>>>
> >>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
> >>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
> >>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
> >>
> >> As mentioned above, one possible issue with existing interface is that
> >> it is limited to 26 events (assuming only lower case letters are used). The limit
> >> is low enough to be of concern.
> >
> > The events which can be monitored by a single counter on ABMC and MPAM
> > so far are combinable, so 26 counters per group today means it limits
> > breaking down MBM traffic for each group 26 ways. If a user complained
> > that a 26-way breakdown of a group's MBM traffic was limiting their
> > investigation, I would question whether they know what they're looking
> > for.
>
> The key here is "so far" as well as the focus on MBM only.
>
> It is impossible for me to predict what we will see in a couple of years
> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
> to support their users. Just looking at the Intel RDT spec the event register
> has space for 32 events for each "CPU agent" resource. That does not take into
> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
> that he is working on patches [1] that will add new events and shared the idea
> that we may be trending to support "perf" like events associated with RMID. I
> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
> customers.
> This all makes me think that resctrl should be ready to support more events than 26.

I was thinking of the letters as representing a reusable, user-defined
event-set for applying to a single counter rather than as individual
events, since MPAM and ABMC allow us to choose the set of events each
one counts. Wherever we define the letters, we could use more symbolic
event names.

In the letters as events model, choosing the events assigned to a
group wouldn't be enough information, since we would want to control
which events should share a counter and which should be counted by
separate counters. I think the amount of information that would need
to be encoded into mbm_assign_control to represent the level of
configurability supported by hardware would quickly get out of hand.

Maybe as an example, one counter for all reads, one counter for all
writes in ABMC would look like...

(L3_QOS_ABMC_CFG.BwType field names below)

(per domain)
group 0:
 counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
 counter 1: VictimBW,LclNTWr,RmtNTWr
group 1:
 counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
 counter 3: VictimBW,LclNTWr,RmtNTWr
...

I assume packing all of this info for a group's desired counter
configuration into a single line (with 32 domains per line on many
dual-socket AMD configurations I see) would be difficult to look at,
even if we could settle on a single letter to represent each
universally.

>
> My goal is for resctrl to have a user interface that can as much as possible
> be ready for whatever may be required from it years down the line. Of course,
> I may be wrong and resctrl would never need to support more than 26 events per
> resource (*). The risk is that resctrl *may* need to support more than 26 events
> and how could resctrl support that?
>
> What is the risk of supporting more than 26 events? As I highlighted earlier
> the interface I used as demonstration may become unwieldy to parse on a system
> with many domains that supports many events. This is a concern for me. Any suggestions
> will be appreciated, especially from you since I know that you are very familiar with
> issues related to large scale use of resctrl interfaces.

It's mainly just the unwieldiness of all the information in one file.
It's already at the limit of what I can visually look through.

I believe that shared assignments will take care of all the
high-frequency and performance-intensive batch configuration updates I
was originally concerned about, so I no longer see much benefit in
finding ways to textually encode all this information in a single file
when it would be more manageable to distribute it around the
filesystem hierarchy.

-Peter


>
> Reinette
>
> [1] https://lore.kernel.org/lkml/SJ1PR11MB6083759CCE59FF2FE931471EFCFF2@SJ1PR11MB6083.namprd11.prod.outlook.com/
>
> (*) There is also the scenario where combined between resources there may be
> more than 26 events supported that will require the same one letter flag to be
> used for different events of different resources. This may potentially be
> confusing.
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 11 months, 3 weeks ago
Hi Peter,

On 2/19/25 3:28 AM, Peter Newman wrote:
> Hi Reinette,
> 
> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>>
>> Hi Peter,
>>
>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>> Hi Reinette,
>>>
>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>> <reinette.chatre@intel.com> wrote:
>>>>
>>>> Hi Babu,
>>>>
>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>
>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>
>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>
>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>
>>>>>>>> mbm_local_read_bytes a
>>>>>>>> mbm_local_write_bytes b
>>>>>>>>
>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>> <value>
>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>
>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>
>>>> As mentioned above, one possible issue with existing interface is that
>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>> is low enough to be of concern.
>>>
>>> The events which can be monitored by a single counter on ABMC and MPAM
>>> so far are combinable, so 26 counters per group today means it limits
>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>> investigation, I would question whether they know what they're looking
>>> for.
>>
>> The key here is "so far" as well as the focus on MBM only.
>>
>> It is impossible for me to predict what we will see in a couple of years
>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>> to support their users. Just looking at the Intel RDT spec the event register
>> has space for 32 events for each "CPU agent" resource. That does not take into
>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>> that he is working on patches [1] that will add new events and shared the idea
>> that we may be trending to support "perf" like events associated with RMID. I
>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>> customers.
>> This all makes me think that resctrl should be ready to support more events than 26.
> 
> I was thinking of the letters as representing a reusable, user-defined
> event-set for applying to a single counter rather than as individual
> events, since MPAM and ABMC allow us to choose the set of events each
> one counts. Wherever we define the letters, we could use more symbolic
> event names.

Thank you for clarifying.

> 
> In the letters as events model, choosing the events assigned to a
> group wouldn't be enough information, since we would want to control
> which events should share a counter and which should be counted by
> separate counters. I think the amount of information that would need
> to be encoded into mbm_assign_control to represent the level of
> configurability supported by hardware would quickly get out of hand.
> 
> Maybe as an example, one counter for all reads, one counter for all
> writes in ABMC would look like...
> 
> (L3_QOS_ABMC_CFG.BwType field names below)
> 
> (per domain)
> group 0:
>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>  counter 1: VictimBW,LclNTWr,RmtNTWr
> group 1:
>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>  counter 3: VictimBW,LclNTWr,RmtNTWr
> ...
> 

I think this may also be what Dave was heading towards in [2] but in that
example and above the counter configuration appears to be global. You do mention
"configurability supported by hardware" so I wonder if per-domain counter
configuration is a requirement?

Until now I viewed counter configuration separate from counter assignment,
similar to how AMD's counters can be configured via mbm_total_bytes_config and
mbm_local_bytes_config before they are assigned. That is still per-domain
counter configuration though, not per-counter.

> I assume packing all of this info for a group's desired counter
> configuration into a single line (with 32 domains per line on many
> dual-socket AMD configurations I see) would be difficult to look at,
> even if we could settle on a single letter to represent each
> universally.
> 
>>
>> My goal is for resctrl to have a user interface that can as much as possible
>> be ready for whatever may be required from it years down the line. Of course,
>> I may be wrong and resctrl would never need to support more than 26 events per
>> resource (*). The risk is that resctrl *may* need to support more than 26 events
>> and how could resctrl support that?
>>
>> What is the risk of supporting more than 26 events? As I highlighted earlier
>> the interface I used as demonstration may become unwieldy to parse on a system
>> with many domains that supports many events. This is a concern for me. Any suggestions
>> will be appreciated, especially from you since I know that you are very familiar with
>> issues related to large scale use of resctrl interfaces.
> 
> It's mainly just the unwieldiness of all the information in one file.
> It's already at the limit of what I can visually look through.

I agree.

> 
> I believe that shared assignments will take care of all the
> high-frequency and performance-intensive batch configuration updates I
> was originally concerned about, so I no longer see much benefit in
> finding ways to textually encode all this information in a single file
> when it would be more manageable to distribute it around the
> filesystem hierarchy.

This is significant. The motivation for the single file was to support
the "high-frequency and performance-intensive" usage. Would "shared assignments"
not also depend on the same files that, if distributed, will require many
filesystem operations? 
Having the files distributed will be significantly simpler while also
avoiding the file size issue that Dave Martin exposed. 

Reinette

>> [1] https://lore.kernel.org/lkml/SJ1PR11MB6083759CCE59FF2FE931471EFCFF2@SJ1PR11MB6083.namprd11.prod.outlook.com/
>>
>> (*) There is also the scenario where combined between resources there may be
>> more than 26 events supported that will require the same one letter flag to be
>> used for different events of different resources. This may potentially be
>> confusing.

[2] https://lore.kernel.org/lkml/Z6zeXby8ajh0ax6i@e133380.arm.com/
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Dave Martin 11 months, 3 weeks ago
Hi,

On Wed, Feb 19, 2025 at 09:56:29AM -0800, Reinette Chatre wrote:
> Hi Peter,
> 
> On 2/19/25 3:28 AM, Peter Newman wrote:

[...]

> > In the letters as events model, choosing the events assigned to a
> > group wouldn't be enough information, since we would want to control
> > which events should share a counter and which should be counted by
> > separate counters. I think the amount of information that would need
> > to be encoded into mbm_assign_control to represent the level of
> > configurability supported by hardware would quickly get out of hand.
> > 
> > Maybe as an example, one counter for all reads, one counter for all
> > writes in ABMC would look like...
> > 
> > (L3_QOS_ABMC_CFG.BwType field names below)
> > 
> > (per domain)
> > group 0:
> >  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >  counter 1: VictimBW,LclNTWr,RmtNTWr
> > group 1:
> >  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >  counter 3: VictimBW,LclNTWr,RmtNTWr
> > ...
> > 
> 
> I think this may also be what Dave was heading towards in [2] but in that
> example and above the counter configuration appears to be global. You do mention
> "configurability supported by hardware" so I wonder if per-domain counter
> configuration is a requirement?
> 
> Until now I viewed counter configuration separate from counter assignment,
> similar to how AMD's counters can be configured via mbm_total_bytes_config and
> mbm_local_bytes_config before they are assigned. That is still per-domain
> counter configuration though, not per-counter.

I hadn't tried to work the design through in any detail: it wasn't
intended as a suggestion for something we should definitely do right
now; rather, it was just an incomplete sketch of one possible future
evolution of the interface.

Either way these feel like future concerns, if the first iteration of
ABMC is just to provide the basics so that ABMC hardware can implement
resctrl without userspace seeing counters randomly stopping and
resetting...

Peter, can you give a view on whether the ABMC as proposed in this series
is a useful stepping-stone?  Or are there things that you need that you
feel could not be added as a later extension without ABI breakage?

[...]

> > I believe that shared assignments will take care of all the
> > high-frequency and performance-intensive batch configuration updates I
> > was originally concerned about, so I no longer see much benefit in
> > finding ways to textually encode all this information in a single file
> > when it would be more manageable to distribute it around the
> > filesystem hierarchy.
> 
> This is significant. The motivation for the single file was to support
> the "high-frequency and performance-intensive" usage. Would "shared assignments"
> not also depend on the same files that, if distributed, will require many
> filesystem operations? 
> Having the files distributed will be significantly simpler while also
> avoiding the file size issue that Dave Martin exposed. 
> 
> Reinette

I still haven't fully understood the "shared assignments" proposal;
I need to go back and look at it.

If we split the file, it will be more closely aligned with the design
of the rest of the resctrlfs interface.

OTOH, the current interface seems workable and I think the file size
issue can be addressed without major re-engineering.

So, from my side, I would not consider the current interface design
a blocker.

[...]

Cheers
---Dave
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Dave Martin 11 months, 3 weeks ago
Hi again,

On Thu, Feb 20, 2025 at 04:46:40PM +0000, Dave Martin wrote:
> Hi,
> 
> On Wed, Feb 19, 2025 at 09:56:29AM -0800, Reinette Chatre wrote:
> > Hi Peter,
> > 
> > On 2/19/25 3:28 AM, Peter Newman wrote:
> 
> [...]
> 
> > > In the letters as events model, choosing the events assigned to a
> > > group wouldn't be enough information, since we would want to control
> > > which events should share a counter and which should be counted by
> > > separate counters. I think the amount of information that would need
> > > to be encoded into mbm_assign_control to represent the level of
> > > configurability supported by hardware would quickly get out of hand.
> > > 
> > > Maybe as an example, one counter for all reads, one counter for all
> > > writes in ABMC would look like...
> > > 
> > > (L3_QOS_ABMC_CFG.BwType field names below)
> > > 
> > > (per domain)
> > > group 0:
> > >  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> > >  counter 1: VictimBW,LclNTWr,RmtNTWr
> > > group 1:
> > >  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> > >  counter 3: VictimBW,LclNTWr,RmtNTWr
> > > ...
> > > 
> > 
> > I think this may also be what Dave was heading towards in [2] but in that
> > example and above the counter configuration appears to be global. You do mention
> > "configurability supported by hardware" so I wonder if per-domain counter
> > configuration is a requirement?
> > 
> > Until now I viewed counter configuration separate from counter assignment,
> > similar to how AMD's counters can be configured via mbm_total_bytes_config and
> > mbm_local_bytes_config before they are assigned. That is still per-domain
> > counter configuration though, not per-counter.
> 
> I hadn't tried to work the design through in any detail: it wasn't
> intended as a suggestion for something we should definitely do right
> now; rather, it was just an incomplete sketch of one possible future
> evolution of the interface.
> 
> Either way these feel like future concerns, if the first iteration of
> ABMC is just to provide the basics so that ABMC hardware can implement
> resctrl without userspace seeing counters randomly stopping and
> resetting...
> 
> Peter, can you give a view on whether the ABMC as proposed in this series
> is a useful stepping-stone?  Or are there things that you need that you
> feel could not be added as a later extension without ABI breakage?
> 
> [...]
> 
> > > I believe that shared assignments will take care of all the
> > > high-frequency and performance-intensive batch configuration updates I
> > > was originally concerned about, so I no longer see much benefit in
> > > finding ways to textually encode all this information in a single file
jjjk> > > when it would be more manageable to distribute it around the
> > > filesystem hierarchy.
> > 
> > This is significant. The motivation for the single file was to support
> > the "high-frequency and performance-intensive" usage. Would "shared assignments"
> > not also depend on the same files that, if distributed, will require many
> > filesystem operations? 
> > Having the files distributed will be significantly simpler while also
> > avoiding the file size issue that Dave Martin exposed. 
> > 
> > Reinette
> 
> I still haven't fully understood the "shared assignments" proposal;
> I need to go back and look at it.

Having taken a quick look at that now, this all seems to duplicate
perf's design journey (again).

"rate" events make some sense.  The perf equivalent is to keep an
accumulated count of the amount of time a counter has been assigned to
an event, and another accumulated count of the events counted by the
counter during assignment.  Only userspace knows what it wants to do
with this information: perf exposes the raw accumulated counts.

Perf events can be also pinned so that they are prioritised for
assignment to counters; that sounds a lot like the regular, non-shared
resctrl counters.


Playing devil's advocate:

It does feel like we are doomed to reinvent perf if we go too far down
this road...

> If we split the file, it will be more closely aligned with the design
> of the rest of the resctrlfs interface.
> 
> OTOH, the current interface seems workable and I think the file size
> issue can be addressed without major re-engineering.
> 
> So, from my side, I would not consider the current interface design
> a blocker.

...so, drawing a hard line around the use cases that we intend to
address with this interface and avoiding feature creep seems desirable.

resctrlfs is already in the wild, so providing reasonable baseline
compatiblity with that interface for ABMC hardware is a sensible goal.
The current series does that.

But I wonder how much additional functionality we should really be
adding via the mbm_assign_control interface, once this series is
settled.

Cheers
---Dave
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 11 months, 3 weeks ago
Hi Dave,

On 2/20/25 9:46 AM, Dave Martin wrote:
> Hi again,
> 
> On Thu, Feb 20, 2025 at 04:46:40PM +0000, Dave Martin wrote:
>> Hi,
>>
>> On Wed, Feb 19, 2025 at 09:56:29AM -0800, Reinette Chatre wrote:
>>> Hi Peter,
>>>
>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>
>> [...]
>>
>>>> In the letters as events model, choosing the events assigned to a
>>>> group wouldn't be enough information, since we would want to control
>>>> which events should share a counter and which should be counted by
>>>> separate counters. I think the amount of information that would need
>>>> to be encoded into mbm_assign_control to represent the level of
>>>> configurability supported by hardware would quickly get out of hand.
>>>>
>>>> Maybe as an example, one counter for all reads, one counter for all
>>>> writes in ABMC would look like...
>>>>
>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>
>>>> (per domain)
>>>> group 0:
>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
>>>> group 1:
>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
>>>> ...
>>>>
>>>
>>> I think this may also be what Dave was heading towards in [2] but in that
>>> example and above the counter configuration appears to be global. You do mention
>>> "configurability supported by hardware" so I wonder if per-domain counter
>>> configuration is a requirement?
>>>
>>> Until now I viewed counter configuration separate from counter assignment,
>>> similar to how AMD's counters can be configured via mbm_total_bytes_config and
>>> mbm_local_bytes_config before they are assigned. That is still per-domain
>>> counter configuration though, not per-counter.
>>
>> I hadn't tried to work the design through in any detail: it wasn't
>> intended as a suggestion for something we should definitely do right
>> now; rather, it was just an incomplete sketch of one possible future
>> evolution of the interface.
>>
>> Either way these feel like future concerns, if the first iteration of
>> ABMC is just to provide the basics so that ABMC hardware can implement
>> resctrl without userspace seeing counters randomly stopping and
>> resetting...
>>
>> Peter, can you give a view on whether the ABMC as proposed in this series
>> is a useful stepping-stone?  Or are there things that you need that you
>> feel could not be added as a later extension without ABI breakage?
>>
>> [...]
>>
>>>> I believe that shared assignments will take care of all the
>>>> high-frequency and performance-intensive batch configuration updates I
>>>> was originally concerned about, so I no longer see much benefit in
>>>> finding ways to textually encode all this information in a single file
> jjjk> > > when it would be more manageable to distribute it around the
>>>> filesystem hierarchy.
>>>
>>> This is significant. The motivation for the single file was to support
>>> the "high-frequency and performance-intensive" usage. Would "shared assignments"
>>> not also depend on the same files that, if distributed, will require many
>>> filesystem operations? 
>>> Having the files distributed will be significantly simpler while also
>>> avoiding the file size issue that Dave Martin exposed. 
>>>
>>> Reinette
>>
>> I still haven't fully understood the "shared assignments" proposal;
>> I need to go back and look at it.
> 
> Having taken a quick look at that now, this all seems to duplicate
> perf's design journey (again).
> 
> "rate" events make some sense.  The perf equivalent is to keep an
> accumulated count of the amount of time a counter has been assigned to
> an event, and another accumulated count of the events counted by the
> counter during assignment.  Only userspace knows what it wants to do
> with this information: perf exposes the raw accumulated counts.
> 
> Perf events can be also pinned so that they are prioritised for
> assignment to counters; that sounds a lot like the regular, non-shared
> resctrl counters.
> 
> 
> Playing devil's advocate:
> 
> It does feel like we are doomed to reinvent perf if we go too far down
> this road...
> 
>> If we split the file, it will be more closely aligned with the design
>> of the rest of the resctrlfs interface.
>>
>> OTOH, the current interface seems workable and I think the file size
>> issue can be addressed without major re-engineering.
>>
>> So, from my side, I would not consider the current interface design
>> a blocker.
> 
> ...so, drawing a hard line around the use cases that we intend to
> address with this interface and avoiding feature creep seems desirable.

This is exactly what I am trying to do ... to understand what use cases
the interface is expected to support.

You have mentioned a couple of times now that this interface is sufficient but
at the same time you hinted at some features from MPAM that I do not see
possible to accommodate with this interface.
 
> resctrlfs is already in the wild, so providing reasonable baseline
> compatiblity with that interface for ABMC hardware is a sensible goal.
> The current series does that.
> 
> But I wonder how much additional functionality we should really be
> adding via the mbm_assign_control interface, once this series is
> settled.

Are you speculating that MPAM counters may not make use of this interface?

Reinette
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Dave Martin 11 months, 3 weeks ago
Hi Reinette,

On Thu, Feb 20, 2025 at 10:36:18AM -0800, Reinette Chatre wrote:
> Hi Dave,
> 
> On 2/20/25 9:46 AM, Dave Martin wrote:
> > Hi again,
> > 
> > On Thu, Feb 20, 2025 at 04:46:40PM +0000, Dave Martin wrote:

[...]

> > Having taken a quick look at that now, this all seems to duplicate
> > perf's design journey (again).
> > 
> > "rate" events make some sense.  The perf equivalent is to keep an
> > accumulated count of the amount of time a counter has been assigned to
> > an event, and another accumulated count of the events counted by the
> > counter during assignment.  Only userspace knows what it wants to do
> > with this information: perf exposes the raw accumulated counts.
> > 
> > Perf events can be also pinned so that they are prioritised for
> > assignment to counters; that sounds a lot like the regular, non-shared
> > resctrl counters.
> > 
> > 
> > Playing devil's advocate:
> > 
> > It does feel like we are doomed to reinvent perf if we go too far down
> > this road...
> > 
> >> If we split the file, it will be more closely aligned with the design
> >> of the rest of the resctrlfs interface.
> >>
> >> OTOH, the current interface seems workable and I think the file size
> >> issue can be addressed without major re-engineering.
> >>
> >> So, from my side, I would not consider the current interface design
> >> a blocker.
> > 
> > ...so, drawing a hard line around the use cases that we intend to
> > address with this interface and avoiding feature creep seems desirable.
> 
> This is exactly what I am trying to do ... to understand what use cases
> the interface is expected to support.
> 
> You have mentioned a couple of times now that this interface is sufficient but
> at the same time you hinted at some features from MPAM that I do not see
> possible to accommodate with this interface.

It's kind of both.

I think the interface is sufficient to be useful, and therefore has
value.

The problem being addressed here (shortage of counters) is fully
relevant to MPAM (at last on some hardware).

Any architecture may define new metrics and types of event that can be
counted, and they're not going to match up exactly between arches -- so
I don't think we can expect everything to fit perfectly within a
generic interface.  But having a generic interface is still useful for
making common features convenient to use.

So the interface is useful but not universal, but that doesn't feel
like a bug.

Hopefully that makes my position a bit clearer.

> > resctrlfs is already in the wild, so providing reasonable baseline
> > compatiblity with that interface for ABMC hardware is a sensible goal.
> > The current series does that.
> > 
> > But I wonder how much additional functionality we should really be
> > adding via the mbm_assign_control interface, once this series is
> > settled.
> 
> Are you speculating that MPAM counters may not make use of this interface?
> 
> Reinette

No, I think it makes sense for MPAM to follow this interface, as least
as far as what has been proposed so far here.

I think James got his updated rebase working. [1]


perf support would be for the future if we do it, but the ABMC
interface may be a useful starting point anyway, because it allows
counters to be assigned explicitly -- that provides a natural way to
hand over some counters to perf, either because that interface may be a
more natural fit for what the user is trying to do, or perhaps to count
weird, platform-specific event types that do not merit the effort of
integration into resctrlfs proper.

Does that make sense?

Cheers
---Dave

[1] https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/log/?h=mpam/abmc/v11
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 11 months, 3 weeks ago
Hi Dave,

On 2/21/25 8:47 AM, Dave Martin wrote:
> Hi Reinette,
> 
> On Thu, Feb 20, 2025 at 10:36:18AM -0800, Reinette Chatre wrote:
>> Hi Dave,
>>
>> On 2/20/25 9:46 AM, Dave Martin wrote:
>>> Hi again,
>>>
>>> On Thu, Feb 20, 2025 at 04:46:40PM +0000, Dave Martin wrote:
> 
> [...]
> 
>>> Having taken a quick look at that now, this all seems to duplicate
>>> perf's design journey (again).
>>>
>>> "rate" events make some sense.  The perf equivalent is to keep an
>>> accumulated count of the amount of time a counter has been assigned to
>>> an event, and another accumulated count of the events counted by the
>>> counter during assignment.  Only userspace knows what it wants to do
>>> with this information: perf exposes the raw accumulated counts.
>>>
>>> Perf events can be also pinned so that they are prioritised for
>>> assignment to counters; that sounds a lot like the regular, non-shared
>>> resctrl counters.
>>>
>>>
>>> Playing devil's advocate:
>>>
>>> It does feel like we are doomed to reinvent perf if we go too far down
>>> this road...
>>>
>>>> If we split the file, it will be more closely aligned with the design
>>>> of the rest of the resctrlfs interface.
>>>>
>>>> OTOH, the current interface seems workable and I think the file size
>>>> issue can be addressed without major re-engineering.
>>>>
>>>> So, from my side, I would not consider the current interface design
>>>> a blocker.
>>>
>>> ...so, drawing a hard line around the use cases that we intend to
>>> address with this interface and avoiding feature creep seems desirable.
>>
>> This is exactly what I am trying to do ... to understand what use cases
>> the interface is expected to support.
>>
>> You have mentioned a couple of times now that this interface is sufficient but
>> at the same time you hinted at some features from MPAM that I do not see
>> possible to accommodate with this interface.
> 
> It's kind of both.
> 
> I think the interface is sufficient to be useful, and therefore has
> value.
> 
> The problem being addressed here (shortage of counters) is fully
> relevant to MPAM (at last on some hardware).
> 
> Any architecture may define new metrics and types of event that can be
> counted, and they're not going to match up exactly between arches -- so
> I don't think we can expect everything to fit perfectly within a
> generic interface.  But having a generic interface is still useful for
> making common features convenient to use.
> 
> So the interface is useful but not universal, but that doesn't feel
> like a bug.
> 
> Hopefully that makes my position a bit clearer.
> 
>>> resctrlfs is already in the wild, so providing reasonable baseline
>>> compatiblity with that interface for ABMC hardware is a sensible goal.
>>> The current series does that.
>>>
>>> But I wonder how much additional functionality we should really be
>>> adding via the mbm_assign_control interface, once this series is
>>> settled.
>>
>> Are you speculating that MPAM counters may not make use of this interface?
>>
>> Reinette
> 
> No, I think it makes sense for MPAM to follow this interface, as least
> as far as what has been proposed so far here.
> 
> I think James got his updated rebase working. [1]
> 
> 
> perf support would be for the future if we do it, but the ABMC
> interface may be a useful starting point anyway, because it allows
> counters to be assigned explicitly -- that provides a natural way to
> hand over some counters to perf, either because that interface may be a
> more natural fit for what the user is trying to do, or perhaps to count
> weird, platform-specific event types that do not merit the effort of
> integration into resctrlfs proper.
> 
> Does that make sense?
> 

This is reasonable. You did state earlier that we should aim to draw
hard lines around the use cases we aim to address and I think one
way this work is doing this is by being explicit in user interface that
this is all about "memory bandwidth monitoring". This is not intended to
be a fully generic interface for all possible counters for all possible
resources.

Apart from that time will tell how many blind spots there were while
creating this interface.

Thank you very much for all your very valuable insights.

Reinette
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 11 months, 3 weeks ago
Hi Reinette,

On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Peter,
>
> On 2/19/25 3:28 AM, Peter Newman wrote:
> > Hi Reinette,
> >
> > On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
> > <reinette.chatre@intel.com> wrote:
> >>
> >> Hi Peter,
> >>
> >> On 2/17/25 2:26 AM, Peter Newman wrote:
> >>> Hi Reinette,
> >>>
> >>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
> >>> <reinette.chatre@intel.com> wrote:
> >>>>
> >>>> Hi Babu,
> >>>>
> >>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
> >>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
> >>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
> >>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
> >>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
> >>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
> >>>>
> >>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
> >>>>
> >>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
> >>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
> >>>>>>>> Please help me understand if you see it differently.
> >>>>>>>>
> >>>>>>>> Doing so would need to come up with alphabetical letters for these events,
> >>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
> >>>>>>>>
> >>>>>>>> mbm_local_read_bytes a
> >>>>>>>> mbm_local_write_bytes b
> >>>>>>>>
> >>>>>>>> Then mbm_assign_control can be used as:
> >>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
> >>>>>>>> <value>
> >>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> >>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
> >>>>>>>>
> >>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
> >>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
> >>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
> >>>>
> >>>> As mentioned above, one possible issue with existing interface is that
> >>>> it is limited to 26 events (assuming only lower case letters are used). The limit
> >>>> is low enough to be of concern.
> >>>
> >>> The events which can be monitored by a single counter on ABMC and MPAM
> >>> so far are combinable, so 26 counters per group today means it limits
> >>> breaking down MBM traffic for each group 26 ways. If a user complained
> >>> that a 26-way breakdown of a group's MBM traffic was limiting their
> >>> investigation, I would question whether they know what they're looking
> >>> for.
> >>
> >> The key here is "so far" as well as the focus on MBM only.
> >>
> >> It is impossible for me to predict what we will see in a couple of years
> >> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
> >> to support their users. Just looking at the Intel RDT spec the event register
> >> has space for 32 events for each "CPU agent" resource. That does not take into
> >> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
> >> that he is working on patches [1] that will add new events and shared the idea
> >> that we may be trending to support "perf" like events associated with RMID. I
> >> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
> >> customers.
> >> This all makes me think that resctrl should be ready to support more events than 26.
> >
> > I was thinking of the letters as representing a reusable, user-defined
> > event-set for applying to a single counter rather than as individual
> > events, since MPAM and ABMC allow us to choose the set of events each
> > one counts. Wherever we define the letters, we could use more symbolic
> > event names.
>
> Thank you for clarifying.
>
> >
> > In the letters as events model, choosing the events assigned to a
> > group wouldn't be enough information, since we would want to control
> > which events should share a counter and which should be counted by
> > separate counters. I think the amount of information that would need
> > to be encoded into mbm_assign_control to represent the level of
> > configurability supported by hardware would quickly get out of hand.
> >
> > Maybe as an example, one counter for all reads, one counter for all
> > writes in ABMC would look like...
> >
> > (L3_QOS_ABMC_CFG.BwType field names below)
> >
> > (per domain)
> > group 0:
> >  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >  counter 1: VictimBW,LclNTWr,RmtNTWr
> > group 1:
> >  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >  counter 3: VictimBW,LclNTWr,RmtNTWr
> > ...
> >
>
> I think this may also be what Dave was heading towards in [2] but in that
> example and above the counter configuration appears to be global. You do mention
> "configurability supported by hardware" so I wonder if per-domain counter
> configuration is a requirement?

If it's global and we want a particular group to be watched by more
counters, I wouldn't want this to result in allocating more counters
for that group in all domains, or allocating counters in domains where
they're not needed. I want to encourage my users to avoid allocating
monitoring resources in domains where a job is not allowed to run so
there's less pressure on the counters.

In Dave's proposal it looks like global configuration means
globally-defined "named counter configurations", which works because
it's really per-domain assignment of the configurations to however
many counters the group needs in each domain.

>
> Until now I viewed counter configuration separate from counter assignment,
> similar to how AMD's counters can be configured via mbm_total_bytes_config and
> mbm_local_bytes_config before they are assigned. That is still per-domain
> counter configuration though, not per-counter.
>
> > I assume packing all of this info for a group's desired counter
> > configuration into a single line (with 32 domains per line on many
> > dual-socket AMD configurations I see) would be difficult to look at,
> > even if we could settle on a single letter to represent each
> > universally.
> >
> >>
> >> My goal is for resctrl to have a user interface that can as much as possible
> >> be ready for whatever may be required from it years down the line. Of course,
> >> I may be wrong and resctrl would never need to support more than 26 events per
> >> resource (*). The risk is that resctrl *may* need to support more than 26 events
> >> and how could resctrl support that?
> >>
> >> What is the risk of supporting more than 26 events? As I highlighted earlier
> >> the interface I used as demonstration may become unwieldy to parse on a system
> >> with many domains that supports many events. This is a concern for me. Any suggestions
> >> will be appreciated, especially from you since I know that you are very familiar with
> >> issues related to large scale use of resctrl interfaces.
> >
> > It's mainly just the unwieldiness of all the information in one file.
> > It's already at the limit of what I can visually look through.
>
> I agree.
>
> >
> > I believe that shared assignments will take care of all the
> > high-frequency and performance-intensive batch configuration updates I
> > was originally concerned about, so I no longer see much benefit in
> > finding ways to textually encode all this information in a single file
> > when it would be more manageable to distribute it around the
> > filesystem hierarchy.
>
> This is significant. The motivation for the single file was to support
> the "high-frequency and performance-intensive" usage. Would "shared assignments"
> not also depend on the same files that, if distributed, will require many
> filesystem operations?
> Having the files distributed will be significantly simpler while also
> avoiding the file size issue that Dave Martin exposed.

The remaining filesystem operations will be assigning or removing
shared counter assignments in the applicable domains, which would
normally correspond to mkdir/rmdir of groups or changing their CPU
affinity. The shared assignments are more "program and forget", while
the exclusive assignment approach requires updates for every counter
(in every domain) every few seconds to cover a large number of groups.

When they want to pay extra attention to a particular group, I expect
they'll ask for exclusive counters and leave them assigned for a while
as they collect extra data.

-Peter
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 11 months, 3 weeks ago
Hi Peter,

On 2/20/25 6:53 AM, Peter Newman wrote:
> Hi Reinette,
> 
> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>>
>> Hi Peter,
>>
>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>> Hi Reinette,
>>>
>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>> <reinette.chatre@intel.com> wrote:
>>>>
>>>> Hi Peter,
>>>>
>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>> Hi Reinette,
>>>>>
>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>
>>>>>> Hi Babu,
>>>>>>
>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>
>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>
>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>
>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>
>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>
>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>> <value>
>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>
>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>
>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>> is low enough to be of concern.
>>>>>
>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>> investigation, I would question whether they know what they're looking
>>>>> for.
>>>>
>>>> The key here is "so far" as well as the focus on MBM only.
>>>>
>>>> It is impossible for me to predict what we will see in a couple of years
>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>> that he is working on patches [1] that will add new events and shared the idea
>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>> customers.
>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>
>>> I was thinking of the letters as representing a reusable, user-defined
>>> event-set for applying to a single counter rather than as individual
>>> events, since MPAM and ABMC allow us to choose the set of events each
>>> one counts. Wherever we define the letters, we could use more symbolic
>>> event names.
>>
>> Thank you for clarifying.
>>
>>>
>>> In the letters as events model, choosing the events assigned to a
>>> group wouldn't be enough information, since we would want to control
>>> which events should share a counter and which should be counted by
>>> separate counters. I think the amount of information that would need
>>> to be encoded into mbm_assign_control to represent the level of
>>> configurability supported by hardware would quickly get out of hand.
>>>
>>> Maybe as an example, one counter for all reads, one counter for all
>>> writes in ABMC would look like...
>>>
>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>
>>> (per domain)
>>> group 0:
>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
>>> group 1:
>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
>>> ...
>>>
>>
>> I think this may also be what Dave was heading towards in [2] but in that
>> example and above the counter configuration appears to be global. You do mention
>> "configurability supported by hardware" so I wonder if per-domain counter
>> configuration is a requirement?
> 
> If it's global and we want a particular group to be watched by more
> counters, I wouldn't want this to result in allocating more counters
> for that group in all domains, or allocating counters in domains where
> they're not needed. I want to encourage my users to avoid allocating
> monitoring resources in domains where a job is not allowed to run so
> there's less pressure on the counters.
> 
> In Dave's proposal it looks like global configuration means
> globally-defined "named counter configurations", which works because
> it's really per-domain assignment of the configurations to however
> many counters the group needs in each domain.

I think I am becoming lost. Would a global configuration not break your
view of "event-set applied to a single counter"? If a counter is configured
globally then it would not make it possible to support the full configurability
of the hardware. 
Before I add more confusion, let me try with an example that builds on your
earlier example copied below:

>>> (per domain)
>>> group 0:
>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
>>> group 1:
>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
>>> ...

Since the above states "per domain" I rewrite the example to highlight that as
I understand it:

group 0:
 domain 0:
  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
  counter 1: VictimBW,LclNTWr,RmtNTWr
 domain 1:
  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
  counter 1: VictimBW,LclNTWr,RmtNTWr
group 1:
 domain 0:
  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
  counter 3: VictimBW,LclNTWr,RmtNTWr
 domain 1:
  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
  counter 3: VictimBW,LclNTWr,RmtNTWr

You mention that you do not want counters to be allocated in domains that they
are not needed in. So, let's say group 0 does not need counter 0 and counter 1
in domain 1, resulting in:

group 0:
 domain 0:
  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
  counter 1: VictimBW,LclNTWr,RmtNTWr
group 1:
 domain 0:
  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
  counter 3: VictimBW,LclNTWr,RmtNTWr
 domain 1:
  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
  counter 3: VictimBW,LclNTWr,RmtNTWr

With counter 0 and counter 1 available in domain 1, these counters could
theoretically be configured to give group 1 more data in domain 1:

group 0:
 domain 0:
  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
  counter 1: VictimBW,LclNTWr,RmtNTWr
group 1:
 domain 0:
  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
  counter 3: VictimBW,LclNTWr,RmtNTWr
 domain 1:
  counter 0: LclFill,RmtFill
  counter 1: LclNTWr,RmtNTWr
  counter 2: LclSlowFill,RmtSlowFill
  counter 3: VictimBW

The counters are shown with different per-domain configurations that seems to
match with earlier goals of (a) choose events counted by each counter and
(b) do not allocate counters in domains where they are not needed. As I
understand the above does contradict global counter configuration though.
Or do you mean that only the *name* of the counter is global and then
that it is reconfigured as part of every assignment?

>> Until now I viewed counter configuration separate from counter assignment,
>> similar to how AMD's counters can be configured via mbm_total_bytes_config and
>> mbm_local_bytes_config before they are assigned. That is still per-domain
>> counter configuration though, not per-counter.
>>
>>> I assume packing all of this info for a group's desired counter
>>> configuration into a single line (with 32 domains per line on many
>>> dual-socket AMD configurations I see) would be difficult to look at,
>>> even if we could settle on a single letter to represent each
>>> universally.
>>>
>>>>
>>>> My goal is for resctrl to have a user interface that can as much as possible
>>>> be ready for whatever may be required from it years down the line. Of course,
>>>> I may be wrong and resctrl would never need to support more than 26 events per
>>>> resource (*). The risk is that resctrl *may* need to support more than 26 events
>>>> and how could resctrl support that?
>>>>
>>>> What is the risk of supporting more than 26 events? As I highlighted earlier
>>>> the interface I used as demonstration may become unwieldy to parse on a system
>>>> with many domains that supports many events. This is a concern for me. Any suggestions
>>>> will be appreciated, especially from you since I know that you are very familiar with
>>>> issues related to large scale use of resctrl interfaces.
>>>
>>> It's mainly just the unwieldiness of all the information in one file.
>>> It's already at the limit of what I can visually look through.
>>
>> I agree.
>>
>>>
>>> I believe that shared assignments will take care of all the
>>> high-frequency and performance-intensive batch configuration updates I
>>> was originally concerned about, so I no longer see much benefit in
>>> finding ways to textually encode all this information in a single file
>>> when it would be more manageable to distribute it around the
>>> filesystem hierarchy.
>>
>> This is significant. The motivation for the single file was to support
>> the "high-frequency and performance-intensive" usage. Would "shared assignments"
>> not also depend on the same files that, if distributed, will require many
>> filesystem operations?
>> Having the files distributed will be significantly simpler while also
>> avoiding the file size issue that Dave Martin exposed.
> 
> The remaining filesystem operations will be assigning or removing
> shared counter assignments in the applicable domains, which would
> normally correspond to mkdir/rmdir of groups or changing their CPU
> affinity. The shared assignments are more "program and forget", while
> the exclusive assignment approach requires updates for every counter
> (in every domain) every few seconds to cover a large number of groups.
> 
> When they want to pay extra attention to a particular group, I expect
> they'll ask for exclusive counters and leave them assigned for a while
> as they collect extra data.

The single file approach is already unwieldy. The demands that will be
placed on it to support the usages currently being discussed would make this
interface even harder to use and manage. If the single file is not required 
then I think we should go back to smaller files distributed in resctrl.
This may not even be an either/or argument. One way to view mbm_assign_control
could be as a way for user to interact with the distributed counter
related files with a single file system operation. Although, without
knowing how counter configuration is expected to work this remains unclear.

Reinette


Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 11 months, 3 weeks ago
Hi Reinette,

On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Peter,
>
> On 2/20/25 6:53 AM, Peter Newman wrote:
> > Hi Reinette,
> >
> > On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
> > <reinette.chatre@intel.com> wrote:
> >>
> >> Hi Peter,
> >>
> >> On 2/19/25 3:28 AM, Peter Newman wrote:
> >>> Hi Reinette,
> >>>
> >>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
> >>> <reinette.chatre@intel.com> wrote:
> >>>>
> >>>> Hi Peter,
> >>>>
> >>>> On 2/17/25 2:26 AM, Peter Newman wrote:
> >>>>> Hi Reinette,
> >>>>>
> >>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
> >>>>> <reinette.chatre@intel.com> wrote:
> >>>>>>
> >>>>>> Hi Babu,
> >>>>>>
> >>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
> >>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
> >>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
> >>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
> >>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
> >>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
> >>>>>>
> >>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
> >>>>>>
> >>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
> >>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
> >>>>>>>>>> Please help me understand if you see it differently.
> >>>>>>>>>>
> >>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
> >>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
> >>>>>>>>>>
> >>>>>>>>>> mbm_local_read_bytes a
> >>>>>>>>>> mbm_local_write_bytes b
> >>>>>>>>>>
> >>>>>>>>>> Then mbm_assign_control can be used as:
> >>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
> >>>>>>>>>> <value>
> >>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> >>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
> >>>>>>>>>>
> >>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
> >>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
> >>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
> >>>>>>
> >>>>>> As mentioned above, one possible issue with existing interface is that
> >>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
> >>>>>> is low enough to be of concern.
> >>>>>
> >>>>> The events which can be monitored by a single counter on ABMC and MPAM
> >>>>> so far are combinable, so 26 counters per group today means it limits
> >>>>> breaking down MBM traffic for each group 26 ways. If a user complained
> >>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
> >>>>> investigation, I would question whether they know what they're looking
> >>>>> for.
> >>>>
> >>>> The key here is "so far" as well as the focus on MBM only.
> >>>>
> >>>> It is impossible for me to predict what we will see in a couple of years
> >>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
> >>>> to support their users. Just looking at the Intel RDT spec the event register
> >>>> has space for 32 events for each "CPU agent" resource. That does not take into
> >>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
> >>>> that he is working on patches [1] that will add new events and shared the idea
> >>>> that we may be trending to support "perf" like events associated with RMID. I
> >>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
> >>>> customers.
> >>>> This all makes me think that resctrl should be ready to support more events than 26.
> >>>
> >>> I was thinking of the letters as representing a reusable, user-defined
> >>> event-set for applying to a single counter rather than as individual
> >>> events, since MPAM and ABMC allow us to choose the set of events each
> >>> one counts. Wherever we define the letters, we could use more symbolic
> >>> event names.
> >>
> >> Thank you for clarifying.
> >>
> >>>
> >>> In the letters as events model, choosing the events assigned to a
> >>> group wouldn't be enough information, since we would want to control
> >>> which events should share a counter and which should be counted by
> >>> separate counters. I think the amount of information that would need
> >>> to be encoded into mbm_assign_control to represent the level of
> >>> configurability supported by hardware would quickly get out of hand.
> >>>
> >>> Maybe as an example, one counter for all reads, one counter for all
> >>> writes in ABMC would look like...
> >>>
> >>> (L3_QOS_ABMC_CFG.BwType field names below)
> >>>
> >>> (per domain)
> >>> group 0:
> >>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>  counter 1: VictimBW,LclNTWr,RmtNTWr
> >>> group 1:
> >>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>  counter 3: VictimBW,LclNTWr,RmtNTWr
> >>> ...
> >>>
> >>
> >> I think this may also be what Dave was heading towards in [2] but in that
> >> example and above the counter configuration appears to be global. You do mention
> >> "configurability supported by hardware" so I wonder if per-domain counter
> >> configuration is a requirement?
> >
> > If it's global and we want a particular group to be watched by more
> > counters, I wouldn't want this to result in allocating more counters
> > for that group in all domains, or allocating counters in domains where
> > they're not needed. I want to encourage my users to avoid allocating
> > monitoring resources in domains where a job is not allowed to run so
> > there's less pressure on the counters.
> >
> > In Dave's proposal it looks like global configuration means
> > globally-defined "named counter configurations", which works because
> > it's really per-domain assignment of the configurations to however
> > many counters the group needs in each domain.
>
> I think I am becoming lost. Would a global configuration not break your
> view of "event-set applied to a single counter"? If a counter is configured
> globally then it would not make it possible to support the full configurability
> of the hardware.
> Before I add more confusion, let me try with an example that builds on your
> earlier example copied below:
>
> >>> (per domain)
> >>> group 0:
> >>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>  counter 1: VictimBW,LclNTWr,RmtNTWr
> >>> group 1:
> >>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>  counter 3: VictimBW,LclNTWr,RmtNTWr
> >>> ...
>
> Since the above states "per domain" I rewrite the example to highlight that as
> I understand it:
>
> group 0:
>  domain 0:
>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>   counter 1: VictimBW,LclNTWr,RmtNTWr
>  domain 1:
>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>   counter 1: VictimBW,LclNTWr,RmtNTWr
> group 1:
>  domain 0:
>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>   counter 3: VictimBW,LclNTWr,RmtNTWr
>  domain 1:
>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>   counter 3: VictimBW,LclNTWr,RmtNTWr
>
> You mention that you do not want counters to be allocated in domains that they
> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
> in domain 1, resulting in:
>
> group 0:
>  domain 0:
>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>   counter 1: VictimBW,LclNTWr,RmtNTWr
> group 1:
>  domain 0:
>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>   counter 3: VictimBW,LclNTWr,RmtNTWr
>  domain 1:
>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>   counter 3: VictimBW,LclNTWr,RmtNTWr
>
> With counter 0 and counter 1 available in domain 1, these counters could
> theoretically be configured to give group 1 more data in domain 1:
>
> group 0:
>  domain 0:
>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>   counter 1: VictimBW,LclNTWr,RmtNTWr
> group 1:
>  domain 0:
>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>   counter 3: VictimBW,LclNTWr,RmtNTWr
>  domain 1:
>   counter 0: LclFill,RmtFill
>   counter 1: LclNTWr,RmtNTWr
>   counter 2: LclSlowFill,RmtSlowFill
>   counter 3: VictimBW
>
> The counters are shown with different per-domain configurations that seems to
> match with earlier goals of (a) choose events counted by each counter and
> (b) do not allocate counters in domains where they are not needed. As I
> understand the above does contradict global counter configuration though.
> Or do you mean that only the *name* of the counter is global and then
> that it is reconfigured as part of every assignment?

Yes, I meant only the *name* is global. I assume based on a particular
system configuration, the user will settle on a handful of useful
groupings to count.

Perhaps mbm_assign_control syntax is the clearest way to express an example...

 # define global configurations (in ABMC terms), not necessarily in this
 # syntax and probably not in the mbm_assign_control file.

 r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
 w=VictimBW,LclNTWr,RmtNTWr

 # legacy "total" configuration, effectively r+w
 t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr

 /group0/0=t;1=t
 /group1/0=t;1=t
 /group2/0=_;1=t
 /group3/0=rw;1=_

- group2 is restricted to domain 0
- group3 is restricted to domain 1
- the rest are unrestricted
- In group3, we decided we need to separate read and write traffic

This consumes 4 counters in domain 0 and 3 counters in domain 1.

>
> >> Until now I viewed counter configuration separate from counter assignment,
> >> similar to how AMD's counters can be configured via mbm_total_bytes_config and
> >> mbm_local_bytes_config before they are assigned. That is still per-domain
> >> counter configuration though, not per-counter.
> >>
> >>> I assume packing all of this info for a group's desired counter
> >>> configuration into a single line (with 32 domains per line on many
> >>> dual-socket AMD configurations I see) would be difficult to look at,
> >>> even if we could settle on a single letter to represent each
> >>> universally.
> >>>
> >>>>
> >>>> My goal is for resctrl to have a user interface that can as much as possible
> >>>> be ready for whatever may be required from it years down the line. Of course,
> >>>> I may be wrong and resctrl would never need to support more than 26 events per
> >>>> resource (*). The risk is that resctrl *may* need to support more than 26 events
> >>>> and how could resctrl support that?
> >>>>
> >>>> What is the risk of supporting more than 26 events? As I highlighted earlier
> >>>> the interface I used as demonstration may become unwieldy to parse on a system
> >>>> with many domains that supports many events. This is a concern for me. Any suggestions
> >>>> will be appreciated, especially from you since I know that you are very familiar with
> >>>> issues related to large scale use of resctrl interfaces.
> >>>
> >>> It's mainly just the unwieldiness of all the information in one file.
> >>> It's already at the limit of what I can visually look through.
> >>
> >> I agree.
> >>
> >>>
> >>> I believe that shared assignments will take care of all the
> >>> high-frequency and performance-intensive batch configuration updates I
> >>> was originally concerned about, so I no longer see much benefit in
> >>> finding ways to textually encode all this information in a single file
> >>> when it would be more manageable to distribute it around the
> >>> filesystem hierarchy.
> >>
> >> This is significant. The motivation for the single file was to support
> >> the "high-frequency and performance-intensive" usage. Would "shared assignments"
> >> not also depend on the same files that, if distributed, will require many
> >> filesystem operations?
> >> Having the files distributed will be significantly simpler while also
> >> avoiding the file size issue that Dave Martin exposed.
> >
> > The remaining filesystem operations will be assigning or removing
> > shared counter assignments in the applicable domains, which would
> > normally correspond to mkdir/rmdir of groups or changing their CPU
> > affinity. The shared assignments are more "program and forget", while
> > the exclusive assignment approach requires updates for every counter
> > (in every domain) every few seconds to cover a large number of groups.
> >
> > When they want to pay extra attention to a particular group, I expect
> > they'll ask for exclusive counters and leave them assigned for a while
> > as they collect extra data.
>
> The single file approach is already unwieldy. The demands that will be
> placed on it to support the usages currently being discussed would make this
> interface even harder to use and manage. If the single file is not required
> then I think we should go back to smaller files distributed in resctrl.
> This may not even be an either/or argument. One way to view mbm_assign_control
> could be as a way for user to interact with the distributed counter
> related files with a single file system operation. Although, without
> knowing how counter configuration is expected to work this remains unclear.

If we do both interfaces and the multi-file model gives us more
capability to express configurations, we could find situations where
there are configurations we cannot represent when reading back from
mbm_assign_control, or updates through mbm_assign_control have
ambiguous effects on existing configurations which were created with
other files.

However, the example I gave above seems to be adequately represented
by a minor extension to mbm_assign_control and we all seem to
understand it now, so maybe it's not broken yet. It's unfortunate that
work went into a requirement that's no longer relevant, but I don't
think that on its own is a blocker.

-Peter
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 11 months, 3 weeks ago
Hi Peter,

On 2/21/25 5:12 AM, Peter Newman wrote:
> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>> <reinette.chatre@intel.com> wrote:
>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>> <reinette.chatre@intel.com> wrote:
>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>
>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>
>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>
>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>
>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>
>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>> <value>
>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>
>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>
>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>> is low enough to be of concern.
>>>>>>>
>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>> for.
>>>>>>
>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>
>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>> customers.
>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>
>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>> event-set for applying to a single counter rather than as individual
>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>> event names.
>>>>
>>>> Thank you for clarifying.
>>>>
>>>>>
>>>>> In the letters as events model, choosing the events assigned to a
>>>>> group wouldn't be enough information, since we would want to control
>>>>> which events should share a counter and which should be counted by
>>>>> separate counters. I think the amount of information that would need
>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>
>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>> writes in ABMC would look like...
>>>>>
>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>
>>>>> (per domain)
>>>>> group 0:
>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>> group 1:
>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>> ...
>>>>>
>>>>
>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>> example and above the counter configuration appears to be global. You do mention
>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>> configuration is a requirement?
>>>
>>> If it's global and we want a particular group to be watched by more
>>> counters, I wouldn't want this to result in allocating more counters
>>> for that group in all domains, or allocating counters in domains where
>>> they're not needed. I want to encourage my users to avoid allocating
>>> monitoring resources in domains where a job is not allowed to run so
>>> there's less pressure on the counters.
>>>
>>> In Dave's proposal it looks like global configuration means
>>> globally-defined "named counter configurations", which works because
>>> it's really per-domain assignment of the configurations to however
>>> many counters the group needs in each domain.
>>
>> I think I am becoming lost. Would a global configuration not break your
>> view of "event-set applied to a single counter"? If a counter is configured
>> globally then it would not make it possible to support the full configurability
>> of the hardware.
>> Before I add more confusion, let me try with an example that builds on your
>> earlier example copied below:
>>
>>>>> (per domain)
>>>>> group 0:
>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>> group 1:
>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>> ...
>>
>> Since the above states "per domain" I rewrite the example to highlight that as
>> I understand it:
>>
>> group 0:
>>  domain 0:
>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>  domain 1:
>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>> group 1:
>>  domain 0:
>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>  domain 1:
>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>
>> You mention that you do not want counters to be allocated in domains that they
>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>> in domain 1, resulting in:
>>
>> group 0:
>>  domain 0:
>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>> group 1:
>>  domain 0:
>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>  domain 1:
>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>
>> With counter 0 and counter 1 available in domain 1, these counters could
>> theoretically be configured to give group 1 more data in domain 1:
>>
>> group 0:
>>  domain 0:
>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>> group 1:
>>  domain 0:
>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>  domain 1:
>>   counter 0: LclFill,RmtFill
>>   counter 1: LclNTWr,RmtNTWr
>>   counter 2: LclSlowFill,RmtSlowFill
>>   counter 3: VictimBW
>>
>> The counters are shown with different per-domain configurations that seems to
>> match with earlier goals of (a) choose events counted by each counter and
>> (b) do not allocate counters in domains where they are not needed. As I
>> understand the above does contradict global counter configuration though.
>> Or do you mean that only the *name* of the counter is global and then
>> that it is reconfigured as part of every assignment?
> 
> Yes, I meant only the *name* is global. I assume based on a particular
> system configuration, the user will settle on a handful of useful
> groupings to count.
> 
> Perhaps mbm_assign_control syntax is the clearest way to express an example...
> 
>  # define global configurations (in ABMC terms), not necessarily in this
>  # syntax and probably not in the mbm_assign_control file.
> 
>  r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>  w=VictimBW,LclNTWr,RmtNTWr
> 
>  # legacy "total" configuration, effectively r+w
>  t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
> 
>  /group0/0=t;1=t
>  /group1/0=t;1=t
>  /group2/0=_;1=t
>  /group3/0=rw;1=_
> 
> - group2 is restricted to domain 0
> - group3 is restricted to domain 1
> - the rest are unrestricted
> - In group3, we decided we need to separate read and write traffic
> 
> This consumes 4 counters in domain 0 and 3 counters in domain 1.
> 

I see. Thank you for the example.

resctrl supports per-domain configurations with the following possible when
using mbm_total_bytes_config and mbm_local_bytes_config:

t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr

   /group0/0=t;1=t
   /group1/0=t;1=t

Even though the flags are identical in all domains, the assigned counters will
be configured differently in each domain.

With this supported by hardware and currently also supported by resctrl it seems
reasonable to carry this forward to what will be supported next.

>>
>>>> Until now I viewed counter configuration separate from counter assignment,
>>>> similar to how AMD's counters can be configured via mbm_total_bytes_config and
>>>> mbm_local_bytes_config before they are assigned. That is still per-domain
>>>> counter configuration though, not per-counter.
>>>>
>>>>> I assume packing all of this info for a group's desired counter
>>>>> configuration into a single line (with 32 domains per line on many
>>>>> dual-socket AMD configurations I see) would be difficult to look at,
>>>>> even if we could settle on a single letter to represent each
>>>>> universally.
>>>>>
>>>>>>
>>>>>> My goal is for resctrl to have a user interface that can as much as possible
>>>>>> be ready for whatever may be required from it years down the line. Of course,
>>>>>> I may be wrong and resctrl would never need to support more than 26 events per
>>>>>> resource (*). The risk is that resctrl *may* need to support more than 26 events
>>>>>> and how could resctrl support that?
>>>>>>
>>>>>> What is the risk of supporting more than 26 events? As I highlighted earlier
>>>>>> the interface I used as demonstration may become unwieldy to parse on a system
>>>>>> with many domains that supports many events. This is a concern for me. Any suggestions
>>>>>> will be appreciated, especially from you since I know that you are very familiar with
>>>>>> issues related to large scale use of resctrl interfaces.
>>>>>
>>>>> It's mainly just the unwieldiness of all the information in one file.
>>>>> It's already at the limit of what I can visually look through.
>>>>
>>>> I agree.
>>>>
>>>>>
>>>>> I believe that shared assignments will take care of all the
>>>>> high-frequency and performance-intensive batch configuration updates I
>>>>> was originally concerned about, so I no longer see much benefit in
>>>>> finding ways to textually encode all this information in a single file
>>>>> when it would be more manageable to distribute it around the
>>>>> filesystem hierarchy.
>>>>
>>>> This is significant. The motivation for the single file was to support
>>>> the "high-frequency and performance-intensive" usage. Would "shared assignments"
>>>> not also depend on the same files that, if distributed, will require many
>>>> filesystem operations?
>>>> Having the files distributed will be significantly simpler while also
>>>> avoiding the file size issue that Dave Martin exposed.
>>>
>>> The remaining filesystem operations will be assigning or removing
>>> shared counter assignments in the applicable domains, which would
>>> normally correspond to mkdir/rmdir of groups or changing their CPU
>>> affinity. The shared assignments are more "program and forget", while
>>> the exclusive assignment approach requires updates for every counter
>>> (in every domain) every few seconds to cover a large number of groups.
>>>
>>> When they want to pay extra attention to a particular group, I expect
>>> they'll ask for exclusive counters and leave them assigned for a while
>>> as they collect extra data.
>>
>> The single file approach is already unwieldy. The demands that will be
>> placed on it to support the usages currently being discussed would make this
>> interface even harder to use and manage. If the single file is not required
>> then I think we should go back to smaller files distributed in resctrl.
>> This may not even be an either/or argument. One way to view mbm_assign_control
>> could be as a way for user to interact with the distributed counter
>> related files with a single file system operation. Although, without
>> knowing how counter configuration is expected to work this remains unclear.
> 
> If we do both interfaces and the multi-file model gives us more
> capability to express configurations, we could find situations where
> there are configurations we cannot represent when reading back from
> mbm_assign_control, or updates through mbm_assign_control have
> ambiguous effects on existing configurations which were created with
> other files.

Right. My assumption was that the syntax would be identical.

> 
> However, the example I gave above seems to be adequately represented
> by a minor extension to mbm_assign_control and we all seem to

To confirm what you mean with "minor extension to mbm_assign_control",
is this where the flags are associated with counter configurations? At this
time this is done separately from mbm_assign_control with the hardcoded "t"
and "l" flags configured via mbm_total_bytes_config and mbm_local_bytes
respectively. I think it would be simpler to keep these configurations
separate from mbm_assign_control. How it would look without better
understanding of MPAM is not clear to me at this time, unless if the
requirement is to enhance support for ABMC and BMEC. I do see that
this can be added later to build on what is supported by mbm_assign_control
with the syntax in this version.

> understand it now, so maybe it's not broken yet. It's unfortunate that
> work went into a requirement that's no longer relevant, but I don't
> think that on its own is a blocker.

I understand that requirements may change as we get new information.
Digesting it now is significantly easier than trying to adapt after
the user interface is merged and essentially set in stone.

Reinette

Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 11 months, 2 weeks ago
Hi Reinette,

On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Peter,
>
> On 2/21/25 5:12 AM, Peter Newman wrote:
> > On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
> > <reinette.chatre@intel.com> wrote:
> >> On 2/20/25 6:53 AM, Peter Newman wrote:
> >>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
> >>> <reinette.chatre@intel.com> wrote:
> >>>> On 2/19/25 3:28 AM, Peter Newman wrote:
> >>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
> >>>>> <reinette.chatre@intel.com> wrote:
> >>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
> >>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
> >>>>>>> <reinette.chatre@intel.com> wrote:
> >>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
> >>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
> >>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
> >>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
> >>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
> >>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
> >>>>>>>>
> >>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
> >>>>>>>>
> >>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
> >>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
> >>>>>>>>>>>> Please help me understand if you see it differently.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
> >>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
> >>>>>>>>>>>>
> >>>>>>>>>>>> mbm_local_read_bytes a
> >>>>>>>>>>>> mbm_local_write_bytes b
> >>>>>>>>>>>>
> >>>>>>>>>>>> Then mbm_assign_control can be used as:
> >>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
> >>>>>>>>>>>> <value>
> >>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> >>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
> >>>>>>>>>>>>
> >>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
> >>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
> >>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
> >>>>>>>>
> >>>>>>>> As mentioned above, one possible issue with existing interface is that
> >>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
> >>>>>>>> is low enough to be of concern.
> >>>>>>>
> >>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
> >>>>>>> so far are combinable, so 26 counters per group today means it limits
> >>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
> >>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
> >>>>>>> investigation, I would question whether they know what they're looking
> >>>>>>> for.
> >>>>>>
> >>>>>> The key here is "so far" as well as the focus on MBM only.
> >>>>>>
> >>>>>> It is impossible for me to predict what we will see in a couple of years
> >>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
> >>>>>> to support their users. Just looking at the Intel RDT spec the event register
> >>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
> >>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
> >>>>>> that he is working on patches [1] that will add new events and shared the idea
> >>>>>> that we may be trending to support "perf" like events associated with RMID. I
> >>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
> >>>>>> customers.
> >>>>>> This all makes me think that resctrl should be ready to support more events than 26.
> >>>>>
> >>>>> I was thinking of the letters as representing a reusable, user-defined
> >>>>> event-set for applying to a single counter rather than as individual
> >>>>> events, since MPAM and ABMC allow us to choose the set of events each
> >>>>> one counts. Wherever we define the letters, we could use more symbolic
> >>>>> event names.
> >>>>
> >>>> Thank you for clarifying.
> >>>>
> >>>>>
> >>>>> In the letters as events model, choosing the events assigned to a
> >>>>> group wouldn't be enough information, since we would want to control
> >>>>> which events should share a counter and which should be counted by
> >>>>> separate counters. I think the amount of information that would need
> >>>>> to be encoded into mbm_assign_control to represent the level of
> >>>>> configurability supported by hardware would quickly get out of hand.
> >>>>>
> >>>>> Maybe as an example, one counter for all reads, one counter for all
> >>>>> writes in ABMC would look like...
> >>>>>
> >>>>> (L3_QOS_ABMC_CFG.BwType field names below)
> >>>>>
> >>>>> (per domain)
> >>>>> group 0:
> >>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>> group 1:
> >>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>> ...
> >>>>>
> >>>>
> >>>> I think this may also be what Dave was heading towards in [2] but in that
> >>>> example and above the counter configuration appears to be global. You do mention
> >>>> "configurability supported by hardware" so I wonder if per-domain counter
> >>>> configuration is a requirement?
> >>>
> >>> If it's global and we want a particular group to be watched by more
> >>> counters, I wouldn't want this to result in allocating more counters
> >>> for that group in all domains, or allocating counters in domains where
> >>> they're not needed. I want to encourage my users to avoid allocating
> >>> monitoring resources in domains where a job is not allowed to run so
> >>> there's less pressure on the counters.
> >>>
> >>> In Dave's proposal it looks like global configuration means
> >>> globally-defined "named counter configurations", which works because
> >>> it's really per-domain assignment of the configurations to however
> >>> many counters the group needs in each domain.
> >>
> >> I think I am becoming lost. Would a global configuration not break your
> >> view of "event-set applied to a single counter"? If a counter is configured
> >> globally then it would not make it possible to support the full configurability
> >> of the hardware.
> >> Before I add more confusion, let me try with an example that builds on your
> >> earlier example copied below:
> >>
> >>>>> (per domain)
> >>>>> group 0:
> >>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>> group 1:
> >>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>> ...
> >>
> >> Since the above states "per domain" I rewrite the example to highlight that as
> >> I understand it:
> >>
> >> group 0:
> >>  domain 0:
> >>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>   counter 1: VictimBW,LclNTWr,RmtNTWr
> >>  domain 1:
> >>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>   counter 1: VictimBW,LclNTWr,RmtNTWr
> >> group 1:
> >>  domain 0:
> >>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>   counter 3: VictimBW,LclNTWr,RmtNTWr
> >>  domain 1:
> >>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>   counter 3: VictimBW,LclNTWr,RmtNTWr
> >>
> >> You mention that you do not want counters to be allocated in domains that they
> >> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
> >> in domain 1, resulting in:
> >>
> >> group 0:
> >>  domain 0:
> >>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>   counter 1: VictimBW,LclNTWr,RmtNTWr
> >> group 1:
> >>  domain 0:
> >>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>   counter 3: VictimBW,LclNTWr,RmtNTWr
> >>  domain 1:
> >>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>   counter 3: VictimBW,LclNTWr,RmtNTWr
> >>
> >> With counter 0 and counter 1 available in domain 1, these counters could
> >> theoretically be configured to give group 1 more data in domain 1:
> >>
> >> group 0:
> >>  domain 0:
> >>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>   counter 1: VictimBW,LclNTWr,RmtNTWr
> >> group 1:
> >>  domain 0:
> >>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>   counter 3: VictimBW,LclNTWr,RmtNTWr
> >>  domain 1:
> >>   counter 0: LclFill,RmtFill
> >>   counter 1: LclNTWr,RmtNTWr
> >>   counter 2: LclSlowFill,RmtSlowFill
> >>   counter 3: VictimBW
> >>
> >> The counters are shown with different per-domain configurations that seems to
> >> match with earlier goals of (a) choose events counted by each counter and
> >> (b) do not allocate counters in domains where they are not needed. As I
> >> understand the above does contradict global counter configuration though.
> >> Or do you mean that only the *name* of the counter is global and then
> >> that it is reconfigured as part of every assignment?
> >
> > Yes, I meant only the *name* is global. I assume based on a particular
> > system configuration, the user will settle on a handful of useful
> > groupings to count.
> >
> > Perhaps mbm_assign_control syntax is the clearest way to express an example...
> >
> >  # define global configurations (in ABMC terms), not necessarily in this
> >  # syntax and probably not in the mbm_assign_control file.
> >
> >  r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >  w=VictimBW,LclNTWr,RmtNTWr
> >
> >  # legacy "total" configuration, effectively r+w
> >  t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
> >
> >  /group0/0=t;1=t
> >  /group1/0=t;1=t
> >  /group2/0=_;1=t
> >  /group3/0=rw;1=_
> >
> > - group2 is restricted to domain 0
> > - group3 is restricted to domain 1
> > - the rest are unrestricted
> > - In group3, we decided we need to separate read and write traffic
> >
> > This consumes 4 counters in domain 0 and 3 counters in domain 1.
> >
>
> I see. Thank you for the example.
>
> resctrl supports per-domain configurations with the following possible when
> using mbm_total_bytes_config and mbm_local_bytes_config:
>
> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>
>    /group0/0=t;1=t
>    /group1/0=t;1=t
>
> Even though the flags are identical in all domains, the assigned counters will
> be configured differently in each domain.
>
> With this supported by hardware and currently also supported by resctrl it seems
> reasonable to carry this forward to what will be supported next.

The hardware supports both a per-domain mode, where all groups in a
domain use the same configurations and are limited to two events per
group and a per-group mode where every group can be configured and
assigned freely. This series is using the legacy counter access mode
where only counters whose BwType matches an instance of QOS_EVT_CFG_n
in the domain can be read. If we chose to read the assigned counter
directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
rather than asking the hardware to find the counter by RMID, we would
not be limited to 2 counters per group/domain and the hardware would
have the same flexibility as on MPAM.

(I might have said something confusing in my last messages because I
had forgotten that I switched to the extended assignment mode when
prototyping with soft-ABMC and MPAM.)

Forcing all groups on a domain to share the same 2 counter
configurations would not be acceptable for us, as the example I gave
earlier is one I've already been asked about.

I'm worried about requiring support for domain-level
mbm_total_bytes_config and mbm_local_bytes_config files to be carried
forward, because this conflicts with the configuration being per
group/domain. (i.e., what would be read back from the domain files if
per-group customizations have already been applied?)

>
> >>
> >>>> Until now I viewed counter configuration separate from counter assignment,
> >>>> similar to how AMD's counters can be configured via mbm_total_bytes_config and
> >>>> mbm_local_bytes_config before they are assigned. That is still per-domain
> >>>> counter configuration though, not per-counter.
> >>>>
> >>>>> I assume packing all of this info for a group's desired counter
> >>>>> configuration into a single line (with 32 domains per line on many
> >>>>> dual-socket AMD configurations I see) would be difficult to look at,
> >>>>> even if we could settle on a single letter to represent each
> >>>>> universally.
> >>>>>
> >>>>>>
> >>>>>> My goal is for resctrl to have a user interface that can as much as possible
> >>>>>> be ready for whatever may be required from it years down the line. Of course,
> >>>>>> I may be wrong and resctrl would never need to support more than 26 events per
> >>>>>> resource (*). The risk is that resctrl *may* need to support more than 26 events
> >>>>>> and how could resctrl support that?
> >>>>>>
> >>>>>> What is the risk of supporting more than 26 events? As I highlighted earlier
> >>>>>> the interface I used as demonstration may become unwieldy to parse on a system
> >>>>>> with many domains that supports many events. This is a concern for me. Any suggestions
> >>>>>> will be appreciated, especially from you since I know that you are very familiar with
> >>>>>> issues related to large scale use of resctrl interfaces.
> >>>>>
> >>>>> It's mainly just the unwieldiness of all the information in one file.
> >>>>> It's already at the limit of what I can visually look through.
> >>>>
> >>>> I agree.
> >>>>
> >>>>>
> >>>>> I believe that shared assignments will take care of all the
> >>>>> high-frequency and performance-intensive batch configuration updates I
> >>>>> was originally concerned about, so I no longer see much benefit in
> >>>>> finding ways to textually encode all this information in a single file
> >>>>> when it would be more manageable to distribute it around the
> >>>>> filesystem hierarchy.
> >>>>
> >>>> This is significant. The motivation for the single file was to support
> >>>> the "high-frequency and performance-intensive" usage. Would "shared assignments"
> >>>> not also depend on the same files that, if distributed, will require many
> >>>> filesystem operations?
> >>>> Having the files distributed will be significantly simpler while also
> >>>> avoiding the file size issue that Dave Martin exposed.
> >>>
> >>> The remaining filesystem operations will be assigning or removing
> >>> shared counter assignments in the applicable domains, which would
> >>> normally correspond to mkdir/rmdir of groups or changing their CPU
> >>> affinity. The shared assignments are more "program and forget", while
> >>> the exclusive assignment approach requires updates for every counter
> >>> (in every domain) every few seconds to cover a large number of groups.
> >>>
> >>> When they want to pay extra attention to a particular group, I expect
> >>> they'll ask for exclusive counters and leave them assigned for a while
> >>> as they collect extra data.
> >>
> >> The single file approach is already unwieldy. The demands that will be
> >> placed on it to support the usages currently being discussed would make this
> >> interface even harder to use and manage. If the single file is not required
> >> then I think we should go back to smaller files distributed in resctrl.
> >> This may not even be an either/or argument. One way to view mbm_assign_control
> >> could be as a way for user to interact with the distributed counter
> >> related files with a single file system operation. Although, without
> >> knowing how counter configuration is expected to work this remains unclear.
> >
> > If we do both interfaces and the multi-file model gives us more
> > capability to express configurations, we could find situations where
> > there are configurations we cannot represent when reading back from
> > mbm_assign_control, or updates through mbm_assign_control have
> > ambiguous effects on existing configurations which were created with
> > other files.
>
> Right. My assumption was that the syntax would be identical.
>
> >
> > However, the example I gave above seems to be adequately represented
> > by a minor extension to mbm_assign_control and we all seem to
>
> To confirm what you mean with "minor extension to mbm_assign_control",
> is this where the flags are associated with counter configurations? At this
> time this is done separately from mbm_assign_control with the hardcoded "t"
> and "l" flags configured via mbm_total_bytes_config and mbm_local_bytes
> respectively. I think it would be simpler to keep these configurations
> separate from mbm_assign_control. How it would look without better
> understanding of MPAM is not clear to me at this time, unless if the
> requirement is to enhance support for ABMC and BMEC. I do see that
> this can be added later to build on what is supported by mbm_assign_control
> with the syntax in this version.

As I explained above, I was looking at this from the perspective of
the extended event assignment mode.

Thanks,
-Peter
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 11 months, 2 weeks ago
Hi Peter,

On 2/25/25 9:11 AM, Peter Newman wrote:
> Hi Reinette,
> 
> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>>
>> Hi Peter,
>>
>> On 2/21/25 5:12 AM, Peter Newman wrote:
>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
>>> <reinette.chatre@intel.com> wrote:
>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>>>> <reinette.chatre@intel.com> wrote:
>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>>>
>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>>>
>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>>>> <value>
>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>>>
>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>>>> is low enough to be of concern.
>>>>>>>>>
>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>>>> for.
>>>>>>>>
>>>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>>>
>>>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>>>> customers.
>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>>>
>>>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>>>> event-set for applying to a single counter rather than as individual
>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>>>> event names.
>>>>>>
>>>>>> Thank you for clarifying.
>>>>>>
>>>>>>>
>>>>>>> In the letters as events model, choosing the events assigned to a
>>>>>>> group wouldn't be enough information, since we would want to control
>>>>>>> which events should share a counter and which should be counted by
>>>>>>> separate counters. I think the amount of information that would need
>>>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>>>
>>>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>>>> writes in ABMC would look like...
>>>>>>>
>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>>>
>>>>>>> (per domain)
>>>>>>> group 0:
>>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>> group 1:
>>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>> ...
>>>>>>>
>>>>>>
>>>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>>>> example and above the counter configuration appears to be global. You do mention
>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>>>> configuration is a requirement?
>>>>>
>>>>> If it's global and we want a particular group to be watched by more
>>>>> counters, I wouldn't want this to result in allocating more counters
>>>>> for that group in all domains, or allocating counters in domains where
>>>>> they're not needed. I want to encourage my users to avoid allocating
>>>>> monitoring resources in domains where a job is not allowed to run so
>>>>> there's less pressure on the counters.
>>>>>
>>>>> In Dave's proposal it looks like global configuration means
>>>>> globally-defined "named counter configurations", which works because
>>>>> it's really per-domain assignment of the configurations to however
>>>>> many counters the group needs in each domain.
>>>>
>>>> I think I am becoming lost. Would a global configuration not break your
>>>> view of "event-set applied to a single counter"? If a counter is configured
>>>> globally then it would not make it possible to support the full configurability
>>>> of the hardware.
>>>> Before I add more confusion, let me try with an example that builds on your
>>>> earlier example copied below:
>>>>
>>>>>>> (per domain)
>>>>>>> group 0:
>>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>> group 1:
>>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>> ...
>>>>
>>>> Since the above states "per domain" I rewrite the example to highlight that as
>>>> I understand it:
>>>>
>>>> group 0:
>>>>  domain 0:
>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>  domain 1:
>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>> group 1:
>>>>  domain 0:
>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>  domain 1:
>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>
>>>> You mention that you do not want counters to be allocated in domains that they
>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>>>> in domain 1, resulting in:
>>>>
>>>> group 0:
>>>>  domain 0:
>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>> group 1:
>>>>  domain 0:
>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>  domain 1:
>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>
>>>> With counter 0 and counter 1 available in domain 1, these counters could
>>>> theoretically be configured to give group 1 more data in domain 1:
>>>>
>>>> group 0:
>>>>  domain 0:
>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>> group 1:
>>>>  domain 0:
>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>  domain 1:
>>>>   counter 0: LclFill,RmtFill
>>>>   counter 1: LclNTWr,RmtNTWr
>>>>   counter 2: LclSlowFill,RmtSlowFill
>>>>   counter 3: VictimBW
>>>>
>>>> The counters are shown with different per-domain configurations that seems to
>>>> match with earlier goals of (a) choose events counted by each counter and
>>>> (b) do not allocate counters in domains where they are not needed. As I
>>>> understand the above does contradict global counter configuration though.
>>>> Or do you mean that only the *name* of the counter is global and then
>>>> that it is reconfigured as part of every assignment?
>>>
>>> Yes, I meant only the *name* is global. I assume based on a particular
>>> system configuration, the user will settle on a handful of useful
>>> groupings to count.
>>>
>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
>>>
>>>  # define global configurations (in ABMC terms), not necessarily in this
>>>  # syntax and probably not in the mbm_assign_control file.
>>>
>>>  r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>  w=VictimBW,LclNTWr,RmtNTWr
>>>
>>>  # legacy "total" configuration, effectively r+w
>>>  t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>
>>>  /group0/0=t;1=t
>>>  /group1/0=t;1=t
>>>  /group2/0=_;1=t
>>>  /group3/0=rw;1=_
>>>
>>> - group2 is restricted to domain 0
>>> - group3 is restricted to domain 1
>>> - the rest are unrestricted
>>> - In group3, we decided we need to separate read and write traffic
>>>
>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
>>>
>>
>> I see. Thank you for the example.
>>
>> resctrl supports per-domain configurations with the following possible when
>> using mbm_total_bytes_config and mbm_local_bytes_config:
>>
>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>>
>>    /group0/0=t;1=t
>>    /group1/0=t;1=t
>>
>> Even though the flags are identical in all domains, the assigned counters will
>> be configured differently in each domain.
>>
>> With this supported by hardware and currently also supported by resctrl it seems
>> reasonable to carry this forward to what will be supported next.
> 
> The hardware supports both a per-domain mode, where all groups in a
> domain use the same configurations and are limited to two events per
> group and a per-group mode where every group can be configured and
> assigned freely. This series is using the legacy counter access mode
> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
> in the domain can be read. If we chose to read the assigned counter
> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
> rather than asking the hardware to find the counter by RMID, we would
> not be limited to 2 counters per group/domain and the hardware would
> have the same flexibility as on MPAM.
> 
> (I might have said something confusing in my last messages because I
> had forgotten that I switched to the extended assignment mode when
> prototyping with soft-ABMC and MPAM.)
> 
> Forcing all groups on a domain to share the same 2 counter
> configurations would not be acceptable for us, as the example I gave
> earlier is one I've already been asked about.

I am surprised to hear this at this point of this work. Sounds like
we need to go back a couple of steps to determine how to best support
user requirements that now includes per-group counter assignment.

Have you perhaps looked into how users access the counter data as
part of your prototyping?

Reinette
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 11 months, 2 weeks ago
Hi Peter,

On 2/25/25 11:11, Peter Newman wrote:
> Hi Reinette,
> 
> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>>
>> Hi Peter,
>>
>> On 2/21/25 5:12 AM, Peter Newman wrote:
>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
>>> <reinette.chatre@intel.com> wrote:
>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>>>> <reinette.chatre@intel.com> wrote:
>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>>>
>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>>>
>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>>>> <value>
>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>>>
>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>>>> is low enough to be of concern.
>>>>>>>>>
>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>>>> for.
>>>>>>>>
>>>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>>>
>>>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>>>> customers.
>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>>>
>>>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>>>> event-set for applying to a single counter rather than as individual
>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>>>> event names.
>>>>>>
>>>>>> Thank you for clarifying.
>>>>>>
>>>>>>>
>>>>>>> In the letters as events model, choosing the events assigned to a
>>>>>>> group wouldn't be enough information, since we would want to control
>>>>>>> which events should share a counter and which should be counted by
>>>>>>> separate counters. I think the amount of information that would need
>>>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>>>
>>>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>>>> writes in ABMC would look like...
>>>>>>>
>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>>>
>>>>>>> (per domain)
>>>>>>> group 0:
>>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>> group 1:
>>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>> ...
>>>>>>>
>>>>>>
>>>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>>>> example and above the counter configuration appears to be global. You do mention
>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>>>> configuration is a requirement?
>>>>>
>>>>> If it's global and we want a particular group to be watched by more
>>>>> counters, I wouldn't want this to result in allocating more counters
>>>>> for that group in all domains, or allocating counters in domains where
>>>>> they're not needed. I want to encourage my users to avoid allocating
>>>>> monitoring resources in domains where a job is not allowed to run so
>>>>> there's less pressure on the counters.
>>>>>
>>>>> In Dave's proposal it looks like global configuration means
>>>>> globally-defined "named counter configurations", which works because
>>>>> it's really per-domain assignment of the configurations to however
>>>>> many counters the group needs in each domain.
>>>>
>>>> I think I am becoming lost. Would a global configuration not break your
>>>> view of "event-set applied to a single counter"? If a counter is configured
>>>> globally then it would not make it possible to support the full configurability
>>>> of the hardware.
>>>> Before I add more confusion, let me try with an example that builds on your
>>>> earlier example copied below:
>>>>
>>>>>>> (per domain)
>>>>>>> group 0:
>>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>> group 1:
>>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>> ...
>>>>
>>>> Since the above states "per domain" I rewrite the example to highlight that as
>>>> I understand it:
>>>>
>>>> group 0:
>>>>  domain 0:
>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>  domain 1:
>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>> group 1:
>>>>  domain 0:
>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>  domain 1:
>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>
>>>> You mention that you do not want counters to be allocated in domains that they
>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>>>> in domain 1, resulting in:
>>>>
>>>> group 0:
>>>>  domain 0:
>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>> group 1:
>>>>  domain 0:
>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>  domain 1:
>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>
>>>> With counter 0 and counter 1 available in domain 1, these counters could
>>>> theoretically be configured to give group 1 more data in domain 1:
>>>>
>>>> group 0:
>>>>  domain 0:
>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>> group 1:
>>>>  domain 0:
>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>  domain 1:
>>>>   counter 0: LclFill,RmtFill
>>>>   counter 1: LclNTWr,RmtNTWr
>>>>   counter 2: LclSlowFill,RmtSlowFill
>>>>   counter 3: VictimBW
>>>>
>>>> The counters are shown with different per-domain configurations that seems to
>>>> match with earlier goals of (a) choose events counted by each counter and
>>>> (b) do not allocate counters in domains where they are not needed. As I
>>>> understand the above does contradict global counter configuration though.
>>>> Or do you mean that only the *name* of the counter is global and then
>>>> that it is reconfigured as part of every assignment?
>>>
>>> Yes, I meant only the *name* is global. I assume based on a particular
>>> system configuration, the user will settle on a handful of useful
>>> groupings to count.
>>>
>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
>>>
>>>  # define global configurations (in ABMC terms), not necessarily in this
>>>  # syntax and probably not in the mbm_assign_control file.
>>>
>>>  r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>  w=VictimBW,LclNTWr,RmtNTWr
>>>
>>>  # legacy "total" configuration, effectively r+w
>>>  t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>
>>>  /group0/0=t;1=t
>>>  /group1/0=t;1=t
>>>  /group2/0=_;1=t
>>>  /group3/0=rw;1=_
>>>
>>> - group2 is restricted to domain 0
>>> - group3 is restricted to domain 1
>>> - the rest are unrestricted
>>> - In group3, we decided we need to separate read and write traffic
>>>
>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
>>>
>>
>> I see. Thank you for the example.
>>
>> resctrl supports per-domain configurations with the following possible when
>> using mbm_total_bytes_config and mbm_local_bytes_config:
>>
>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>>
>>    /group0/0=t;1=t
>>    /group1/0=t;1=t
>>
>> Even though the flags are identical in all domains, the assigned counters will
>> be configured differently in each domain.
>>
>> With this supported by hardware and currently also supported by resctrl it seems
>> reasonable to carry this forward to what will be supported next.
> 
> The hardware supports both a per-domain mode, where all groups in a
> domain use the same configurations and are limited to two events per
> group and a per-group mode where every group can be configured and
> assigned freely. This series is using the legacy counter access mode
> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
> in the domain can be read. If we chose to read the assigned counter
> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
> rather than asking the hardware to find the counter by RMID, we would
> not be limited to 2 counters per group/domain and the hardware would
> have the same flexibility as on MPAM.

In extended mode, the contents of a specific counter can be read by
setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
[EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
QM_CTR will then return the contents of the specified counter.

It is documented below.
https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
 Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)

We previously discussed this with you (off the public list) and I
initially proposed the extended assignment mode.

Yes, the extended mode allows greater flexibility by enabling multiple
counters to be assigned to the same group, rather than being limited to
just two.

However, the challenge is that we currently lack the necessary interfaces
to configure multiple events per group. Without these interfaces, the
extended mode is not practical at this time.

Therefore, we ultimately agreed to use the legacy mode, as it does not
require modifications to the existing interface, allowing us to continue
using it as is.

> 
> (I might have said something confusing in my last messages because I
> had forgotten that I switched to the extended assignment mode when
> prototyping with soft-ABMC and MPAM.)
> 
> Forcing all groups on a domain to share the same 2 counter
> configurations would not be acceptable for us, as the example I gave
> earlier is one I've already been asked about.

I don’t see this as a blocker. It should be considered an extension to the
current ABMC series. We can easily build on top of this series once we
finalize how to configure the multiple event interface for each group.

> 
> I'm worried about requiring support for domain-level
> mbm_total_bytes_config and mbm_local_bytes_config files to be carried
> forward, because this conflicts with the configuration being per
> group/domain. (i.e., what would be read back from the domain files if
> per-group customizations have already been applied?)
> 
>>
>>>>
>>>>>> Until now I viewed counter configuration separate from counter assignment,
>>>>>> similar to how AMD's counters can be configured via mbm_total_bytes_config and
>>>>>> mbm_local_bytes_config before they are assigned. That is still per-domain
>>>>>> counter configuration though, not per-counter.
>>>>>>
>>>>>>> I assume packing all of this info for a group's desired counter
>>>>>>> configuration into a single line (with 32 domains per line on many
>>>>>>> dual-socket AMD configurations I see) would be difficult to look at,
>>>>>>> even if we could settle on a single letter to represent each
>>>>>>> universally.
>>>>>>>
>>>>>>>>
>>>>>>>> My goal is for resctrl to have a user interface that can as much as possible
>>>>>>>> be ready for whatever may be required from it years down the line. Of course,
>>>>>>>> I may be wrong and resctrl would never need to support more than 26 events per
>>>>>>>> resource (*). The risk is that resctrl *may* need to support more than 26 events
>>>>>>>> and how could resctrl support that?
>>>>>>>>
>>>>>>>> What is the risk of supporting more than 26 events? As I highlighted earlier
>>>>>>>> the interface I used as demonstration may become unwieldy to parse on a system
>>>>>>>> with many domains that supports many events. This is a concern for me. Any suggestions
>>>>>>>> will be appreciated, especially from you since I know that you are very familiar with
>>>>>>>> issues related to large scale use of resctrl interfaces.
>>>>>>>
>>>>>>> It's mainly just the unwieldiness of all the information in one file.
>>>>>>> It's already at the limit of what I can visually look through.
>>>>>>
>>>>>> I agree.
>>>>>>
>>>>>>>
>>>>>>> I believe that shared assignments will take care of all the
>>>>>>> high-frequency and performance-intensive batch configuration updates I
>>>>>>> was originally concerned about, so I no longer see much benefit in
>>>>>>> finding ways to textually encode all this information in a single file
>>>>>>> when it would be more manageable to distribute it around the
>>>>>>> filesystem hierarchy.
>>>>>>
>>>>>> This is significant. The motivation for the single file was to support
>>>>>> the "high-frequency and performance-intensive" usage. Would "shared assignments"
>>>>>> not also depend on the same files that, if distributed, will require many
>>>>>> filesystem operations?
>>>>>> Having the files distributed will be significantly simpler while also
>>>>>> avoiding the file size issue that Dave Martin exposed.
>>>>>
>>>>> The remaining filesystem operations will be assigning or removing
>>>>> shared counter assignments in the applicable domains, which would
>>>>> normally correspond to mkdir/rmdir of groups or changing their CPU
>>>>> affinity. The shared assignments are more "program and forget", while
>>>>> the exclusive assignment approach requires updates for every counter
>>>>> (in every domain) every few seconds to cover a large number of groups.
>>>>>
>>>>> When they want to pay extra attention to a particular group, I expect
>>>>> they'll ask for exclusive counters and leave them assigned for a while
>>>>> as they collect extra data.
>>>>
>>>> The single file approach is already unwieldy. The demands that will be
>>>> placed on it to support the usages currently being discussed would make this
>>>> interface even harder to use and manage. If the single file is not required
>>>> then I think we should go back to smaller files distributed in resctrl.
>>>> This may not even be an either/or argument. One way to view mbm_assign_control
>>>> could be as a way for user to interact with the distributed counter
>>>> related files with a single file system operation. Although, without
>>>> knowing how counter configuration is expected to work this remains unclear.
>>>
>>> If we do both interfaces and the multi-file model gives us more
>>> capability to express configurations, we could find situations where
>>> there are configurations we cannot represent when reading back from
>>> mbm_assign_control, or updates through mbm_assign_control have
>>> ambiguous effects on existing configurations which were created with
>>> other files.
>>
>> Right. My assumption was that the syntax would be identical.
>>
>>>
>>> However, the example I gave above seems to be adequately represented
>>> by a minor extension to mbm_assign_control and we all seem to
>>
>> To confirm what you mean with "minor extension to mbm_assign_control",
>> is this where the flags are associated with counter configurations? At this
>> time this is done separately from mbm_assign_control with the hardcoded "t"
>> and "l" flags configured via mbm_total_bytes_config and mbm_local_bytes
>> respectively. I think it would be simpler to keep these configurations
>> separate from mbm_assign_control. How it would look without better
>> understanding of MPAM is not clear to me at this time, unless if the
>> requirement is to enhance support for ABMC and BMEC. I do see that
>> this can be added later to build on what is supported by mbm_assign_control
>> with the syntax in this version.
> 
> As I explained above, I was looking at this from the perspective of
> the extended event assignment mode.
> 
> Thanks,
> -Peter
> 

-- 
Thanks
Babu Moger
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 11 months, 2 weeks ago
Hi Babu,

On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@amd.com> wrote:
>
> Hi Peter,
>
> On 2/25/25 11:11, Peter Newman wrote:
> > Hi Reinette,
> >
> > On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
> > <reinette.chatre@intel.com> wrote:
> >>
> >> Hi Peter,
> >>
> >> On 2/21/25 5:12 AM, Peter Newman wrote:
> >>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
> >>> <reinette.chatre@intel.com> wrote:
> >>>> On 2/20/25 6:53 AM, Peter Newman wrote:
> >>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
> >>>>> <reinette.chatre@intel.com> wrote:
> >>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
> >>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
> >>>>>>> <reinette.chatre@intel.com> wrote:
> >>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
> >>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
> >>>>>>>>> <reinette.chatre@intel.com> wrote:
> >>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
> >>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
> >>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
> >>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
> >>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
> >>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
> >>>>>>>>>>
> >>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
> >>>>>>>>>>
> >>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
> >>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
> >>>>>>>>>>>>>> Please help me understand if you see it differently.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
> >>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> mbm_local_read_bytes a
> >>>>>>>>>>>>>> mbm_local_write_bytes b
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Then mbm_assign_control can be used as:
> >>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
> >>>>>>>>>>>>>> <value>
> >>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> >>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
> >>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
> >>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
> >>>>>>>>>>
> >>>>>>>>>> As mentioned above, one possible issue with existing interface is that
> >>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
> >>>>>>>>>> is low enough to be of concern.
> >>>>>>>>>
> >>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
> >>>>>>>>> so far are combinable, so 26 counters per group today means it limits
> >>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
> >>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
> >>>>>>>>> investigation, I would question whether they know what they're looking
> >>>>>>>>> for.
> >>>>>>>>
> >>>>>>>> The key here is "so far" as well as the focus on MBM only.
> >>>>>>>>
> >>>>>>>> It is impossible for me to predict what we will see in a couple of years
> >>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
> >>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
> >>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
> >>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
> >>>>>>>> that he is working on patches [1] that will add new events and shared the idea
> >>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
> >>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
> >>>>>>>> customers.
> >>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
> >>>>>>>
> >>>>>>> I was thinking of the letters as representing a reusable, user-defined
> >>>>>>> event-set for applying to a single counter rather than as individual
> >>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
> >>>>>>> one counts. Wherever we define the letters, we could use more symbolic
> >>>>>>> event names.
> >>>>>>
> >>>>>> Thank you for clarifying.
> >>>>>>
> >>>>>>>
> >>>>>>> In the letters as events model, choosing the events assigned to a
> >>>>>>> group wouldn't be enough information, since we would want to control
> >>>>>>> which events should share a counter and which should be counted by
> >>>>>>> separate counters. I think the amount of information that would need
> >>>>>>> to be encoded into mbm_assign_control to represent the level of
> >>>>>>> configurability supported by hardware would quickly get out of hand.
> >>>>>>>
> >>>>>>> Maybe as an example, one counter for all reads, one counter for all
> >>>>>>> writes in ABMC would look like...
> >>>>>>>
> >>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
> >>>>>>>
> >>>>>>> (per domain)
> >>>>>>> group 0:
> >>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>>>> group 1:
> >>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>> ...
> >>>>>>>
> >>>>>>
> >>>>>> I think this may also be what Dave was heading towards in [2] but in that
> >>>>>> example and above the counter configuration appears to be global. You do mention
> >>>>>> "configurability supported by hardware" so I wonder if per-domain counter
> >>>>>> configuration is a requirement?
> >>>>>
> >>>>> If it's global and we want a particular group to be watched by more
> >>>>> counters, I wouldn't want this to result in allocating more counters
> >>>>> for that group in all domains, or allocating counters in domains where
> >>>>> they're not needed. I want to encourage my users to avoid allocating
> >>>>> monitoring resources in domains where a job is not allowed to run so
> >>>>> there's less pressure on the counters.
> >>>>>
> >>>>> In Dave's proposal it looks like global configuration means
> >>>>> globally-defined "named counter configurations", which works because
> >>>>> it's really per-domain assignment of the configurations to however
> >>>>> many counters the group needs in each domain.
> >>>>
> >>>> I think I am becoming lost. Would a global configuration not break your
> >>>> view of "event-set applied to a single counter"? If a counter is configured
> >>>> globally then it would not make it possible to support the full configurability
> >>>> of the hardware.
> >>>> Before I add more confusion, let me try with an example that builds on your
> >>>> earlier example copied below:
> >>>>
> >>>>>>> (per domain)
> >>>>>>> group 0:
> >>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>>>> group 1:
> >>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>> ...
> >>>>
> >>>> Since the above states "per domain" I rewrite the example to highlight that as
> >>>> I understand it:
> >>>>
> >>>> group 0:
> >>>>  domain 0:
> >>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>  domain 1:
> >>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>> group 1:
> >>>>  domain 0:
> >>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>  domain 1:
> >>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>
> >>>> You mention that you do not want counters to be allocated in domains that they
> >>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
> >>>> in domain 1, resulting in:
> >>>>
> >>>> group 0:
> >>>>  domain 0:
> >>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>> group 1:
> >>>>  domain 0:
> >>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>  domain 1:
> >>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>
> >>>> With counter 0 and counter 1 available in domain 1, these counters could
> >>>> theoretically be configured to give group 1 more data in domain 1:
> >>>>
> >>>> group 0:
> >>>>  domain 0:
> >>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>> group 1:
> >>>>  domain 0:
> >>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>  domain 1:
> >>>>   counter 0: LclFill,RmtFill
> >>>>   counter 1: LclNTWr,RmtNTWr
> >>>>   counter 2: LclSlowFill,RmtSlowFill
> >>>>   counter 3: VictimBW
> >>>>
> >>>> The counters are shown with different per-domain configurations that seems to
> >>>> match with earlier goals of (a) choose events counted by each counter and
> >>>> (b) do not allocate counters in domains where they are not needed. As I
> >>>> understand the above does contradict global counter configuration though.
> >>>> Or do you mean that only the *name* of the counter is global and then
> >>>> that it is reconfigured as part of every assignment?
> >>>
> >>> Yes, I meant only the *name* is global. I assume based on a particular
> >>> system configuration, the user will settle on a handful of useful
> >>> groupings to count.
> >>>
> >>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
> >>>
> >>>  # define global configurations (in ABMC terms), not necessarily in this
> >>>  # syntax and probably not in the mbm_assign_control file.
> >>>
> >>>  r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>  w=VictimBW,LclNTWr,RmtNTWr
> >>>
> >>>  # legacy "total" configuration, effectively r+w
> >>>  t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
> >>>
> >>>  /group0/0=t;1=t
> >>>  /group1/0=t;1=t
> >>>  /group2/0=_;1=t
> >>>  /group3/0=rw;1=_
> >>>
> >>> - group2 is restricted to domain 0
> >>> - group3 is restricted to domain 1
> >>> - the rest are unrestricted
> >>> - In group3, we decided we need to separate read and write traffic
> >>>
> >>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
> >>>
> >>
> >> I see. Thank you for the example.
> >>
> >> resctrl supports per-domain configurations with the following possible when
> >> using mbm_total_bytes_config and mbm_local_bytes_config:
> >>
> >> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
> >> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
> >>
> >>    /group0/0=t;1=t
> >>    /group1/0=t;1=t
> >>
> >> Even though the flags are identical in all domains, the assigned counters will
> >> be configured differently in each domain.
> >>
> >> With this supported by hardware and currently also supported by resctrl it seems
> >> reasonable to carry this forward to what will be supported next.
> >
> > The hardware supports both a per-domain mode, where all groups in a
> > domain use the same configurations and are limited to two events per
> > group and a per-group mode where every group can be configured and
> > assigned freely. This series is using the legacy counter access mode
> > where only counters whose BwType matches an instance of QOS_EVT_CFG_n
> > in the domain can be read. If we chose to read the assigned counter
> > directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
> > rather than asking the hardware to find the counter by RMID, we would
> > not be limited to 2 counters per group/domain and the hardware would
> > have the same flexibility as on MPAM.
>
> In extended mode, the contents of a specific counter can be read by
> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
> QM_CTR will then return the contents of the specified counter.
>
> It is documented below.
> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
>  Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
>
> We previously discussed this with you (off the public list) and I
> initially proposed the extended assignment mode.
>
> Yes, the extended mode allows greater flexibility by enabling multiple
> counters to be assigned to the same group, rather than being limited to
> just two.
>
> However, the challenge is that we currently lack the necessary interfaces
> to configure multiple events per group. Without these interfaces, the
> extended mode is not practical at this time.
>
> Therefore, we ultimately agreed to use the legacy mode, as it does not
> require modifications to the existing interface, allowing us to continue
> using it as is.
>
> >
> > (I might have said something confusing in my last messages because I
> > had forgotten that I switched to the extended assignment mode when
> > prototyping with soft-ABMC and MPAM.)
> >
> > Forcing all groups on a domain to share the same 2 counter
> > configurations would not be acceptable for us, as the example I gave
> > earlier is one I've already been asked about.
>
> I don’t see this as a blocker. It should be considered an extension to the
> current ABMC series. We can easily build on top of this series once we
> finalize how to configure the multiple event interface for each group.

I don't think it is, either. Only being able to use ABMC to assign
counters is fine for our use as an incremental step. My longer-term
concern is the domain-scoped mbm_total_bytes_config and
mbm_local_bytes_config files, but they were introduced with BMEC, so
there's already an expectation that the files are present when BMEC is
supported.

On ABMC hardware that also supports BMEC, I'm concerned about enabling
ABMC when only the BMEC-style event configuration interface exists.
The scope of my issue is just whether enabling "full" ABMC support
will require an additional opt-in, since that could remove the BMEC
interface. If it does, it's something we can live with.

-Peter
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 11 months, 2 weeks ago
Hi Peter/Reinette,

On 2/26/25 07:27, Peter Newman wrote:
> Hi Babu,
> 
> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@amd.com> wrote:
>>
>> Hi Peter,
>>
>> On 2/25/25 11:11, Peter Newman wrote:
>>> Hi Reinette,
>>>
>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
>>> <reinette.chatre@intel.com> wrote:
>>>>
>>>> Hi Peter,
>>>>
>>>> On 2/21/25 5:12 AM, Peter Newman wrote:
>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
>>>>> <reinette.chatre@intel.com> wrote:
>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>>>>>> <value>
>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>>>>>
>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>>>>>> is low enough to be of concern.
>>>>>>>>>>>
>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>>>>>> for.
>>>>>>>>>>
>>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>>>>>
>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>>>>>> customers.
>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>>>>>
>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>>>>>> event-set for applying to a single counter rather than as individual
>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>>>>>> event names.
>>>>>>>>
>>>>>>>> Thank you for clarifying.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> In the letters as events model, choosing the events assigned to a
>>>>>>>>> group wouldn't be enough information, since we would want to control
>>>>>>>>> which events should share a counter and which should be counted by
>>>>>>>>> separate counters. I think the amount of information that would need
>>>>>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>>>>>
>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>>>>>> writes in ABMC would look like...
>>>>>>>>>
>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>>>>>
>>>>>>>>> (per domain)
>>>>>>>>> group 0:
>>>>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>> group 1:
>>>>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>>>>>> example and above the counter configuration appears to be global. You do mention
>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>>>>>> configuration is a requirement?
>>>>>>>
>>>>>>> If it's global and we want a particular group to be watched by more
>>>>>>> counters, I wouldn't want this to result in allocating more counters
>>>>>>> for that group in all domains, or allocating counters in domains where
>>>>>>> they're not needed. I want to encourage my users to avoid allocating
>>>>>>> monitoring resources in domains where a job is not allowed to run so
>>>>>>> there's less pressure on the counters.
>>>>>>>
>>>>>>> In Dave's proposal it looks like global configuration means
>>>>>>> globally-defined "named counter configurations", which works because
>>>>>>> it's really per-domain assignment of the configurations to however
>>>>>>> many counters the group needs in each domain.
>>>>>>
>>>>>> I think I am becoming lost. Would a global configuration not break your
>>>>>> view of "event-set applied to a single counter"? If a counter is configured
>>>>>> globally then it would not make it possible to support the full configurability
>>>>>> of the hardware.
>>>>>> Before I add more confusion, let me try with an example that builds on your
>>>>>> earlier example copied below:
>>>>>>
>>>>>>>>> (per domain)
>>>>>>>>> group 0:
>>>>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>> group 1:
>>>>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>> ...
>>>>>>
>>>>>> Since the above states "per domain" I rewrite the example to highlight that as
>>>>>> I understand it:
>>>>>>
>>>>>> group 0:
>>>>>>  domain 0:
>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>  domain 1:
>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>> group 1:
>>>>>>  domain 0:
>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>  domain 1:
>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>
>>>>>> You mention that you do not want counters to be allocated in domains that they
>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>>>>>> in domain 1, resulting in:
>>>>>>
>>>>>> group 0:
>>>>>>  domain 0:
>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>> group 1:
>>>>>>  domain 0:
>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>  domain 1:
>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>
>>>>>> With counter 0 and counter 1 available in domain 1, these counters could
>>>>>> theoretically be configured to give group 1 more data in domain 1:
>>>>>>
>>>>>> group 0:
>>>>>>  domain 0:
>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>> group 1:
>>>>>>  domain 0:
>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>  domain 1:
>>>>>>   counter 0: LclFill,RmtFill
>>>>>>   counter 1: LclNTWr,RmtNTWr
>>>>>>   counter 2: LclSlowFill,RmtSlowFill
>>>>>>   counter 3: VictimBW
>>>>>>
>>>>>> The counters are shown with different per-domain configurations that seems to
>>>>>> match with earlier goals of (a) choose events counted by each counter and
>>>>>> (b) do not allocate counters in domains where they are not needed. As I
>>>>>> understand the above does contradict global counter configuration though.
>>>>>> Or do you mean that only the *name* of the counter is global and then
>>>>>> that it is reconfigured as part of every assignment?
>>>>>
>>>>> Yes, I meant only the *name* is global. I assume based on a particular
>>>>> system configuration, the user will settle on a handful of useful
>>>>> groupings to count.
>>>>>
>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
>>>>>
>>>>>  # define global configurations (in ABMC terms), not necessarily in this
>>>>>  # syntax and probably not in the mbm_assign_control file.
>>>>>
>>>>>  r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>  w=VictimBW,LclNTWr,RmtNTWr
>>>>>
>>>>>  # legacy "total" configuration, effectively r+w
>>>>>  t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>
>>>>>  /group0/0=t;1=t
>>>>>  /group1/0=t;1=t
>>>>>  /group2/0=_;1=t
>>>>>  /group3/0=rw;1=_
>>>>>
>>>>> - group2 is restricted to domain 0
>>>>> - group3 is restricted to domain 1
>>>>> - the rest are unrestricted
>>>>> - In group3, we decided we need to separate read and write traffic
>>>>>
>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
>>>>>
>>>>
>>>> I see. Thank you for the example.
>>>>
>>>> resctrl supports per-domain configurations with the following possible when
>>>> using mbm_total_bytes_config and mbm_local_bytes_config:
>>>>
>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>>>>
>>>>    /group0/0=t;1=t
>>>>    /group1/0=t;1=t
>>>>
>>>> Even though the flags are identical in all domains, the assigned counters will
>>>> be configured differently in each domain.
>>>>
>>>> With this supported by hardware and currently also supported by resctrl it seems
>>>> reasonable to carry this forward to what will be supported next.
>>>
>>> The hardware supports both a per-domain mode, where all groups in a
>>> domain use the same configurations and are limited to two events per
>>> group and a per-group mode where every group can be configured and
>>> assigned freely. This series is using the legacy counter access mode
>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
>>> in the domain can be read. If we chose to read the assigned counter
>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
>>> rather than asking the hardware to find the counter by RMID, we would
>>> not be limited to 2 counters per group/domain and the hardware would
>>> have the same flexibility as on MPAM.
>>
>> In extended mode, the contents of a specific counter can be read by
>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
>> QM_CTR will then return the contents of the specified counter.
>>
>> It is documented below.
>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
>>  Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
>>
>> We previously discussed this with you (off the public list) and I
>> initially proposed the extended assignment mode.
>>
>> Yes, the extended mode allows greater flexibility by enabling multiple
>> counters to be assigned to the same group, rather than being limited to
>> just two.
>>
>> However, the challenge is that we currently lack the necessary interfaces
>> to configure multiple events per group. Without these interfaces, the
>> extended mode is not practical at this time.
>>
>> Therefore, we ultimately agreed to use the legacy mode, as it does not
>> require modifications to the existing interface, allowing us to continue
>> using it as is.
>>
>>>
>>> (I might have said something confusing in my last messages because I
>>> had forgotten that I switched to the extended assignment mode when
>>> prototyping with soft-ABMC and MPAM.)
>>>
>>> Forcing all groups on a domain to share the same 2 counter
>>> configurations would not be acceptable for us, as the example I gave
>>> earlier is one I've already been asked about.
>>
>> I don’t see this as a blocker. It should be considered an extension to the
>> current ABMC series. We can easily build on top of this series once we
>> finalize how to configure the multiple event interface for each group.
> 
> I don't think it is, either. Only being able to use ABMC to assign
> counters is fine for our use as an incremental step. My longer-term
> concern is the domain-scoped mbm_total_bytes_config and
> mbm_local_bytes_config files, but they were introduced with BMEC, so
> there's already an expectation that the files are present when BMEC is
> supported.
> 
> On ABMC hardware that also supports BMEC, I'm concerned about enabling
> ABMC when only the BMEC-style event configuration interface exists.
> The scope of my issue is just whether enabling "full" ABMC support
> will require an additional opt-in, since that could remove the BMEC
> interface. If it does, it's something we can live with.

As you know, this series is currently blocked without further feedback.

I’d like to begin reworking these patches to incorporate Peter’s feedback.
Any input or suggestions would be appreciated.

Here’s what we’ve learned so far:

1. Assignments should be independent of BMEC.
2. We should be able to specify multiple event types to a counter (e.g.,
read, write, victimBM, etc.). This is also called shared counter
3. There should be an option to assign events per domain.
4. Currently, only two counters can be assigned per group, but the design
should allow flexibility to assign more in the future as the interface
evolves.
5. Utilize the extended RMID read mode.


Here is my proposal using Peter's earlier example:

# define event configurations

========================================================
Bits	Mnemonics    	Description
====   ========================================================
6       VictimBW	Dirty Victims from all types of memory
5       RmtSlowFill     Reads to slow memory in the non-local NUMA domain
4       LclSlowFill     Reads to slow memory in the local NUMA domain
3       RmtNTWr  	Non-temporal writes to non-local NUMA domain
2       LclNTWr 	Non-temporal writes to local NUMA domain
1       mtFill		Reads to memory in the non-local NUMA domain
0       LclFill		Reads to memory in the local NUMA domain
====    ========================================================

#Define flags based on combination of above event types.

t = LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
l = LclFill, LclNTWr, LclSlowFill
r = LclFill,RmtFill,LclSlowFill,RmtSlowFill
w = VictimBW,LclNTWr,RmtNTWr
v = VictimBW	

Peter suggested the following format earlier :

/group0/0=t;1=t
/group1/0=t;1=t
/group2/0=_;1=t
/group3/0=rw;1=_

Interpretation:
/group0/0=t;1=t  : Assign a counter with event configuration 't' to domain
0 and 1 on the resctrl group0.

This format does not indicate which index should be used for assignment.
Based the index we can read the events from either mbm_total_bytes or
mbm_local_bytes.

Currently, we can assign two counters to a group and events can be read
from mon_data/mon_L3_00/mbm_total_bytes (index 0) and
mon_data/mon_L3_00/mbm_local_bytes (index 1).

To address this, we need to include the index in some form. One approach
is to incorporate this information into the group's name.

Like below:

/group0:0/0=t;1=t
/group0:1/0=l;1=l
/group1:0/0=t;1=t
/group2:1/0=_;1=t
/group3:0/0=rw;1=_


Interpretation:
/group0:0/0=t;1=t : Assign a counter with event configuration 't' to
domain 0 and 1 on the resctrl group0 and use the index 0. The events can
be read in group0/mon_data/mon_L3_00/mbm_total_bytes and
group0/mon_data/mon_L3_01/mbm_total_bytes


/group0:1/0=l;1=l  :  Assign a counter with event configuration 'l' to
domain 0 and 1 on the resctrl group0 and use the index 1. The events can
be read in group0/mon_data/mon_L3_00/mbm_local_bytes and
group0/mon_data/mon_L3_01/mbm_local_bytes


What are your thoughts?
-- 
Thanks
Babu Moger
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 11 months, 1 week ago
On Mon, Mar 3, 2025 at 8:16 PM Moger, Babu <babu.moger@amd.com> wrote:
>
> Hi Peter/Reinette,
>
> On 2/26/25 07:27, Peter Newman wrote:
> > Hi Babu,
> >
> > On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@amd.com> wrote:
> >>
> >> Hi Peter,
> >>
> >> On 2/25/25 11:11, Peter Newman wrote:
> >>> Hi Reinette,
> >>>
> >>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
> >>> <reinette.chatre@intel.com> wrote:
> >>>>
> >>>> Hi Peter,
> >>>>
> >>>> On 2/21/25 5:12 AM, Peter Newman wrote:
> >>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
> >>>>> <reinette.chatre@intel.com> wrote:
> >>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
> >>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
> >>>>>>> <reinette.chatre@intel.com> wrote:
> >>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
> >>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
> >>>>>>>>> <reinette.chatre@intel.com> wrote:
> >>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
> >>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
> >>>>>>>>>>> <reinette.chatre@intel.com> wrote:
> >>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
> >>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
> >>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
> >>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
> >>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
> >>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
> >>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
> >>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
> >>>>>>>>>>>>>>>> Please help me understand if you see it differently.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
> >>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> mbm_local_read_bytes a
> >>>>>>>>>>>>>>>> mbm_local_write_bytes b
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
> >>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
> >>>>>>>>>>>>>>>> <value>
> >>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> >>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
> >>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
> >>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
> >>>>>>>>>>>>
> >>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
> >>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
> >>>>>>>>>>>> is low enough to be of concern.
> >>>>>>>>>>>
> >>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
> >>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
> >>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
> >>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
> >>>>>>>>>>> investigation, I would question whether they know what they're looking
> >>>>>>>>>>> for.
> >>>>>>>>>>
> >>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
> >>>>>>>>>>
> >>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
> >>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
> >>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
> >>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
> >>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
> >>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
> >>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
> >>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
> >>>>>>>>>> customers.
> >>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
> >>>>>>>>>
> >>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
> >>>>>>>>> event-set for applying to a single counter rather than as individual
> >>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
> >>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
> >>>>>>>>> event names.
> >>>>>>>>
> >>>>>>>> Thank you for clarifying.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> In the letters as events model, choosing the events assigned to a
> >>>>>>>>> group wouldn't be enough information, since we would want to control
> >>>>>>>>> which events should share a counter and which should be counted by
> >>>>>>>>> separate counters. I think the amount of information that would need
> >>>>>>>>> to be encoded into mbm_assign_control to represent the level of
> >>>>>>>>> configurability supported by hardware would quickly get out of hand.
> >>>>>>>>>
> >>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
> >>>>>>>>> writes in ABMC would look like...
> >>>>>>>>>
> >>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
> >>>>>>>>>
> >>>>>>>>> (per domain)
> >>>>>>>>> group 0:
> >>>>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>> group 1:
> >>>>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>> ...
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
> >>>>>>>> example and above the counter configuration appears to be global. You do mention
> >>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
> >>>>>>>> configuration is a requirement?
> >>>>>>>
> >>>>>>> If it's global and we want a particular group to be watched by more
> >>>>>>> counters, I wouldn't want this to result in allocating more counters
> >>>>>>> for that group in all domains, or allocating counters in domains where
> >>>>>>> they're not needed. I want to encourage my users to avoid allocating
> >>>>>>> monitoring resources in domains where a job is not allowed to run so
> >>>>>>> there's less pressure on the counters.
> >>>>>>>
> >>>>>>> In Dave's proposal it looks like global configuration means
> >>>>>>> globally-defined "named counter configurations", which works because
> >>>>>>> it's really per-domain assignment of the configurations to however
> >>>>>>> many counters the group needs in each domain.
> >>>>>>
> >>>>>> I think I am becoming lost. Would a global configuration not break your
> >>>>>> view of "event-set applied to a single counter"? If a counter is configured
> >>>>>> globally then it would not make it possible to support the full configurability
> >>>>>> of the hardware.
> >>>>>> Before I add more confusion, let me try with an example that builds on your
> >>>>>> earlier example copied below:
> >>>>>>
> >>>>>>>>> (per domain)
> >>>>>>>>> group 0:
> >>>>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>> group 1:
> >>>>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>> ...
> >>>>>>
> >>>>>> Since the above states "per domain" I rewrite the example to highlight that as
> >>>>>> I understand it:
> >>>>>>
> >>>>>> group 0:
> >>>>>>  domain 0:
> >>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>>>  domain 1:
> >>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>>> group 1:
> >>>>>>  domain 0:
> >>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>  domain 1:
> >>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>
> >>>>>> You mention that you do not want counters to be allocated in domains that they
> >>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
> >>>>>> in domain 1, resulting in:
> >>>>>>
> >>>>>> group 0:
> >>>>>>  domain 0:
> >>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>>> group 1:
> >>>>>>  domain 0:
> >>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>  domain 1:
> >>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>
> >>>>>> With counter 0 and counter 1 available in domain 1, these counters could
> >>>>>> theoretically be configured to give group 1 more data in domain 1:
> >>>>>>
> >>>>>> group 0:
> >>>>>>  domain 0:
> >>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>>> group 1:
> >>>>>>  domain 0:
> >>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>  domain 1:
> >>>>>>   counter 0: LclFill,RmtFill
> >>>>>>   counter 1: LclNTWr,RmtNTWr
> >>>>>>   counter 2: LclSlowFill,RmtSlowFill
> >>>>>>   counter 3: VictimBW
> >>>>>>
> >>>>>> The counters are shown with different per-domain configurations that seems to
> >>>>>> match with earlier goals of (a) choose events counted by each counter and
> >>>>>> (b) do not allocate counters in domains where they are not needed. As I
> >>>>>> understand the above does contradict global counter configuration though.
> >>>>>> Or do you mean that only the *name* of the counter is global and then
> >>>>>> that it is reconfigured as part of every assignment?
> >>>>>
> >>>>> Yes, I meant only the *name* is global. I assume based on a particular
> >>>>> system configuration, the user will settle on a handful of useful
> >>>>> groupings to count.
> >>>>>
> >>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
> >>>>>
> >>>>>  # define global configurations (in ABMC terms), not necessarily in this
> >>>>>  # syntax and probably not in the mbm_assign_control file.
> >>>>>
> >>>>>  r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>  w=VictimBW,LclNTWr,RmtNTWr
> >>>>>
> >>>>>  # legacy "total" configuration, effectively r+w
> >>>>>  t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
> >>>>>
> >>>>>  /group0/0=t;1=t
> >>>>>  /group1/0=t;1=t
> >>>>>  /group2/0=_;1=t
> >>>>>  /group3/0=rw;1=_
> >>>>>
> >>>>> - group2 is restricted to domain 0
> >>>>> - group3 is restricted to domain 1
> >>>>> - the rest are unrestricted
> >>>>> - In group3, we decided we need to separate read and write traffic
> >>>>>
> >>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
> >>>>>
> >>>>
> >>>> I see. Thank you for the example.
> >>>>
> >>>> resctrl supports per-domain configurations with the following possible when
> >>>> using mbm_total_bytes_config and mbm_local_bytes_config:
> >>>>
> >>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
> >>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
> >>>>
> >>>>    /group0/0=t;1=t
> >>>>    /group1/0=t;1=t
> >>>>
> >>>> Even though the flags are identical in all domains, the assigned counters will
> >>>> be configured differently in each domain.
> >>>>
> >>>> With this supported by hardware and currently also supported by resctrl it seems
> >>>> reasonable to carry this forward to what will be supported next.
> >>>
> >>> The hardware supports both a per-domain mode, where all groups in a
> >>> domain use the same configurations and are limited to two events per
> >>> group and a per-group mode where every group can be configured and
> >>> assigned freely. This series is using the legacy counter access mode
> >>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
> >>> in the domain can be read. If we chose to read the assigned counter
> >>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
> >>> rather than asking the hardware to find the counter by RMID, we would
> >>> not be limited to 2 counters per group/domain and the hardware would
> >>> have the same flexibility as on MPAM.
> >>
> >> In extended mode, the contents of a specific counter can be read by
> >> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
> >> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
> >> QM_CTR will then return the contents of the specified counter.
> >>
> >> It is documented below.
> >> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
> >>  Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
> >>
> >> We previously discussed this with you (off the public list) and I
> >> initially proposed the extended assignment mode.
> >>
> >> Yes, the extended mode allows greater flexibility by enabling multiple
> >> counters to be assigned to the same group, rather than being limited to
> >> just two.
> >>
> >> However, the challenge is that we currently lack the necessary interfaces
> >> to configure multiple events per group. Without these interfaces, the
> >> extended mode is not practical at this time.
> >>
> >> Therefore, we ultimately agreed to use the legacy mode, as it does not
> >> require modifications to the existing interface, allowing us to continue
> >> using it as is.
> >>
> >>>
> >>> (I might have said something confusing in my last messages because I
> >>> had forgotten that I switched to the extended assignment mode when
> >>> prototyping with soft-ABMC and MPAM.)
> >>>
> >>> Forcing all groups on a domain to share the same 2 counter
> >>> configurations would not be acceptable for us, as the example I gave
> >>> earlier is one I've already been asked about.
> >>
> >> I don’t see this as a blocker. It should be considered an extension to the
> >> current ABMC series. We can easily build on top of this series once we
> >> finalize how to configure the multiple event interface for each group.
> >
> > I don't think it is, either. Only being able to use ABMC to assign
> > counters is fine for our use as an incremental step. My longer-term
> > concern is the domain-scoped mbm_total_bytes_config and
> > mbm_local_bytes_config files, but they were introduced with BMEC, so
> > there's already an expectation that the files are present when BMEC is
> > supported.
> >
> > On ABMC hardware that also supports BMEC, I'm concerned about enabling
> > ABMC when only the BMEC-style event configuration interface exists.
> > The scope of my issue is just whether enabling "full" ABMC support
> > will require an additional opt-in, since that could remove the BMEC
> > interface. If it does, it's something we can live with.
>
> As you know, this series is currently blocked without further feedback.
>
> I’d like to begin reworking these patches to incorporate Peter’s feedback.
> Any input or suggestions would be appreciated.
>
> Here’s what we’ve learned so far:
>
> 1. Assignments should be independent of BMEC.
> 2. We should be able to specify multiple event types to a counter (e.g.,
> read, write, victimBM, etc.). This is also called shared counter
> 3. There should be an option to assign events per domain.
> 4. Currently, only two counters can be assigned per group, but the design
> should allow flexibility to assign more in the future as the interface
> evolves.
> 5. Utilize the extended RMID read mode.
>
>
> Here is my proposal using Peter's earlier example:
>
> # define event configurations
>
> ========================================================
> Bits    Mnemonics       Description
> ====   ========================================================
> 6       VictimBW        Dirty Victims from all types of memory
> 5       RmtSlowFill     Reads to slow memory in the non-local NUMA domain
> 4       LclSlowFill     Reads to slow memory in the local NUMA domain
> 3       RmtNTWr         Non-temporal writes to non-local NUMA domain
> 2       LclNTWr         Non-temporal writes to local NUMA domain
> 1       mtFill          Reads to memory in the non-local NUMA domain
> 0       LclFill         Reads to memory in the local NUMA domain
> ====    ========================================================
>
> #Define flags based on combination of above event types.
>
> t = LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
> l = LclFill, LclNTWr, LclSlowFill
> r = LclFill,RmtFill,LclSlowFill,RmtSlowFill
> w = VictimBW,LclNTWr,RmtNTWr
> v = VictimBW
>
> Peter suggested the following format earlier :
>
> /group0/0=t;1=t
> /group1/0=t;1=t
> /group2/0=_;1=t
> /group3/0=rw;1=_

After some inquiries within Google, it sounds like nobody has invested
much into the current mbm_assign_control format yet, so it would be
best to drop it and distribute the configuration around the filesystem
hierarchy[1], which should allow us to produce something more flexible
and cleaner to implement.

Roughly what I had in mind:

Use mkdir in a info/<resource>_MON subdirectory to create free-form
names for the assignable configurations rather than being restricted
to single letters.  In the resulting directory, populate a file where
we can specify the set of events the config should represent. I think
we should use symbolic names for the events rather than raw BMEC field
values. Moving forward we could come up with portable names for common
events and only support the BMEC names on AMD machines for users who
want specific events and don't care about portability.

Next, put assignment-control file nodes in per-domain directories
(i.e., mon_data/mon_L3_00/assign_{exclusive,shared}). Writing a
counter-configuration name into the file would then allocate a counter
in the domain, apply the named configuration, and monitor the parent
group-directory. We can also put a group/resource-scoped assign_* file
higher in the hierarchy to make it easier for users who want to
configure all domains the same for a group.

The configuration names listed in assign_* would result in files of
the same name in the appropriate mon_data domain directories from
which the count values can be read.

 # mkdir info/L3_MON/counter_configs/mbm_local_bytes
 # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
 # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
 # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
 # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
LclFill
LclNTWr
LclSlowFill

Note that we could also pre-populate info/L3_MON/counter_configs with
the expected configuration for mbm_local_bytes and mbm_total_bytes for
backwards compatibility.

To manually allocate counters for "mbm_local_bytes":

 # mkdir test
 # echo mbm_local_bytes > test/mon_data/mon_L3_00/assign_exclusive
 # echo mbm_local_bytes > test/mon_data/mon_L3_01/assign_exclusive
 # echo mbm_local_bytes > test/mon_data/mon_L3_02/assign_exclusive
[..]

Which would result in the creation of test/mon_data/mon_L3_*/mbm_local_bytes

For unassignment, we can just make an "unassign" node alongside
"assign_exclusive" and "assign_shared". These should provide enough
context to form resctrl_arch_config_cntr() calls.

-Peter

[1] https://lore.kernel.org/lkml/CALPaoCj1TH+GN6+dFnt5xuN406u=tB-8mj+UuMRSm5KWPJW2wg@mail.gmail.com/
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 11 months, 1 week ago
Hi Peter,

On 3/4/25 10:44, Peter Newman wrote:
> On Mon, Mar 3, 2025 at 8:16 PM Moger, Babu <babu.moger@amd.com> wrote:
>>
>> Hi Peter/Reinette,
>>
>> On 2/26/25 07:27, Peter Newman wrote:
>>> Hi Babu,
>>>
>>> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>
>>>> Hi Peter,
>>>>
>>>> On 2/25/25 11:11, Peter Newman wrote:
>>>>> Hi Reinette,
>>>>>
>>>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>
>>>>>> Hi Peter,
>>>>>>
>>>>>> On 2/21/25 5:12 AM, Peter Newman wrote:
>>>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>>>>>>>> <value>
>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>>>>>>>> is low enough to be of concern.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>>>>>>>> for.
>>>>>>>>>>>>
>>>>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>>>>>>>
>>>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>>>>>>>> customers.
>>>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>>>>>>>
>>>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>>>>>>>> event-set for applying to a single counter rather than as individual
>>>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>>>>>>>> event names.
>>>>>>>>>>
>>>>>>>>>> Thank you for clarifying.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> In the letters as events model, choosing the events assigned to a
>>>>>>>>>>> group wouldn't be enough information, since we would want to control
>>>>>>>>>>> which events should share a counter and which should be counted by
>>>>>>>>>>> separate counters. I think the amount of information that would need
>>>>>>>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>>>>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>>>>>>>
>>>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>>>>>>>> writes in ABMC would look like...
>>>>>>>>>>>
>>>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>>>>>>>
>>>>>>>>>>> (per domain)
>>>>>>>>>>> group 0:
>>>>>>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>> group 1:
>>>>>>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>>>>>>>> example and above the counter configuration appears to be global. You do mention
>>>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>>>>>>>> configuration is a requirement?
>>>>>>>>>
>>>>>>>>> If it's global and we want a particular group to be watched by more
>>>>>>>>> counters, I wouldn't want this to result in allocating more counters
>>>>>>>>> for that group in all domains, or allocating counters in domains where
>>>>>>>>> they're not needed. I want to encourage my users to avoid allocating
>>>>>>>>> monitoring resources in domains where a job is not allowed to run so
>>>>>>>>> there's less pressure on the counters.
>>>>>>>>>
>>>>>>>>> In Dave's proposal it looks like global configuration means
>>>>>>>>> globally-defined "named counter configurations", which works because
>>>>>>>>> it's really per-domain assignment of the configurations to however
>>>>>>>>> many counters the group needs in each domain.
>>>>>>>>
>>>>>>>> I think I am becoming lost. Would a global configuration not break your
>>>>>>>> view of "event-set applied to a single counter"? If a counter is configured
>>>>>>>> globally then it would not make it possible to support the full configurability
>>>>>>>> of the hardware.
>>>>>>>> Before I add more confusion, let me try with an example that builds on your
>>>>>>>> earlier example copied below:
>>>>>>>>
>>>>>>>>>>> (per domain)
>>>>>>>>>>> group 0:
>>>>>>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>> group 1:
>>>>>>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>> ...
>>>>>>>>
>>>>>>>> Since the above states "per domain" I rewrite the example to highlight that as
>>>>>>>> I understand it:
>>>>>>>>
>>>>>>>> group 0:
>>>>>>>>  domain 0:
>>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>  domain 1:
>>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>> group 1:
>>>>>>>>  domain 0:
>>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>  domain 1:
>>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>
>>>>>>>> You mention that you do not want counters to be allocated in domains that they
>>>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>>>>>>>> in domain 1, resulting in:
>>>>>>>>
>>>>>>>> group 0:
>>>>>>>>  domain 0:
>>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>> group 1:
>>>>>>>>  domain 0:
>>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>  domain 1:
>>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>
>>>>>>>> With counter 0 and counter 1 available in domain 1, these counters could
>>>>>>>> theoretically be configured to give group 1 more data in domain 1:
>>>>>>>>
>>>>>>>> group 0:
>>>>>>>>  domain 0:
>>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>> group 1:
>>>>>>>>  domain 0:
>>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>  domain 1:
>>>>>>>>   counter 0: LclFill,RmtFill
>>>>>>>>   counter 1: LclNTWr,RmtNTWr
>>>>>>>>   counter 2: LclSlowFill,RmtSlowFill
>>>>>>>>   counter 3: VictimBW
>>>>>>>>
>>>>>>>> The counters are shown with different per-domain configurations that seems to
>>>>>>>> match with earlier goals of (a) choose events counted by each counter and
>>>>>>>> (b) do not allocate counters in domains where they are not needed. As I
>>>>>>>> understand the above does contradict global counter configuration though.
>>>>>>>> Or do you mean that only the *name* of the counter is global and then
>>>>>>>> that it is reconfigured as part of every assignment?
>>>>>>>
>>>>>>> Yes, I meant only the *name* is global. I assume based on a particular
>>>>>>> system configuration, the user will settle on a handful of useful
>>>>>>> groupings to count.
>>>>>>>
>>>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
>>>>>>>
>>>>>>>  # define global configurations (in ABMC terms), not necessarily in this
>>>>>>>  # syntax and probably not in the mbm_assign_control file.
>>>>>>>
>>>>>>>  r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>  w=VictimBW,LclNTWr,RmtNTWr
>>>>>>>
>>>>>>>  # legacy "total" configuration, effectively r+w
>>>>>>>  t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>
>>>>>>>  /group0/0=t;1=t
>>>>>>>  /group1/0=t;1=t
>>>>>>>  /group2/0=_;1=t
>>>>>>>  /group3/0=rw;1=_
>>>>>>>
>>>>>>> - group2 is restricted to domain 0
>>>>>>> - group3 is restricted to domain 1
>>>>>>> - the rest are unrestricted
>>>>>>> - In group3, we decided we need to separate read and write traffic
>>>>>>>
>>>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
>>>>>>>
>>>>>>
>>>>>> I see. Thank you for the example.
>>>>>>
>>>>>> resctrl supports per-domain configurations with the following possible when
>>>>>> using mbm_total_bytes_config and mbm_local_bytes_config:
>>>>>>
>>>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>
>>>>>>    /group0/0=t;1=t
>>>>>>    /group1/0=t;1=t
>>>>>>
>>>>>> Even though the flags are identical in all domains, the assigned counters will
>>>>>> be configured differently in each domain.
>>>>>>
>>>>>> With this supported by hardware and currently also supported by resctrl it seems
>>>>>> reasonable to carry this forward to what will be supported next.
>>>>>
>>>>> The hardware supports both a per-domain mode, where all groups in a
>>>>> domain use the same configurations and are limited to two events per
>>>>> group and a per-group mode where every group can be configured and
>>>>> assigned freely. This series is using the legacy counter access mode
>>>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
>>>>> in the domain can be read. If we chose to read the assigned counter
>>>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
>>>>> rather than asking the hardware to find the counter by RMID, we would
>>>>> not be limited to 2 counters per group/domain and the hardware would
>>>>> have the same flexibility as on MPAM.
>>>>
>>>> In extended mode, the contents of a specific counter can be read by
>>>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
>>>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
>>>> QM_CTR will then return the contents of the specified counter.
>>>>
>>>> It is documented below.
>>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
>>>>  Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
>>>>
>>>> We previously discussed this with you (off the public list) and I
>>>> initially proposed the extended assignment mode.
>>>>
>>>> Yes, the extended mode allows greater flexibility by enabling multiple
>>>> counters to be assigned to the same group, rather than being limited to
>>>> just two.
>>>>
>>>> However, the challenge is that we currently lack the necessary interfaces
>>>> to configure multiple events per group. Without these interfaces, the
>>>> extended mode is not practical at this time.
>>>>
>>>> Therefore, we ultimately agreed to use the legacy mode, as it does not
>>>> require modifications to the existing interface, allowing us to continue
>>>> using it as is.
>>>>
>>>>>
>>>>> (I might have said something confusing in my last messages because I
>>>>> had forgotten that I switched to the extended assignment mode when
>>>>> prototyping with soft-ABMC and MPAM.)
>>>>>
>>>>> Forcing all groups on a domain to share the same 2 counter
>>>>> configurations would not be acceptable for us, as the example I gave
>>>>> earlier is one I've already been asked about.
>>>>
>>>> I don’t see this as a blocker. It should be considered an extension to the
>>>> current ABMC series. We can easily build on top of this series once we
>>>> finalize how to configure the multiple event interface for each group.
>>>
>>> I don't think it is, either. Only being able to use ABMC to assign
>>> counters is fine for our use as an incremental step. My longer-term
>>> concern is the domain-scoped mbm_total_bytes_config and
>>> mbm_local_bytes_config files, but they were introduced with BMEC, so
>>> there's already an expectation that the files are present when BMEC is
>>> supported.
>>>
>>> On ABMC hardware that also supports BMEC, I'm concerned about enabling
>>> ABMC when only the BMEC-style event configuration interface exists.
>>> The scope of my issue is just whether enabling "full" ABMC support
>>> will require an additional opt-in, since that could remove the BMEC
>>> interface. If it does, it's something we can live with.
>>
>> As you know, this series is currently blocked without further feedback.
>>
>> I’d like to begin reworking these patches to incorporate Peter’s feedback.
>> Any input or suggestions would be appreciated.
>>
>> Here’s what we’ve learned so far:
>>
>> 1. Assignments should be independent of BMEC.
>> 2. We should be able to specify multiple event types to a counter (e.g.,
>> read, write, victimBM, etc.). This is also called shared counter
>> 3. There should be an option to assign events per domain.
>> 4. Currently, only two counters can be assigned per group, but the design
>> should allow flexibility to assign more in the future as the interface
>> evolves.
>> 5. Utilize the extended RMID read mode.
>>
>>
>> Here is my proposal using Peter's earlier example:
>>
>> # define event configurations
>>
>> ========================================================
>> Bits    Mnemonics       Description
>> ====   ========================================================
>> 6       VictimBW        Dirty Victims from all types of memory
>> 5       RmtSlowFill     Reads to slow memory in the non-local NUMA domain
>> 4       LclSlowFill     Reads to slow memory in the local NUMA domain
>> 3       RmtNTWr         Non-temporal writes to non-local NUMA domain
>> 2       LclNTWr         Non-temporal writes to local NUMA domain
>> 1       mtFill          Reads to memory in the non-local NUMA domain
>> 0       LclFill         Reads to memory in the local NUMA domain
>> ====    ========================================================
>>
>> #Define flags based on combination of above event types.
>>
>> t = LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>> l = LclFill, LclNTWr, LclSlowFill
>> r = LclFill,RmtFill,LclSlowFill,RmtSlowFill
>> w = VictimBW,LclNTWr,RmtNTWr
>> v = VictimBW
>>
>> Peter suggested the following format earlier :
>>
>> /group0/0=t;1=t
>> /group1/0=t;1=t
>> /group2/0=_;1=t
>> /group3/0=rw;1=_
> 
> After some inquiries within Google, it sounds like nobody has invested
> much into the current mbm_assign_control format yet, so it would be
> best to drop it and distribute the configuration around the filesystem
> hierarchy[1], which should allow us to produce something more flexible
> and cleaner to implement.
> 
> Roughly what I had in mind:
> 
> Use mkdir in a info/<resource>_MON subdirectory to create free-form
> names for the assignable configurations rather than being restricted
> to single letters.  In the resulting directory, populate a file where
> we can specify the set of events the config should represent. I think
> we should use symbolic names for the events rather than raw BMEC field
> values. Moving forward we could come up with portable names for common
> events and only support the BMEC names on AMD machines for users who
> want specific events and don't care about portability.


I’m still processing this. Let me start with some initial questions.

So, we are creating event configurations here, which seems reasonable.

Yes, we should use portable names and are not limited to BMEC names.

How many configurations should we allow? Do we know?

> 
> Next, put assignment-control file nodes in per-domain directories
> (i.e., mon_data/mon_L3_00/assign_{exclusive,shared}). Writing a
> counter-configuration name into the file would then allocate a counter
> in the domain, apply the named configuration, and monitor the parent
> group-directory. We can also put a group/resource-scoped assign_* file
> higher in the hierarchy to make it easier for users who want to
> configure all domains the same for a group.

What is the difference between shared and exclusive?

Having three files—assign_shared, assign_exclusive, and unassign—for each
domain seems excessive. In a system with 32 groups and 12 domains, this
results in 32 × 12 × 3 files, which is quite large.

There should be a more efficient way to handle this.

Initially, we started with a group-level file for this interface, but it
was rejected due to the high number of sysfs calls, making it inefficient.

Additionally, how can we list all assignments with a single sysfs call?

That was another problem we need to address.


> 
> The configuration names listed in assign_* would result in files of
> the same name in the appropriate mon_data domain directories from
> which the count values can be read.
> 
>  # mkdir info/L3_MON/counter_configs/mbm_local_bytes
>  # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>  # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>  # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>  # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> LclFill
> LclNTWr
> LclSlowFill

I feel we can just have the configs. event_filter file is not required.

#cat info/L3_MON/counter_configs/mbm_local_bytes
LclFill <-rename these to generic names.
LclNTWr
LclSlowFill


> 
> Note that we could also pre-populate info/L3_MON/counter_configs with
> the expected configuration for mbm_local_bytes and mbm_total_bytes for
> backwards compatibility.
> 
> To manually allocate counters for "mbm_local_bytes":
> 
>  # mkdir test
>  # echo mbm_local_bytes > test/mon_data/mon_L3_00/assign_exclusive
>  # echo mbm_local_bytes > test/mon_data/mon_L3_01/assign_exclusive
>  # echo mbm_local_bytes > test/mon_data/mon_L3_02/assign_exclusive
> [..]
> 
> Which would result in the creation of test/mon_data/mon_L3_*/mbm_local_bytes
> 
> For unassignment, we can just make an "unassign" node alongside
> "assign_exclusive" and "assign_shared". These should provide enough
> context to form resctrl_arch_config_cntr() calls.
> 
> -Peter
> 
> [1] https://lore.kernel.org/lkml/CALPaoCj1TH+GN6+dFnt5xuN406u=tB-8mj+UuMRSm5KWPJW2wg@mail.gmail.com/
> 

Lets keep discussing.
-- 
Thanks
Babu Moger
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 11 months, 1 week ago
Hi Babu,

On Tue, Mar 4, 2025 at 10:49 PM Moger, Babu <babu.moger@amd.com> wrote:
>
> Hi Peter,
>
> On 3/4/25 10:44, Peter Newman wrote:
> > On Mon, Mar 3, 2025 at 8:16 PM Moger, Babu <babu.moger@amd.com> wrote:
> >>
> >> Hi Peter/Reinette,
> >>
> >> On 2/26/25 07:27, Peter Newman wrote:
> >>> Hi Babu,
> >>>
> >>> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@amd.com> wrote:
> >>>>
> >>>> Hi Peter,
> >>>>
> >>>> On 2/25/25 11:11, Peter Newman wrote:
> >>>>> Hi Reinette,
> >>>>>
> >>>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
> >>>>> <reinette.chatre@intel.com> wrote:
> >>>>>>
> >>>>>> Hi Peter,
> >>>>>>
> >>>>>> On 2/21/25 5:12 AM, Peter Newman wrote:
> >>>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
> >>>>>>> <reinette.chatre@intel.com> wrote:
> >>>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
> >>>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
> >>>>>>>>> <reinette.chatre@intel.com> wrote:
> >>>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
> >>>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
> >>>>>>>>>>> <reinette.chatre@intel.com> wrote:
> >>>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
> >>>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
> >>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
> >>>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
> >>>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
> >>>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
> >>>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
> >>>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
> >>>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
> >>>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
> >>>>>>>>>>>>>>>>>> Please help me understand if you see it differently.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
> >>>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> mbm_local_read_bytes a
> >>>>>>>>>>>>>>>>>> mbm_local_write_bytes b
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
> >>>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
> >>>>>>>>>>>>>>>>>> <value>
> >>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> >>>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
> >>>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
> >>>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
> >>>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
> >>>>>>>>>>>>>> is low enough to be of concern.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
> >>>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
> >>>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
> >>>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
> >>>>>>>>>>>>> investigation, I would question whether they know what they're looking
> >>>>>>>>>>>>> for.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
> >>>>>>>>>>>>
> >>>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
> >>>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
> >>>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
> >>>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
> >>>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
> >>>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
> >>>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
> >>>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
> >>>>>>>>>>>> customers.
> >>>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
> >>>>>>>>>>>
> >>>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
> >>>>>>>>>>> event-set for applying to a single counter rather than as individual
> >>>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
> >>>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
> >>>>>>>>>>> event names.
> >>>>>>>>>>
> >>>>>>>>>> Thank you for clarifying.
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> In the letters as events model, choosing the events assigned to a
> >>>>>>>>>>> group wouldn't be enough information, since we would want to control
> >>>>>>>>>>> which events should share a counter and which should be counted by
> >>>>>>>>>>> separate counters. I think the amount of information that would need
> >>>>>>>>>>> to be encoded into mbm_assign_control to represent the level of
> >>>>>>>>>>> configurability supported by hardware would quickly get out of hand.
> >>>>>>>>>>>
> >>>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
> >>>>>>>>>>> writes in ABMC would look like...
> >>>>>>>>>>>
> >>>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
> >>>>>>>>>>>
> >>>>>>>>>>> (per domain)
> >>>>>>>>>>> group 0:
> >>>>>>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>>>> group 1:
> >>>>>>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>>>> ...
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
> >>>>>>>>>> example and above the counter configuration appears to be global. You do mention
> >>>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
> >>>>>>>>>> configuration is a requirement?
> >>>>>>>>>
> >>>>>>>>> If it's global and we want a particular group to be watched by more
> >>>>>>>>> counters, I wouldn't want this to result in allocating more counters
> >>>>>>>>> for that group in all domains, or allocating counters in domains where
> >>>>>>>>> they're not needed. I want to encourage my users to avoid allocating
> >>>>>>>>> monitoring resources in domains where a job is not allowed to run so
> >>>>>>>>> there's less pressure on the counters.
> >>>>>>>>>
> >>>>>>>>> In Dave's proposal it looks like global configuration means
> >>>>>>>>> globally-defined "named counter configurations", which works because
> >>>>>>>>> it's really per-domain assignment of the configurations to however
> >>>>>>>>> many counters the group needs in each domain.
> >>>>>>>>
> >>>>>>>> I think I am becoming lost. Would a global configuration not break your
> >>>>>>>> view of "event-set applied to a single counter"? If a counter is configured
> >>>>>>>> globally then it would not make it possible to support the full configurability
> >>>>>>>> of the hardware.
> >>>>>>>> Before I add more confusion, let me try with an example that builds on your
> >>>>>>>> earlier example copied below:
> >>>>>>>>
> >>>>>>>>>>> (per domain)
> >>>>>>>>>>> group 0:
> >>>>>>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>>>> group 1:
> >>>>>>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>>>> ...
> >>>>>>>>
> >>>>>>>> Since the above states "per domain" I rewrite the example to highlight that as
> >>>>>>>> I understand it:
> >>>>>>>>
> >>>>>>>> group 0:
> >>>>>>>>  domain 0:
> >>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>  domain 1:
> >>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>> group 1:
> >>>>>>>>  domain 0:
> >>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>  domain 1:
> >>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>
> >>>>>>>> You mention that you do not want counters to be allocated in domains that they
> >>>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
> >>>>>>>> in domain 1, resulting in:
> >>>>>>>>
> >>>>>>>> group 0:
> >>>>>>>>  domain 0:
> >>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>> group 1:
> >>>>>>>>  domain 0:
> >>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>  domain 1:
> >>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>
> >>>>>>>> With counter 0 and counter 1 available in domain 1, these counters could
> >>>>>>>> theoretically be configured to give group 1 more data in domain 1:
> >>>>>>>>
> >>>>>>>> group 0:
> >>>>>>>>  domain 0:
> >>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>> group 1:
> >>>>>>>>  domain 0:
> >>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>  domain 1:
> >>>>>>>>   counter 0: LclFill,RmtFill
> >>>>>>>>   counter 1: LclNTWr,RmtNTWr
> >>>>>>>>   counter 2: LclSlowFill,RmtSlowFill
> >>>>>>>>   counter 3: VictimBW
> >>>>>>>>
> >>>>>>>> The counters are shown with different per-domain configurations that seems to
> >>>>>>>> match with earlier goals of (a) choose events counted by each counter and
> >>>>>>>> (b) do not allocate counters in domains where they are not needed. As I
> >>>>>>>> understand the above does contradict global counter configuration though.
> >>>>>>>> Or do you mean that only the *name* of the counter is global and then
> >>>>>>>> that it is reconfigured as part of every assignment?
> >>>>>>>
> >>>>>>> Yes, I meant only the *name* is global. I assume based on a particular
> >>>>>>> system configuration, the user will settle on a handful of useful
> >>>>>>> groupings to count.
> >>>>>>>
> >>>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
> >>>>>>>
> >>>>>>>  # define global configurations (in ABMC terms), not necessarily in this
> >>>>>>>  # syntax and probably not in the mbm_assign_control file.
> >>>>>>>
> >>>>>>>  r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>  w=VictimBW,LclNTWr,RmtNTWr
> >>>>>>>
> >>>>>>>  # legacy "total" configuration, effectively r+w
> >>>>>>>  t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
> >>>>>>>
> >>>>>>>  /group0/0=t;1=t
> >>>>>>>  /group1/0=t;1=t
> >>>>>>>  /group2/0=_;1=t
> >>>>>>>  /group3/0=rw;1=_
> >>>>>>>
> >>>>>>> - group2 is restricted to domain 0
> >>>>>>> - group3 is restricted to domain 1
> >>>>>>> - the rest are unrestricted
> >>>>>>> - In group3, we decided we need to separate read and write traffic
> >>>>>>>
> >>>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
> >>>>>>>
> >>>>>>
> >>>>>> I see. Thank you for the example.
> >>>>>>
> >>>>>> resctrl supports per-domain configurations with the following possible when
> >>>>>> using mbm_total_bytes_config and mbm_local_bytes_config:
> >>>>>>
> >>>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
> >>>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
> >>>>>>
> >>>>>>    /group0/0=t;1=t
> >>>>>>    /group1/0=t;1=t
> >>>>>>
> >>>>>> Even though the flags are identical in all domains, the assigned counters will
> >>>>>> be configured differently in each domain.
> >>>>>>
> >>>>>> With this supported by hardware and currently also supported by resctrl it seems
> >>>>>> reasonable to carry this forward to what will be supported next.
> >>>>>
> >>>>> The hardware supports both a per-domain mode, where all groups in a
> >>>>> domain use the same configurations and are limited to two events per
> >>>>> group and a per-group mode where every group can be configured and
> >>>>> assigned freely. This series is using the legacy counter access mode
> >>>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
> >>>>> in the domain can be read. If we chose to read the assigned counter
> >>>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
> >>>>> rather than asking the hardware to find the counter by RMID, we would
> >>>>> not be limited to 2 counters per group/domain and the hardware would
> >>>>> have the same flexibility as on MPAM.
> >>>>
> >>>> In extended mode, the contents of a specific counter can be read by
> >>>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
> >>>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
> >>>> QM_CTR will then return the contents of the specified counter.
> >>>>
> >>>> It is documented below.
> >>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
> >>>>  Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
> >>>>
> >>>> We previously discussed this with you (off the public list) and I
> >>>> initially proposed the extended assignment mode.
> >>>>
> >>>> Yes, the extended mode allows greater flexibility by enabling multiple
> >>>> counters to be assigned to the same group, rather than being limited to
> >>>> just two.
> >>>>
> >>>> However, the challenge is that we currently lack the necessary interfaces
> >>>> to configure multiple events per group. Without these interfaces, the
> >>>> extended mode is not practical at this time.
> >>>>
> >>>> Therefore, we ultimately agreed to use the legacy mode, as it does not
> >>>> require modifications to the existing interface, allowing us to continue
> >>>> using it as is.
> >>>>
> >>>>>
> >>>>> (I might have said something confusing in my last messages because I
> >>>>> had forgotten that I switched to the extended assignment mode when
> >>>>> prototyping with soft-ABMC and MPAM.)
> >>>>>
> >>>>> Forcing all groups on a domain to share the same 2 counter
> >>>>> configurations would not be acceptable for us, as the example I gave
> >>>>> earlier is one I've already been asked about.
> >>>>
> >>>> I don’t see this as a blocker. It should be considered an extension to the
> >>>> current ABMC series. We can easily build on top of this series once we
> >>>> finalize how to configure the multiple event interface for each group.
> >>>
> >>> I don't think it is, either. Only being able to use ABMC to assign
> >>> counters is fine for our use as an incremental step. My longer-term
> >>> concern is the domain-scoped mbm_total_bytes_config and
> >>> mbm_local_bytes_config files, but they were introduced with BMEC, so
> >>> there's already an expectation that the files are present when BMEC is
> >>> supported.
> >>>
> >>> On ABMC hardware that also supports BMEC, I'm concerned about enabling
> >>> ABMC when only the BMEC-style event configuration interface exists.
> >>> The scope of my issue is just whether enabling "full" ABMC support
> >>> will require an additional opt-in, since that could remove the BMEC
> >>> interface. If it does, it's something we can live with.
> >>
> >> As you know, this series is currently blocked without further feedback.
> >>
> >> I’d like to begin reworking these patches to incorporate Peter’s feedback.
> >> Any input or suggestions would be appreciated.
> >>
> >> Here’s what we’ve learned so far:
> >>
> >> 1. Assignments should be independent of BMEC.
> >> 2. We should be able to specify multiple event types to a counter (e.g.,
> >> read, write, victimBM, etc.). This is also called shared counter
> >> 3. There should be an option to assign events per domain.
> >> 4. Currently, only two counters can be assigned per group, but the design
> >> should allow flexibility to assign more in the future as the interface
> >> evolves.
> >> 5. Utilize the extended RMID read mode.
> >>
> >>
> >> Here is my proposal using Peter's earlier example:
> >>
> >> # define event configurations
> >>
> >> ========================================================
> >> Bits    Mnemonics       Description
> >> ====   ========================================================
> >> 6       VictimBW        Dirty Victims from all types of memory
> >> 5       RmtSlowFill     Reads to slow memory in the non-local NUMA domain
> >> 4       LclSlowFill     Reads to slow memory in the local NUMA domain
> >> 3       RmtNTWr         Non-temporal writes to non-local NUMA domain
> >> 2       LclNTWr         Non-temporal writes to local NUMA domain
> >> 1       mtFill          Reads to memory in the non-local NUMA domain
> >> 0       LclFill         Reads to memory in the local NUMA domain
> >> ====    ========================================================
> >>
> >> #Define flags based on combination of above event types.
> >>
> >> t = LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
> >> l = LclFill, LclNTWr, LclSlowFill
> >> r = LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >> w = VictimBW,LclNTWr,RmtNTWr
> >> v = VictimBW
> >>
> >> Peter suggested the following format earlier :
> >>
> >> /group0/0=t;1=t
> >> /group1/0=t;1=t
> >> /group2/0=_;1=t
> >> /group3/0=rw;1=_
> >
> > After some inquiries within Google, it sounds like nobody has invested
> > much into the current mbm_assign_control format yet, so it would be
> > best to drop it and distribute the configuration around the filesystem
> > hierarchy[1], which should allow us to produce something more flexible
> > and cleaner to implement.
> >
> > Roughly what I had in mind:
> >
> > Use mkdir in a info/<resource>_MON subdirectory to create free-form
> > names for the assignable configurations rather than being restricted
> > to single letters.  In the resulting directory, populate a file where
> > we can specify the set of events the config should represent. I think
> > we should use symbolic names for the events rather than raw BMEC field
> > values. Moving forward we could come up with portable names for common
> > events and only support the BMEC names on AMD machines for users who
> > want specific events and don't care about portability.
>
>
> I’m still processing this. Let me start with some initial questions.
>
> So, we are creating event configurations here, which seems reasonable.
>
> Yes, we should use portable names and are not limited to BMEC names.
>
> How many configurations should we allow? Do we know?

Do we need an upper limit?

>
> >
> > Next, put assignment-control file nodes in per-domain directories
> > (i.e., mon_data/mon_L3_00/assign_{exclusive,shared}). Writing a
> > counter-configuration name into the file would then allocate a counter
> > in the domain, apply the named configuration, and monitor the parent
> > group-directory. We can also put a group/resource-scoped assign_* file
> > higher in the hierarchy to make it easier for users who want to
> > configure all domains the same for a group.
>
> What is the difference between shared and exclusive?

Shared assignment[1] means that non-exclusively-assigned counters in
each domain will be scheduled round-robin to the groups requesting
shared access to a counter. In my tests, I assigned the counters long
enough to produce a single 1-second MB/s sample for the per-domain
aggregation files[2].

These do not need to be implemented immediately, but knowing that they
work addresses the overhead and scalability concerns of reassigning
counters and reading their values.

>
> Having three files—assign_shared, assign_exclusive, and unassign—for each
> domain seems excessive. In a system with 32 groups and 12 domains, this
> results in 32 × 12 × 3 files, which is quite large.
>
> There should be a more efficient way to handle this.
>
> Initially, we started with a group-level file for this interface, but it
> was rejected due to the high number of sysfs calls, making it inefficient.

I had rejected it due to the high-frequency of access of a large
number of files, which has since been addressed by shared assignment
(or automatic reassignment) and aggregated mbps files.

>
> Additionally, how can we list all assignments with a single sysfs call?
>
> That was another problem we need to address.

This is not a requirement I was aware of. If the user forgot where
they assigned counters (or forgot to disable auto-assignment), they
can read multiple sysfs nodes to remind themselves.

>
>
> >
> > The configuration names listed in assign_* would result in files of
> > the same name in the appropriate mon_data domain directories from
> > which the count values can be read.
> >
> >  # mkdir info/L3_MON/counter_configs/mbm_local_bytes
> >  # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> >  # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> >  # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> >  # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> > LclFill
> > LclNTWr
> > LclSlowFill
>
> I feel we can just have the configs. event_filter file is not required.

That's right, I forgot that we can implement kernfs_ops::open(). I was
only looking at struct kernfs_syscall_ops

>
> #cat info/L3_MON/counter_configs/mbm_local_bytes
> LclFill <-rename these to generic names.
> LclNTWr
> LclSlowFill
>

I think portable and non-portable event names should both be available
as options. There are simple bandwidth measurement mechanisms that
will be applied in general, but when they turn up an issue, it can
often lead to a more focused investigation, requiring more precise
events.

-Peter
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 11 months, 1 week ago
Hi Peter,

On 3/5/25 04:40, Peter Newman wrote:
> Hi Babu,
> 
> On Tue, Mar 4, 2025 at 10:49 PM Moger, Babu <babu.moger@amd.com> wrote:
>>
>> Hi Peter,
>>
>> On 3/4/25 10:44, Peter Newman wrote:
>>> On Mon, Mar 3, 2025 at 8:16 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>
>>>> Hi Peter/Reinette,
>>>>
>>>> On 2/26/25 07:27, Peter Newman wrote:
>>>>> Hi Babu,
>>>>>
>>>>> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>
>>>>>> Hi Peter,
>>>>>>
>>>>>> On 2/25/25 11:11, Peter Newman wrote:
>>>>>>> Hi Reinette,
>>>>>>>
>>>>>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> On 2/21/25 5:12 AM, Peter Newman wrote:
>>>>>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>>>>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>>>>>>>>>> <value>
>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>>>>>>>>>> is low enough to be of concern.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>>>>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>>>>>>>>>> for.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>>>>>>>>>> customers.
>>>>>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>>>>>>>>>> event-set for applying to a single counter rather than as individual
>>>>>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>>>>>>>>>> event names.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you for clarifying.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> In the letters as events model, choosing the events assigned to a
>>>>>>>>>>>>> group wouldn't be enough information, since we would want to control
>>>>>>>>>>>>> which events should share a counter and which should be counted by
>>>>>>>>>>>>> separate counters. I think the amount of information that would need
>>>>>>>>>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>>>>>>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>>>>>>>>>> writes in ABMC would look like...
>>>>>>>>>>>>>
>>>>>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>>>>>>>>>
>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>> ...
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>>>>>>>>>> example and above the counter configuration appears to be global. You do mention
>>>>>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>>>>>>>>>> configuration is a requirement?
>>>>>>>>>>>
>>>>>>>>>>> If it's global and we want a particular group to be watched by more
>>>>>>>>>>> counters, I wouldn't want this to result in allocating more counters
>>>>>>>>>>> for that group in all domains, or allocating counters in domains where
>>>>>>>>>>> they're not needed. I want to encourage my users to avoid allocating
>>>>>>>>>>> monitoring resources in domains where a job is not allowed to run so
>>>>>>>>>>> there's less pressure on the counters.
>>>>>>>>>>>
>>>>>>>>>>> In Dave's proposal it looks like global configuration means
>>>>>>>>>>> globally-defined "named counter configurations", which works because
>>>>>>>>>>> it's really per-domain assignment of the configurations to however
>>>>>>>>>>> many counters the group needs in each domain.
>>>>>>>>>>
>>>>>>>>>> I think I am becoming lost. Would a global configuration not break your
>>>>>>>>>> view of "event-set applied to a single counter"? If a counter is configured
>>>>>>>>>> globally then it would not make it possible to support the full configurability
>>>>>>>>>> of the hardware.
>>>>>>>>>> Before I add more confusion, let me try with an example that builds on your
>>>>>>>>>> earlier example copied below:
>>>>>>>>>>
>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>> ...
>>>>>>>>>>
>>>>>>>>>> Since the above states "per domain" I rewrite the example to highlight that as
>>>>>>>>>> I understand it:
>>>>>>>>>>
>>>>>>>>>> group 0:
>>>>>>>>>>  domain 0:
>>>>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>  domain 1:
>>>>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>> group 1:
>>>>>>>>>>  domain 0:
>>>>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>  domain 1:
>>>>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>
>>>>>>>>>> You mention that you do not want counters to be allocated in domains that they
>>>>>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>>>>>>>>>> in domain 1, resulting in:
>>>>>>>>>>
>>>>>>>>>> group 0:
>>>>>>>>>>  domain 0:
>>>>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>> group 1:
>>>>>>>>>>  domain 0:
>>>>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>  domain 1:
>>>>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>
>>>>>>>>>> With counter 0 and counter 1 available in domain 1, these counters could
>>>>>>>>>> theoretically be configured to give group 1 more data in domain 1:
>>>>>>>>>>
>>>>>>>>>> group 0:
>>>>>>>>>>  domain 0:
>>>>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>> group 1:
>>>>>>>>>>  domain 0:
>>>>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>  domain 1:
>>>>>>>>>>   counter 0: LclFill,RmtFill
>>>>>>>>>>   counter 1: LclNTWr,RmtNTWr
>>>>>>>>>>   counter 2: LclSlowFill,RmtSlowFill
>>>>>>>>>>   counter 3: VictimBW
>>>>>>>>>>
>>>>>>>>>> The counters are shown with different per-domain configurations that seems to
>>>>>>>>>> match with earlier goals of (a) choose events counted by each counter and
>>>>>>>>>> (b) do not allocate counters in domains where they are not needed. As I
>>>>>>>>>> understand the above does contradict global counter configuration though.
>>>>>>>>>> Or do you mean that only the *name* of the counter is global and then
>>>>>>>>>> that it is reconfigured as part of every assignment?
>>>>>>>>>
>>>>>>>>> Yes, I meant only the *name* is global. I assume based on a particular
>>>>>>>>> system configuration, the user will settle on a handful of useful
>>>>>>>>> groupings to count.
>>>>>>>>>
>>>>>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
>>>>>>>>>
>>>>>>>>>  # define global configurations (in ABMC terms), not necessarily in this
>>>>>>>>>  # syntax and probably not in the mbm_assign_control file.
>>>>>>>>>
>>>>>>>>>  r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>  w=VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>
>>>>>>>>>  # legacy "total" configuration, effectively r+w
>>>>>>>>>  t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>
>>>>>>>>>  /group0/0=t;1=t
>>>>>>>>>  /group1/0=t;1=t
>>>>>>>>>  /group2/0=_;1=t
>>>>>>>>>  /group3/0=rw;1=_
>>>>>>>>>
>>>>>>>>> - group2 is restricted to domain 0
>>>>>>>>> - group3 is restricted to domain 1
>>>>>>>>> - the rest are unrestricted
>>>>>>>>> - In group3, we decided we need to separate read and write traffic
>>>>>>>>>
>>>>>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I see. Thank you for the example.
>>>>>>>>
>>>>>>>> resctrl supports per-domain configurations with the following possible when
>>>>>>>> using mbm_total_bytes_config and mbm_local_bytes_config:
>>>>>>>>
>>>>>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>
>>>>>>>>    /group0/0=t;1=t
>>>>>>>>    /group1/0=t;1=t
>>>>>>>>
>>>>>>>> Even though the flags are identical in all domains, the assigned counters will
>>>>>>>> be configured differently in each domain.
>>>>>>>>
>>>>>>>> With this supported by hardware and currently also supported by resctrl it seems
>>>>>>>> reasonable to carry this forward to what will be supported next.
>>>>>>>
>>>>>>> The hardware supports both a per-domain mode, where all groups in a
>>>>>>> domain use the same configurations and are limited to two events per
>>>>>>> group and a per-group mode where every group can be configured and
>>>>>>> assigned freely. This series is using the legacy counter access mode
>>>>>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
>>>>>>> in the domain can be read. If we chose to read the assigned counter
>>>>>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
>>>>>>> rather than asking the hardware to find the counter by RMID, we would
>>>>>>> not be limited to 2 counters per group/domain and the hardware would
>>>>>>> have the same flexibility as on MPAM.
>>>>>>
>>>>>> In extended mode, the contents of a specific counter can be read by
>>>>>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
>>>>>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
>>>>>> QM_CTR will then return the contents of the specified counter.
>>>>>>
>>>>>> It is documented below.
>>>>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
>>>>>>  Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
>>>>>>
>>>>>> We previously discussed this with you (off the public list) and I
>>>>>> initially proposed the extended assignment mode.
>>>>>>
>>>>>> Yes, the extended mode allows greater flexibility by enabling multiple
>>>>>> counters to be assigned to the same group, rather than being limited to
>>>>>> just two.
>>>>>>
>>>>>> However, the challenge is that we currently lack the necessary interfaces
>>>>>> to configure multiple events per group. Without these interfaces, the
>>>>>> extended mode is not practical at this time.
>>>>>>
>>>>>> Therefore, we ultimately agreed to use the legacy mode, as it does not
>>>>>> require modifications to the existing interface, allowing us to continue
>>>>>> using it as is.
>>>>>>
>>>>>>>
>>>>>>> (I might have said something confusing in my last messages because I
>>>>>>> had forgotten that I switched to the extended assignment mode when
>>>>>>> prototyping with soft-ABMC and MPAM.)
>>>>>>>
>>>>>>> Forcing all groups on a domain to share the same 2 counter
>>>>>>> configurations would not be acceptable for us, as the example I gave
>>>>>>> earlier is one I've already been asked about.
>>>>>>
>>>>>> I don’t see this as a blocker. It should be considered an extension to the
>>>>>> current ABMC series. We can easily build on top of this series once we
>>>>>> finalize how to configure the multiple event interface for each group.
>>>>>
>>>>> I don't think it is, either. Only being able to use ABMC to assign
>>>>> counters is fine for our use as an incremental step. My longer-term
>>>>> concern is the domain-scoped mbm_total_bytes_config and
>>>>> mbm_local_bytes_config files, but they were introduced with BMEC, so
>>>>> there's already an expectation that the files are present when BMEC is
>>>>> supported.
>>>>>
>>>>> On ABMC hardware that also supports BMEC, I'm concerned about enabling
>>>>> ABMC when only the BMEC-style event configuration interface exists.
>>>>> The scope of my issue is just whether enabling "full" ABMC support
>>>>> will require an additional opt-in, since that could remove the BMEC
>>>>> interface. If it does, it's something we can live with.
>>>>
>>>> As you know, this series is currently blocked without further feedback.
>>>>
>>>> I’d like to begin reworking these patches to incorporate Peter’s feedback.
>>>> Any input or suggestions would be appreciated.
>>>>
>>>> Here’s what we’ve learned so far:
>>>>
>>>> 1. Assignments should be independent of BMEC.
>>>> 2. We should be able to specify multiple event types to a counter (e.g.,
>>>> read, write, victimBM, etc.). This is also called shared counter
>>>> 3. There should be an option to assign events per domain.
>>>> 4. Currently, only two counters can be assigned per group, but the design
>>>> should allow flexibility to assign more in the future as the interface
>>>> evolves.
>>>> 5. Utilize the extended RMID read mode.
>>>>
>>>>
>>>> Here is my proposal using Peter's earlier example:
>>>>
>>>> # define event configurations
>>>>
>>>> ========================================================
>>>> Bits    Mnemonics       Description
>>>> ====   ========================================================
>>>> 6       VictimBW        Dirty Victims from all types of memory
>>>> 5       RmtSlowFill     Reads to slow memory in the non-local NUMA domain
>>>> 4       LclSlowFill     Reads to slow memory in the local NUMA domain
>>>> 3       RmtNTWr         Non-temporal writes to non-local NUMA domain
>>>> 2       LclNTWr         Non-temporal writes to local NUMA domain
>>>> 1       mtFill          Reads to memory in the non-local NUMA domain
>>>> 0       LclFill         Reads to memory in the local NUMA domain
>>>> ====    ========================================================
>>>>
>>>> #Define flags based on combination of above event types.
>>>>
>>>> t = LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>> l = LclFill, LclNTWr, LclSlowFill
>>>> r = LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>> w = VictimBW,LclNTWr,RmtNTWr
>>>> v = VictimBW
>>>>
>>>> Peter suggested the following format earlier :
>>>>
>>>> /group0/0=t;1=t
>>>> /group1/0=t;1=t
>>>> /group2/0=_;1=t
>>>> /group3/0=rw;1=_
>>>
>>> After some inquiries within Google, it sounds like nobody has invested
>>> much into the current mbm_assign_control format yet, so it would be
>>> best to drop it and distribute the configuration around the filesystem
>>> hierarchy[1], which should allow us to produce something more flexible
>>> and cleaner to implement.
>>>
>>> Roughly what I had in mind:
>>>
>>> Use mkdir in a info/<resource>_MON subdirectory to create free-form
>>> names for the assignable configurations rather than being restricted
>>> to single letters.  In the resulting directory, populate a file where
>>> we can specify the set of events the config should represent. I think
>>> we should use symbolic names for the events rather than raw BMEC field
>>> values. Moving forward we could come up with portable names for common
>>> events and only support the BMEC names on AMD machines for users who
>>> want specific events and don't care about portability.
>>
>>
>> I’m still processing this. Let me start with some initial questions.
>>
>> So, we are creating event configurations here, which seems reasonable.
>>
>> Yes, we should use portable names and are not limited to BMEC names.
>>
>> How many configurations should we allow? Do we know?
> 
> Do we need an upper limit?

I think so. This needs to be maintained in some data structure. We can
start with 2 default configurations for now.

> 
>>
>>>
>>> Next, put assignment-control file nodes in per-domain directories
>>> (i.e., mon_data/mon_L3_00/assign_{exclusive,shared}). Writing a
>>> counter-configuration name into the file would then allocate a counter
>>> in the domain, apply the named configuration, and monitor the parent
>>> group-directory. We can also put a group/resource-scoped assign_* file
>>> higher in the hierarchy to make it easier for users who want to
>>> configure all domains the same for a group.
>>
>> What is the difference between shared and exclusive?
> 
> Shared assignment[1] means that non-exclusively-assigned counters in
> each domain will be scheduled round-robin to the groups requesting
> shared access to a counter. In my tests, I assigned the counters long
> enough to produce a single 1-second MB/s sample for the per-domain
> aggregation files[2].
> 
> These do not need to be implemented immediately, but knowing that they
> work addresses the overhead and scalability concerns of reassigning
> counters and reading their values.

Ok. Lets focus on exclusive assignments for now.

> 
>>
>> Having three files—assign_shared, assign_exclusive, and unassign—for each
>> domain seems excessive. In a system with 32 groups and 12 domains, this
>> results in 32 × 12 × 3 files, which is quite large.
>>
>> There should be a more efficient way to handle this.
>>
>> Initially, we started with a group-level file for this interface, but it
>> was rejected due to the high number of sysfs calls, making it inefficient.
> 
> I had rejected it due to the high-frequency of access of a large
> number of files, which has since been addressed by shared assignment
> (or automatic reassignment) and aggregated mbps files.

I think we should address this as well. Creating three extra files for
each group isn’t ideal when there are more efficient alternatives.

> 
>>
>> Additionally, how can we list all assignments with a single sysfs call?
>>
>> That was another problem we need to address.
> 
> This is not a requirement I was aware of. If the user forgot where
> they assigned counters (or forgot to disable auto-assignment), they
> can read multiple sysfs nodes to remind themselves.

I suggest, we should provide users with an option to list the assignments
of all groups in a single command. As the number of groups increases, it
becomes cumbersome to query each group individually.

To achieve this, we can reuse our existing mbm_assign_control interface
for this purpose. More details on this below.

>>
>>
>>>
>>> The configuration names listed in assign_* would result in files of
>>> the same name in the appropriate mon_data domain directories from
>>> which the count values can be read.
>>>
>>>  # mkdir info/L3_MON/counter_configs/mbm_local_bytes
>>>  # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>  # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>  # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>  # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>> LclFill
>>> LclNTWr
>>> LclSlowFill
>>
>> I feel we can just have the configs. event_filter file is not required.
> 
> That's right, I forgot that we can implement kernfs_ops::open(). I was
> only looking at struct kernfs_syscall_ops
> 
>>
>> #cat info/L3_MON/counter_configs/mbm_local_bytes
>> LclFill <-rename these to generic names.
>> LclNTWr
>> LclSlowFill
>>
> 
> I think portable and non-portable event names should both be available
> as options. There are simple bandwidth measurement mechanisms that
> will be applied in general, but when they turn up an issue, it can
> often lead to a more focused investigation, requiring more precise
> events.

I aggree. We should provide both portable and non-portable event names.

Here is my draft proposal based on the discussion so far and reusing some
of the current interface. Idea here is to start with basic assigment
feature with options to enhance it in the future. Feel free to
comment/suggest.

1. Event configurations will be in
   /sys/fs/resctrl/info/L3_MON/counter_configs/.

   There will be two pre-defined configurations by default.

   #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
   LclFill, LclNTWr,LclSlowFill,VictimBM,RmtSlowFill,LclSlowFill,RmtFill

   #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
   LclFill, LclNTWr, LclSlowFill

2. Users will have options to update these configurations.

   #echo "LclFill, LclNTWr, RmtFill" >
      /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes

   # #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
   LclFill, LclNTWr, RmtFill

3. The default configurations will be used when user mounts the resctrl.

   mount  -t resctrl resctrl /sys/fs/resctrl/
   mkdir /sys/fs/resctrl/test/

4. The resctrl group/domains can be in one of these assingnment states.
   e: Exclusive
   s: Shared
   u: Unassigned

   Exclusive mode is supported now. Shared mode will be supported in the
future.

5. We can use the current /sys/fs/resctrl/info/L3_MON/mbm_assign_control
to list the assignment state of all the groups.

   Format:
   "<CTRL_MON group>/<MON group>/<confguration>:<domain_id>=<assign state>"

  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
   test//mbm_total_bytes:0=e;1=e
   test//mbm_local_bytes:0=e;1=e
   //mbm_total_bytes:0=e;1=e
   //mbm_local_bytes:0=e;1=e

6. Users can modify the assignment state by writing to mbm_assign_control.

   Format:
   “<CTRL_MON group>/<MON group>/<configuration>:<domain_id>=<assign state>”

   #echo "test//mbm_local_bytes:0=e;1=e" >
/sys/fs/resctrl/info/L3_MON/mbm_assign_control

   #echo "test//mbm_local_bytes:0=u;1=u" >
/sys/fs/resctrl/info/L3_MON/mbm_assign_control

   # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
   test//mbm_total_bytes:0=u;1=u
   test//mbm_local_bytes:0=u;1=u
   //mbm_total_bytes:0=e;1=e
   //mbm_local_bytes:0=e;1=e

   The corresponding events will be read in

   /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
   /sys/fs/resctrl/mon_data/mon_L3_01/mbm_total_bytes
   /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
   /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
   /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
   /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
   /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
   /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes

7. In the first stage, only two configurations(mbm_total_bytes and
mbm_local_bytes) will be supported.

8. In the future, there will be options to create multiple configurations
and corresponding directory will be created in
/sysf/fs/resctrl/test/mon_data/mon_L3_00/<configation name>.

-- 
Thanks
Babu Moger
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 11 months, 1 week ago
Hi All,

On 3/5/2025 1:34 PM, Moger, Babu wrote:
> Hi Peter,
> 
> On 3/5/25 04:40, Peter Newman wrote:
>> Hi Babu,
>>
>> On Tue, Mar 4, 2025 at 10:49 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>
>>> Hi Peter,
>>>
>>> On 3/4/25 10:44, Peter Newman wrote:
>>>> On Mon, Mar 3, 2025 at 8:16 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>
>>>>> Hi Peter/Reinette,
>>>>>
>>>>> On 2/26/25 07:27, Peter Newman wrote:
>>>>>> Hi Babu,
>>>>>>
>>>>>> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>
>>>>>>> Hi Peter,
>>>>>>>
>>>>>>> On 2/25/25 11:11, Peter Newman wrote:
>>>>>>>> Hi Reinette,
>>>>>>>>
>>>>>>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi Peter,
>>>>>>>>>
>>>>>>>>> On 2/21/25 5:12 AM, Peter Newman wrote:
>>>>>>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>>>>>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>>>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>>>>>>>>>>> <value>
>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>>>>>>>>>>> is low enough to be of concern.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>>>>>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>>>>>>>>>>> for.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>>>>>>>>>>> customers.
>>>>>>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>>>>>>>>>>> event-set for applying to a single counter rather than as individual
>>>>>>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>>>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>>>>>>>>>>> event names.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you for clarifying.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In the letters as events model, choosing the events assigned to a
>>>>>>>>>>>>>> group wouldn't be enough information, since we would want to control
>>>>>>>>>>>>>> which events should share a counter and which should be counted by
>>>>>>>>>>>>>> separate counters. I think the amount of information that would need
>>>>>>>>>>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>>>>>>>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>>>>>>>>>>> writes in ABMC would look like...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>>>>>>>>>>> example and above the counter configuration appears to be global. You do mention
>>>>>>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>>>>>>>>>>> configuration is a requirement?
>>>>>>>>>>>>
>>>>>>>>>>>> If it's global and we want a particular group to be watched by more
>>>>>>>>>>>> counters, I wouldn't want this to result in allocating more counters
>>>>>>>>>>>> for that group in all domains, or allocating counters in domains where
>>>>>>>>>>>> they're not needed. I want to encourage my users to avoid allocating
>>>>>>>>>>>> monitoring resources in domains where a job is not allowed to run so
>>>>>>>>>>>> there's less pressure on the counters.
>>>>>>>>>>>>
>>>>>>>>>>>> In Dave's proposal it looks like global configuration means
>>>>>>>>>>>> globally-defined "named counter configurations", which works because
>>>>>>>>>>>> it's really per-domain assignment of the configurations to however
>>>>>>>>>>>> many counters the group needs in each domain.
>>>>>>>>>>>
>>>>>>>>>>> I think I am becoming lost. Would a global configuration not break your
>>>>>>>>>>> view of "event-set applied to a single counter"? If a counter is configured
>>>>>>>>>>> globally then it would not make it possible to support the full configurability
>>>>>>>>>>> of the hardware.
>>>>>>>>>>> Before I add more confusion, let me try with an example that builds on your
>>>>>>>>>>> earlier example copied below:
>>>>>>>>>>>
>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>> Since the above states "per domain" I rewrite the example to highlight that as
>>>>>>>>>>> I understand it:
>>>>>>>>>>>
>>>>>>>>>>> group 0:
>>>>>>>>>>>   domain 0:
>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>   domain 1:
>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>> group 1:
>>>>>>>>>>>   domain 0:
>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>   domain 1:
>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>
>>>>>>>>>>> You mention that you do not want counters to be allocated in domains that they
>>>>>>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>>>>>>>>>>> in domain 1, resulting in:
>>>>>>>>>>>
>>>>>>>>>>> group 0:
>>>>>>>>>>>   domain 0:
>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>> group 1:
>>>>>>>>>>>   domain 0:
>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>   domain 1:
>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>
>>>>>>>>>>> With counter 0 and counter 1 available in domain 1, these counters could
>>>>>>>>>>> theoretically be configured to give group 1 more data in domain 1:
>>>>>>>>>>>
>>>>>>>>>>> group 0:
>>>>>>>>>>>   domain 0:
>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>> group 1:
>>>>>>>>>>>   domain 0:
>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>   domain 1:
>>>>>>>>>>>    counter 0: LclFill,RmtFill
>>>>>>>>>>>    counter 1: LclNTWr,RmtNTWr
>>>>>>>>>>>    counter 2: LclSlowFill,RmtSlowFill
>>>>>>>>>>>    counter 3: VictimBW
>>>>>>>>>>>
>>>>>>>>>>> The counters are shown with different per-domain configurations that seems to
>>>>>>>>>>> match with earlier goals of (a) choose events counted by each counter and
>>>>>>>>>>> (b) do not allocate counters in domains where they are not needed. As I
>>>>>>>>>>> understand the above does contradict global counter configuration though.
>>>>>>>>>>> Or do you mean that only the *name* of the counter is global and then
>>>>>>>>>>> that it is reconfigured as part of every assignment?
>>>>>>>>>>
>>>>>>>>>> Yes, I meant only the *name* is global. I assume based on a particular
>>>>>>>>>> system configuration, the user will settle on a handful of useful
>>>>>>>>>> groupings to count.
>>>>>>>>>>
>>>>>>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
>>>>>>>>>>
>>>>>>>>>>   # define global configurations (in ABMC terms), not necessarily in this
>>>>>>>>>>   # syntax and probably not in the mbm_assign_control file.
>>>>>>>>>>
>>>>>>>>>>   r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>   w=VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>
>>>>>>>>>>   # legacy "total" configuration, effectively r+w
>>>>>>>>>>   t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>
>>>>>>>>>>   /group0/0=t;1=t
>>>>>>>>>>   /group1/0=t;1=t
>>>>>>>>>>   /group2/0=_;1=t
>>>>>>>>>>   /group3/0=rw;1=_
>>>>>>>>>>
>>>>>>>>>> - group2 is restricted to domain 0
>>>>>>>>>> - group3 is restricted to domain 1
>>>>>>>>>> - the rest are unrestricted
>>>>>>>>>> - In group3, we decided we need to separate read and write traffic
>>>>>>>>>>
>>>>>>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I see. Thank you for the example.
>>>>>>>>>
>>>>>>>>> resctrl supports per-domain configurations with the following possible when
>>>>>>>>> using mbm_total_bytes_config and mbm_local_bytes_config:
>>>>>>>>>
>>>>>>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>
>>>>>>>>>     /group0/0=t;1=t
>>>>>>>>>     /group1/0=t;1=t
>>>>>>>>>
>>>>>>>>> Even though the flags are identical in all domains, the assigned counters will
>>>>>>>>> be configured differently in each domain.
>>>>>>>>>
>>>>>>>>> With this supported by hardware and currently also supported by resctrl it seems
>>>>>>>>> reasonable to carry this forward to what will be supported next.
>>>>>>>>
>>>>>>>> The hardware supports both a per-domain mode, where all groups in a
>>>>>>>> domain use the same configurations and are limited to two events per
>>>>>>>> group and a per-group mode where every group can be configured and
>>>>>>>> assigned freely. This series is using the legacy counter access mode
>>>>>>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
>>>>>>>> in the domain can be read. If we chose to read the assigned counter
>>>>>>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
>>>>>>>> rather than asking the hardware to find the counter by RMID, we would
>>>>>>>> not be limited to 2 counters per group/domain and the hardware would
>>>>>>>> have the same flexibility as on MPAM.
>>>>>>>
>>>>>>> In extended mode, the contents of a specific counter can be read by
>>>>>>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
>>>>>>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
>>>>>>> QM_CTR will then return the contents of the specified counter.
>>>>>>>
>>>>>>> It is documented below.
>>>>>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
>>>>>>>   Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
>>>>>>>
>>>>>>> We previously discussed this with you (off the public list) and I
>>>>>>> initially proposed the extended assignment mode.
>>>>>>>
>>>>>>> Yes, the extended mode allows greater flexibility by enabling multiple
>>>>>>> counters to be assigned to the same group, rather than being limited to
>>>>>>> just two.
>>>>>>>
>>>>>>> However, the challenge is that we currently lack the necessary interfaces
>>>>>>> to configure multiple events per group. Without these interfaces, the
>>>>>>> extended mode is not practical at this time.
>>>>>>>
>>>>>>> Therefore, we ultimately agreed to use the legacy mode, as it does not
>>>>>>> require modifications to the existing interface, allowing us to continue
>>>>>>> using it as is.
>>>>>>>
>>>>>>>>
>>>>>>>> (I might have said something confusing in my last messages because I
>>>>>>>> had forgotten that I switched to the extended assignment mode when
>>>>>>>> prototyping with soft-ABMC and MPAM.)
>>>>>>>>
>>>>>>>> Forcing all groups on a domain to share the same 2 counter
>>>>>>>> configurations would not be acceptable for us, as the example I gave
>>>>>>>> earlier is one I've already been asked about.
>>>>>>>
>>>>>>> I don’t see this as a blocker. It should be considered an extension to the
>>>>>>> current ABMC series. We can easily build on top of this series once we
>>>>>>> finalize how to configure the multiple event interface for each group.
>>>>>>
>>>>>> I don't think it is, either. Only being able to use ABMC to assign
>>>>>> counters is fine for our use as an incremental step. My longer-term
>>>>>> concern is the domain-scoped mbm_total_bytes_config and
>>>>>> mbm_local_bytes_config files, but they were introduced with BMEC, so
>>>>>> there's already an expectation that the files are present when BMEC is
>>>>>> supported.
>>>>>>
>>>>>> On ABMC hardware that also supports BMEC, I'm concerned about enabling
>>>>>> ABMC when only the BMEC-style event configuration interface exists.
>>>>>> The scope of my issue is just whether enabling "full" ABMC support
>>>>>> will require an additional opt-in, since that could remove the BMEC
>>>>>> interface. If it does, it's something we can live with.
>>>>>
>>>>> As you know, this series is currently blocked without further feedback.
>>>>>
>>>>> I’d like to begin reworking these patches to incorporate Peter’s feedback.
>>>>> Any input or suggestions would be appreciated.
>>>>>
>>>>> Here’s what we’ve learned so far:
>>>>>
>>>>> 1. Assignments should be independent of BMEC.
>>>>> 2. We should be able to specify multiple event types to a counter (e.g.,
>>>>> read, write, victimBM, etc.). This is also called shared counter
>>>>> 3. There should be an option to assign events per domain.
>>>>> 4. Currently, only two counters can be assigned per group, but the design
>>>>> should allow flexibility to assign more in the future as the interface
>>>>> evolves.
>>>>> 5. Utilize the extended RMID read mode.
>>>>>
>>>>>
>>>>> Here is my proposal using Peter's earlier example:
>>>>>
>>>>> # define event configurations
>>>>>
>>>>> ========================================================
>>>>> Bits    Mnemonics       Description
>>>>> ====   ========================================================
>>>>> 6       VictimBW        Dirty Victims from all types of memory
>>>>> 5       RmtSlowFill     Reads to slow memory in the non-local NUMA domain
>>>>> 4       LclSlowFill     Reads to slow memory in the local NUMA domain
>>>>> 3       RmtNTWr         Non-temporal writes to non-local NUMA domain
>>>>> 2       LclNTWr         Non-temporal writes to local NUMA domain
>>>>> 1       mtFill          Reads to memory in the non-local NUMA domain
>>>>> 0       LclFill         Reads to memory in the local NUMA domain
>>>>> ====    ========================================================
>>>>>
>>>>> #Define flags based on combination of above event types.
>>>>>
>>>>> t = LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>> l = LclFill, LclNTWr, LclSlowFill
>>>>> r = LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>> w = VictimBW,LclNTWr,RmtNTWr
>>>>> v = VictimBW
>>>>>
>>>>> Peter suggested the following format earlier :
>>>>>
>>>>> /group0/0=t;1=t
>>>>> /group1/0=t;1=t
>>>>> /group2/0=_;1=t
>>>>> /group3/0=rw;1=_
>>>>
>>>> After some inquiries within Google, it sounds like nobody has invested
>>>> much into the current mbm_assign_control format yet, so it would be
>>>> best to drop it and distribute the configuration around the filesystem
>>>> hierarchy[1], which should allow us to produce something more flexible
>>>> and cleaner to implement.
>>>>
>>>> Roughly what I had in mind:
>>>>
>>>> Use mkdir in a info/<resource>_MON subdirectory to create free-form
>>>> names for the assignable configurations rather than being restricted
>>>> to single letters.  In the resulting directory, populate a file where
>>>> we can specify the set of events the config should represent. I think
>>>> we should use symbolic names for the events rather than raw BMEC field
>>>> values. Moving forward we could come up with portable names for common
>>>> events and only support the BMEC names on AMD machines for users who
>>>> want specific events and don't care about portability.
>>>
>>>
>>> I’m still processing this. Let me start with some initial questions.
>>>
>>> So, we are creating event configurations here, which seems reasonable.
>>>
>>> Yes, we should use portable names and are not limited to BMEC names.
>>>
>>> How many configurations should we allow? Do we know?
>>
>> Do we need an upper limit?
> 
> I think so. This needs to be maintained in some data structure. We can
> start with 2 default configurations for now.
> 
>>
>>>
>>>>
>>>> Next, put assignment-control file nodes in per-domain directories
>>>> (i.e., mon_data/mon_L3_00/assign_{exclusive,shared}). Writing a
>>>> counter-configuration name into the file would then allocate a counter
>>>> in the domain, apply the named configuration, and monitor the parent
>>>> group-directory. We can also put a group/resource-scoped assign_* file
>>>> higher in the hierarchy to make it easier for users who want to
>>>> configure all domains the same for a group.
>>>
>>> What is the difference between shared and exclusive?
>>
>> Shared assignment[1] means that non-exclusively-assigned counters in
>> each domain will be scheduled round-robin to the groups requesting
>> shared access to a counter. In my tests, I assigned the counters long
>> enough to produce a single 1-second MB/s sample for the per-domain
>> aggregation files[2].
>>
>> These do not need to be implemented immediately, but knowing that they
>> work addresses the overhead and scalability concerns of reassigning
>> counters and reading their values.
> 
> Ok. Lets focus on exclusive assignments for now.
> 
>>
>>>
>>> Having three files—assign_shared, assign_exclusive, and unassign—for each
>>> domain seems excessive. In a system with 32 groups and 12 domains, this
>>> results in 32 × 12 × 3 files, which is quite large.
>>>
>>> There should be a more efficient way to handle this.
>>>
>>> Initially, we started with a group-level file for this interface, but it
>>> was rejected due to the high number of sysfs calls, making it inefficient.
>>
>> I had rejected it due to the high-frequency of access of a large
>> number of files, which has since been addressed by shared assignment
>> (or automatic reassignment) and aggregated mbps files.
> 
> I think we should address this as well. Creating three extra files for
> each group isn’t ideal when there are more efficient alternatives.
> 
>>
>>>
>>> Additionally, how can we list all assignments with a single sysfs call?
>>>
>>> That was another problem we need to address.
>>
>> This is not a requirement I was aware of. If the user forgot where
>> they assigned counters (or forgot to disable auto-assignment), they
>> can read multiple sysfs nodes to remind themselves.
> 
> I suggest, we should provide users with an option to list the assignments
> of all groups in a single command. As the number of groups increases, it
> becomes cumbersome to query each group individually.
> 
> To achieve this, we can reuse our existing mbm_assign_control interface
> for this purpose. More details on this below.
> 
>>>
>>>
>>>>
>>>> The configuration names listed in assign_* would result in files of
>>>> the same name in the appropriate mon_data domain directories from
>>>> which the count values can be read.
>>>>
>>>>   # mkdir info/L3_MON/counter_configs/mbm_local_bytes
>>>>   # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>   # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>   # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>   # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>> LclFill
>>>> LclNTWr
>>>> LclSlowFill
>>>
>>> I feel we can just have the configs. event_filter file is not required.
>>
>> That's right, I forgot that we can implement kernfs_ops::open(). I was
>> only looking at struct kernfs_syscall_ops
>>
>>>
>>> #cat info/L3_MON/counter_configs/mbm_local_bytes
>>> LclFill <-rename these to generic names.
>>> LclNTWr
>>> LclSlowFill
>>>
>>
>> I think portable and non-portable event names should both be available
>> as options. There are simple bandwidth measurement mechanisms that
>> will be applied in general, but when they turn up an issue, it can
>> often lead to a more focused investigation, requiring more precise
>> events.
> 
> I aggree. We should provide both portable and non-portable event names.
> 
> Here is my draft proposal based on the discussion so far and reusing some
> of the current interface. Idea here is to start with basic assigment
> feature with options to enhance it in the future. Feel free to
> comment/suggest.
> 
> 1. Event configurations will be in
>     /sys/fs/resctrl/info/L3_MON/counter_configs/.
> 
>     There will be two pre-defined configurations by default.
> 
>     #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
>     LclFill, LclNTWr,LclSlowFill,VictimBM,RmtSlowFill,LclSlowFill,RmtFill
> 
>     #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>     LclFill, LclNTWr, LclSlowFill
> 
> 2. Users will have options to update these configurations.
> 
>     #echo "LclFill, LclNTWr, RmtFill" >
>        /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
> 
>     # #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>     LclFill, LclNTWr, RmtFill
> 
> 3. The default configurations will be used when user mounts the resctrl.
> 
>     mount  -t resctrl resctrl /sys/fs/resctrl/
>     mkdir /sys/fs/resctrl/test/
> 
> 4. The resctrl group/domains can be in one of these assingnment states.
>     e: Exclusive
>     s: Shared
>     u: Unassigned
> 
>     Exclusive mode is supported now. Shared mode will be supported in the
> future.
> 
> 5. We can use the current /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> to list the assignment state of all the groups.
> 
>     Format:
>     "<CTRL_MON group>/<MON group>/<confguration>:<domain_id>=<assign state>"
> 
>    # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>     test//mbm_total_bytes:0=e;1=e
>     test//mbm_local_bytes:0=e;1=e
>     //mbm_total_bytes:0=e;1=e
>     //mbm_local_bytes:0=e;1=e
> 
> 6. Users can modify the assignment state by writing to mbm_assign_control.
> 
>     Format:
>     “<CTRL_MON group>/<MON group>/<configuration>:<domain_id>=<assign state>”
> 
>     #echo "test//mbm_local_bytes:0=e;1=e" >
> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> 
>     #echo "test//mbm_local_bytes:0=u;1=u" >
> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> 
>     # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>     test//mbm_total_bytes:0=u;1=u
>     test//mbm_local_bytes:0=u;1=u
>     //mbm_total_bytes:0=e;1=e
>     //mbm_local_bytes:0=e;1=e
> 
>     The corresponding events will be read in
> 
>     /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>     /sys/fs/resctrl/mon_data/mon_L3_01/mbm_total_bytes
>     /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>     /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
>     /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
>     /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
>     /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
>     /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
> 
> 7. In the first stage, only two configurations(mbm_total_bytes and
> mbm_local_bytes) will be supported.
> 
> 8. In the future, there will be options to create multiple configurations
> and corresponding directory will be created in
> /sysf/fs/resctrl/test/mon_data/mon_L3_00/<configation name>.
> 

I know you are all busy with multiple series going on parallel. I am 
still waiting for the inputs on this. It will be great if you can spend 
some time on this to see if we can find common ground on the interface.

Thanks
Babu
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Luck, Tony 11 months, 1 week ago
On Mon, Mar 10, 2025 at 05:48:44PM -0500, Moger, Babu wrote:
> Hi All,
> 
> On 3/5/2025 1:34 PM, Moger, Babu wrote:
> > Hi Peter,
> > 
> > On 3/5/25 04:40, Peter Newman wrote:
> > > Hi Babu,
> > > 
> > > On Tue, Mar 4, 2025 at 10:49 PM Moger, Babu <babu.moger@amd.com> wrote:
> > > > 
> > > > Hi Peter,
> > > > 
> > > > On 3/4/25 10:44, Peter Newman wrote:
> > > > > On Mon, Mar 3, 2025 at 8:16 PM Moger, Babu <babu.moger@amd.com> wrote:
> > > > > > 
> > > > > > Hi Peter/Reinette,
> > > > > > 
> > > > > > On 2/26/25 07:27, Peter Newman wrote:
> > > > > > > Hi Babu,
> > > > > > > 
> > > > > > > On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@amd.com> wrote:
> > > > > > > > 
> > > > > > > > Hi Peter,
> > > > > > > > 
> > > > > > > > On 2/25/25 11:11, Peter Newman wrote:
> > > > > > > > > Hi Reinette,
> > > > > > > > > 
> > > > > > > > > On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
> > > > > > > > > <reinette.chatre@intel.com> wrote:
> > > > > > > > > > 
> > > > > > > > > > Hi Peter,
> > > > > > > > > > 
> > > > > > > > > > On 2/21/25 5:12 AM, Peter Newman wrote:
> > > > > > > > > > > On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
> > > > > > > > > > > <reinette.chatre@intel.com> wrote:
> > > > > > > > > > > > On 2/20/25 6:53 AM, Peter Newman wrote:
> > > > > > > > > > > > > On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
> > > > > > > > > > > > > <reinette.chatre@intel.com> wrote:
> > > > > > > > > > > > > > On 2/19/25 3:28 AM, Peter Newman wrote:
> > > > > > > > > > > > > > > On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
> > > > > > > > > > > > > > > <reinette.chatre@intel.com> wrote:
> > > > > > > > > > > > > > > > On 2/17/25 2:26 AM, Peter Newman wrote:
> > > > > > > > > > > > > > > > > On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
> > > > > > > > > > > > > > > > > <reinette.chatre@intel.com> wrote:
> > > > > > > > > > > > > > > > > > On 2/14/25 10:31 AM, Moger, Babu wrote:
> > > > > > > > > > > > > > > > > > > On 2/14/2025 12:26 AM, Reinette Chatre wrote:
> > > > > > > > > > > > > > > > > > > > On 2/13/25 9:37 AM, Dave Martin wrote:
> > > > > > > > > > > > > > > > > > > > > On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
> > > > > > > > > > > > > > > > > > > > > > On 2/12/25 9:46 AM, Dave Martin wrote:
> > > > > > > > > > > > > > > > > > > > > > > On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > (quoting relevant parts with goal to focus discussion on new possible syntax)
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > I see the support for MPAM events distinct from the support of assignable counters.
> > > > > > > > > > > > > > > > > > > > > > Once the MPAM events are sorted, I think that they can be assigned with existing interface.
> > > > > > > > > > > > > > > > > > > > > > Please help me understand if you see it differently.
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > Doing so would need to come up with alphabetical letters for these events,
> > > > > > > > > > > > > > > > > > > > > > which seems to be needed for your proposal also? If we use possible flags of:
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > mbm_local_read_bytes a
> > > > > > > > > > > > > > > > > > > > > > mbm_local_write_bytes b
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > Then mbm_assign_control can be used as:
> > > > > > > > > > > > > > > > > > > > > > # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> > > > > > > > > > > > > > > > > > > > > > # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
> > > > > > > > > > > > > > > > > > > > > > <value>
> > > > > > > > > > > > > > > > > > > > > > # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> > > > > > > > > > > > > > > > > > > > > > <sum of mbm_local_read_bytes and mbm_local_write_bytes>
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > One issue would be when resctrl needs to support more than 26 events (no more flags available),
> > > > > > > > > > > > > > > > > > > > > > assuming that upper case would be used for "shared" counters (unless this interface is defined
> > > > > > > > > > > > > > > > > > > > > > differently and only few uppercase letters used for it). Would this be too low of a limit?
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > As mentioned above, one possible issue with existing interface is that
> > > > > > > > > > > > > > > > > > it is limited to 26 events (assuming only lower case letters are used). The limit
> > > > > > > > > > > > > > > > > > is low enough to be of concern.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > The events which can be monitored by a single counter on ABMC and MPAM
> > > > > > > > > > > > > > > > > so far are combinable, so 26 counters per group today means it limits
> > > > > > > > > > > > > > > > > breaking down MBM traffic for each group 26 ways. If a user complained
> > > > > > > > > > > > > > > > > that a 26-way breakdown of a group's MBM traffic was limiting their
> > > > > > > > > > > > > > > > > investigation, I would question whether they know what they're looking
> > > > > > > > > > > > > > > > > for.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > The key here is "so far" as well as the focus on MBM only.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > It is impossible for me to predict what we will see in a couple of years
> > > > > > > > > > > > > > > > from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
> > > > > > > > > > > > > > > > to support their users. Just looking at the Intel RDT spec the event register
> > > > > > > > > > > > > > > > has space for 32 events for each "CPU agent" resource. That does not take into
> > > > > > > > > > > > > > > > account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
> > > > > > > > > > > > > > > > that he is working on patches [1] that will add new events and shared the idea
> > > > > > > > > > > > > > > > that we may be trending to support "perf" like events associated with RMID. I
> > > > > > > > > > > > > > > > expect AMD PQoS and Arm MPAM to provide related enhancements to support their
> > > > > > > > > > > > > > > > customers.
> > > > > > > > > > > > > > > > This all makes me think that resctrl should be ready to support more events than 26.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > I was thinking of the letters as representing a reusable, user-defined
> > > > > > > > > > > > > > > event-set for applying to a single counter rather than as individual
> > > > > > > > > > > > > > > events, since MPAM and ABMC allow us to choose the set of events each
> > > > > > > > > > > > > > > one counts. Wherever we define the letters, we could use more symbolic
> > > > > > > > > > > > > > > event names.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Thank you for clarifying.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > In the letters as events model, choosing the events assigned to a
> > > > > > > > > > > > > > > group wouldn't be enough information, since we would want to control
> > > > > > > > > > > > > > > which events should share a counter and which should be counted by
> > > > > > > > > > > > > > > separate counters. I think the amount of information that would need
> > > > > > > > > > > > > > > to be encoded into mbm_assign_control to represent the level of
> > > > > > > > > > > > > > > configurability supported by hardware would quickly get out of hand.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Maybe as an example, one counter for all reads, one counter for all
> > > > > > > > > > > > > > > writes in ABMC would look like...
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > (L3_QOS_ABMC_CFG.BwType field names below)
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > (per domain)
> > > > > > > > > > > > > > > group 0:
> > > > > > > > > > > > > > >   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> > > > > > > > > > > > > > >   counter 1: VictimBW,LclNTWr,RmtNTWr
> > > > > > > > > > > > > > > group 1:
> > > > > > > > > > > > > > >   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> > > > > > > > > > > > > > >   counter 3: VictimBW,LclNTWr,RmtNTWr
> > > > > > > > > > > > > > > ...
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I think this may also be what Dave was heading towards in [2] but in that
> > > > > > > > > > > > > > example and above the counter configuration appears to be global. You do mention
> > > > > > > > > > > > > > "configurability supported by hardware" so I wonder if per-domain counter
> > > > > > > > > > > > > > configuration is a requirement?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > If it's global and we want a particular group to be watched by more
> > > > > > > > > > > > > counters, I wouldn't want this to result in allocating more counters
> > > > > > > > > > > > > for that group in all domains, or allocating counters in domains where
> > > > > > > > > > > > > they're not needed. I want to encourage my users to avoid allocating
> > > > > > > > > > > > > monitoring resources in domains where a job is not allowed to run so
> > > > > > > > > > > > > there's less pressure on the counters.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > In Dave's proposal it looks like global configuration means
> > > > > > > > > > > > > globally-defined "named counter configurations", which works because
> > > > > > > > > > > > > it's really per-domain assignment of the configurations to however
> > > > > > > > > > > > > many counters the group needs in each domain.
> > > > > > > > > > > > 
> > > > > > > > > > > > I think I am becoming lost. Would a global configuration not break your
> > > > > > > > > > > > view of "event-set applied to a single counter"? If a counter is configured
> > > > > > > > > > > > globally then it would not make it possible to support the full configurability
> > > > > > > > > > > > of the hardware.
> > > > > > > > > > > > Before I add more confusion, let me try with an example that builds on your
> > > > > > > > > > > > earlier example copied below:
> > > > > > > > > > > > 
> > > > > > > > > > > > > > > (per domain)
> > > > > > > > > > > > > > > group 0:
> > > > > > > > > > > > > > >   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> > > > > > > > > > > > > > >   counter 1: VictimBW,LclNTWr,RmtNTWr
> > > > > > > > > > > > > > > group 1:
> > > > > > > > > > > > > > >   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> > > > > > > > > > > > > > >   counter 3: VictimBW,LclNTWr,RmtNTWr
> > > > > > > > > > > > > > > ...
> > > > > > > > > > > > 
> > > > > > > > > > > > Since the above states "per domain" I rewrite the example to highlight that as
> > > > > > > > > > > > I understand it:
> > > > > > > > > > > > 
> > > > > > > > > > > > group 0:
> > > > > > > > > > > >   domain 0:
> > > > > > > > > > > >    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> > > > > > > > > > > >    counter 1: VictimBW,LclNTWr,RmtNTWr
> > > > > > > > > > > >   domain 1:
> > > > > > > > > > > >    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> > > > > > > > > > > >    counter 1: VictimBW,LclNTWr,RmtNTWr
> > > > > > > > > > > > group 1:
> > > > > > > > > > > >   domain 0:
> > > > > > > > > > > >    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> > > > > > > > > > > >    counter 3: VictimBW,LclNTWr,RmtNTWr
> > > > > > > > > > > >   domain 1:
> > > > > > > > > > > >    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> > > > > > > > > > > >    counter 3: VictimBW,LclNTWr,RmtNTWr
> > > > > > > > > > > > 
> > > > > > > > > > > > You mention that you do not want counters to be allocated in domains that they
> > > > > > > > > > > > are not needed in. So, let's say group 0 does not need counter 0 and counter 1
> > > > > > > > > > > > in domain 1, resulting in:
> > > > > > > > > > > > 
> > > > > > > > > > > > group 0:
> > > > > > > > > > > >   domain 0:
> > > > > > > > > > > >    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> > > > > > > > > > > >    counter 1: VictimBW,LclNTWr,RmtNTWr
> > > > > > > > > > > > group 1:
> > > > > > > > > > > >   domain 0:
> > > > > > > > > > > >    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> > > > > > > > > > > >    counter 3: VictimBW,LclNTWr,RmtNTWr
> > > > > > > > > > > >   domain 1:
> > > > > > > > > > > >    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> > > > > > > > > > > >    counter 3: VictimBW,LclNTWr,RmtNTWr
> > > > > > > > > > > > 
> > > > > > > > > > > > With counter 0 and counter 1 available in domain 1, these counters could
> > > > > > > > > > > > theoretically be configured to give group 1 more data in domain 1:
> > > > > > > > > > > > 
> > > > > > > > > > > > group 0:
> > > > > > > > > > > >   domain 0:
> > > > > > > > > > > >    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> > > > > > > > > > > >    counter 1: VictimBW,LclNTWr,RmtNTWr
> > > > > > > > > > > > group 1:
> > > > > > > > > > > >   domain 0:
> > > > > > > > > > > >    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> > > > > > > > > > > >    counter 3: VictimBW,LclNTWr,RmtNTWr
> > > > > > > > > > > >   domain 1:
> > > > > > > > > > > >    counter 0: LclFill,RmtFill
> > > > > > > > > > > >    counter 1: LclNTWr,RmtNTWr
> > > > > > > > > > > >    counter 2: LclSlowFill,RmtSlowFill
> > > > > > > > > > > >    counter 3: VictimBW
> > > > > > > > > > > > 
> > > > > > > > > > > > The counters are shown with different per-domain configurations that seems to
> > > > > > > > > > > > match with earlier goals of (a) choose events counted by each counter and
> > > > > > > > > > > > (b) do not allocate counters in domains where they are not needed. As I
> > > > > > > > > > > > understand the above does contradict global counter configuration though.
> > > > > > > > > > > > Or do you mean that only the *name* of the counter is global and then
> > > > > > > > > > > > that it is reconfigured as part of every assignment?
> > > > > > > > > > > 
> > > > > > > > > > > Yes, I meant only the *name* is global. I assume based on a particular
> > > > > > > > > > > system configuration, the user will settle on a handful of useful
> > > > > > > > > > > groupings to count.
> > > > > > > > > > > 
> > > > > > > > > > > Perhaps mbm_assign_control syntax is the clearest way to express an example...
> > > > > > > > > > > 
> > > > > > > > > > >   # define global configurations (in ABMC terms), not necessarily in this
> > > > > > > > > > >   # syntax and probably not in the mbm_assign_control file.
> > > > > > > > > > > 
> > > > > > > > > > >   r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
> > > > > > > > > > >   w=VictimBW,LclNTWr,RmtNTWr
> > > > > > > > > > > 
> > > > > > > > > > >   # legacy "total" configuration, effectively r+w
> > > > > > > > > > >   t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
> > > > > > > > > > > 
> > > > > > > > > > >   /group0/0=t;1=t
> > > > > > > > > > >   /group1/0=t;1=t
> > > > > > > > > > >   /group2/0=_;1=t
> > > > > > > > > > >   /group3/0=rw;1=_
> > > > > > > > > > > 
> > > > > > > > > > > - group2 is restricted to domain 0
> > > > > > > > > > > - group3 is restricted to domain 1
> > > > > > > > > > > - the rest are unrestricted
> > > > > > > > > > > - In group3, we decided we need to separate read and write traffic
> > > > > > > > > > > 
> > > > > > > > > > > This consumes 4 counters in domain 0 and 3 counters in domain 1.
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > I see. Thank you for the example.
> > > > > > > > > > 
> > > > > > > > > > resctrl supports per-domain configurations with the following possible when
> > > > > > > > > > using mbm_total_bytes_config and mbm_local_bytes_config:
> > > > > > > > > > 
> > > > > > > > > > t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
> > > > > > > > > > t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
> > > > > > > > > > 
> > > > > > > > > >     /group0/0=t;1=t
> > > > > > > > > >     /group1/0=t;1=t
> > > > > > > > > > 
> > > > > > > > > > Even though the flags are identical in all domains, the assigned counters will
> > > > > > > > > > be configured differently in each domain.
> > > > > > > > > > 
> > > > > > > > > > With this supported by hardware and currently also supported by resctrl it seems
> > > > > > > > > > reasonable to carry this forward to what will be supported next.
> > > > > > > > > 
> > > > > > > > > The hardware supports both a per-domain mode, where all groups in a
> > > > > > > > > domain use the same configurations and are limited to two events per
> > > > > > > > > group and a per-group mode where every group can be configured and
> > > > > > > > > assigned freely. This series is using the legacy counter access mode
> > > > > > > > > where only counters whose BwType matches an instance of QOS_EVT_CFG_n
> > > > > > > > > in the domain can be read. If we chose to read the assigned counter
> > > > > > > > > directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
> > > > > > > > > rather than asking the hardware to find the counter by RMID, we would
> > > > > > > > > not be limited to 2 counters per group/domain and the hardware would
> > > > > > > > > have the same flexibility as on MPAM.
> > > > > > > > 
> > > > > > > > In extended mode, the contents of a specific counter can be read by
> > > > > > > > setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
> > > > > > > > [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
> > > > > > > > QM_CTR will then return the contents of the specified counter.
> > > > > > > > 
> > > > > > > > It is documented below.
> > > > > > > > https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
> > > > > > > >   Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
> > > > > > > > 
> > > > > > > > We previously discussed this with you (off the public list) and I
> > > > > > > > initially proposed the extended assignment mode.
> > > > > > > > 
> > > > > > > > Yes, the extended mode allows greater flexibility by enabling multiple
> > > > > > > > counters to be assigned to the same group, rather than being limited to
> > > > > > > > just two.
> > > > > > > > 
> > > > > > > > However, the challenge is that we currently lack the necessary interfaces
> > > > > > > > to configure multiple events per group. Without these interfaces, the
> > > > > > > > extended mode is not practical at this time.
> > > > > > > > 
> > > > > > > > Therefore, we ultimately agreed to use the legacy mode, as it does not
> > > > > > > > require modifications to the existing interface, allowing us to continue
> > > > > > > > using it as is.
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > (I might have said something confusing in my last messages because I
> > > > > > > > > had forgotten that I switched to the extended assignment mode when
> > > > > > > > > prototyping with soft-ABMC and MPAM.)
> > > > > > > > > 
> > > > > > > > > Forcing all groups on a domain to share the same 2 counter
> > > > > > > > > configurations would not be acceptable for us, as the example I gave
> > > > > > > > > earlier is one I've already been asked about.
> > > > > > > > 
> > > > > > > > I don’t see this as a blocker. It should be considered an extension to the
> > > > > > > > current ABMC series. We can easily build on top of this series once we
> > > > > > > > finalize how to configure the multiple event interface for each group.
> > > > > > > 
> > > > > > > I don't think it is, either. Only being able to use ABMC to assign
> > > > > > > counters is fine for our use as an incremental step. My longer-term
> > > > > > > concern is the domain-scoped mbm_total_bytes_config and
> > > > > > > mbm_local_bytes_config files, but they were introduced with BMEC, so
> > > > > > > there's already an expectation that the files are present when BMEC is
> > > > > > > supported.
> > > > > > > 
> > > > > > > On ABMC hardware that also supports BMEC, I'm concerned about enabling
> > > > > > > ABMC when only the BMEC-style event configuration interface exists.
> > > > > > > The scope of my issue is just whether enabling "full" ABMC support
> > > > > > > will require an additional opt-in, since that could remove the BMEC
> > > > > > > interface. If it does, it's something we can live with.
> > > > > > 
> > > > > > As you know, this series is currently blocked without further feedback.
> > > > > > 
> > > > > > I’d like to begin reworking these patches to incorporate Peter’s feedback.
> > > > > > Any input or suggestions would be appreciated.
> > > > > > 
> > > > > > Here’s what we’ve learned so far:
> > > > > > 
> > > > > > 1. Assignments should be independent of BMEC.
> > > > > > 2. We should be able to specify multiple event types to a counter (e.g.,
> > > > > > read, write, victimBM, etc.). This is also called shared counter
> > > > > > 3. There should be an option to assign events per domain.
> > > > > > 4. Currently, only two counters can be assigned per group, but the design
> > > > > > should allow flexibility to assign more in the future as the interface
> > > > > > evolves.
> > > > > > 5. Utilize the extended RMID read mode.
> > > > > > 
> > > > > > 
> > > > > > Here is my proposal using Peter's earlier example:
> > > > > > 
> > > > > > # define event configurations
> > > > > > 
> > > > > > ========================================================
> > > > > > Bits    Mnemonics       Description
> > > > > > ====   ========================================================
> > > > > > 6       VictimBW        Dirty Victims from all types of memory
> > > > > > 5       RmtSlowFill     Reads to slow memory in the non-local NUMA domain
> > > > > > 4       LclSlowFill     Reads to slow memory in the local NUMA domain
> > > > > > 3       RmtNTWr         Non-temporal writes to non-local NUMA domain
> > > > > > 2       LclNTWr         Non-temporal writes to local NUMA domain
> > > > > > 1       mtFill          Reads to memory in the non-local NUMA domain
> > > > > > 0       LclFill         Reads to memory in the local NUMA domain
> > > > > > ====    ========================================================
> > > > > > 
> > > > > > #Define flags based on combination of above event types.
> > > > > > 
> > > > > > t = LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
> > > > > > l = LclFill, LclNTWr, LclSlowFill
> > > > > > r = LclFill,RmtFill,LclSlowFill,RmtSlowFill
> > > > > > w = VictimBW,LclNTWr,RmtNTWr
> > > > > > v = VictimBW
> > > > > > 
> > > > > > Peter suggested the following format earlier :
> > > > > > 
> > > > > > /group0/0=t;1=t
> > > > > > /group1/0=t;1=t
> > > > > > /group2/0=_;1=t
> > > > > > /group3/0=rw;1=_
> > > > > 
> > > > > After some inquiries within Google, it sounds like nobody has invested
> > > > > much into the current mbm_assign_control format yet, so it would be
> > > > > best to drop it and distribute the configuration around the filesystem
> > > > > hierarchy[1], which should allow us to produce something more flexible
> > > > > and cleaner to implement.
> > > > > 
> > > > > Roughly what I had in mind:
> > > > > 
> > > > > Use mkdir in a info/<resource>_MON subdirectory to create free-form
> > > > > names for the assignable configurations rather than being restricted
> > > > > to single letters.  In the resulting directory, populate a file where
> > > > > we can specify the set of events the config should represent. I think
> > > > > we should use symbolic names for the events rather than raw BMEC field
> > > > > values. Moving forward we could come up with portable names for common
> > > > > events and only support the BMEC names on AMD machines for users who
> > > > > want specific events and don't care about portability.
> > > > 
> > > > 
> > > > I’m still processing this. Let me start with some initial questions.
> > > > 
> > > > So, we are creating event configurations here, which seems reasonable.
> > > > 
> > > > Yes, we should use portable names and are not limited to BMEC names.
> > > > 
> > > > How many configurations should we allow? Do we know?
> > > 
> > > Do we need an upper limit?
> > 
> > I think so. This needs to be maintained in some data structure. We can
> > start with 2 default configurations for now.
> > 
> > > 
> > > > 
> > > > > 
> > > > > Next, put assignment-control file nodes in per-domain directories
> > > > > (i.e., mon_data/mon_L3_00/assign_{exclusive,shared}). Writing a
> > > > > counter-configuration name into the file would then allocate a counter
> > > > > in the domain, apply the named configuration, and monitor the parent
> > > > > group-directory. We can also put a group/resource-scoped assign_* file
> > > > > higher in the hierarchy to make it easier for users who want to
> > > > > configure all domains the same for a group.
> > > > 
> > > > What is the difference between shared and exclusive?
> > > 
> > > Shared assignment[1] means that non-exclusively-assigned counters in
> > > each domain will be scheduled round-robin to the groups requesting
> > > shared access to a counter. In my tests, I assigned the counters long
> > > enough to produce a single 1-second MB/s sample for the per-domain
> > > aggregation files[2].
> > > 
> > > These do not need to be implemented immediately, but knowing that they
> > > work addresses the overhead and scalability concerns of reassigning
> > > counters and reading their values.
> > 
> > Ok. Lets focus on exclusive assignments for now.
> > 
> > > 
> > > > 
> > > > Having three files—assign_shared, assign_exclusive, and unassign—for each
> > > > domain seems excessive. In a system with 32 groups and 12 domains, this
> > > > results in 32 × 12 × 3 files, which is quite large.
> > > > 
> > > > There should be a more efficient way to handle this.
> > > > 
> > > > Initially, we started with a group-level file for this interface, but it
> > > > was rejected due to the high number of sysfs calls, making it inefficient.
> > > 
> > > I had rejected it due to the high-frequency of access of a large
> > > number of files, which has since been addressed by shared assignment
> > > (or automatic reassignment) and aggregated mbps files.
> > 
> > I think we should address this as well. Creating three extra files for
> > each group isn’t ideal when there are more efficient alternatives.
> > 
> > > 
> > > > 
> > > > Additionally, how can we list all assignments with a single sysfs call?
> > > > 
> > > > That was another problem we need to address.
> > > 
> > > This is not a requirement I was aware of. If the user forgot where
> > > they assigned counters (or forgot to disable auto-assignment), they
> > > can read multiple sysfs nodes to remind themselves.
> > 
> > I suggest, we should provide users with an option to list the assignments
> > of all groups in a single command. As the number of groups increases, it
> > becomes cumbersome to query each group individually.
> > 
> > To achieve this, we can reuse our existing mbm_assign_control interface
> > for this purpose. More details on this below.
> > 
> > > > 
> > > > 
> > > > > 
> > > > > The configuration names listed in assign_* would result in files of
> > > > > the same name in the appropriate mon_data domain directories from
> > > > > which the count values can be read.
> > > > > 
> > > > >   # mkdir info/L3_MON/counter_configs/mbm_local_bytes
> > > > >   # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> > > > >   # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> > > > >   # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> > > > >   # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> > > > > LclFill
> > > > > LclNTWr
> > > > > LclSlowFill
> > > > 
> > > > I feel we can just have the configs. event_filter file is not required.
> > > 
> > > That's right, I forgot that we can implement kernfs_ops::open(). I was
> > > only looking at struct kernfs_syscall_ops
> > > 
> > > > 
> > > > #cat info/L3_MON/counter_configs/mbm_local_bytes
> > > > LclFill <-rename these to generic names.
> > > > LclNTWr
> > > > LclSlowFill
> > > > 
> > > 
> > > I think portable and non-portable event names should both be available
> > > as options. There are simple bandwidth measurement mechanisms that
> > > will be applied in general, but when they turn up an issue, it can
> > > often lead to a more focused investigation, requiring more precise
> > > events.
> > 
> > I aggree. We should provide both portable and non-portable event names.
> > 
> > Here is my draft proposal based on the discussion so far and reusing some
> > of the current interface. Idea here is to start with basic assigment
> > feature with options to enhance it in the future. Feel free to
> > comment/suggest.
> > 
> > 1. Event configurations will be in
> >     /sys/fs/resctrl/info/L3_MON/counter_configs/.
> > 
> >     There will be two pre-defined configurations by default.
> > 
> >     #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
> >     LclFill, LclNTWr,LclSlowFill,VictimBM,RmtSlowFill,LclSlowFill,RmtFill
> > 
> >     #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
> >     LclFill, LclNTWr, LclSlowFill
> > 
> > 2. Users will have options to update these configurations.
> > 
> >     #echo "LclFill, LclNTWr, RmtFill" >
> >        /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes

This part seems odd to me. Now the "mbm_local_bytes" files aren't
reporting "local_bytes" any more. They report something different,
and users only know if they come to check the options currently
configured in this file. Changing the contents without changing
the name seems confusing to me.

> > 
> >     # #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
> >     LclFill, LclNTWr, RmtFill
> > 
> > 3. The default configurations will be used when user mounts the resctrl.
> > 
> >     mount  -t resctrl resctrl /sys/fs/resctrl/
> >     mkdir /sys/fs/resctrl/test/
> > 
> > 4. The resctrl group/domains can be in one of these assingnment states.
> >     e: Exclusive
> >     s: Shared
> >     u: Unassigned
> > 
> >     Exclusive mode is supported now. Shared mode will be supported in the
> > future.
> > 
> > 5. We can use the current /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> > to list the assignment state of all the groups.
> > 
> >     Format:
> >     "<CTRL_MON group>/<MON group>/<confguration>:<domain_id>=<assign state>"
> > 
> >    # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >     test//mbm_total_bytes:0=e;1=e
> >     test//mbm_local_bytes:0=e;1=e
> >     //mbm_total_bytes:0=e;1=e
> >     //mbm_local_bytes:0=e;1=e
> > 
> > 6. Users can modify the assignment state by writing to mbm_assign_control.
> > 
> >     Format:
> >     “<CTRL_MON group>/<MON group>/<configuration>:<domain_id>=<assign state>”
> > 
> >     #echo "test//mbm_local_bytes:0=e;1=e" >
> > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> > 
> >     #echo "test//mbm_local_bytes:0=u;1=u" >
> > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> > 
> >     # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >     test//mbm_total_bytes:0=u;1=u
> >     test//mbm_local_bytes:0=u;1=u
> >     //mbm_total_bytes:0=e;1=e
> >     //mbm_local_bytes:0=e;1=e
> > 
> >     The corresponding events will be read in
> > 
> >     /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> >     /sys/fs/resctrl/mon_data/mon_L3_01/mbm_total_bytes
> >     /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> >     /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
> >     /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
> >     /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
> >     /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
> >     /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
> > 
> > 7. In the first stage, only two configurations(mbm_total_bytes and
> > mbm_local_bytes) will be supported.
> > 
> > 8. In the future, there will be options to create multiple configurations
> > and corresponding directory will be created in
> > /sysf/fs/resctrl/test/mon_data/mon_L3_00/<configation name>.

Would this be done by creating a new file in the /sys/fs/resctrl/info/L3_MON/counter_configs
directory? Like this:

# echo "LclFill, LclNTWr, RmtFill" >
        /sys/fs/resctrl/info/L3_MON/counter_configs/cache_stuff

This seems OK (dependent on the user picking meaningful names for
the set of attributes picked ... but if they want to name this
monitor file "brian" then they have to live with any confusion
that they bring on themselves).

Would this involve an extension to kernfs? I don't see a function
pointer callback for file creation in kernfs_syscall_ops.

> > 
> 
> I know you are all busy with multiple series going on parallel. I am still
> waiting for the inputs on this. It will be great if you can spend some time
> on this to see if we can find common ground on the interface.
> 
> Thanks
> Babu

-Tony
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 11 months, 1 week ago
Hi Tony,

On 3/10/2025 6:22 PM, Luck, Tony wrote:
> On Mon, Mar 10, 2025 at 05:48:44PM -0500, Moger, Babu wrote:
>> Hi All,
>>
>> On 3/5/2025 1:34 PM, Moger, Babu wrote:
>>> Hi Peter,
>>>
>>> On 3/5/25 04:40, Peter Newman wrote:
>>>> Hi Babu,
>>>>
>>>> On Tue, Mar 4, 2025 at 10:49 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>
>>>>> Hi Peter,
>>>>>
>>>>> On 3/4/25 10:44, Peter Newman wrote:
>>>>>> On Mon, Mar 3, 2025 at 8:16 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>
>>>>>>> Hi Peter/Reinette,
>>>>>>>
>>>>>>> On 2/26/25 07:27, Peter Newman wrote:
>>>>>>>> Hi Babu,
>>>>>>>>
>>>>>>>> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi Peter,
>>>>>>>>>
>>>>>>>>> On 2/25/25 11:11, Peter Newman wrote:
>>>>>>>>>> Hi Reinette,
>>>>>>>>>>
>>>>>>>>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>
>>>>>>>>>>> On 2/21/25 5:12 AM, Peter Newman wrote:
>>>>>>>>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>>>>>>>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>>>>>>>>>>>>> <value>
>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>>>>>>>>>>>>> is low enough to be of concern.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>>>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>>>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>>>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>>>>>>>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>>>>>>>>>>>>> for.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>>>>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>>>>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>>>>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>>>>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>>>>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>>>>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>>>>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>>>>>>>>>>>>> customers.
>>>>>>>>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>>>>>>>>>>>>> event-set for applying to a single counter rather than as individual
>>>>>>>>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>>>>>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>>>>>>>>>>>>> event names.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thank you for clarifying.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In the letters as events model, choosing the events assigned to a
>>>>>>>>>>>>>>>> group wouldn't be enough information, since we would want to control
>>>>>>>>>>>>>>>> which events should share a counter and which should be counted by
>>>>>>>>>>>>>>>> separate counters. I think the amount of information that would need
>>>>>>>>>>>>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>>>>>>>>>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>>>>>>>>>>>>> writes in ABMC would look like...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>>>>>>>>>>>>> example and above the counter configuration appears to be global. You do mention
>>>>>>>>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>>>>>>>>>>>>> configuration is a requirement?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If it's global and we want a particular group to be watched by more
>>>>>>>>>>>>>> counters, I wouldn't want this to result in allocating more counters
>>>>>>>>>>>>>> for that group in all domains, or allocating counters in domains where
>>>>>>>>>>>>>> they're not needed. I want to encourage my users to avoid allocating
>>>>>>>>>>>>>> monitoring resources in domains where a job is not allowed to run so
>>>>>>>>>>>>>> there's less pressure on the counters.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In Dave's proposal it looks like global configuration means
>>>>>>>>>>>>>> globally-defined "named counter configurations", which works because
>>>>>>>>>>>>>> it's really per-domain assignment of the configurations to however
>>>>>>>>>>>>>> many counters the group needs in each domain.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think I am becoming lost. Would a global configuration not break your
>>>>>>>>>>>>> view of "event-set applied to a single counter"? If a counter is configured
>>>>>>>>>>>>> globally then it would not make it possible to support the full configurability
>>>>>>>>>>>>> of the hardware.
>>>>>>>>>>>>> Before I add more confusion, let me try with an example that builds on your
>>>>>>>>>>>>> earlier example copied below:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>
>>>>>>>>>>>>> Since the above states "per domain" I rewrite the example to highlight that as
>>>>>>>>>>>>> I understand it:
>>>>>>>>>>>>>
>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>
>>>>>>>>>>>>> You mention that you do not want counters to be allocated in domains that they
>>>>>>>>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>>>>>>>>>>>>> in domain 1, resulting in:
>>>>>>>>>>>>>
>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>
>>>>>>>>>>>>> With counter 0 and counter 1 available in domain 1, these counters could
>>>>>>>>>>>>> theoretically be configured to give group 1 more data in domain 1:
>>>>>>>>>>>>>
>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>     counter 0: LclFill,RmtFill
>>>>>>>>>>>>>     counter 1: LclNTWr,RmtNTWr
>>>>>>>>>>>>>     counter 2: LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>     counter 3: VictimBW
>>>>>>>>>>>>>
>>>>>>>>>>>>> The counters are shown with different per-domain configurations that seems to
>>>>>>>>>>>>> match with earlier goals of (a) choose events counted by each counter and
>>>>>>>>>>>>> (b) do not allocate counters in domains where they are not needed. As I
>>>>>>>>>>>>> understand the above does contradict global counter configuration though.
>>>>>>>>>>>>> Or do you mean that only the *name* of the counter is global and then
>>>>>>>>>>>>> that it is reconfigured as part of every assignment?
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, I meant only the *name* is global. I assume based on a particular
>>>>>>>>>>>> system configuration, the user will settle on a handful of useful
>>>>>>>>>>>> groupings to count.
>>>>>>>>>>>>
>>>>>>>>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
>>>>>>>>>>>>
>>>>>>>>>>>>    # define global configurations (in ABMC terms), not necessarily in this
>>>>>>>>>>>>    # syntax and probably not in the mbm_assign_control file.
>>>>>>>>>>>>
>>>>>>>>>>>>    r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>    w=VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>
>>>>>>>>>>>>    # legacy "total" configuration, effectively r+w
>>>>>>>>>>>>    t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>
>>>>>>>>>>>>    /group0/0=t;1=t
>>>>>>>>>>>>    /group1/0=t;1=t
>>>>>>>>>>>>    /group2/0=_;1=t
>>>>>>>>>>>>    /group3/0=rw;1=_
>>>>>>>>>>>>
>>>>>>>>>>>> - group2 is restricted to domain 0
>>>>>>>>>>>> - group3 is restricted to domain 1
>>>>>>>>>>>> - the rest are unrestricted
>>>>>>>>>>>> - In group3, we decided we need to separate read and write traffic
>>>>>>>>>>>>
>>>>>>>>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I see. Thank you for the example.
>>>>>>>>>>>
>>>>>>>>>>> resctrl supports per-domain configurations with the following possible when
>>>>>>>>>>> using mbm_total_bytes_config and mbm_local_bytes_config:
>>>>>>>>>>>
>>>>>>>>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>
>>>>>>>>>>>      /group0/0=t;1=t
>>>>>>>>>>>      /group1/0=t;1=t
>>>>>>>>>>>
>>>>>>>>>>> Even though the flags are identical in all domains, the assigned counters will
>>>>>>>>>>> be configured differently in each domain.
>>>>>>>>>>>
>>>>>>>>>>> With this supported by hardware and currently also supported by resctrl it seems
>>>>>>>>>>> reasonable to carry this forward to what will be supported next.
>>>>>>>>>>
>>>>>>>>>> The hardware supports both a per-domain mode, where all groups in a
>>>>>>>>>> domain use the same configurations and are limited to two events per
>>>>>>>>>> group and a per-group mode where every group can be configured and
>>>>>>>>>> assigned freely. This series is using the legacy counter access mode
>>>>>>>>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
>>>>>>>>>> in the domain can be read. If we chose to read the assigned counter
>>>>>>>>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
>>>>>>>>>> rather than asking the hardware to find the counter by RMID, we would
>>>>>>>>>> not be limited to 2 counters per group/domain and the hardware would
>>>>>>>>>> have the same flexibility as on MPAM.
>>>>>>>>>
>>>>>>>>> In extended mode, the contents of a specific counter can be read by
>>>>>>>>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
>>>>>>>>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
>>>>>>>>> QM_CTR will then return the contents of the specified counter.
>>>>>>>>>
>>>>>>>>> It is documented below.
>>>>>>>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
>>>>>>>>>    Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
>>>>>>>>>
>>>>>>>>> We previously discussed this with you (off the public list) and I
>>>>>>>>> initially proposed the extended assignment mode.
>>>>>>>>>
>>>>>>>>> Yes, the extended mode allows greater flexibility by enabling multiple
>>>>>>>>> counters to be assigned to the same group, rather than being limited to
>>>>>>>>> just two.
>>>>>>>>>
>>>>>>>>> However, the challenge is that we currently lack the necessary interfaces
>>>>>>>>> to configure multiple events per group. Without these interfaces, the
>>>>>>>>> extended mode is not practical at this time.
>>>>>>>>>
>>>>>>>>> Therefore, we ultimately agreed to use the legacy mode, as it does not
>>>>>>>>> require modifications to the existing interface, allowing us to continue
>>>>>>>>> using it as is.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> (I might have said something confusing in my last messages because I
>>>>>>>>>> had forgotten that I switched to the extended assignment mode when
>>>>>>>>>> prototyping with soft-ABMC and MPAM.)
>>>>>>>>>>
>>>>>>>>>> Forcing all groups on a domain to share the same 2 counter
>>>>>>>>>> configurations would not be acceptable for us, as the example I gave
>>>>>>>>>> earlier is one I've already been asked about.
>>>>>>>>>
>>>>>>>>> I don’t see this as a blocker. It should be considered an extension to the
>>>>>>>>> current ABMC series. We can easily build on top of this series once we
>>>>>>>>> finalize how to configure the multiple event interface for each group.
>>>>>>>>
>>>>>>>> I don't think it is, either. Only being able to use ABMC to assign
>>>>>>>> counters is fine for our use as an incremental step. My longer-term
>>>>>>>> concern is the domain-scoped mbm_total_bytes_config and
>>>>>>>> mbm_local_bytes_config files, but they were introduced with BMEC, so
>>>>>>>> there's already an expectation that the files are present when BMEC is
>>>>>>>> supported.
>>>>>>>>
>>>>>>>> On ABMC hardware that also supports BMEC, I'm concerned about enabling
>>>>>>>> ABMC when only the BMEC-style event configuration interface exists.
>>>>>>>> The scope of my issue is just whether enabling "full" ABMC support
>>>>>>>> will require an additional opt-in, since that could remove the BMEC
>>>>>>>> interface. If it does, it's something we can live with.
>>>>>>>
>>>>>>> As you know, this series is currently blocked without further feedback.
>>>>>>>
>>>>>>> I’d like to begin reworking these patches to incorporate Peter’s feedback.
>>>>>>> Any input or suggestions would be appreciated.
>>>>>>>
>>>>>>> Here’s what we’ve learned so far:
>>>>>>>
>>>>>>> 1. Assignments should be independent of BMEC.
>>>>>>> 2. We should be able to specify multiple event types to a counter (e.g.,
>>>>>>> read, write, victimBM, etc.). This is also called shared counter
>>>>>>> 3. There should be an option to assign events per domain.
>>>>>>> 4. Currently, only two counters can be assigned per group, but the design
>>>>>>> should allow flexibility to assign more in the future as the interface
>>>>>>> evolves.
>>>>>>> 5. Utilize the extended RMID read mode.
>>>>>>>
>>>>>>>
>>>>>>> Here is my proposal using Peter's earlier example:
>>>>>>>
>>>>>>> # define event configurations
>>>>>>>
>>>>>>> ========================================================
>>>>>>> Bits    Mnemonics       Description
>>>>>>> ====   ========================================================
>>>>>>> 6       VictimBW        Dirty Victims from all types of memory
>>>>>>> 5       RmtSlowFill     Reads to slow memory in the non-local NUMA domain
>>>>>>> 4       LclSlowFill     Reads to slow memory in the local NUMA domain
>>>>>>> 3       RmtNTWr         Non-temporal writes to non-local NUMA domain
>>>>>>> 2       LclNTWr         Non-temporal writes to local NUMA domain
>>>>>>> 1       mtFill          Reads to memory in the non-local NUMA domain
>>>>>>> 0       LclFill         Reads to memory in the local NUMA domain
>>>>>>> ====    ========================================================
>>>>>>>
>>>>>>> #Define flags based on combination of above event types.
>>>>>>>
>>>>>>> t = LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>> l = LclFill, LclNTWr, LclSlowFill
>>>>>>> r = LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>> w = VictimBW,LclNTWr,RmtNTWr
>>>>>>> v = VictimBW
>>>>>>>
>>>>>>> Peter suggested the following format earlier :
>>>>>>>
>>>>>>> /group0/0=t;1=t
>>>>>>> /group1/0=t;1=t
>>>>>>> /group2/0=_;1=t
>>>>>>> /group3/0=rw;1=_
>>>>>>
>>>>>> After some inquiries within Google, it sounds like nobody has invested
>>>>>> much into the current mbm_assign_control format yet, so it would be
>>>>>> best to drop it and distribute the configuration around the filesystem
>>>>>> hierarchy[1], which should allow us to produce something more flexible
>>>>>> and cleaner to implement.
>>>>>>
>>>>>> Roughly what I had in mind:
>>>>>>
>>>>>> Use mkdir in a info/<resource>_MON subdirectory to create free-form
>>>>>> names for the assignable configurations rather than being restricted
>>>>>> to single letters.  In the resulting directory, populate a file where
>>>>>> we can specify the set of events the config should represent. I think
>>>>>> we should use symbolic names for the events rather than raw BMEC field
>>>>>> values. Moving forward we could come up with portable names for common
>>>>>> events and only support the BMEC names on AMD machines for users who
>>>>>> want specific events and don't care about portability.
>>>>>
>>>>>
>>>>> I’m still processing this. Let me start with some initial questions.
>>>>>
>>>>> So, we are creating event configurations here, which seems reasonable.
>>>>>
>>>>> Yes, we should use portable names and are not limited to BMEC names.
>>>>>
>>>>> How many configurations should we allow? Do we know?
>>>>
>>>> Do we need an upper limit?
>>>
>>> I think so. This needs to be maintained in some data structure. We can
>>> start with 2 default configurations for now.
>>>
>>>>
>>>>>
>>>>>>
>>>>>> Next, put assignment-control file nodes in per-domain directories
>>>>>> (i.e., mon_data/mon_L3_00/assign_{exclusive,shared}). Writing a
>>>>>> counter-configuration name into the file would then allocate a counter
>>>>>> in the domain, apply the named configuration, and monitor the parent
>>>>>> group-directory. We can also put a group/resource-scoped assign_* file
>>>>>> higher in the hierarchy to make it easier for users who want to
>>>>>> configure all domains the same for a group.
>>>>>
>>>>> What is the difference between shared and exclusive?
>>>>
>>>> Shared assignment[1] means that non-exclusively-assigned counters in
>>>> each domain will be scheduled round-robin to the groups requesting
>>>> shared access to a counter. In my tests, I assigned the counters long
>>>> enough to produce a single 1-second MB/s sample for the per-domain
>>>> aggregation files[2].
>>>>
>>>> These do not need to be implemented immediately, but knowing that they
>>>> work addresses the overhead and scalability concerns of reassigning
>>>> counters and reading their values.
>>>
>>> Ok. Lets focus on exclusive assignments for now.
>>>
>>>>
>>>>>
>>>>> Having three files—assign_shared, assign_exclusive, and unassign—for each
>>>>> domain seems excessive. In a system with 32 groups and 12 domains, this
>>>>> results in 32 × 12 × 3 files, which is quite large.
>>>>>
>>>>> There should be a more efficient way to handle this.
>>>>>
>>>>> Initially, we started with a group-level file for this interface, but it
>>>>> was rejected due to the high number of sysfs calls, making it inefficient.
>>>>
>>>> I had rejected it due to the high-frequency of access of a large
>>>> number of files, which has since been addressed by shared assignment
>>>> (or automatic reassignment) and aggregated mbps files.
>>>
>>> I think we should address this as well. Creating three extra files for
>>> each group isn’t ideal when there are more efficient alternatives.
>>>
>>>>
>>>>>
>>>>> Additionally, how can we list all assignments with a single sysfs call?
>>>>>
>>>>> That was another problem we need to address.
>>>>
>>>> This is not a requirement I was aware of. If the user forgot where
>>>> they assigned counters (or forgot to disable auto-assignment), they
>>>> can read multiple sysfs nodes to remind themselves.
>>>
>>> I suggest, we should provide users with an option to list the assignments
>>> of all groups in a single command. As the number of groups increases, it
>>> becomes cumbersome to query each group individually.
>>>
>>> To achieve this, we can reuse our existing mbm_assign_control interface
>>> for this purpose. More details on this below.
>>>
>>>>>
>>>>>
>>>>>>
>>>>>> The configuration names listed in assign_* would result in files of
>>>>>> the same name in the appropriate mon_data domain directories from
>>>>>> which the count values can be read.
>>>>>>
>>>>>>    # mkdir info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>    # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>    # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>    # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>    # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>> LclFill
>>>>>> LclNTWr
>>>>>> LclSlowFill
>>>>>
>>>>> I feel we can just have the configs. event_filter file is not required.
>>>>
>>>> That's right, I forgot that we can implement kernfs_ops::open(). I was
>>>> only looking at struct kernfs_syscall_ops
>>>>
>>>>>
>>>>> #cat info/L3_MON/counter_configs/mbm_local_bytes
>>>>> LclFill <-rename these to generic names.
>>>>> LclNTWr
>>>>> LclSlowFill
>>>>>
>>>>
>>>> I think portable and non-portable event names should both be available
>>>> as options. There are simple bandwidth measurement mechanisms that
>>>> will be applied in general, but when they turn up an issue, it can
>>>> often lead to a more focused investigation, requiring more precise
>>>> events.
>>>
>>> I aggree. We should provide both portable and non-portable event names.
>>>
>>> Here is my draft proposal based on the discussion so far and reusing some
>>> of the current interface. Idea here is to start with basic assigment
>>> feature with options to enhance it in the future. Feel free to
>>> comment/suggest.
>>>
>>> 1. Event configurations will be in
>>>      /sys/fs/resctrl/info/L3_MON/counter_configs/.
>>>
>>>      There will be two pre-defined configurations by default.
>>>
>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
>>>      LclFill, LclNTWr,LclSlowFill,VictimBM,RmtSlowFill,LclSlowFill,RmtFill
>>>
>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>      LclFill, LclNTWr, LclSlowFill
>>>
>>> 2. Users will have options to update these configurations.
>>>
>>>      #echo "LclFill, LclNTWr, RmtFill" >
>>>         /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
> 
> This part seems odd to me. Now the "mbm_local_bytes" files aren't
> reporting "local_bytes" any more. They report something different,
> and users only know if they come to check the options currently
> configured in this file. Changing the contents without changing
> the name seems confusing to me.

It is the same behaviour right now with BMEC. It is configurable.
By default it is mbm_local_bytes, but users can configure whatever they 
want to monitor using /info/L3_MON/mbm_local_bytes_config.

We can continue the same behaviour with ABMC, but the configuration will 
be in /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes.

> 
>>>
>>>      # #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>      LclFill, LclNTWr, RmtFill
>>>
>>> 3. The default configurations will be used when user mounts the resctrl.
>>>
>>>      mount  -t resctrl resctrl /sys/fs/resctrl/
>>>      mkdir /sys/fs/resctrl/test/
>>>
>>> 4. The resctrl group/domains can be in one of these assingnment states.
>>>      e: Exclusive
>>>      s: Shared
>>>      u: Unassigned
>>>
>>>      Exclusive mode is supported now. Shared mode will be supported in the
>>> future.
>>>
>>> 5. We can use the current /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>> to list the assignment state of all the groups.
>>>
>>>      Format:
>>>      "<CTRL_MON group>/<MON group>/<confguration>:<domain_id>=<assign state>"
>>>
>>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>      test//mbm_total_bytes:0=e;1=e
>>>      test//mbm_local_bytes:0=e;1=e
>>>      //mbm_total_bytes:0=e;1=e
>>>      //mbm_local_bytes:0=e;1=e
>>>
>>> 6. Users can modify the assignment state by writing to mbm_assign_control.
>>>
>>>      Format:
>>>      “<CTRL_MON group>/<MON group>/<configuration>:<domain_id>=<assign state>”
>>>
>>>      #echo "test//mbm_local_bytes:0=e;1=e" >
>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>
>>>      #echo "test//mbm_local_bytes:0=u;1=u" >
>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>
>>>      # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>      test//mbm_total_bytes:0=u;1=u
>>>      test//mbm_local_bytes:0=u;1=u
>>>      //mbm_total_bytes:0=e;1=e
>>>      //mbm_local_bytes:0=e;1=e
>>>
>>>      The corresponding events will be read in
>>>
>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_total_bytes
>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
>>>
>>> 7. In the first stage, only two configurations(mbm_total_bytes and
>>> mbm_local_bytes) will be supported.
>>>
>>> 8. In the future, there will be options to create multiple configurations
>>> and corresponding directory will be created in
>>> /sysf/fs/resctrl/test/mon_data/mon_L3_00/<configation name>.
> 
> Would this be done by creating a new file in the /sys/fs/resctrl/info/L3_MON/counter_configs
> directory? Like this:
> 
> # echo "LclFill, LclNTWr, RmtFill" >
>          /sys/fs/resctrl/info/L3_MON/counter_configs/cache_stuff
> 
> This seems OK (dependent on the user picking meaningful names for
> the set of attributes picked ... but if they want to name this
> monitor file "brian" then they have to live with any confusion
> that they bring on themselves).
> 
> Would this involve an extension to kernfs? I don't see a function
> pointer callback for file creation in kernfs_syscall_ops.
> 
>>>
>>
>> I know you are all busy with multiple series going on parallel. I am still
>> waiting for the inputs on this. It will be great if you can spend some time
>> on this to see if we can find common ground on the interface.
>>
>> Thanks
>> Babu
> 
> -Tony
> 


thanks
Babu
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 11 months, 1 week ago

On 3/10/25 6:44 PM, Moger, Babu wrote:
> Hi Tony,
> 
> On 3/10/2025 6:22 PM, Luck, Tony wrote:
>> On Mon, Mar 10, 2025 at 05:48:44PM -0500, Moger, Babu wrote:
>>> Hi All,
>>>
>>> On 3/5/2025 1:34 PM, Moger, Babu wrote:
>>>> Hi Peter,
>>>>
>>>> On 3/5/25 04:40, Peter Newman wrote:
>>>>> Hi Babu,
>>>>>
>>>>> On Tue, Mar 4, 2025 at 10:49 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>
>>>>>> Hi Peter,
>>>>>>
>>>>>> On 3/4/25 10:44, Peter Newman wrote:
>>>>>>> On Mon, Mar 3, 2025 at 8:16 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>
>>>>>>>> Hi Peter/Reinette,
>>>>>>>>
>>>>>>>> On 2/26/25 07:27, Peter Newman wrote:
>>>>>>>>> Hi Babu,
>>>>>>>>>
>>>>>>>>> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Peter,
>>>>>>>>>>
>>>>>>>>>> On 2/25/25 11:11, Peter Newman wrote:
>>>>>>>>>>> Hi Reinette,
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>
>>>>>>>>>>>> On 2/21/25 5:12 AM, Peter Newman wrote:
>>>>>>>>>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>>>>>>>>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>>>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>>>>>>>>>>>>>> <value>
>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>>>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>>>>>>>>>>>>>> is low enough to be of concern.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>>>>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>>>>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>>>>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>>>>>>>>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>>>>>>>>>>>>>> for.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>>>>>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>>>>>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>>>>>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>>>>>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>>>>>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>>>>>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>>>>>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>>>>>>>>>>>>>> customers.
>>>>>>>>>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>>>>>>>>>>>>>> event-set for applying to a single counter rather than as individual
>>>>>>>>>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>>>>>>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>>>>>>>>>>>>>> event names.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thank you for clarifying.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In the letters as events model, choosing the events assigned to a
>>>>>>>>>>>>>>>>> group wouldn't be enough information, since we would want to control
>>>>>>>>>>>>>>>>> which events should share a counter and which should be counted by
>>>>>>>>>>>>>>>>> separate counters. I think the amount of information that would need
>>>>>>>>>>>>>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>>>>>>>>>>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>>>>>>>>>>>>>> writes in ABMC would look like...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>>>>>>>>>>>>>> example and above the counter configuration appears to be global. You do mention
>>>>>>>>>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>>>>>>>>>>>>>> configuration is a requirement?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If it's global and we want a particular group to be watched by more
>>>>>>>>>>>>>>> counters, I wouldn't want this to result in allocating more counters
>>>>>>>>>>>>>>> for that group in all domains, or allocating counters in domains where
>>>>>>>>>>>>>>> they're not needed. I want to encourage my users to avoid allocating
>>>>>>>>>>>>>>> monitoring resources in domains where a job is not allowed to run so
>>>>>>>>>>>>>>> there's less pressure on the counters.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In Dave's proposal it looks like global configuration means
>>>>>>>>>>>>>>> globally-defined "named counter configurations", which works because
>>>>>>>>>>>>>>> it's really per-domain assignment of the configurations to however
>>>>>>>>>>>>>>> many counters the group needs in each domain.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think I am becoming lost. Would a global configuration not break your
>>>>>>>>>>>>>> view of "event-set applied to a single counter"? If a counter is configured
>>>>>>>>>>>>>> globally then it would not make it possible to support the full configurability
>>>>>>>>>>>>>> of the hardware.
>>>>>>>>>>>>>> Before I add more confusion, let me try with an example that builds on your
>>>>>>>>>>>>>> earlier example copied below:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Since the above states "per domain" I rewrite the example to highlight that as
>>>>>>>>>>>>>> I understand it:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You mention that you do not want counters to be allocated in domains that they
>>>>>>>>>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>>>>>>>>>>>>>> in domain 1, resulting in:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> With counter 0 and counter 1 available in domain 1, these counters could
>>>>>>>>>>>>>> theoretically be configured to give group 1 more data in domain 1:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill
>>>>>>>>>>>>>>     counter 1: LclNTWr,RmtNTWr
>>>>>>>>>>>>>>     counter 2: LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>     counter 3: VictimBW
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The counters are shown with different per-domain configurations that seems to
>>>>>>>>>>>>>> match with earlier goals of (a) choose events counted by each counter and
>>>>>>>>>>>>>> (b) do not allocate counters in domains where they are not needed. As I
>>>>>>>>>>>>>> understand the above does contradict global counter configuration though.
>>>>>>>>>>>>>> Or do you mean that only the *name* of the counter is global and then
>>>>>>>>>>>>>> that it is reconfigured as part of every assignment?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, I meant only the *name* is global. I assume based on a particular
>>>>>>>>>>>>> system configuration, the user will settle on a handful of useful
>>>>>>>>>>>>> groupings to count.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
>>>>>>>>>>>>>
>>>>>>>>>>>>>    # define global configurations (in ABMC terms), not necessarily in this
>>>>>>>>>>>>>    # syntax and probably not in the mbm_assign_control file.
>>>>>>>>>>>>>
>>>>>>>>>>>>>    r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>    w=VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>
>>>>>>>>>>>>>    # legacy "total" configuration, effectively r+w
>>>>>>>>>>>>>    t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>
>>>>>>>>>>>>>    /group0/0=t;1=t
>>>>>>>>>>>>>    /group1/0=t;1=t
>>>>>>>>>>>>>    /group2/0=_;1=t
>>>>>>>>>>>>>    /group3/0=rw;1=_
>>>>>>>>>>>>>
>>>>>>>>>>>>> - group2 is restricted to domain 0
>>>>>>>>>>>>> - group3 is restricted to domain 1
>>>>>>>>>>>>> - the rest are unrestricted
>>>>>>>>>>>>> - In group3, we decided we need to separate read and write traffic
>>>>>>>>>>>>>
>>>>>>>>>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I see. Thank you for the example.
>>>>>>>>>>>>
>>>>>>>>>>>> resctrl supports per-domain configurations with the following possible when
>>>>>>>>>>>> using mbm_total_bytes_config and mbm_local_bytes_config:
>>>>>>>>>>>>
>>>>>>>>>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>
>>>>>>>>>>>>      /group0/0=t;1=t
>>>>>>>>>>>>      /group1/0=t;1=t
>>>>>>>>>>>>
>>>>>>>>>>>> Even though the flags are identical in all domains, the assigned counters will
>>>>>>>>>>>> be configured differently in each domain.
>>>>>>>>>>>>
>>>>>>>>>>>> With this supported by hardware and currently also supported by resctrl it seems
>>>>>>>>>>>> reasonable to carry this forward to what will be supported next.
>>>>>>>>>>>
>>>>>>>>>>> The hardware supports both a per-domain mode, where all groups in a
>>>>>>>>>>> domain use the same configurations and are limited to two events per
>>>>>>>>>>> group and a per-group mode where every group can be configured and
>>>>>>>>>>> assigned freely. This series is using the legacy counter access mode
>>>>>>>>>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
>>>>>>>>>>> in the domain can be read. If we chose to read the assigned counter
>>>>>>>>>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
>>>>>>>>>>> rather than asking the hardware to find the counter by RMID, we would
>>>>>>>>>>> not be limited to 2 counters per group/domain and the hardware would
>>>>>>>>>>> have the same flexibility as on MPAM.
>>>>>>>>>>
>>>>>>>>>> In extended mode, the contents of a specific counter can be read by
>>>>>>>>>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
>>>>>>>>>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
>>>>>>>>>> QM_CTR will then return the contents of the specified counter.
>>>>>>>>>>
>>>>>>>>>> It is documented below.
>>>>>>>>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
>>>>>>>>>>    Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
>>>>>>>>>>
>>>>>>>>>> We previously discussed this with you (off the public list) and I
>>>>>>>>>> initially proposed the extended assignment mode.
>>>>>>>>>>
>>>>>>>>>> Yes, the extended mode allows greater flexibility by enabling multiple
>>>>>>>>>> counters to be assigned to the same group, rather than being limited to
>>>>>>>>>> just two.
>>>>>>>>>>
>>>>>>>>>> However, the challenge is that we currently lack the necessary interfaces
>>>>>>>>>> to configure multiple events per group. Without these interfaces, the
>>>>>>>>>> extended mode is not practical at this time.
>>>>>>>>>>
>>>>>>>>>> Therefore, we ultimately agreed to use the legacy mode, as it does not
>>>>>>>>>> require modifications to the existing interface, allowing us to continue
>>>>>>>>>> using it as is.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> (I might have said something confusing in my last messages because I
>>>>>>>>>>> had forgotten that I switched to the extended assignment mode when
>>>>>>>>>>> prototyping with soft-ABMC and MPAM.)
>>>>>>>>>>>
>>>>>>>>>>> Forcing all groups on a domain to share the same 2 counter
>>>>>>>>>>> configurations would not be acceptable for us, as the example I gave
>>>>>>>>>>> earlier is one I've already been asked about.
>>>>>>>>>>
>>>>>>>>>> I don’t see this as a blocker. It should be considered an extension to the
>>>>>>>>>> current ABMC series. We can easily build on top of this series once we
>>>>>>>>>> finalize how to configure the multiple event interface for each group.
>>>>>>>>>
>>>>>>>>> I don't think it is, either. Only being able to use ABMC to assign
>>>>>>>>> counters is fine for our use as an incremental step. My longer-term
>>>>>>>>> concern is the domain-scoped mbm_total_bytes_config and
>>>>>>>>> mbm_local_bytes_config files, but they were introduced with BMEC, so
>>>>>>>>> there's already an expectation that the files are present when BMEC is
>>>>>>>>> supported.
>>>>>>>>>
>>>>>>>>> On ABMC hardware that also supports BMEC, I'm concerned about enabling
>>>>>>>>> ABMC when only the BMEC-style event configuration interface exists.
>>>>>>>>> The scope of my issue is just whether enabling "full" ABMC support
>>>>>>>>> will require an additional opt-in, since that could remove the BMEC
>>>>>>>>> interface. If it does, it's something we can live with.
>>>>>>>>
>>>>>>>> As you know, this series is currently blocked without further feedback.
>>>>>>>>
>>>>>>>> I’d like to begin reworking these patches to incorporate Peter’s feedback.
>>>>>>>> Any input or suggestions would be appreciated.
>>>>>>>>
>>>>>>>> Here’s what we’ve learned so far:
>>>>>>>>
>>>>>>>> 1. Assignments should be independent of BMEC.
>>>>>>>> 2. We should be able to specify multiple event types to a counter (e.g.,
>>>>>>>> read, write, victimBM, etc.). This is also called shared counter
>>>>>>>> 3. There should be an option to assign events per domain.
>>>>>>>> 4. Currently, only two counters can be assigned per group, but the design
>>>>>>>> should allow flexibility to assign more in the future as the interface
>>>>>>>> evolves.
>>>>>>>> 5. Utilize the extended RMID read mode.
>>>>>>>>
>>>>>>>>
>>>>>>>> Here is my proposal using Peter's earlier example:
>>>>>>>>
>>>>>>>> # define event configurations
>>>>>>>>
>>>>>>>> ========================================================
>>>>>>>> Bits    Mnemonics       Description
>>>>>>>> ====   ========================================================
>>>>>>>> 6       VictimBW        Dirty Victims from all types of memory
>>>>>>>> 5       RmtSlowFill     Reads to slow memory in the non-local NUMA domain
>>>>>>>> 4       LclSlowFill     Reads to slow memory in the local NUMA domain
>>>>>>>> 3       RmtNTWr         Non-temporal writes to non-local NUMA domain
>>>>>>>> 2       LclNTWr         Non-temporal writes to local NUMA domain
>>>>>>>> 1       mtFill          Reads to memory in the non-local NUMA domain
>>>>>>>> 0       LclFill         Reads to memory in the local NUMA domain
>>>>>>>> ====    ========================================================
>>>>>>>>
>>>>>>>> #Define flags based on combination of above event types.
>>>>>>>>
>>>>>>>> t = LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>> l = LclFill, LclNTWr, LclSlowFill
>>>>>>>> r = LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>> w = VictimBW,LclNTWr,RmtNTWr
>>>>>>>> v = VictimBW
>>>>>>>>
>>>>>>>> Peter suggested the following format earlier :
>>>>>>>>
>>>>>>>> /group0/0=t;1=t
>>>>>>>> /group1/0=t;1=t
>>>>>>>> /group2/0=_;1=t
>>>>>>>> /group3/0=rw;1=_
>>>>>>>
>>>>>>> After some inquiries within Google, it sounds like nobody has invested
>>>>>>> much into the current mbm_assign_control format yet, so it would be
>>>>>>> best to drop it and distribute the configuration around the filesystem
>>>>>>> hierarchy[1], which should allow us to produce something more flexible
>>>>>>> and cleaner to implement.
>>>>>>>
>>>>>>> Roughly what I had in mind:
>>>>>>>
>>>>>>> Use mkdir in a info/<resource>_MON subdirectory to create free-form
>>>>>>> names for the assignable configurations rather than being restricted
>>>>>>> to single letters.  In the resulting directory, populate a file where
>>>>>>> we can specify the set of events the config should represent. I think
>>>>>>> we should use symbolic names for the events rather than raw BMEC field
>>>>>>> values. Moving forward we could come up with portable names for common
>>>>>>> events and only support the BMEC names on AMD machines for users who
>>>>>>> want specific events and don't care about portability.
>>>>>>
>>>>>>
>>>>>> I’m still processing this. Let me start with some initial questions.
>>>>>>
>>>>>> So, we are creating event configurations here, which seems reasonable.
>>>>>>
>>>>>> Yes, we should use portable names and are not limited to BMEC names.
>>>>>>
>>>>>> How many configurations should we allow? Do we know?
>>>>>
>>>>> Do we need an upper limit?
>>>>
>>>> I think so. This needs to be maintained in some data structure. We can
>>>> start with 2 default configurations for now.

There is a big difference between no upper limit and 2. The hardware is
capable of supporting per-domain configurations so more flexibility is
certainly possible. Consider the example presented by Peter in:
https://lore.kernel.org/lkml/CALPaoCi0mFZ9TycyNs+SCR+2tuRJovQ2809jYMun4HtC64hJmA@mail.gmail.com/

>>>>>>> Next, put assignment-control file nodes in per-domain directories
>>>>>>> (i.e., mon_data/mon_L3_00/assign_{exclusive,shared}). Writing a
>>>>>>> counter-configuration name into the file would then allocate a counter
>>>>>>> in the domain, apply the named configuration, and monitor the parent
>>>>>>> group-directory. We can also put a group/resource-scoped assign_* file
>>>>>>> higher in the hierarchy to make it easier for users who want to
>>>>>>> configure all domains the same for a group.
>>>>>>
>>>>>> What is the difference between shared and exclusive?
>>>>>
>>>>> Shared assignment[1] means that non-exclusively-assigned counters in
>>>>> each domain will be scheduled round-robin to the groups requesting
>>>>> shared access to a counter. In my tests, I assigned the counters long
>>>>> enough to produce a single 1-second MB/s sample for the per-domain
>>>>> aggregation files[2].
>>>>>
>>>>> These do not need to be implemented immediately, but knowing that they
>>>>> work addresses the overhead and scalability concerns of reassigning
>>>>> counters and reading their values.
>>>>
>>>> Ok. Lets focus on exclusive assignments for now.
>>>>
>>>>>
>>>>>>
>>>>>> Having three files—assign_shared, assign_exclusive, and unassign—for each
>>>>>> domain seems excessive. In a system with 32 groups and 12 domains, this
>>>>>> results in 32 × 12 × 3 files, which is quite large.
>>>>>>
>>>>>> There should be a more efficient way to handle this.
>>>>>>
>>>>>> Initially, we started with a group-level file for this interface, but it
>>>>>> was rejected due to the high number of sysfs calls, making it inefficient.
>>>>>
>>>>> I had rejected it due to the high-frequency of access of a large
>>>>> number of files, which has since been addressed by shared assignment
>>>>> (or automatic reassignment) and aggregated mbps files.
>>>>
>>>> I think we should address this as well. Creating three extra files for
>>>> each group isn’t ideal when there are more efficient alternatives.
>>>>
>>>>>
>>>>>>
>>>>>> Additionally, how can we list all assignments with a single sysfs call?
>>>>>>
>>>>>> That was another problem we need to address.
>>>>>
>>>>> This is not a requirement I was aware of. If the user forgot where
>>>>> they assigned counters (or forgot to disable auto-assignment), they
>>>>> can read multiple sysfs nodes to remind themselves.
>>>>
>>>> I suggest, we should provide users with an option to list the assignments
>>>> of all groups in a single command. As the number of groups increases, it
>>>> becomes cumbersome to query each group individually.
>>>>
>>>> To achieve this, we can reuse our existing mbm_assign_control interface
>>>> for this purpose. More details on this below.
>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> The configuration names listed in assign_* would result in files of
>>>>>>> the same name in the appropriate mon_data domain directories from
>>>>>>> which the count values can be read.
>>>>>>>
>>>>>>>    # mkdir info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>    # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>    # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>    # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>    # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>> LclFill
>>>>>>> LclNTWr
>>>>>>> LclSlowFill
>>>>>>
>>>>>> I feel we can just have the configs. event_filter file is not required.
>>>>>
>>>>> That's right, I forgot that we can implement kernfs_ops::open(). I was
>>>>> only looking at struct kernfs_syscall_ops
>>>>>
>>>>>>
>>>>>> #cat info/L3_MON/counter_configs/mbm_local_bytes
>>>>>> LclFill <-rename these to generic names.
>>>>>> LclNTWr
>>>>>> LclSlowFill
>>>>>>
>>>>>
>>>>> I think portable and non-portable event names should both be available
>>>>> as options. There are simple bandwidth measurement mechanisms that
>>>>> will be applied in general, but when they turn up an issue, it can
>>>>> often lead to a more focused investigation, requiring more precise
>>>>> events.
>>>>
>>>> I aggree. We should provide both portable and non-portable event names.
>>>>
>>>> Here is my draft proposal based on the discussion so far and reusing some
>>>> of the current interface. Idea here is to start with basic assigment
>>>> feature with options to enhance it in the future. Feel free to
>>>> comment/suggest.
>>>>
>>>> 1. Event configurations will be in
>>>>      /sys/fs/resctrl/info/L3_MON/counter_configs/.
>>>>
>>>>      There will be two pre-defined configurations by default.
>>>>
>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
>>>>      LclFill, LclNTWr,LclSlowFill,VictimBM,RmtSlowFill,LclSlowFill,RmtFill
>>>>
>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>      LclFill, LclNTWr, LclSlowFill
>>>>
>>>> 2. Users will have options to update these configurations.
>>>>
>>>>      #echo "LclFill, LclNTWr, RmtFill" >
>>>>         /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>
>> This part seems odd to me. Now the "mbm_local_bytes" files aren't
>> reporting "local_bytes" any more. They report something different,
>> and users only know if they come to check the options currently
>> configured in this file. Changing the contents without changing
>> the name seems confusing to me.
> 
> It is the same behaviour right now with BMEC. It is configurable.
> By default it is mbm_local_bytes, but users can configure whatever they want to monitor using /info/L3_MON/mbm_local_bytes_config.
> 
> We can continue the same behaviour with ABMC, but the configuration will be in /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes.

This could be supported by following Peter's original proposal where the name
of the counter configuration is provided by the user via a mkdir:
https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/

As he mentioned there could be pre-populated mbm_local_bytes/mbm_total_bytes.

> 
>>
>>>>
>>>>      # #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>      LclFill, LclNTWr, RmtFill
>>>>
>>>> 3. The default configurations will be used when user mounts the resctrl.
>>>>
>>>>      mount  -t resctrl resctrl /sys/fs/resctrl/
>>>>      mkdir /sys/fs/resctrl/test/
>>>>
>>>> 4. The resctrl group/domains can be in one of these assingnment states.
>>>>      e: Exclusive
>>>>      s: Shared
>>>>      u: Unassigned
>>>>
>>>>      Exclusive mode is supported now. Shared mode will be supported in the
>>>> future.
>>>>
>>>> 5. We can use the current /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>> to list the assignment state of all the groups.
>>>>
>>>>      Format:
>>>>      "<CTRL_MON group>/<MON group>/<confguration>:<domain_id>=<assign state>"
>>>>
>>>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>      test//mbm_total_bytes:0=e;1=e
>>>>      test//mbm_local_bytes:0=e;1=e
>>>>      //mbm_total_bytes:0=e;1=e
>>>>      //mbm_local_bytes:0=e;1=e

This would make mbm_assign_control even more unwieldy and quicker to exceed a
page of data (these examples never seem to reflect those AMD systems with the many
L3 domains). How to handle resctrl files larger than 4KB needs to be well understood
and solved when/if going this route.

There seems to be two opinions about this file at moment. Would it be possible to
summarize the discussion with pros/cons raised to make an informed selection?
I understand that Google as represented by Peter no longer requires/requests this
file but the motivation for this change seems new and does not seem to reduce the
original motivation for this file. We may also want to separate requirements for reading
from and writing to this file.

>>>>
>>>> 6. Users can modify the assignment state by writing to mbm_assign_control.
>>>>
>>>>      Format:
>>>>      “<CTRL_MON group>/<MON group>/<configuration>:<domain_id>=<assign state>”
>>>>
>>>>      #echo "test//mbm_local_bytes:0=e;1=e" >
>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>
>>>>      #echo "test//mbm_local_bytes:0=u;1=u" >
>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>
>>>>      # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>      test//mbm_total_bytes:0=u;1=u
>>>>      test//mbm_local_bytes:0=u;1=u
>>>>      //mbm_total_bytes:0=e;1=e
>>>>      //mbm_local_bytes:0=e;1=e
>>>>
>>>>      The corresponding events will be read in
>>>>
>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_total_bytes
>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
>>>>
>>>> 7. In the first stage, only two configurations(mbm_total_bytes and
>>>> mbm_local_bytes) will be supported.
>>>>
>>>> 8. In the future, there will be options to create multiple configurations
>>>> and corresponding directory will be created in
>>>> /sysf/fs/resctrl/test/mon_data/mon_L3_00/<configation name>.
>>
>> Would this be done by creating a new file in the /sys/fs/resctrl/info/L3_MON/counter_configs
>> directory? Like this:
>>
>> # echo "LclFill, LclNTWr, RmtFill" >
>>          /sys/fs/resctrl/info/L3_MON/counter_configs/cache_stuff
>>
>> This seems OK (dependent on the user picking meaningful names for
>> the set of attributes picked ... but if they want to name this
>> monitor file "brian" then they have to live with any confusion
>> that they bring on themselves).
>>
>> Would this involve an extension to kernfs? I don't see a function
>> pointer callback for file creation in kernfs_syscall_ops.
>>
>>>>
>>>
>>> I know you are all busy with multiple series going on parallel. I am still
>>> waiting for the inputs on this. It will be great if you can spend some time
>>> on this to see if we can find common ground on the interface.
>>>
>>> Thanks
>>> Babu
>>
>> -Tony
>>
> 
> 
> thanks
> Babu

Reinette

Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 11 months ago
Hi All,

On 3/10/25 22:51, Reinette Chatre wrote:
> 
> 
> On 3/10/25 6:44 PM, Moger, Babu wrote:
>> Hi Tony,
>>
>> On 3/10/2025 6:22 PM, Luck, Tony wrote:
>>> On Mon, Mar 10, 2025 at 05:48:44PM -0500, Moger, Babu wrote:
>>>> Hi All,
>>>>
>>>> On 3/5/2025 1:34 PM, Moger, Babu wrote:
>>>>> Hi Peter,
>>>>>
>>>>> On 3/5/25 04:40, Peter Newman wrote:
>>>>>> Hi Babu,
>>>>>>
>>>>>> On Tue, Mar 4, 2025 at 10:49 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>
>>>>>>> Hi Peter,
>>>>>>>
>>>>>>> On 3/4/25 10:44, Peter Newman wrote:
>>>>>>>> On Mon, Mar 3, 2025 at 8:16 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi Peter/Reinette,
>>>>>>>>>
>>>>>>>>> On 2/26/25 07:27, Peter Newman wrote:
>>>>>>>>>> Hi Babu,
>>>>>>>>>>
>>>>>>>>>> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>
>>>>>>>>>>> On 2/25/25 11:11, Peter Newman wrote:
>>>>>>>>>>>> Hi Reinette,
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2/21/25 5:12 AM, Peter Newman wrote:
>>>>>>>>>>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>>>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>>>>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>>>>>>>>>>>>>>> <value>
>>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>>>>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>>>>>>>>>>>>>>> is low enough to be of concern.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>>>>>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>>>>>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>>>>>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>>>>>>>>>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>>>>>>>>>>>>>>> for.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>>>>>>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>>>>>>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>>>>>>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>>>>>>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>>>>>>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>>>>>>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>>>>>>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>>>>>>>>>>>>>>> customers.
>>>>>>>>>>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>>>>>>>>>>>>>>> event-set for applying to a single counter rather than as individual
>>>>>>>>>>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>>>>>>>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>>>>>>>>>>>>>>> event names.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thank you for clarifying.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In the letters as events model, choosing the events assigned to a
>>>>>>>>>>>>>>>>>> group wouldn't be enough information, since we would want to control
>>>>>>>>>>>>>>>>>> which events should share a counter and which should be counted by
>>>>>>>>>>>>>>>>>> separate counters. I think the amount of information that would need
>>>>>>>>>>>>>>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>>>>>>>>>>>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>>>>>>>>>>>>>>> writes in ABMC would look like...
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>>>>>>>>>>>>>>> example and above the counter configuration appears to be global. You do mention
>>>>>>>>>>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>>>>>>>>>>>>>>> configuration is a requirement?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If it's global and we want a particular group to be watched by more
>>>>>>>>>>>>>>>> counters, I wouldn't want this to result in allocating more counters
>>>>>>>>>>>>>>>> for that group in all domains, or allocating counters in domains where
>>>>>>>>>>>>>>>> they're not needed. I want to encourage my users to avoid allocating
>>>>>>>>>>>>>>>> monitoring resources in domains where a job is not allowed to run so
>>>>>>>>>>>>>>>> there's less pressure on the counters.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In Dave's proposal it looks like global configuration means
>>>>>>>>>>>>>>>> globally-defined "named counter configurations", which works because
>>>>>>>>>>>>>>>> it's really per-domain assignment of the configurations to however
>>>>>>>>>>>>>>>> many counters the group needs in each domain.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think I am becoming lost. Would a global configuration not break your
>>>>>>>>>>>>>>> view of "event-set applied to a single counter"? If a counter is configured
>>>>>>>>>>>>>>> globally then it would not make it possible to support the full configurability
>>>>>>>>>>>>>>> of the hardware.
>>>>>>>>>>>>>>> Before I add more confusion, let me try with an example that builds on your
>>>>>>>>>>>>>>> earlier example copied below:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Since the above states "per domain" I rewrite the example to highlight that as
>>>>>>>>>>>>>>> I understand it:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You mention that you do not want counters to be allocated in domains that they
>>>>>>>>>>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>>>>>>>>>>>>>>> in domain 1, resulting in:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> With counter 0 and counter 1 available in domain 1, these counters could
>>>>>>>>>>>>>>> theoretically be configured to give group 1 more data in domain 1:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill
>>>>>>>>>>>>>>>     counter 1: LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>     counter 2: LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>     counter 3: VictimBW
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The counters are shown with different per-domain configurations that seems to
>>>>>>>>>>>>>>> match with earlier goals of (a) choose events counted by each counter and
>>>>>>>>>>>>>>> (b) do not allocate counters in domains where they are not needed. As I
>>>>>>>>>>>>>>> understand the above does contradict global counter configuration though.
>>>>>>>>>>>>>>> Or do you mean that only the *name* of the counter is global and then
>>>>>>>>>>>>>>> that it is reconfigured as part of every assignment?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, I meant only the *name* is global. I assume based on a particular
>>>>>>>>>>>>>> system configuration, the user will settle on a handful of useful
>>>>>>>>>>>>>> groupings to count.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    # define global configurations (in ABMC terms), not necessarily in this
>>>>>>>>>>>>>>    # syntax and probably not in the mbm_assign_control file.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>    w=VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    # legacy "total" configuration, effectively r+w
>>>>>>>>>>>>>>    t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    /group0/0=t;1=t
>>>>>>>>>>>>>>    /group1/0=t;1=t
>>>>>>>>>>>>>>    /group2/0=_;1=t
>>>>>>>>>>>>>>    /group3/0=rw;1=_
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - group2 is restricted to domain 0
>>>>>>>>>>>>>> - group3 is restricted to domain 1
>>>>>>>>>>>>>> - the rest are unrestricted
>>>>>>>>>>>>>> - In group3, we decided we need to separate read and write traffic
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I see. Thank you for the example.
>>>>>>>>>>>>>
>>>>>>>>>>>>> resctrl supports per-domain configurations with the following possible when
>>>>>>>>>>>>> using mbm_total_bytes_config and mbm_local_bytes_config:
>>>>>>>>>>>>>
>>>>>>>>>>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>
>>>>>>>>>>>>>      /group0/0=t;1=t
>>>>>>>>>>>>>      /group1/0=t;1=t
>>>>>>>>>>>>>
>>>>>>>>>>>>> Even though the flags are identical in all domains, the assigned counters will
>>>>>>>>>>>>> be configured differently in each domain.
>>>>>>>>>>>>>
>>>>>>>>>>>>> With this supported by hardware and currently also supported by resctrl it seems
>>>>>>>>>>>>> reasonable to carry this forward to what will be supported next.
>>>>>>>>>>>>
>>>>>>>>>>>> The hardware supports both a per-domain mode, where all groups in a
>>>>>>>>>>>> domain use the same configurations and are limited to two events per
>>>>>>>>>>>> group and a per-group mode where every group can be configured and
>>>>>>>>>>>> assigned freely. This series is using the legacy counter access mode
>>>>>>>>>>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
>>>>>>>>>>>> in the domain can be read. If we chose to read the assigned counter
>>>>>>>>>>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
>>>>>>>>>>>> rather than asking the hardware to find the counter by RMID, we would
>>>>>>>>>>>> not be limited to 2 counters per group/domain and the hardware would
>>>>>>>>>>>> have the same flexibility as on MPAM.
>>>>>>>>>>>
>>>>>>>>>>> In extended mode, the contents of a specific counter can be read by
>>>>>>>>>>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
>>>>>>>>>>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
>>>>>>>>>>> QM_CTR will then return the contents of the specified counter.
>>>>>>>>>>>
>>>>>>>>>>> It is documented below.
>>>>>>>>>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
>>>>>>>>>>>    Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
>>>>>>>>>>>
>>>>>>>>>>> We previously discussed this with you (off the public list) and I
>>>>>>>>>>> initially proposed the extended assignment mode.
>>>>>>>>>>>
>>>>>>>>>>> Yes, the extended mode allows greater flexibility by enabling multiple
>>>>>>>>>>> counters to be assigned to the same group, rather than being limited to
>>>>>>>>>>> just two.
>>>>>>>>>>>
>>>>>>>>>>> However, the challenge is that we currently lack the necessary interfaces
>>>>>>>>>>> to configure multiple events per group. Without these interfaces, the
>>>>>>>>>>> extended mode is not practical at this time.
>>>>>>>>>>>
>>>>>>>>>>> Therefore, we ultimately agreed to use the legacy mode, as it does not
>>>>>>>>>>> require modifications to the existing interface, allowing us to continue
>>>>>>>>>>> using it as is.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> (I might have said something confusing in my last messages because I
>>>>>>>>>>>> had forgotten that I switched to the extended assignment mode when
>>>>>>>>>>>> prototyping with soft-ABMC and MPAM.)
>>>>>>>>>>>>
>>>>>>>>>>>> Forcing all groups on a domain to share the same 2 counter
>>>>>>>>>>>> configurations would not be acceptable for us, as the example I gave
>>>>>>>>>>>> earlier is one I've already been asked about.
>>>>>>>>>>>
>>>>>>>>>>> I don’t see this as a blocker. It should be considered an extension to the
>>>>>>>>>>> current ABMC series. We can easily build on top of this series once we
>>>>>>>>>>> finalize how to configure the multiple event interface for each group.
>>>>>>>>>>
>>>>>>>>>> I don't think it is, either. Only being able to use ABMC to assign
>>>>>>>>>> counters is fine for our use as an incremental step. My longer-term
>>>>>>>>>> concern is the domain-scoped mbm_total_bytes_config and
>>>>>>>>>> mbm_local_bytes_config files, but they were introduced with BMEC, so
>>>>>>>>>> there's already an expectation that the files are present when BMEC is
>>>>>>>>>> supported.
>>>>>>>>>>
>>>>>>>>>> On ABMC hardware that also supports BMEC, I'm concerned about enabling
>>>>>>>>>> ABMC when only the BMEC-style event configuration interface exists.
>>>>>>>>>> The scope of my issue is just whether enabling "full" ABMC support
>>>>>>>>>> will require an additional opt-in, since that could remove the BMEC
>>>>>>>>>> interface. If it does, it's something we can live with.
>>>>>>>>>
>>>>>>>>> As you know, this series is currently blocked without further feedback.
>>>>>>>>>
>>>>>>>>> I’d like to begin reworking these patches to incorporate Peter’s feedback.
>>>>>>>>> Any input or suggestions would be appreciated.
>>>>>>>>>
>>>>>>>>> Here’s what we’ve learned so far:
>>>>>>>>>
>>>>>>>>> 1. Assignments should be independent of BMEC.
>>>>>>>>> 2. We should be able to specify multiple event types to a counter (e.g.,
>>>>>>>>> read, write, victimBM, etc.). This is also called shared counter
>>>>>>>>> 3. There should be an option to assign events per domain.
>>>>>>>>> 4. Currently, only two counters can be assigned per group, but the design
>>>>>>>>> should allow flexibility to assign more in the future as the interface
>>>>>>>>> evolves.
>>>>>>>>> 5. Utilize the extended RMID read mode.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here is my proposal using Peter's earlier example:
>>>>>>>>>
>>>>>>>>> # define event configurations
>>>>>>>>>
>>>>>>>>> ========================================================
>>>>>>>>> Bits    Mnemonics       Description
>>>>>>>>> ====   ========================================================
>>>>>>>>> 6       VictimBW        Dirty Victims from all types of memory
>>>>>>>>> 5       RmtSlowFill     Reads to slow memory in the non-local NUMA domain
>>>>>>>>> 4       LclSlowFill     Reads to slow memory in the local NUMA domain
>>>>>>>>> 3       RmtNTWr         Non-temporal writes to non-local NUMA domain
>>>>>>>>> 2       LclNTWr         Non-temporal writes to local NUMA domain
>>>>>>>>> 1       mtFill          Reads to memory in the non-local NUMA domain
>>>>>>>>> 0       LclFill         Reads to memory in the local NUMA domain
>>>>>>>>> ====    ========================================================
>>>>>>>>>
>>>>>>>>> #Define flags based on combination of above event types.
>>>>>>>>>
>>>>>>>>> t = LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>> l = LclFill, LclNTWr, LclSlowFill
>>>>>>>>> r = LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>> w = VictimBW,LclNTWr,RmtNTWr
>>>>>>>>> v = VictimBW
>>>>>>>>>
>>>>>>>>> Peter suggested the following format earlier :
>>>>>>>>>
>>>>>>>>> /group0/0=t;1=t
>>>>>>>>> /group1/0=t;1=t
>>>>>>>>> /group2/0=_;1=t
>>>>>>>>> /group3/0=rw;1=_
>>>>>>>>
>>>>>>>> After some inquiries within Google, it sounds like nobody has invested
>>>>>>>> much into the current mbm_assign_control format yet, so it would be
>>>>>>>> best to drop it and distribute the configuration around the filesystem
>>>>>>>> hierarchy[1], which should allow us to produce something more flexible
>>>>>>>> and cleaner to implement.
>>>>>>>>
>>>>>>>> Roughly what I had in mind:
>>>>>>>>
>>>>>>>> Use mkdir in a info/<resource>_MON subdirectory to create free-form
>>>>>>>> names for the assignable configurations rather than being restricted
>>>>>>>> to single letters.  In the resulting directory, populate a file where
>>>>>>>> we can specify the set of events the config should represent. I think
>>>>>>>> we should use symbolic names for the events rather than raw BMEC field
>>>>>>>> values. Moving forward we could come up with portable names for common
>>>>>>>> events and only support the BMEC names on AMD machines for users who
>>>>>>>> want specific events and don't care about portability.
>>>>>>>
>>>>>>>
>>>>>>> I’m still processing this. Let me start with some initial questions.
>>>>>>>
>>>>>>> So, we are creating event configurations here, which seems reasonable.
>>>>>>>
>>>>>>> Yes, we should use portable names and are not limited to BMEC names.
>>>>>>>
>>>>>>> How many configurations should we allow? Do we know?
>>>>>>
>>>>>> Do we need an upper limit?
>>>>>
>>>>> I think so. This needs to be maintained in some data structure. We can
>>>>> start with 2 default configurations for now.
> 
> There is a big difference between no upper limit and 2. The hardware is
> capable of supporting per-domain configurations so more flexibility is
> certainly possible. Consider the example presented by Peter in:
> https://lore.kernel.org/lkml/CALPaoCi0mFZ9TycyNs+SCR+2tuRJovQ2809jYMun4HtC64hJmA@mail.gmail.com/
> 
>>>>>>>> Next, put assignment-control file nodes in per-domain directories
>>>>>>>> (i.e., mon_data/mon_L3_00/assign_{exclusive,shared}). Writing a
>>>>>>>> counter-configuration name into the file would then allocate a counter
>>>>>>>> in the domain, apply the named configuration, and monitor the parent
>>>>>>>> group-directory. We can also put a group/resource-scoped assign_* file
>>>>>>>> higher in the hierarchy to make it easier for users who want to
>>>>>>>> configure all domains the same for a group.
>>>>>>>
>>>>>>> What is the difference between shared and exclusive?
>>>>>>
>>>>>> Shared assignment[1] means that non-exclusively-assigned counters in
>>>>>> each domain will be scheduled round-robin to the groups requesting
>>>>>> shared access to a counter. In my tests, I assigned the counters long
>>>>>> enough to produce a single 1-second MB/s sample for the per-domain
>>>>>> aggregation files[2].
>>>>>>
>>>>>> These do not need to be implemented immediately, but knowing that they
>>>>>> work addresses the overhead and scalability concerns of reassigning
>>>>>> counters and reading their values.
>>>>>
>>>>> Ok. Lets focus on exclusive assignments for now.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Having three files—assign_shared, assign_exclusive, and unassign—for each
>>>>>>> domain seems excessive. In a system with 32 groups and 12 domains, this
>>>>>>> results in 32 × 12 × 3 files, which is quite large.
>>>>>>>
>>>>>>> There should be a more efficient way to handle this.
>>>>>>>
>>>>>>> Initially, we started with a group-level file for this interface, but it
>>>>>>> was rejected due to the high number of sysfs calls, making it inefficient.
>>>>>>
>>>>>> I had rejected it due to the high-frequency of access of a large
>>>>>> number of files, which has since been addressed by shared assignment
>>>>>> (or automatic reassignment) and aggregated mbps files.
>>>>>
>>>>> I think we should address this as well. Creating three extra files for
>>>>> each group isn’t ideal when there are more efficient alternatives.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Additionally, how can we list all assignments with a single sysfs call?
>>>>>>>
>>>>>>> That was another problem we need to address.
>>>>>>
>>>>>> This is not a requirement I was aware of. If the user forgot where
>>>>>> they assigned counters (or forgot to disable auto-assignment), they
>>>>>> can read multiple sysfs nodes to remind themselves.
>>>>>
>>>>> I suggest, we should provide users with an option to list the assignments
>>>>> of all groups in a single command. As the number of groups increases, it
>>>>> becomes cumbersome to query each group individually.
>>>>>
>>>>> To achieve this, we can reuse our existing mbm_assign_control interface
>>>>> for this purpose. More details on this below.
>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> The configuration names listed in assign_* would result in files of
>>>>>>>> the same name in the appropriate mon_data domain directories from
>>>>>>>> which the count values can be read.
>>>>>>>>
>>>>>>>>    # mkdir info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>    # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>    # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>    # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>    # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>> LclFill
>>>>>>>> LclNTWr
>>>>>>>> LclSlowFill
>>>>>>>
>>>>>>> I feel we can just have the configs. event_filter file is not required.
>>>>>>
>>>>>> That's right, I forgot that we can implement kernfs_ops::open(). I was
>>>>>> only looking at struct kernfs_syscall_ops
>>>>>>
>>>>>>>
>>>>>>> #cat info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>> LclFill <-rename these to generic names.
>>>>>>> LclNTWr
>>>>>>> LclSlowFill
>>>>>>>
>>>>>>
>>>>>> I think portable and non-portable event names should both be available
>>>>>> as options. There are simple bandwidth measurement mechanisms that
>>>>>> will be applied in general, but when they turn up an issue, it can
>>>>>> often lead to a more focused investigation, requiring more precise
>>>>>> events.
>>>>>
>>>>> I aggree. We should provide both portable and non-portable event names.
>>>>>
>>>>> Here is my draft proposal based on the discussion so far and reusing some
>>>>> of the current interface. Idea here is to start with basic assigment
>>>>> feature with options to enhance it in the future. Feel free to
>>>>> comment/suggest.
>>>>>
>>>>> 1. Event configurations will be in
>>>>>      /sys/fs/resctrl/info/L3_MON/counter_configs/.
>>>>>
>>>>>      There will be two pre-defined configurations by default.
>>>>>
>>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
>>>>>      LclFill, LclNTWr,LclSlowFill,VictimBM,RmtSlowFill,LclSlowFill,RmtFill
>>>>>
>>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>      LclFill, LclNTWr, LclSlowFill
>>>>>
>>>>> 2. Users will have options to update these configurations.
>>>>>
>>>>>      #echo "LclFill, LclNTWr, RmtFill" >
>>>>>         /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>
>>> This part seems odd to me. Now the "mbm_local_bytes" files aren't
>>> reporting "local_bytes" any more. They report something different,
>>> and users only know if they come to check the options currently
>>> configured in this file. Changing the contents without changing
>>> the name seems confusing to me.
>>
>> It is the same behaviour right now with BMEC. It is configurable.
>> By default it is mbm_local_bytes, but users can configure whatever they want to monitor using /info/L3_MON/mbm_local_bytes_config.
>>
>> We can continue the same behaviour with ABMC, but the configuration will be in /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes.
> 
> This could be supported by following Peter's original proposal where the name
> of the counter configuration is provided by the user via a mkdir:
> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
> 
> As he mentioned there could be pre-populated mbm_local_bytes/mbm_total_bytes.

Sure. We can do that. I was thinking in the first phase, just provide the
default pre-defined configuration and option to update the configuration.

We can add the mkdir support later. That way we can provide basic ABMC
support without too much code complexity with mkdir support.

> 
>>
>>>
>>>>>
>>>>>      # #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>      LclFill, LclNTWr, RmtFill
>>>>>
>>>>> 3. The default configurations will be used when user mounts the resctrl.
>>>>>
>>>>>      mount  -t resctrl resctrl /sys/fs/resctrl/
>>>>>      mkdir /sys/fs/resctrl/test/
>>>>>
>>>>> 4. The resctrl group/domains can be in one of these assingnment states.
>>>>>      e: Exclusive
>>>>>      s: Shared
>>>>>      u: Unassigned
>>>>>
>>>>>      Exclusive mode is supported now. Shared mode will be supported in the
>>>>> future.
>>>>>
>>>>> 5. We can use the current /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>> to list the assignment state of all the groups.
>>>>>
>>>>>      Format:
>>>>>      "<CTRL_MON group>/<MON group>/<confguration>:<domain_id>=<assign state>"
>>>>>
>>>>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>      test//mbm_total_bytes:0=e;1=e
>>>>>      test//mbm_local_bytes:0=e;1=e
>>>>>      //mbm_total_bytes:0=e;1=e
>>>>>      //mbm_local_bytes:0=e;1=e
> 
> This would make mbm_assign_control even more unwieldy and quicker to exceed a
> page of data (these examples never seem to reflect those AMD systems with the many
> L3 domains). How to handle resctrl files larger than 4KB needs to be well understood
> and solved when/if going this route.

This problem is not specific this series. I feel it is a generic problem
to many of the semilar interfaces. I dont know how it is addressed. May
have to investigate on this. Any pointers would be helpful.


> 
> There seems to be two opinions about this file at moment. Would it be possible to
> summarize the discussion with pros/cons raised to make an informed selection?
> I understand that Google as represented by Peter no longer requires/requests this
> file but the motivation for this change seems new and does not seem to reduce the
> original motivation for this file. We may also want to separate requirements for reading
> from and writing to this file.

Yea. We can just use mbm_assign_control for reading the assignment states.

Summary: We have two proposals.

First one from Peter:

https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/


Pros
a.  Allows flexible creation of free-form names for assignable
configurations, stored in info/L3_MON/counter_configs/.

b.  Events can be accessed using corresponding free-form names in the
mon_data directory, making it clear to users what each event represents.


Cons:
a. Requires three separate files for assignment in each group
(assign_exclusive, assign_shared, unassign), which might be excessive.

b. No built-in listing support, meaning users must query each group
individually to check assignment states.


Second Proposal (Mine)

https://lore.kernel.org/lkml/a4ab53b5-03be-4299-8853-e86270d46f2e@amd.com/

Pros:

a. Maintains the flexibility of free-form names for assignable
configurations (info/L3_MON/counter_configs/).

b. Events remain accessible via free-form names in mon_data, ensuring
clarity on their purpose.

c. Adds the ability to list assignment states for all groups in a single
command.

Cons:
a.  Potential buffer overflow issues when handling a large number of
groups and domains and code complexity to fix the issue.


Third Option: A Hybrid Approach

We could combine elements from both proposals:

a. Retain the free-form naming approach for assignable configurations in
info/L3_MON/counter_configs/.

b. Use the assignment method from the first proposal:
   $mkdir test
   $echo mbm_local_bytes > test/mon_data/mon_L3_00/assign_exclusive

c. Introduce listing support via the info/L3_MON/mbm_assign_control
interface, enabling users to read assignment states for all groups in one
place. Only reading support.


> 
>>>>>
>>>>> 6. Users can modify the assignment state by writing to mbm_assign_control.
>>>>>
>>>>>      Format:
>>>>>      “<CTRL_MON group>/<MON group>/<configuration>:<domain_id>=<assign state>”
>>>>>
>>>>>      #echo "test//mbm_local_bytes:0=e;1=e" >
>>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>
>>>>>      #echo "test//mbm_local_bytes:0=u;1=u" >
>>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>
>>>>>      # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>      test//mbm_total_bytes:0=u;1=u
>>>>>      test//mbm_local_bytes:0=u;1=u
>>>>>      //mbm_total_bytes:0=e;1=e
>>>>>      //mbm_local_bytes:0=e;1=e
>>>>>
>>>>>      The corresponding events will be read in
>>>>>
>>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_total_bytes
>>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
>>>>>
>>>>> 7. In the first stage, only two configurations(mbm_total_bytes and
>>>>> mbm_local_bytes) will be supported.
>>>>>
>>>>> 8. In the future, there will be options to create multiple configurations
>>>>> and corresponding directory will be created in
>>>>> /sysf/fs/resctrl/test/mon_data/mon_L3_00/<configation name>.
>>>
>>> Would this be done by creating a new file in the /sys/fs/resctrl/info/L3_MON/counter_configs
>>> directory? Like this:
>>>
>>> # echo "LclFill, LclNTWr, RmtFill" >
>>>          /sys/fs/resctrl/info/L3_MON/counter_configs/cache_stuff
>>>
>>> This seems OK (dependent on the user picking meaningful names for
>>> the set of attributes picked ... but if they want to name this
>>> monitor file "brian" then they have to live with any confusion
>>> that they bring on themselves).
>>>
>>> Would this involve an extension to kernfs? I don't see a function
>>> pointer callback for file creation in kernfs_syscall_ops.
>>>
>>>>>
>>>>
>>>> I know you are all busy with multiple series going on parallel. I am still
>>>> waiting for the inputs on this. It will be great if you can spend some time
>>>> on this to see if we can find common ground on the interface.
>>>>
>>>> Thanks
>>>> Babu
>>>
>>> -Tony
>>>
>>
>>
>> thanks
>> Babu
> 
> Reinette
> 
> 

-- 
Thanks
Babu Moger
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 11 months ago
Hi Babu,

On 3/11/25 1:35 PM, Moger, Babu wrote:
> Hi All,
> 
> On 3/10/25 22:51, Reinette Chatre wrote:
>>
>>
>> On 3/10/25 6:44 PM, Moger, Babu wrote:
>>> Hi Tony,
>>>
>>> On 3/10/2025 6:22 PM, Luck, Tony wrote:
>>>> On Mon, Mar 10, 2025 at 05:48:44PM -0500, Moger, Babu wrote:
>>>>> Hi All,
>>>>>
>>>>> On 3/5/2025 1:34 PM, Moger, Babu wrote:
>>>>>> Hi Peter,
>>>>>>
>>>>>> On 3/5/25 04:40, Peter Newman wrote:
>>>>>>> Hi Babu,
>>>>>>>
>>>>>>> On Tue, Mar 4, 2025 at 10:49 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> On 3/4/25 10:44, Peter Newman wrote:
>>>>>>>>> On Mon, Mar 3, 2025 at 8:16 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Peter/Reinette,
>>>>>>>>>>
>>>>>>>>>> On 2/26/25 07:27, Peter Newman wrote:
>>>>>>>>>>> Hi Babu,
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>
>>>>>>>>>>>> On 2/25/25 11:11, Peter Newman wrote:
>>>>>>>>>>>>> Hi Reinette,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2/21/25 5:12 AM, Peter Newman wrote:
>>>>>>>>>>>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>>>>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>>>>>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>>>>>>>>>>>>>>>> <value>
>>>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>>>>>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>>>>>>>>>>>>>>>> is low enough to be of concern.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>>>>>>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>>>>>>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>>>>>>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>>>>>>>>>>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>>>>>>>>>>>>>>>> for.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>>>>>>>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>>>>>>>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>>>>>>>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>>>>>>>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>>>>>>>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>>>>>>>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>>>>>>>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>>>>>>>>>>>>>>>> customers.
>>>>>>>>>>>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>>>>>>>>>>>>>>>> event-set for applying to a single counter rather than as individual
>>>>>>>>>>>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>>>>>>>>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>>>>>>>>>>>>>>>> event names.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thank you for clarifying.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In the letters as events model, choosing the events assigned to a
>>>>>>>>>>>>>>>>>>> group wouldn't be enough information, since we would want to control
>>>>>>>>>>>>>>>>>>> which events should share a counter and which should be counted by
>>>>>>>>>>>>>>>>>>> separate counters. I think the amount of information that would need
>>>>>>>>>>>>>>>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>>>>>>>>>>>>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>>>>>>>>>>>>>>>> writes in ABMC would look like...
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>>>>>>>>>>>>>>>> example and above the counter configuration appears to be global. You do mention
>>>>>>>>>>>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>>>>>>>>>>>>>>>> configuration is a requirement?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If it's global and we want a particular group to be watched by more
>>>>>>>>>>>>>>>>> counters, I wouldn't want this to result in allocating more counters
>>>>>>>>>>>>>>>>> for that group in all domains, or allocating counters in domains where
>>>>>>>>>>>>>>>>> they're not needed. I want to encourage my users to avoid allocating
>>>>>>>>>>>>>>>>> monitoring resources in domains where a job is not allowed to run so
>>>>>>>>>>>>>>>>> there's less pressure on the counters.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In Dave's proposal it looks like global configuration means
>>>>>>>>>>>>>>>>> globally-defined "named counter configurations", which works because
>>>>>>>>>>>>>>>>> it's really per-domain assignment of the configurations to however
>>>>>>>>>>>>>>>>> many counters the group needs in each domain.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think I am becoming lost. Would a global configuration not break your
>>>>>>>>>>>>>>>> view of "event-set applied to a single counter"? If a counter is configured
>>>>>>>>>>>>>>>> globally then it would not make it possible to support the full configurability
>>>>>>>>>>>>>>>> of the hardware.
>>>>>>>>>>>>>>>> Before I add more confusion, let me try with an example that builds on your
>>>>>>>>>>>>>>>> earlier example copied below:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Since the above states "per domain" I rewrite the example to highlight that as
>>>>>>>>>>>>>>>> I understand it:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You mention that you do not want counters to be allocated in domains that they
>>>>>>>>>>>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>>>>>>>>>>>>>>>> in domain 1, resulting in:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> With counter 0 and counter 1 available in domain 1, these counters could
>>>>>>>>>>>>>>>> theoretically be configured to give group 1 more data in domain 1:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill
>>>>>>>>>>>>>>>>     counter 1: LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>     counter 2: LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>     counter 3: VictimBW
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The counters are shown with different per-domain configurations that seems to
>>>>>>>>>>>>>>>> match with earlier goals of (a) choose events counted by each counter and
>>>>>>>>>>>>>>>> (b) do not allocate counters in domains where they are not needed. As I
>>>>>>>>>>>>>>>> understand the above does contradict global counter configuration though.
>>>>>>>>>>>>>>>> Or do you mean that only the *name* of the counter is global and then
>>>>>>>>>>>>>>>> that it is reconfigured as part of every assignment?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, I meant only the *name* is global. I assume based on a particular
>>>>>>>>>>>>>>> system configuration, the user will settle on a handful of useful
>>>>>>>>>>>>>>> groupings to count.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    # define global configurations (in ABMC terms), not necessarily in this
>>>>>>>>>>>>>>>    # syntax and probably not in the mbm_assign_control file.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>    w=VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    # legacy "total" configuration, effectively r+w
>>>>>>>>>>>>>>>    t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    /group0/0=t;1=t
>>>>>>>>>>>>>>>    /group1/0=t;1=t
>>>>>>>>>>>>>>>    /group2/0=_;1=t
>>>>>>>>>>>>>>>    /group3/0=rw;1=_
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - group2 is restricted to domain 0
>>>>>>>>>>>>>>> - group3 is restricted to domain 1
>>>>>>>>>>>>>>> - the rest are unrestricted
>>>>>>>>>>>>>>> - In group3, we decided we need to separate read and write traffic
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I see. Thank you for the example.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> resctrl supports per-domain configurations with the following possible when
>>>>>>>>>>>>>> using mbm_total_bytes_config and mbm_local_bytes_config:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      /group0/0=t;1=t
>>>>>>>>>>>>>>      /group1/0=t;1=t
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Even though the flags are identical in all domains, the assigned counters will
>>>>>>>>>>>>>> be configured differently in each domain.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> With this supported by hardware and currently also supported by resctrl it seems
>>>>>>>>>>>>>> reasonable to carry this forward to what will be supported next.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The hardware supports both a per-domain mode, where all groups in a
>>>>>>>>>>>>> domain use the same configurations and are limited to two events per
>>>>>>>>>>>>> group and a per-group mode where every group can be configured and
>>>>>>>>>>>>> assigned freely. This series is using the legacy counter access mode
>>>>>>>>>>>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
>>>>>>>>>>>>> in the domain can be read. If we chose to read the assigned counter
>>>>>>>>>>>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
>>>>>>>>>>>>> rather than asking the hardware to find the counter by RMID, we would
>>>>>>>>>>>>> not be limited to 2 counters per group/domain and the hardware would
>>>>>>>>>>>>> have the same flexibility as on MPAM.
>>>>>>>>>>>>
>>>>>>>>>>>> In extended mode, the contents of a specific counter can be read by
>>>>>>>>>>>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
>>>>>>>>>>>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
>>>>>>>>>>>> QM_CTR will then return the contents of the specified counter.
>>>>>>>>>>>>
>>>>>>>>>>>> It is documented below.
>>>>>>>>>>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
>>>>>>>>>>>>    Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
>>>>>>>>>>>>
>>>>>>>>>>>> We previously discussed this with you (off the public list) and I
>>>>>>>>>>>> initially proposed the extended assignment mode.
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, the extended mode allows greater flexibility by enabling multiple
>>>>>>>>>>>> counters to be assigned to the same group, rather than being limited to
>>>>>>>>>>>> just two.
>>>>>>>>>>>>
>>>>>>>>>>>> However, the challenge is that we currently lack the necessary interfaces
>>>>>>>>>>>> to configure multiple events per group. Without these interfaces, the
>>>>>>>>>>>> extended mode is not practical at this time.
>>>>>>>>>>>>
>>>>>>>>>>>> Therefore, we ultimately agreed to use the legacy mode, as it does not
>>>>>>>>>>>> require modifications to the existing interface, allowing us to continue
>>>>>>>>>>>> using it as is.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> (I might have said something confusing in my last messages because I
>>>>>>>>>>>>> had forgotten that I switched to the extended assignment mode when
>>>>>>>>>>>>> prototyping with soft-ABMC and MPAM.)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Forcing all groups on a domain to share the same 2 counter
>>>>>>>>>>>>> configurations would not be acceptable for us, as the example I gave
>>>>>>>>>>>>> earlier is one I've already been asked about.
>>>>>>>>>>>>
>>>>>>>>>>>> I don’t see this as a blocker. It should be considered an extension to the
>>>>>>>>>>>> current ABMC series. We can easily build on top of this series once we
>>>>>>>>>>>> finalize how to configure the multiple event interface for each group.
>>>>>>>>>>>
>>>>>>>>>>> I don't think it is, either. Only being able to use ABMC to assign
>>>>>>>>>>> counters is fine for our use as an incremental step. My longer-term
>>>>>>>>>>> concern is the domain-scoped mbm_total_bytes_config and
>>>>>>>>>>> mbm_local_bytes_config files, but they were introduced with BMEC, so
>>>>>>>>>>> there's already an expectation that the files are present when BMEC is
>>>>>>>>>>> supported.
>>>>>>>>>>>
>>>>>>>>>>> On ABMC hardware that also supports BMEC, I'm concerned about enabling
>>>>>>>>>>> ABMC when only the BMEC-style event configuration interface exists.
>>>>>>>>>>> The scope of my issue is just whether enabling "full" ABMC support
>>>>>>>>>>> will require an additional opt-in, since that could remove the BMEC
>>>>>>>>>>> interface. If it does, it's something we can live with.
>>>>>>>>>>
>>>>>>>>>> As you know, this series is currently blocked without further feedback.
>>>>>>>>>>
>>>>>>>>>> I’d like to begin reworking these patches to incorporate Peter’s feedback.
>>>>>>>>>> Any input or suggestions would be appreciated.
>>>>>>>>>>
>>>>>>>>>> Here’s what we’ve learned so far:
>>>>>>>>>>
>>>>>>>>>> 1. Assignments should be independent of BMEC.
>>>>>>>>>> 2. We should be able to specify multiple event types to a counter (e.g.,
>>>>>>>>>> read, write, victimBM, etc.). This is also called shared counter
>>>>>>>>>> 3. There should be an option to assign events per domain.
>>>>>>>>>> 4. Currently, only two counters can be assigned per group, but the design
>>>>>>>>>> should allow flexibility to assign more in the future as the interface
>>>>>>>>>> evolves.
>>>>>>>>>> 5. Utilize the extended RMID read mode.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here is my proposal using Peter's earlier example:
>>>>>>>>>>
>>>>>>>>>> # define event configurations
>>>>>>>>>>
>>>>>>>>>> ========================================================
>>>>>>>>>> Bits    Mnemonics       Description
>>>>>>>>>> ====   ========================================================
>>>>>>>>>> 6       VictimBW        Dirty Victims from all types of memory
>>>>>>>>>> 5       RmtSlowFill     Reads to slow memory in the non-local NUMA domain
>>>>>>>>>> 4       LclSlowFill     Reads to slow memory in the local NUMA domain
>>>>>>>>>> 3       RmtNTWr         Non-temporal writes to non-local NUMA domain
>>>>>>>>>> 2       LclNTWr         Non-temporal writes to local NUMA domain
>>>>>>>>>> 1       mtFill          Reads to memory in the non-local NUMA domain
>>>>>>>>>> 0       LclFill         Reads to memory in the local NUMA domain
>>>>>>>>>> ====    ========================================================
>>>>>>>>>>
>>>>>>>>>> #Define flags based on combination of above event types.
>>>>>>>>>>
>>>>>>>>>> t = LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>> l = LclFill, LclNTWr, LclSlowFill
>>>>>>>>>> r = LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>> w = VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>> v = VictimBW
>>>>>>>>>>
>>>>>>>>>> Peter suggested the following format earlier :
>>>>>>>>>>
>>>>>>>>>> /group0/0=t;1=t
>>>>>>>>>> /group1/0=t;1=t
>>>>>>>>>> /group2/0=_;1=t
>>>>>>>>>> /group3/0=rw;1=_
>>>>>>>>>
>>>>>>>>> After some inquiries within Google, it sounds like nobody has invested
>>>>>>>>> much into the current mbm_assign_control format yet, so it would be
>>>>>>>>> best to drop it and distribute the configuration around the filesystem
>>>>>>>>> hierarchy[1], which should allow us to produce something more flexible
>>>>>>>>> and cleaner to implement.
>>>>>>>>>
>>>>>>>>> Roughly what I had in mind:
>>>>>>>>>
>>>>>>>>> Use mkdir in a info/<resource>_MON subdirectory to create free-form
>>>>>>>>> names for the assignable configurations rather than being restricted
>>>>>>>>> to single letters.  In the resulting directory, populate a file where
>>>>>>>>> we can specify the set of events the config should represent. I think
>>>>>>>>> we should use symbolic names for the events rather than raw BMEC field
>>>>>>>>> values. Moving forward we could come up with portable names for common
>>>>>>>>> events and only support the BMEC names on AMD machines for users who
>>>>>>>>> want specific events and don't care about portability.
>>>>>>>>
>>>>>>>>
>>>>>>>> I’m still processing this. Let me start with some initial questions.
>>>>>>>>
>>>>>>>> So, we are creating event configurations here, which seems reasonable.
>>>>>>>>
>>>>>>>> Yes, we should use portable names and are not limited to BMEC names.
>>>>>>>>
>>>>>>>> How many configurations should we allow? Do we know?
>>>>>>>
>>>>>>> Do we need an upper limit?
>>>>>>
>>>>>> I think so. This needs to be maintained in some data structure. We can
>>>>>> start with 2 default configurations for now.
>>
>> There is a big difference between no upper limit and 2. The hardware is
>> capable of supporting per-domain configurations so more flexibility is
>> certainly possible. Consider the example presented by Peter in:
>> https://lore.kernel.org/lkml/CALPaoCi0mFZ9TycyNs+SCR+2tuRJovQ2809jYMun4HtC64hJmA@mail.gmail.com/
>>
>>>>>>>>> Next, put assignment-control file nodes in per-domain directories
>>>>>>>>> (i.e., mon_data/mon_L3_00/assign_{exclusive,shared}). Writing a
>>>>>>>>> counter-configuration name into the file would then allocate a counter
>>>>>>>>> in the domain, apply the named configuration, and monitor the parent
>>>>>>>>> group-directory. We can also put a group/resource-scoped assign_* file
>>>>>>>>> higher in the hierarchy to make it easier for users who want to
>>>>>>>>> configure all domains the same for a group.
>>>>>>>>
>>>>>>>> What is the difference between shared and exclusive?
>>>>>>>
>>>>>>> Shared assignment[1] means that non-exclusively-assigned counters in
>>>>>>> each domain will be scheduled round-robin to the groups requesting
>>>>>>> shared access to a counter. In my tests, I assigned the counters long
>>>>>>> enough to produce a single 1-second MB/s sample for the per-domain
>>>>>>> aggregation files[2].
>>>>>>>
>>>>>>> These do not need to be implemented immediately, but knowing that they
>>>>>>> work addresses the overhead and scalability concerns of reassigning
>>>>>>> counters and reading their values.
>>>>>>
>>>>>> Ok. Lets focus on exclusive assignments for now.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Having three files—assign_shared, assign_exclusive, and unassign—for each
>>>>>>>> domain seems excessive. In a system with 32 groups and 12 domains, this
>>>>>>>> results in 32 × 12 × 3 files, which is quite large.
>>>>>>>>
>>>>>>>> There should be a more efficient way to handle this.
>>>>>>>>
>>>>>>>> Initially, we started with a group-level file for this interface, but it
>>>>>>>> was rejected due to the high number of sysfs calls, making it inefficient.
>>>>>>>
>>>>>>> I had rejected it due to the high-frequency of access of a large
>>>>>>> number of files, which has since been addressed by shared assignment
>>>>>>> (or automatic reassignment) and aggregated mbps files.
>>>>>>
>>>>>> I think we should address this as well. Creating three extra files for
>>>>>> each group isn’t ideal when there are more efficient alternatives.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Additionally, how can we list all assignments with a single sysfs call?
>>>>>>>>
>>>>>>>> That was another problem we need to address.
>>>>>>>
>>>>>>> This is not a requirement I was aware of. If the user forgot where
>>>>>>> they assigned counters (or forgot to disable auto-assignment), they
>>>>>>> can read multiple sysfs nodes to remind themselves.
>>>>>>
>>>>>> I suggest, we should provide users with an option to list the assignments
>>>>>> of all groups in a single command. As the number of groups increases, it
>>>>>> becomes cumbersome to query each group individually.
>>>>>>
>>>>>> To achieve this, we can reuse our existing mbm_assign_control interface
>>>>>> for this purpose. More details on this below.
>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> The configuration names listed in assign_* would result in files of
>>>>>>>>> the same name in the appropriate mon_data domain directories from
>>>>>>>>> which the count values can be read.
>>>>>>>>>
>>>>>>>>>    # mkdir info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>>    # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>    # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>    # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>    # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>> LclFill
>>>>>>>>> LclNTWr
>>>>>>>>> LclSlowFill
>>>>>>>>
>>>>>>>> I feel we can just have the configs. event_filter file is not required.
>>>>>>>
>>>>>>> That's right, I forgot that we can implement kernfs_ops::open(). I was
>>>>>>> only looking at struct kernfs_syscall_ops
>>>>>>>
>>>>>>>>
>>>>>>>> #cat info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>> LclFill <-rename these to generic names.
>>>>>>>> LclNTWr
>>>>>>>> LclSlowFill
>>>>>>>>
>>>>>>>
>>>>>>> I think portable and non-portable event names should both be available
>>>>>>> as options. There are simple bandwidth measurement mechanisms that
>>>>>>> will be applied in general, but when they turn up an issue, it can
>>>>>>> often lead to a more focused investigation, requiring more precise
>>>>>>> events.
>>>>>>
>>>>>> I aggree. We should provide both portable and non-portable event names.
>>>>>>
>>>>>> Here is my draft proposal based on the discussion so far and reusing some
>>>>>> of the current interface. Idea here is to start with basic assigment
>>>>>> feature with options to enhance it in the future. Feel free to
>>>>>> comment/suggest.
>>>>>>
>>>>>> 1. Event configurations will be in
>>>>>>      /sys/fs/resctrl/info/L3_MON/counter_configs/.
>>>>>>
>>>>>>      There will be two pre-defined configurations by default.
>>>>>>
>>>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
>>>>>>      LclFill, LclNTWr,LclSlowFill,VictimBM,RmtSlowFill,LclSlowFill,RmtFill
>>>>>>
>>>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>      LclFill, LclNTWr, LclSlowFill
>>>>>>
>>>>>> 2. Users will have options to update these configurations.
>>>>>>
>>>>>>      #echo "LclFill, LclNTWr, RmtFill" >
>>>>>>         /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>
>>>> This part seems odd to me. Now the "mbm_local_bytes" files aren't
>>>> reporting "local_bytes" any more. They report something different,
>>>> and users only know if they come to check the options currently
>>>> configured in this file. Changing the contents without changing
>>>> the name seems confusing to me.
>>>
>>> It is the same behaviour right now with BMEC. It is configurable.
>>> By default it is mbm_local_bytes, but users can configure whatever they want to monitor using /info/L3_MON/mbm_local_bytes_config.
>>>
>>> We can continue the same behaviour with ABMC, but the configuration will be in /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes.
>>
>> This could be supported by following Peter's original proposal where the name
>> of the counter configuration is provided by the user via a mkdir:
>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>
>> As he mentioned there could be pre-populated mbm_local_bytes/mbm_total_bytes.
> 
> Sure. We can do that. I was thinking in the first phase, just provide the
> default pre-defined configuration and option to update the configuration.
> 
> We can add the mkdir support later. That way we can provide basic ABMC
> support without too much code complexity with mkdir support.

This is not clear to me how you envision the "first phase". Is it what you
proposed above, for example:
      #echo "LclFill, LclNTWr, RmtFill" >
         /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes

In above the counter configuration name is a file. 

How could mkdir support be added to this later if there are already files present?

> 
>>
>>>
>>>>
>>>>>>
>>>>>>      # #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>      LclFill, LclNTWr, RmtFill
>>>>>>
>>>>>> 3. The default configurations will be used when user mounts the resctrl.
>>>>>>
>>>>>>      mount  -t resctrl resctrl /sys/fs/resctrl/
>>>>>>      mkdir /sys/fs/resctrl/test/
>>>>>>
>>>>>> 4. The resctrl group/domains can be in one of these assingnment states.
>>>>>>      e: Exclusive
>>>>>>      s: Shared
>>>>>>      u: Unassigned
>>>>>>
>>>>>>      Exclusive mode is supported now. Shared mode will be supported in the
>>>>>> future.
>>>>>>
>>>>>> 5. We can use the current /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>> to list the assignment state of all the groups.
>>>>>>
>>>>>>      Format:
>>>>>>      "<CTRL_MON group>/<MON group>/<confguration>:<domain_id>=<assign state>"
>>>>>>
>>>>>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>      test//mbm_total_bytes:0=e;1=e
>>>>>>      test//mbm_local_bytes:0=e;1=e
>>>>>>      //mbm_total_bytes:0=e;1=e
>>>>>>      //mbm_local_bytes:0=e;1=e
>>
>> This would make mbm_assign_control even more unwieldy and quicker to exceed a
>> page of data (these examples never seem to reflect those AMD systems with the many
>> L3 domains). How to handle resctrl files larger than 4KB needs to be well understood
>> and solved when/if going this route.
> 
> This problem is not specific this series. I feel it is a generic problem
> to many of the semilar interfaces. I dont know how it is addressed. May
> have to investigate on this. Any pointers would be helpful.

Dave Martin already did a lot of analysis here. What other pointers do you need?

> 
> 
>>
>> There seems to be two opinions about this file at moment. Would it be possible to
>> summarize the discussion with pros/cons raised to make an informed selection?
>> I understand that Google as represented by Peter no longer requires/requests this
>> file but the motivation for this change seems new and does not seem to reduce the
>> original motivation for this file. We may also want to separate requirements for reading
>> from and writing to this file.
> 
> Yea. We can just use mbm_assign_control for reading the assignment states.
> 
> Summary: We have two proposals.
> 
> First one from Peter:
> 
> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
> 
> 
> Pros
> a.  Allows flexible creation of free-form names for assignable
> configurations, stored in info/L3_MON/counter_configs/.
> 
> b.  Events can be accessed using corresponding free-form names in the
> mon_data directory, making it clear to users what each event represents.
> 
> 
> Cons:
> a. Requires three separate files for assignment in each group
> (assign_exclusive, assign_shared, unassign), which might be excessive.
> 
> b. No built-in listing support, meaning users must query each group
> individually to check assignment states.
> 
> 
> Second Proposal (Mine)
> 
> https://lore.kernel.org/lkml/a4ab53b5-03be-4299-8853-e86270d46f2e@amd.com/
> 
> Pros:
> 
> a. Maintains the flexibility of free-form names for assignable
> configurations (info/L3_MON/counter_configs/).
> 
> b. Events remain accessible via free-form names in mon_data, ensuring
> clarity on their purpose.
> 
> c. Adds the ability to list assignment states for all groups in a single
> command.
> 
> Cons:
> a.  Potential buffer overflow issues when handling a large number of
> groups and domains and code complexity to fix the issue.
> 
> 
> Third Option: A Hybrid Approach
> 
> We could combine elements from both proposals:
> 
> a. Retain the free-form naming approach for assignable configurations in
> info/L3_MON/counter_configs/.
> 
> b. Use the assignment method from the first proposal:
>    $mkdir test
>    $echo mbm_local_bytes > test/mon_data/mon_L3_00/assign_exclusive
> 
> c. Introduce listing support via the info/L3_MON/mbm_assign_control
> interface, enabling users to read assignment states for all groups in one
> place. Only reading support.
> 
> 
>>
>>>>>>
>>>>>> 6. Users can modify the assignment state by writing to mbm_assign_control.
>>>>>>
>>>>>>      Format:
>>>>>>      “<CTRL_MON group>/<MON group>/<configuration>:<domain_id>=<assign state>”
>>>>>>
>>>>>>      #echo "test//mbm_local_bytes:0=e;1=e" >
>>>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>
>>>>>>      #echo "test//mbm_local_bytes:0=u;1=u" >
>>>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>
>>>>>>      # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>      test//mbm_total_bytes:0=u;1=u
>>>>>>      test//mbm_local_bytes:0=u;1=u
>>>>>>      //mbm_total_bytes:0=e;1=e
>>>>>>      //mbm_local_bytes:0=e;1=e
>>>>>>
>>>>>>      The corresponding events will be read in
>>>>>>
>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_total_bytes
>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
>>>>>>
>>>>>> 7. In the first stage, only two configurations(mbm_total_bytes and
>>>>>> mbm_local_bytes) will be supported.
>>>>>>
>>>>>> 8. In the future, there will be options to create multiple configurations
>>>>>> and corresponding directory will be created in
>>>>>> /sysf/fs/resctrl/test/mon_data/mon_L3_00/<configation name>.
>>>>
>>>> Would this be done by creating a new file in the /sys/fs/resctrl/info/L3_MON/counter_configs
>>>> directory? Like this:
>>>>
>>>> # echo "LclFill, LclNTWr, RmtFill" >
>>>>          /sys/fs/resctrl/info/L3_MON/counter_configs/cache_stuff
>>>>
>>>> This seems OK (dependent on the user picking meaningful names for
>>>> the set of attributes picked ... but if they want to name this
>>>> monitor file "brian" then they have to live with any confusion
>>>> that they bring on themselves).
>>>>
>>>> Would this involve an extension to kernfs? I don't see a function
>>>> pointer callback for file creation in kernfs_syscall_ops.
>>>>
>>>>>>
>>>>>
>>>>> I know you are all busy with multiple series going on parallel. I am still
>>>>> waiting for the inputs on this. It will be great if you can spend some time
>>>>> on this to see if we can find common ground on the interface.
>>>>>
>>>>> Thanks
>>>>> Babu
>>>>
>>>> -Tony
>>>>
>>>
>>>
>>> thanks
>>> Babu
>>
>> Reinette
>>
>>
> 

Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 11 months ago
Hi Reinette,

On 3/12/25 10:07, Reinette Chatre wrote:
> Hi Babu,
> 
> On 3/11/25 1:35 PM, Moger, Babu wrote:
>> Hi All,
>>
>> On 3/10/25 22:51, Reinette Chatre wrote:
>>>
>>>
>>> On 3/10/25 6:44 PM, Moger, Babu wrote:
>>>> Hi Tony,
>>>>
>>>> On 3/10/2025 6:22 PM, Luck, Tony wrote:
>>>>> On Mon, Mar 10, 2025 at 05:48:44PM -0500, Moger, Babu wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> On 3/5/2025 1:34 PM, Moger, Babu wrote:
>>>>>>> Hi Peter,
>>>>>>>
>>>>>>> On 3/5/25 04:40, Peter Newman wrote:
>>>>>>>> Hi Babu,
>>>>>>>>
>>>>>>>> On Tue, Mar 4, 2025 at 10:49 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi Peter,
>>>>>>>>>
>>>>>>>>> On 3/4/25 10:44, Peter Newman wrote:
>>>>>>>>>> On Mon, Mar 3, 2025 at 8:16 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Peter/Reinette,
>>>>>>>>>>>
>>>>>>>>>>> On 2/26/25 07:27, Peter Newman wrote:
>>>>>>>>>>>> Hi Babu,
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2/25/25 11:11, Peter Newman wrote:
>>>>>>>>>>>>>> Hi Reinette,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 2/21/25 5:12 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>>>>>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>>>>>>>>>>>>>>>>> <value>
>>>>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>>>>>>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>>>>>>>>>>>>>>>>> is low enough to be of concern.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>>>>>>>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>>>>>>>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>>>>>>>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>>>>>>>>>>>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>>>>>>>>>>>>>>>>> for.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>>>>>>>>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>>>>>>>>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>>>>>>>>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>>>>>>>>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>>>>>>>>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>>>>>>>>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>>>>>>>>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>>>>>>>>>>>>>>>>> customers.
>>>>>>>>>>>>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>>>>>>>>>>>>>>>>> event-set for applying to a single counter rather than as individual
>>>>>>>>>>>>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>>>>>>>>>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>>>>>>>>>>>>>>>>> event names.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thank you for clarifying.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> In the letters as events model, choosing the events assigned to a
>>>>>>>>>>>>>>>>>>>> group wouldn't be enough information, since we would want to control
>>>>>>>>>>>>>>>>>>>> which events should share a counter and which should be counted by
>>>>>>>>>>>>>>>>>>>> separate counters. I think the amount of information that would need
>>>>>>>>>>>>>>>>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>>>>>>>>>>>>>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>>>>>>>>>>>>>>>>> writes in ABMC would look like...
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>>>>>>>>>>>>>>>>> example and above the counter configuration appears to be global. You do mention
>>>>>>>>>>>>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>>>>>>>>>>>>>>>>> configuration is a requirement?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If it's global and we want a particular group to be watched by more
>>>>>>>>>>>>>>>>>> counters, I wouldn't want this to result in allocating more counters
>>>>>>>>>>>>>>>>>> for that group in all domains, or allocating counters in domains where
>>>>>>>>>>>>>>>>>> they're not needed. I want to encourage my users to avoid allocating
>>>>>>>>>>>>>>>>>> monitoring resources in domains where a job is not allowed to run so
>>>>>>>>>>>>>>>>>> there's less pressure on the counters.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In Dave's proposal it looks like global configuration means
>>>>>>>>>>>>>>>>>> globally-defined "named counter configurations", which works because
>>>>>>>>>>>>>>>>>> it's really per-domain assignment of the configurations to however
>>>>>>>>>>>>>>>>>> many counters the group needs in each domain.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think I am becoming lost. Would a global configuration not break your
>>>>>>>>>>>>>>>>> view of "event-set applied to a single counter"? If a counter is configured
>>>>>>>>>>>>>>>>> globally then it would not make it possible to support the full configurability
>>>>>>>>>>>>>>>>> of the hardware.
>>>>>>>>>>>>>>>>> Before I add more confusion, let me try with an example that builds on your
>>>>>>>>>>>>>>>>> earlier example copied below:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Since the above states "per domain" I rewrite the example to highlight that as
>>>>>>>>>>>>>>>>> I understand it:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You mention that you do not want counters to be allocated in domains that they
>>>>>>>>>>>>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>>>>>>>>>>>>>>>>> in domain 1, resulting in:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> With counter 0 and counter 1 available in domain 1, these counters could
>>>>>>>>>>>>>>>>> theoretically be configured to give group 1 more data in domain 1:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill
>>>>>>>>>>>>>>>>>     counter 1: LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>     counter 2: LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 3: VictimBW
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The counters are shown with different per-domain configurations that seems to
>>>>>>>>>>>>>>>>> match with earlier goals of (a) choose events counted by each counter and
>>>>>>>>>>>>>>>>> (b) do not allocate counters in domains where they are not needed. As I
>>>>>>>>>>>>>>>>> understand the above does contradict global counter configuration though.
>>>>>>>>>>>>>>>>> Or do you mean that only the *name* of the counter is global and then
>>>>>>>>>>>>>>>>> that it is reconfigured as part of every assignment?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, I meant only the *name* is global. I assume based on a particular
>>>>>>>>>>>>>>>> system configuration, the user will settle on a handful of useful
>>>>>>>>>>>>>>>> groupings to count.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    # define global configurations (in ABMC terms), not necessarily in this
>>>>>>>>>>>>>>>>    # syntax and probably not in the mbm_assign_control file.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>    w=VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    # legacy "total" configuration, effectively r+w
>>>>>>>>>>>>>>>>    t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    /group0/0=t;1=t
>>>>>>>>>>>>>>>>    /group1/0=t;1=t
>>>>>>>>>>>>>>>>    /group2/0=_;1=t
>>>>>>>>>>>>>>>>    /group3/0=rw;1=_
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - group2 is restricted to domain 0
>>>>>>>>>>>>>>>> - group3 is restricted to domain 1
>>>>>>>>>>>>>>>> - the rest are unrestricted
>>>>>>>>>>>>>>>> - In group3, we decided we need to separate read and write traffic
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I see. Thank you for the example.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> resctrl supports per-domain configurations with the following possible when
>>>>>>>>>>>>>>> using mbm_total_bytes_config and mbm_local_bytes_config:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>      /group0/0=t;1=t
>>>>>>>>>>>>>>>      /group1/0=t;1=t
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Even though the flags are identical in all domains, the assigned counters will
>>>>>>>>>>>>>>> be configured differently in each domain.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> With this supported by hardware and currently also supported by resctrl it seems
>>>>>>>>>>>>>>> reasonable to carry this forward to what will be supported next.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The hardware supports both a per-domain mode, where all groups in a
>>>>>>>>>>>>>> domain use the same configurations and are limited to two events per
>>>>>>>>>>>>>> group and a per-group mode where every group can be configured and
>>>>>>>>>>>>>> assigned freely. This series is using the legacy counter access mode
>>>>>>>>>>>>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
>>>>>>>>>>>>>> in the domain can be read. If we chose to read the assigned counter
>>>>>>>>>>>>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
>>>>>>>>>>>>>> rather than asking the hardware to find the counter by RMID, we would
>>>>>>>>>>>>>> not be limited to 2 counters per group/domain and the hardware would
>>>>>>>>>>>>>> have the same flexibility as on MPAM.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In extended mode, the contents of a specific counter can be read by
>>>>>>>>>>>>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
>>>>>>>>>>>>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
>>>>>>>>>>>>> QM_CTR will then return the contents of the specified counter.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It is documented below.
>>>>>>>>>>>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
>>>>>>>>>>>>>    Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
>>>>>>>>>>>>>
>>>>>>>>>>>>> We previously discussed this with you (off the public list) and I
>>>>>>>>>>>>> initially proposed the extended assignment mode.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, the extended mode allows greater flexibility by enabling multiple
>>>>>>>>>>>>> counters to be assigned to the same group, rather than being limited to
>>>>>>>>>>>>> just two.
>>>>>>>>>>>>>
>>>>>>>>>>>>> However, the challenge is that we currently lack the necessary interfaces
>>>>>>>>>>>>> to configure multiple events per group. Without these interfaces, the
>>>>>>>>>>>>> extended mode is not practical at this time.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Therefore, we ultimately agreed to use the legacy mode, as it does not
>>>>>>>>>>>>> require modifications to the existing interface, allowing us to continue
>>>>>>>>>>>>> using it as is.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (I might have said something confusing in my last messages because I
>>>>>>>>>>>>>> had forgotten that I switched to the extended assignment mode when
>>>>>>>>>>>>>> prototyping with soft-ABMC and MPAM.)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Forcing all groups on a domain to share the same 2 counter
>>>>>>>>>>>>>> configurations would not be acceptable for us, as the example I gave
>>>>>>>>>>>>>> earlier is one I've already been asked about.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don’t see this as a blocker. It should be considered an extension to the
>>>>>>>>>>>>> current ABMC series. We can easily build on top of this series once we
>>>>>>>>>>>>> finalize how to configure the multiple event interface for each group.
>>>>>>>>>>>>
>>>>>>>>>>>> I don't think it is, either. Only being able to use ABMC to assign
>>>>>>>>>>>> counters is fine for our use as an incremental step. My longer-term
>>>>>>>>>>>> concern is the domain-scoped mbm_total_bytes_config and
>>>>>>>>>>>> mbm_local_bytes_config files, but they were introduced with BMEC, so
>>>>>>>>>>>> there's already an expectation that the files are present when BMEC is
>>>>>>>>>>>> supported.
>>>>>>>>>>>>
>>>>>>>>>>>> On ABMC hardware that also supports BMEC, I'm concerned about enabling
>>>>>>>>>>>> ABMC when only the BMEC-style event configuration interface exists.
>>>>>>>>>>>> The scope of my issue is just whether enabling "full" ABMC support
>>>>>>>>>>>> will require an additional opt-in, since that could remove the BMEC
>>>>>>>>>>>> interface. If it does, it's something we can live with.
>>>>>>>>>>>
>>>>>>>>>>> As you know, this series is currently blocked without further feedback.
>>>>>>>>>>>
>>>>>>>>>>> I’d like to begin reworking these patches to incorporate Peter’s feedback.
>>>>>>>>>>> Any input or suggestions would be appreciated.
>>>>>>>>>>>
>>>>>>>>>>> Here’s what we’ve learned so far:
>>>>>>>>>>>
>>>>>>>>>>> 1. Assignments should be independent of BMEC.
>>>>>>>>>>> 2. We should be able to specify multiple event types to a counter (e.g.,
>>>>>>>>>>> read, write, victimBM, etc.). This is also called shared counter
>>>>>>>>>>> 3. There should be an option to assign events per domain.
>>>>>>>>>>> 4. Currently, only two counters can be assigned per group, but the design
>>>>>>>>>>> should allow flexibility to assign more in the future as the interface
>>>>>>>>>>> evolves.
>>>>>>>>>>> 5. Utilize the extended RMID read mode.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Here is my proposal using Peter's earlier example:
>>>>>>>>>>>
>>>>>>>>>>> # define event configurations
>>>>>>>>>>>
>>>>>>>>>>> ========================================================
>>>>>>>>>>> Bits    Mnemonics       Description
>>>>>>>>>>> ====   ========================================================
>>>>>>>>>>> 6       VictimBW        Dirty Victims from all types of memory
>>>>>>>>>>> 5       RmtSlowFill     Reads to slow memory in the non-local NUMA domain
>>>>>>>>>>> 4       LclSlowFill     Reads to slow memory in the local NUMA domain
>>>>>>>>>>> 3       RmtNTWr         Non-temporal writes to non-local NUMA domain
>>>>>>>>>>> 2       LclNTWr         Non-temporal writes to local NUMA domain
>>>>>>>>>>> 1       mtFill          Reads to memory in the non-local NUMA domain
>>>>>>>>>>> 0       LclFill         Reads to memory in the local NUMA domain
>>>>>>>>>>> ====    ========================================================
>>>>>>>>>>>
>>>>>>>>>>> #Define flags based on combination of above event types.
>>>>>>>>>>>
>>>>>>>>>>> t = LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>> l = LclFill, LclNTWr, LclSlowFill
>>>>>>>>>>> r = LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>> w = VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>> v = VictimBW
>>>>>>>>>>>
>>>>>>>>>>> Peter suggested the following format earlier :
>>>>>>>>>>>
>>>>>>>>>>> /group0/0=t;1=t
>>>>>>>>>>> /group1/0=t;1=t
>>>>>>>>>>> /group2/0=_;1=t
>>>>>>>>>>> /group3/0=rw;1=_
>>>>>>>>>>
>>>>>>>>>> After some inquiries within Google, it sounds like nobody has invested
>>>>>>>>>> much into the current mbm_assign_control format yet, so it would be
>>>>>>>>>> best to drop it and distribute the configuration around the filesystem
>>>>>>>>>> hierarchy[1], which should allow us to produce something more flexible
>>>>>>>>>> and cleaner to implement.
>>>>>>>>>>
>>>>>>>>>> Roughly what I had in mind:
>>>>>>>>>>
>>>>>>>>>> Use mkdir in a info/<resource>_MON subdirectory to create free-form
>>>>>>>>>> names for the assignable configurations rather than being restricted
>>>>>>>>>> to single letters.  In the resulting directory, populate a file where
>>>>>>>>>> we can specify the set of events the config should represent. I think
>>>>>>>>>> we should use symbolic names for the events rather than raw BMEC field
>>>>>>>>>> values. Moving forward we could come up with portable names for common
>>>>>>>>>> events and only support the BMEC names on AMD machines for users who
>>>>>>>>>> want specific events and don't care about portability.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I’m still processing this. Let me start with some initial questions.
>>>>>>>>>
>>>>>>>>> So, we are creating event configurations here, which seems reasonable.
>>>>>>>>>
>>>>>>>>> Yes, we should use portable names and are not limited to BMEC names.
>>>>>>>>>
>>>>>>>>> How many configurations should we allow? Do we know?
>>>>>>>>
>>>>>>>> Do we need an upper limit?
>>>>>>>
>>>>>>> I think so. This needs to be maintained in some data structure. We can
>>>>>>> start with 2 default configurations for now.
>>>
>>> There is a big difference between no upper limit and 2. The hardware is
>>> capable of supporting per-domain configurations so more flexibility is
>>> certainly possible. Consider the example presented by Peter in:
>>> https://lore.kernel.org/lkml/CALPaoCi0mFZ9TycyNs+SCR+2tuRJovQ2809jYMun4HtC64hJmA@mail.gmail.com/
>>>
>>>>>>>>>> Next, put assignment-control file nodes in per-domain directories
>>>>>>>>>> (i.e., mon_data/mon_L3_00/assign_{exclusive,shared}). Writing a
>>>>>>>>>> counter-configuration name into the file would then allocate a counter
>>>>>>>>>> in the domain, apply the named configuration, and monitor the parent
>>>>>>>>>> group-directory. We can also put a group/resource-scoped assign_* file
>>>>>>>>>> higher in the hierarchy to make it easier for users who want to
>>>>>>>>>> configure all domains the same for a group.
>>>>>>>>>
>>>>>>>>> What is the difference between shared and exclusive?
>>>>>>>>
>>>>>>>> Shared assignment[1] means that non-exclusively-assigned counters in
>>>>>>>> each domain will be scheduled round-robin to the groups requesting
>>>>>>>> shared access to a counter. In my tests, I assigned the counters long
>>>>>>>> enough to produce a single 1-second MB/s sample for the per-domain
>>>>>>>> aggregation files[2].
>>>>>>>>
>>>>>>>> These do not need to be implemented immediately, but knowing that they
>>>>>>>> work addresses the overhead and scalability concerns of reassigning
>>>>>>>> counters and reading their values.
>>>>>>>
>>>>>>> Ok. Lets focus on exclusive assignments for now.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Having three files—assign_shared, assign_exclusive, and unassign—for each
>>>>>>>>> domain seems excessive. In a system with 32 groups and 12 domains, this
>>>>>>>>> results in 32 × 12 × 3 files, which is quite large.
>>>>>>>>>
>>>>>>>>> There should be a more efficient way to handle this.
>>>>>>>>>
>>>>>>>>> Initially, we started with a group-level file for this interface, but it
>>>>>>>>> was rejected due to the high number of sysfs calls, making it inefficient.
>>>>>>>>
>>>>>>>> I had rejected it due to the high-frequency of access of a large
>>>>>>>> number of files, which has since been addressed by shared assignment
>>>>>>>> (or automatic reassignment) and aggregated mbps files.
>>>>>>>
>>>>>>> I think we should address this as well. Creating three extra files for
>>>>>>> each group isn’t ideal when there are more efficient alternatives.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Additionally, how can we list all assignments with a single sysfs call?
>>>>>>>>>
>>>>>>>>> That was another problem we need to address.
>>>>>>>>
>>>>>>>> This is not a requirement I was aware of. If the user forgot where
>>>>>>>> they assigned counters (or forgot to disable auto-assignment), they
>>>>>>>> can read multiple sysfs nodes to remind themselves.
>>>>>>>
>>>>>>> I suggest, we should provide users with an option to list the assignments
>>>>>>> of all groups in a single command. As the number of groups increases, it
>>>>>>> becomes cumbersome to query each group individually.
>>>>>>>
>>>>>>> To achieve this, we can reuse our existing mbm_assign_control interface
>>>>>>> for this purpose. More details on this below.
>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The configuration names listed in assign_* would result in files of
>>>>>>>>>> the same name in the appropriate mon_data domain directories from
>>>>>>>>>> which the count values can be read.
>>>>>>>>>>
>>>>>>>>>>    # mkdir info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>>>    # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>    # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>    # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>    # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>> LclFill
>>>>>>>>>> LclNTWr
>>>>>>>>>> LclSlowFill
>>>>>>>>>
>>>>>>>>> I feel we can just have the configs. event_filter file is not required.
>>>>>>>>
>>>>>>>> That's right, I forgot that we can implement kernfs_ops::open(). I was
>>>>>>>> only looking at struct kernfs_syscall_ops
>>>>>>>>
>>>>>>>>>
>>>>>>>>> #cat info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>> LclFill <-rename these to generic names.
>>>>>>>>> LclNTWr
>>>>>>>>> LclSlowFill
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think portable and non-portable event names should both be available
>>>>>>>> as options. There are simple bandwidth measurement mechanisms that
>>>>>>>> will be applied in general, but when they turn up an issue, it can
>>>>>>>> often lead to a more focused investigation, requiring more precise
>>>>>>>> events.
>>>>>>>
>>>>>>> I aggree. We should provide both portable and non-portable event names.
>>>>>>>
>>>>>>> Here is my draft proposal based on the discussion so far and reusing some
>>>>>>> of the current interface. Idea here is to start with basic assigment
>>>>>>> feature with options to enhance it in the future. Feel free to
>>>>>>> comment/suggest.
>>>>>>>
>>>>>>> 1. Event configurations will be in
>>>>>>>      /sys/fs/resctrl/info/L3_MON/counter_configs/.
>>>>>>>
>>>>>>>      There will be two pre-defined configurations by default.
>>>>>>>
>>>>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
>>>>>>>      LclFill, LclNTWr,LclSlowFill,VictimBM,RmtSlowFill,LclSlowFill,RmtFill
>>>>>>>
>>>>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>      LclFill, LclNTWr, LclSlowFill
>>>>>>>
>>>>>>> 2. Users will have options to update these configurations.
>>>>>>>
>>>>>>>      #echo "LclFill, LclNTWr, RmtFill" >
>>>>>>>         /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>
>>>>> This part seems odd to me. Now the "mbm_local_bytes" files aren't
>>>>> reporting "local_bytes" any more. They report something different,
>>>>> and users only know if they come to check the options currently
>>>>> configured in this file. Changing the contents without changing
>>>>> the name seems confusing to me.
>>>>
>>>> It is the same behaviour right now with BMEC. It is configurable.
>>>> By default it is mbm_local_bytes, but users can configure whatever they want to monitor using /info/L3_MON/mbm_local_bytes_config.
>>>>
>>>> We can continue the same behaviour with ABMC, but the configuration will be in /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes.
>>>
>>> This could be supported by following Peter's original proposal where the name
>>> of the counter configuration is provided by the user via a mkdir:
>>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>>
>>> As he mentioned there could be pre-populated mbm_local_bytes/mbm_total_bytes.
>>
>> Sure. We can do that. I was thinking in the first phase, just provide the
>> default pre-defined configuration and option to update the configuration.
>>
>> We can add the mkdir support later. That way we can provide basic ABMC
>> support without too much code complexity with mkdir support.
> 
> This is not clear to me how you envision the "first phase". Is it what you
> proposed above, for example:
>       #echo "LclFill, LclNTWr, RmtFill" >
>          /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
> 
> In above the counter configuration name is a file. 

Yes. That is correct.

There will be two configuration files by default when resctrl is mounted
when ABMC is enabled.
/sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
/sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes

> 
> How could mkdir support be added to this later if there are already files present?

We already have these directories when resctrl is mounted.
/sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
/sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
/sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
/sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes

We dont need "mkdir" support for default  configurations.

My plan was to support only the default configurations in the first phase.
That way there is no difference in the usage model with ABMC when mounted.


> 
>>
>>>
>>>>
>>>>>
>>>>>>>
>>>>>>>      # #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>      LclFill, LclNTWr, RmtFill
>>>>>>>
>>>>>>> 3. The default configurations will be used when user mounts the resctrl.
>>>>>>>
>>>>>>>      mount  -t resctrl resctrl /sys/fs/resctrl/
>>>>>>>      mkdir /sys/fs/resctrl/test/
>>>>>>>
>>>>>>> 4. The resctrl group/domains can be in one of these assingnment states.
>>>>>>>      e: Exclusive
>>>>>>>      s: Shared
>>>>>>>      u: Unassigned
>>>>>>>
>>>>>>>      Exclusive mode is supported now. Shared mode will be supported in the
>>>>>>> future.
>>>>>>>
>>>>>>> 5. We can use the current /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>> to list the assignment state of all the groups.
>>>>>>>
>>>>>>>      Format:
>>>>>>>      "<CTRL_MON group>/<MON group>/<confguration>:<domain_id>=<assign state>"
>>>>>>>
>>>>>>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>      test//mbm_total_bytes:0=e;1=e
>>>>>>>      test//mbm_local_bytes:0=e;1=e
>>>>>>>      //mbm_total_bytes:0=e;1=e
>>>>>>>      //mbm_local_bytes:0=e;1=e
>>>
>>> This would make mbm_assign_control even more unwieldy and quicker to exceed a
>>> page of data (these examples never seem to reflect those AMD systems with the many
>>> L3 domains). How to handle resctrl files larger than 4KB needs to be well understood
>>> and solved when/if going this route.
>>
>> This problem is not specific this series. I feel it is a generic problem
>> to many of the semilar interfaces. I dont know how it is addressed. May
>> have to investigate on this. Any pointers would be helpful.
> 
> Dave Martin already did a lot of analysis here. What other pointers do you need?
> 
>>
>>
>>>
>>> There seems to be two opinions about this file at moment. Would it be possible to
>>> summarize the discussion with pros/cons raised to make an informed selection?
>>> I understand that Google as represented by Peter no longer requires/requests this
>>> file but the motivation for this change seems new and does not seem to reduce the
>>> original motivation for this file. We may also want to separate requirements for reading
>>> from and writing to this file.
>>
>> Yea. We can just use mbm_assign_control for reading the assignment states.
>>
>> Summary: We have two proposals.
>>
>> First one from Peter:
>>
>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>
>>
>> Pros
>> a.  Allows flexible creation of free-form names for assignable
>> configurations, stored in info/L3_MON/counter_configs/.
>>
>> b.  Events can be accessed using corresponding free-form names in the
>> mon_data directory, making it clear to users what each event represents.
>>
>>
>> Cons:
>> a. Requires three separate files for assignment in each group
>> (assign_exclusive, assign_shared, unassign), which might be excessive.
>>
>> b. No built-in listing support, meaning users must query each group
>> individually to check assignment states.
>>
>>
>> Second Proposal (Mine)
>>
>> https://lore.kernel.org/lkml/a4ab53b5-03be-4299-8853-e86270d46f2e@amd.com/
>>
>> Pros:
>>
>> a. Maintains the flexibility of free-form names for assignable
>> configurations (info/L3_MON/counter_configs/).
>>
>> b. Events remain accessible via free-form names in mon_data, ensuring
>> clarity on their purpose.
>>
>> c. Adds the ability to list assignment states for all groups in a single
>> command.
>>
>> Cons:
>> a.  Potential buffer overflow issues when handling a large number of
>> groups and domains and code complexity to fix the issue.
>>
>>
>> Third Option: A Hybrid Approach
>>
>> We could combine elements from both proposals:
>>
>> a. Retain the free-form naming approach for assignable configurations in
>> info/L3_MON/counter_configs/.
>>
>> b. Use the assignment method from the first proposal:
>>    $mkdir test
>>    $echo mbm_local_bytes > test/mon_data/mon_L3_00/assign_exclusive
>>
>> c. Introduce listing support via the info/L3_MON/mbm_assign_control
>> interface, enabling users to read assignment states for all groups in one
>> place. Only reading support.
>>
>>
>>>
>>>>>>>
>>>>>>> 6. Users can modify the assignment state by writing to mbm_assign_control.
>>>>>>>
>>>>>>>      Format:
>>>>>>>      “<CTRL_MON group>/<MON group>/<configuration>:<domain_id>=<assign state>”
>>>>>>>
>>>>>>>      #echo "test//mbm_local_bytes:0=e;1=e" >
>>>>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>
>>>>>>>      #echo "test//mbm_local_bytes:0=u;1=u" >
>>>>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>
>>>>>>>      # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>      test//mbm_total_bytes:0=u;1=u
>>>>>>>      test//mbm_local_bytes:0=u;1=u
>>>>>>>      //mbm_total_bytes:0=e;1=e
>>>>>>>      //mbm_local_bytes:0=e;1=e
>>>>>>>
>>>>>>>      The corresponding events will be read in
>>>>>>>
>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_total_bytes
>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
>>>>>>>
>>>>>>> 7. In the first stage, only two configurations(mbm_total_bytes and
>>>>>>> mbm_local_bytes) will be supported.
>>>>>>>
>>>>>>> 8. In the future, there will be options to create multiple configurations
>>>>>>> and corresponding directory will be created in
>>>>>>> /sysf/fs/resctrl/test/mon_data/mon_L3_00/<configation name>.
>>>>>
>>>>> Would this be done by creating a new file in the /sys/fs/resctrl/info/L3_MON/counter_configs
>>>>> directory? Like this:
>>>>>
>>>>> # echo "LclFill, LclNTWr, RmtFill" >
>>>>>          /sys/fs/resctrl/info/L3_MON/counter_configs/cache_stuff
>>>>>
>>>>> This seems OK (dependent on the user picking meaningful names for
>>>>> the set of attributes picked ... but if they want to name this
>>>>> monitor file "brian" then they have to live with any confusion
>>>>> that they bring on themselves).
>>>>>
>>>>> Would this involve an extension to kernfs? I don't see a function
>>>>> pointer callback for file creation in kernfs_syscall_ops.
>>>>>
>>>>>>>
>>>>>>
>>>>>> I know you are all busy with multiple series going on parallel. I am still
>>>>>> waiting for the inputs on this. It will be great if you can spend some time
>>>>>> on this to see if we can find common ground on the interface.
>>>>>>
>>>>>> Thanks
>>>>>> Babu
>>>>>
>>>>> -Tony
>>>>>
>>>>
>>>>
>>>> thanks
>>>> Babu
>>>
>>> Reinette
>>>
>>>
>>
> 
> 

-- 
Thanks
Babu Moger
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 11 months ago
Hi Babu,

On 3/12/25 9:03 AM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 3/12/25 10:07, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 3/11/25 1:35 PM, Moger, Babu wrote:
>>> Hi All,
>>>
>>> On 3/10/25 22:51, Reinette Chatre wrote:
>>>>
>>>>
>>>> On 3/10/25 6:44 PM, Moger, Babu wrote:
>>>>> Hi Tony,
>>>>>
>>>>> On 3/10/2025 6:22 PM, Luck, Tony wrote:
>>>>>> On Mon, Mar 10, 2025 at 05:48:44PM -0500, Moger, Babu wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> On 3/5/2025 1:34 PM, Moger, Babu wrote:
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> On 3/5/25 04:40, Peter Newman wrote:
>>>>>>>>> Hi Babu,
>>>>>>>>>
>>>>>>>>> On Tue, Mar 4, 2025 at 10:49 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Peter,
>>>>>>>>>>
>>>>>>>>>> On 3/4/25 10:44, Peter Newman wrote:
>>>>>>>>>>> On Mon, Mar 3, 2025 at 8:16 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Peter/Reinette,
>>>>>>>>>>>>
>>>>>>>>>>>> On 2/26/25 07:27, Peter Newman wrote:
>>>>>>>>>>>>> Hi Babu,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2/25/25 11:11, Peter Newman wrote:
>>>>>>>>>>>>>>> Hi Reinette,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 2/21/25 5:12 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>>>>>>>>>>>>>>>>>> <value>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>>>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>>>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>>>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>>>>>>>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>>>>>>>>>>>>>>>>>> is low enough to be of concern.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>>>>>>>>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>>>>>>>>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>>>>>>>>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>>>>>>>>>>>>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>>>>>>>>>>>>>>>>>> for.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>>>>>>>>>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>>>>>>>>>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>>>>>>>>>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>>>>>>>>>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>>>>>>>>>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>>>>>>>>>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>>>>>>>>>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>>>>>>>>>>>>>>>>>> customers.
>>>>>>>>>>>>>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>>>>>>>>>>>>>>>>>> event-set for applying to a single counter rather than as individual
>>>>>>>>>>>>>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>>>>>>>>>>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>>>>>>>>>>>>>>>>>> event names.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thank you for clarifying.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> In the letters as events model, choosing the events assigned to a
>>>>>>>>>>>>>>>>>>>>> group wouldn't be enough information, since we would want to control
>>>>>>>>>>>>>>>>>>>>> which events should share a counter and which should be counted by
>>>>>>>>>>>>>>>>>>>>> separate counters. I think the amount of information that would need
>>>>>>>>>>>>>>>>>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>>>>>>>>>>>>>>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>>>>>>>>>>>>>>>>>> writes in ABMC would look like...
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>>>>>>>>>>>>>>>>>> example and above the counter configuration appears to be global. You do mention
>>>>>>>>>>>>>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>>>>>>>>>>>>>>>>>> configuration is a requirement?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> If it's global and we want a particular group to be watched by more
>>>>>>>>>>>>>>>>>>> counters, I wouldn't want this to result in allocating more counters
>>>>>>>>>>>>>>>>>>> for that group in all domains, or allocating counters in domains where
>>>>>>>>>>>>>>>>>>> they're not needed. I want to encourage my users to avoid allocating
>>>>>>>>>>>>>>>>>>> monitoring resources in domains where a job is not allowed to run so
>>>>>>>>>>>>>>>>>>> there's less pressure on the counters.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In Dave's proposal it looks like global configuration means
>>>>>>>>>>>>>>>>>>> globally-defined "named counter configurations", which works because
>>>>>>>>>>>>>>>>>>> it's really per-domain assignment of the configurations to however
>>>>>>>>>>>>>>>>>>> many counters the group needs in each domain.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think I am becoming lost. Would a global configuration not break your
>>>>>>>>>>>>>>>>>> view of "event-set applied to a single counter"? If a counter is configured
>>>>>>>>>>>>>>>>>> globally then it would not make it possible to support the full configurability
>>>>>>>>>>>>>>>>>> of the hardware.
>>>>>>>>>>>>>>>>>> Before I add more confusion, let me try with an example that builds on your
>>>>>>>>>>>>>>>>>> earlier example copied below:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Since the above states "per domain" I rewrite the example to highlight that as
>>>>>>>>>>>>>>>>>> I understand it:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> You mention that you do not want counters to be allocated in domains that they
>>>>>>>>>>>>>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>>>>>>>>>>>>>>>>>> in domain 1, resulting in:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> With counter 0 and counter 1 available in domain 1, these counters could
>>>>>>>>>>>>>>>>>> theoretically be configured to give group 1 more data in domain 1:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill
>>>>>>>>>>>>>>>>>>     counter 1: LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>     counter 2: LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>     counter 3: VictimBW
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The counters are shown with different per-domain configurations that seems to
>>>>>>>>>>>>>>>>>> match with earlier goals of (a) choose events counted by each counter and
>>>>>>>>>>>>>>>>>> (b) do not allocate counters in domains where they are not needed. As I
>>>>>>>>>>>>>>>>>> understand the above does contradict global counter configuration though.
>>>>>>>>>>>>>>>>>> Or do you mean that only the *name* of the counter is global and then
>>>>>>>>>>>>>>>>>> that it is reconfigured as part of every assignment?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yes, I meant only the *name* is global. I assume based on a particular
>>>>>>>>>>>>>>>>> system configuration, the user will settle on a handful of useful
>>>>>>>>>>>>>>>>> groupings to count.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    # define global configurations (in ABMC terms), not necessarily in this
>>>>>>>>>>>>>>>>>    # syntax and probably not in the mbm_assign_control file.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>    w=VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    # legacy "total" configuration, effectively r+w
>>>>>>>>>>>>>>>>>    t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    /group0/0=t;1=t
>>>>>>>>>>>>>>>>>    /group1/0=t;1=t
>>>>>>>>>>>>>>>>>    /group2/0=_;1=t
>>>>>>>>>>>>>>>>>    /group3/0=rw;1=_
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> - group2 is restricted to domain 0
>>>>>>>>>>>>>>>>> - group3 is restricted to domain 1
>>>>>>>>>>>>>>>>> - the rest are unrestricted
>>>>>>>>>>>>>>>>> - In group3, we decided we need to separate read and write traffic
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I see. Thank you for the example.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> resctrl supports per-domain configurations with the following possible when
>>>>>>>>>>>>>>>> using mbm_total_bytes_config and mbm_local_bytes_config:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>      /group0/0=t;1=t
>>>>>>>>>>>>>>>>      /group1/0=t;1=t
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Even though the flags are identical in all domains, the assigned counters will
>>>>>>>>>>>>>>>> be configured differently in each domain.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> With this supported by hardware and currently also supported by resctrl it seems
>>>>>>>>>>>>>>>> reasonable to carry this forward to what will be supported next.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The hardware supports both a per-domain mode, where all groups in a
>>>>>>>>>>>>>>> domain use the same configurations and are limited to two events per
>>>>>>>>>>>>>>> group and a per-group mode where every group can be configured and
>>>>>>>>>>>>>>> assigned freely. This series is using the legacy counter access mode
>>>>>>>>>>>>>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
>>>>>>>>>>>>>>> in the domain can be read. If we chose to read the assigned counter
>>>>>>>>>>>>>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
>>>>>>>>>>>>>>> rather than asking the hardware to find the counter by RMID, we would
>>>>>>>>>>>>>>> not be limited to 2 counters per group/domain and the hardware would
>>>>>>>>>>>>>>> have the same flexibility as on MPAM.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In extended mode, the contents of a specific counter can be read by
>>>>>>>>>>>>>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
>>>>>>>>>>>>>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
>>>>>>>>>>>>>> QM_CTR will then return the contents of the specified counter.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It is documented below.
>>>>>>>>>>>>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
>>>>>>>>>>>>>>    Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We previously discussed this with you (off the public list) and I
>>>>>>>>>>>>>> initially proposed the extended assignment mode.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, the extended mode allows greater flexibility by enabling multiple
>>>>>>>>>>>>>> counters to be assigned to the same group, rather than being limited to
>>>>>>>>>>>>>> just two.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> However, the challenge is that we currently lack the necessary interfaces
>>>>>>>>>>>>>> to configure multiple events per group. Without these interfaces, the
>>>>>>>>>>>>>> extended mode is not practical at this time.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Therefore, we ultimately agreed to use the legacy mode, as it does not
>>>>>>>>>>>>>> require modifications to the existing interface, allowing us to continue
>>>>>>>>>>>>>> using it as is.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (I might have said something confusing in my last messages because I
>>>>>>>>>>>>>>> had forgotten that I switched to the extended assignment mode when
>>>>>>>>>>>>>>> prototyping with soft-ABMC and MPAM.)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Forcing all groups on a domain to share the same 2 counter
>>>>>>>>>>>>>>> configurations would not be acceptable for us, as the example I gave
>>>>>>>>>>>>>>> earlier is one I've already been asked about.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don’t see this as a blocker. It should be considered an extension to the
>>>>>>>>>>>>>> current ABMC series. We can easily build on top of this series once we
>>>>>>>>>>>>>> finalize how to configure the multiple event interface for each group.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't think it is, either. Only being able to use ABMC to assign
>>>>>>>>>>>>> counters is fine for our use as an incremental step. My longer-term
>>>>>>>>>>>>> concern is the domain-scoped mbm_total_bytes_config and
>>>>>>>>>>>>> mbm_local_bytes_config files, but they were introduced with BMEC, so
>>>>>>>>>>>>> there's already an expectation that the files are present when BMEC is
>>>>>>>>>>>>> supported.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On ABMC hardware that also supports BMEC, I'm concerned about enabling
>>>>>>>>>>>>> ABMC when only the BMEC-style event configuration interface exists.
>>>>>>>>>>>>> The scope of my issue is just whether enabling "full" ABMC support
>>>>>>>>>>>>> will require an additional opt-in, since that could remove the BMEC
>>>>>>>>>>>>> interface. If it does, it's something we can live with.
>>>>>>>>>>>>
>>>>>>>>>>>> As you know, this series is currently blocked without further feedback.
>>>>>>>>>>>>
>>>>>>>>>>>> I’d like to begin reworking these patches to incorporate Peter’s feedback.
>>>>>>>>>>>> Any input or suggestions would be appreciated.
>>>>>>>>>>>>
>>>>>>>>>>>> Here’s what we’ve learned so far:
>>>>>>>>>>>>
>>>>>>>>>>>> 1. Assignments should be independent of BMEC.
>>>>>>>>>>>> 2. We should be able to specify multiple event types to a counter (e.g.,
>>>>>>>>>>>> read, write, victimBM, etc.). This is also called shared counter
>>>>>>>>>>>> 3. There should be an option to assign events per domain.
>>>>>>>>>>>> 4. Currently, only two counters can be assigned per group, but the design
>>>>>>>>>>>> should allow flexibility to assign more in the future as the interface
>>>>>>>>>>>> evolves.
>>>>>>>>>>>> 5. Utilize the extended RMID read mode.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Here is my proposal using Peter's earlier example:
>>>>>>>>>>>>
>>>>>>>>>>>> # define event configurations
>>>>>>>>>>>>
>>>>>>>>>>>> ========================================================
>>>>>>>>>>>> Bits    Mnemonics       Description
>>>>>>>>>>>> ====   ========================================================
>>>>>>>>>>>> 6       VictimBW        Dirty Victims from all types of memory
>>>>>>>>>>>> 5       RmtSlowFill     Reads to slow memory in the non-local NUMA domain
>>>>>>>>>>>> 4       LclSlowFill     Reads to slow memory in the local NUMA domain
>>>>>>>>>>>> 3       RmtNTWr         Non-temporal writes to non-local NUMA domain
>>>>>>>>>>>> 2       LclNTWr         Non-temporal writes to local NUMA domain
>>>>>>>>>>>> 1       mtFill          Reads to memory in the non-local NUMA domain
>>>>>>>>>>>> 0       LclFill         Reads to memory in the local NUMA domain
>>>>>>>>>>>> ====    ========================================================
>>>>>>>>>>>>
>>>>>>>>>>>> #Define flags based on combination of above event types.
>>>>>>>>>>>>
>>>>>>>>>>>> t = LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>> l = LclFill, LclNTWr, LclSlowFill
>>>>>>>>>>>> r = LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>> w = VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>> v = VictimBW
>>>>>>>>>>>>
>>>>>>>>>>>> Peter suggested the following format earlier :
>>>>>>>>>>>>
>>>>>>>>>>>> /group0/0=t;1=t
>>>>>>>>>>>> /group1/0=t;1=t
>>>>>>>>>>>> /group2/0=_;1=t
>>>>>>>>>>>> /group3/0=rw;1=_
>>>>>>>>>>>
>>>>>>>>>>> After some inquiries within Google, it sounds like nobody has invested
>>>>>>>>>>> much into the current mbm_assign_control format yet, so it would be
>>>>>>>>>>> best to drop it and distribute the configuration around the filesystem
>>>>>>>>>>> hierarchy[1], which should allow us to produce something more flexible
>>>>>>>>>>> and cleaner to implement.
>>>>>>>>>>>
>>>>>>>>>>> Roughly what I had in mind:
>>>>>>>>>>>
>>>>>>>>>>> Use mkdir in a info/<resource>_MON subdirectory to create free-form
>>>>>>>>>>> names for the assignable configurations rather than being restricted
>>>>>>>>>>> to single letters.  In the resulting directory, populate a file where
>>>>>>>>>>> we can specify the set of events the config should represent. I think
>>>>>>>>>>> we should use symbolic names for the events rather than raw BMEC field
>>>>>>>>>>> values. Moving forward we could come up with portable names for common
>>>>>>>>>>> events and only support the BMEC names on AMD machines for users who
>>>>>>>>>>> want specific events and don't care about portability.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I’m still processing this. Let me start with some initial questions.
>>>>>>>>>>
>>>>>>>>>> So, we are creating event configurations here, which seems reasonable.
>>>>>>>>>>
>>>>>>>>>> Yes, we should use portable names and are not limited to BMEC names.
>>>>>>>>>>
>>>>>>>>>> How many configurations should we allow? Do we know?
>>>>>>>>>
>>>>>>>>> Do we need an upper limit?
>>>>>>>>
>>>>>>>> I think so. This needs to be maintained in some data structure. We can
>>>>>>>> start with 2 default configurations for now.
>>>>
>>>> There is a big difference between no upper limit and 2. The hardware is
>>>> capable of supporting per-domain configurations so more flexibility is
>>>> certainly possible. Consider the example presented by Peter in:
>>>> https://lore.kernel.org/lkml/CALPaoCi0mFZ9TycyNs+SCR+2tuRJovQ2809jYMun4HtC64hJmA@mail.gmail.com/
>>>>
>>>>>>>>>>> Next, put assignment-control file nodes in per-domain directories
>>>>>>>>>>> (i.e., mon_data/mon_L3_00/assign_{exclusive,shared}). Writing a
>>>>>>>>>>> counter-configuration name into the file would then allocate a counter
>>>>>>>>>>> in the domain, apply the named configuration, and monitor the parent
>>>>>>>>>>> group-directory. We can also put a group/resource-scoped assign_* file
>>>>>>>>>>> higher in the hierarchy to make it easier for users who want to
>>>>>>>>>>> configure all domains the same for a group.
>>>>>>>>>>
>>>>>>>>>> What is the difference between shared and exclusive?
>>>>>>>>>
>>>>>>>>> Shared assignment[1] means that non-exclusively-assigned counters in
>>>>>>>>> each domain will be scheduled round-robin to the groups requesting
>>>>>>>>> shared access to a counter. In my tests, I assigned the counters long
>>>>>>>>> enough to produce a single 1-second MB/s sample for the per-domain
>>>>>>>>> aggregation files[2].
>>>>>>>>>
>>>>>>>>> These do not need to be implemented immediately, but knowing that they
>>>>>>>>> work addresses the overhead and scalability concerns of reassigning
>>>>>>>>> counters and reading their values.
>>>>>>>>
>>>>>>>> Ok. Lets focus on exclusive assignments for now.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Having three files—assign_shared, assign_exclusive, and unassign—for each
>>>>>>>>>> domain seems excessive. In a system with 32 groups and 12 domains, this
>>>>>>>>>> results in 32 × 12 × 3 files, which is quite large.
>>>>>>>>>>
>>>>>>>>>> There should be a more efficient way to handle this.
>>>>>>>>>>
>>>>>>>>>> Initially, we started with a group-level file for this interface, but it
>>>>>>>>>> was rejected due to the high number of sysfs calls, making it inefficient.
>>>>>>>>>
>>>>>>>>> I had rejected it due to the high-frequency of access of a large
>>>>>>>>> number of files, which has since been addressed by shared assignment
>>>>>>>>> (or automatic reassignment) and aggregated mbps files.
>>>>>>>>
>>>>>>>> I think we should address this as well. Creating three extra files for
>>>>>>>> each group isn’t ideal when there are more efficient alternatives.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Additionally, how can we list all assignments with a single sysfs call?
>>>>>>>>>>
>>>>>>>>>> That was another problem we need to address.
>>>>>>>>>
>>>>>>>>> This is not a requirement I was aware of. If the user forgot where
>>>>>>>>> they assigned counters (or forgot to disable auto-assignment), they
>>>>>>>>> can read multiple sysfs nodes to remind themselves.
>>>>>>>>
>>>>>>>> I suggest, we should provide users with an option to list the assignments
>>>>>>>> of all groups in a single command. As the number of groups increases, it
>>>>>>>> becomes cumbersome to query each group individually.
>>>>>>>>
>>>>>>>> To achieve this, we can reuse our existing mbm_assign_control interface
>>>>>>>> for this purpose. More details on this below.
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The configuration names listed in assign_* would result in files of
>>>>>>>>>>> the same name in the appropriate mon_data domain directories from
>>>>>>>>>>> which the count values can be read.
>>>>>>>>>>>
>>>>>>>>>>>    # mkdir info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>>>>    # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>>    # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>>    # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>>    # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>> LclFill
>>>>>>>>>>> LclNTWr
>>>>>>>>>>> LclSlowFill
>>>>>>>>>>
>>>>>>>>>> I feel we can just have the configs. event_filter file is not required.
>>>>>>>>>
>>>>>>>>> That's right, I forgot that we can implement kernfs_ops::open(). I was
>>>>>>>>> only looking at struct kernfs_syscall_ops
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> #cat info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>>> LclFill <-rename these to generic names.
>>>>>>>>>> LclNTWr
>>>>>>>>>> LclSlowFill
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think portable and non-portable event names should both be available
>>>>>>>>> as options. There are simple bandwidth measurement mechanisms that
>>>>>>>>> will be applied in general, but when they turn up an issue, it can
>>>>>>>>> often lead to a more focused investigation, requiring more precise
>>>>>>>>> events.
>>>>>>>>
>>>>>>>> I aggree. We should provide both portable and non-portable event names.
>>>>>>>>
>>>>>>>> Here is my draft proposal based on the discussion so far and reusing some
>>>>>>>> of the current interface. Idea here is to start with basic assigment
>>>>>>>> feature with options to enhance it in the future. Feel free to
>>>>>>>> comment/suggest.
>>>>>>>>
>>>>>>>> 1. Event configurations will be in
>>>>>>>>      /sys/fs/resctrl/info/L3_MON/counter_configs/.
>>>>>>>>
>>>>>>>>      There will be two pre-defined configurations by default.
>>>>>>>>
>>>>>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
>>>>>>>>      LclFill, LclNTWr,LclSlowFill,VictimBM,RmtSlowFill,LclSlowFill,RmtFill
>>>>>>>>
>>>>>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>      LclFill, LclNTWr, LclSlowFill
>>>>>>>>
>>>>>>>> 2. Users will have options to update these configurations.
>>>>>>>>
>>>>>>>>      #echo "LclFill, LclNTWr, RmtFill" >
>>>>>>>>         /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>
>>>>>> This part seems odd to me. Now the "mbm_local_bytes" files aren't
>>>>>> reporting "local_bytes" any more. They report something different,
>>>>>> and users only know if they come to check the options currently
>>>>>> configured in this file. Changing the contents without changing
>>>>>> the name seems confusing to me.
>>>>>
>>>>> It is the same behaviour right now with BMEC. It is configurable.
>>>>> By default it is mbm_local_bytes, but users can configure whatever they want to monitor using /info/L3_MON/mbm_local_bytes_config.
>>>>>
>>>>> We can continue the same behaviour with ABMC, but the configuration will be in /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes.
>>>>
>>>> This could be supported by following Peter's original proposal where the name
>>>> of the counter configuration is provided by the user via a mkdir:
>>>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>>>
>>>> As he mentioned there could be pre-populated mbm_local_bytes/mbm_total_bytes.
>>>
>>> Sure. We can do that. I was thinking in the first phase, just provide the
>>> default pre-defined configuration and option to update the configuration.
>>>
>>> We can add the mkdir support later. That way we can provide basic ABMC
>>> support without too much code complexity with mkdir support.
>>
>> This is not clear to me how you envision the "first phase". Is it what you
>> proposed above, for example:
>>       #echo "LclFill, LclNTWr, RmtFill" >
>>          /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>
>> In above the counter configuration name is a file. 
> 
> Yes. That is correct.
> 
> There will be two configuration files by default when resctrl is mounted
> when ABMC is enabled.
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
> 
>>
>> How could mkdir support be added to this later if there are already files present?
> 
> We already have these directories when resctrl is mounted.
> /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
> /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
> /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
> /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
> 
> We dont need "mkdir" support for default  configurations.

I was referring to the "mkdir" support for additional configurations that
I understood you are thinking about adding later. For example,
(copied from Peter's message
https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/):


 # mkdir info/L3_MON/counter_configs/mbm_local_bytes
 # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
 # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
 # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
 # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
LclFill
LclNTWr
LclSlowFill

Any "later" work needs to be backward compatible with the first phase.

If the first phase starts with a file:
/sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
... I do not see how second phase can be backward compatible when that work
needs a directory with the same name that contains a file for configuration:
/sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter

sidenote: I think interactions with the "event_filter" file needs more
descriptions since it is not clear with the provided example how user space
may want to interact with the file when adding vs replacing event configurations.

> 
> My plan was to support only the default configurations in the first phase.
> That way there is no difference in the usage model with ABMC when mounted.
> 
> 
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>>>
>>>>>>>>      # #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>      LclFill, LclNTWr, RmtFill
>>>>>>>>
>>>>>>>> 3. The default configurations will be used when user mounts the resctrl.
>>>>>>>>
>>>>>>>>      mount  -t resctrl resctrl /sys/fs/resctrl/
>>>>>>>>      mkdir /sys/fs/resctrl/test/
>>>>>>>>
>>>>>>>> 4. The resctrl group/domains can be in one of these assingnment states.
>>>>>>>>      e: Exclusive
>>>>>>>>      s: Shared
>>>>>>>>      u: Unassigned
>>>>>>>>
>>>>>>>>      Exclusive mode is supported now. Shared mode will be supported in the
>>>>>>>> future.
>>>>>>>>
>>>>>>>> 5. We can use the current /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>> to list the assignment state of all the groups.
>>>>>>>>
>>>>>>>>      Format:
>>>>>>>>      "<CTRL_MON group>/<MON group>/<confguration>:<domain_id>=<assign state>"
>>>>>>>>
>>>>>>>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>      test//mbm_total_bytes:0=e;1=e
>>>>>>>>      test//mbm_local_bytes:0=e;1=e
>>>>>>>>      //mbm_total_bytes:0=e;1=e
>>>>>>>>      //mbm_local_bytes:0=e;1=e
>>>>
>>>> This would make mbm_assign_control even more unwieldy and quicker to exceed a
>>>> page of data (these examples never seem to reflect those AMD systems with the many
>>>> L3 domains). How to handle resctrl files larger than 4KB needs to be well understood
>>>> and solved when/if going this route.
>>>
>>> This problem is not specific this series. I feel it is a generic problem
>>> to many of the semilar interfaces. I dont know how it is addressed. May
>>> have to investigate on this. Any pointers would be helpful.
>>
>> Dave Martin already did a lot of analysis here. What other pointers do you need?
>>
>>>
>>>
>>>>
>>>> There seems to be two opinions about this file at moment. Would it be possible to
>>>> summarize the discussion with pros/cons raised to make an informed selection?
>>>> I understand that Google as represented by Peter no longer requires/requests this
>>>> file but the motivation for this change seems new and does not seem to reduce the
>>>> original motivation for this file. We may also want to separate requirements for reading
>>>> from and writing to this file.
>>>
>>> Yea. We can just use mbm_assign_control for reading the assignment states.
>>>
>>> Summary: We have two proposals.
>>>
>>> First one from Peter:
>>>
>>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>>
>>>
>>> Pros
>>> a.  Allows flexible creation of free-form names for assignable
>>> configurations, stored in info/L3_MON/counter_configs/.
>>>
>>> b.  Events can be accessed using corresponding free-form names in the
>>> mon_data directory, making it clear to users what each event represents.
>>>
>>>
>>> Cons:
>>> a. Requires three separate files for assignment in each group
>>> (assign_exclusive, assign_shared, unassign), which might be excessive.
>>>
>>> b. No built-in listing support, meaning users must query each group
>>> individually to check assignment states.
>>>
>>>
>>> Second Proposal (Mine)
>>>
>>> https://lore.kernel.org/lkml/a4ab53b5-03be-4299-8853-e86270d46f2e@amd.com/
>>>
>>> Pros:
>>>
>>> a. Maintains the flexibility of free-form names for assignable
>>> configurations (info/L3_MON/counter_configs/).
>>>
>>> b. Events remain accessible via free-form names in mon_data, ensuring
>>> clarity on their purpose.
>>>
>>> c. Adds the ability to list assignment states for all groups in a single
>>> command.
>>>
>>> Cons:
>>> a.  Potential buffer overflow issues when handling a large number of
>>> groups and domains and code complexity to fix the issue.
>>>
>>>
>>> Third Option: A Hybrid Approach
>>>
>>> We could combine elements from both proposals:
>>>
>>> a. Retain the free-form naming approach for assignable configurations in
>>> info/L3_MON/counter_configs/.
>>>
>>> b. Use the assignment method from the first proposal:
>>>    $mkdir test
>>>    $echo mbm_local_bytes > test/mon_data/mon_L3_00/assign_exclusive
>>>
>>> c. Introduce listing support via the info/L3_MON/mbm_assign_control
>>> interface, enabling users to read assignment states for all groups in one
>>> place. Only reading support.
>>>
>>>
>>>>
>>>>>>>>
>>>>>>>> 6. Users can modify the assignment state by writing to mbm_assign_control.
>>>>>>>>
>>>>>>>>      Format:
>>>>>>>>      “<CTRL_MON group>/<MON group>/<configuration>:<domain_id>=<assign state>”
>>>>>>>>
>>>>>>>>      #echo "test//mbm_local_bytes:0=e;1=e" >
>>>>>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>
>>>>>>>>      #echo "test//mbm_local_bytes:0=u;1=u" >
>>>>>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>
>>>>>>>>      # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>      test//mbm_total_bytes:0=u;1=u
>>>>>>>>      test//mbm_local_bytes:0=u;1=u
>>>>>>>>      //mbm_total_bytes:0=e;1=e
>>>>>>>>      //mbm_local_bytes:0=e;1=e
>>>>>>>>
>>>>>>>>      The corresponding events will be read in
>>>>>>>>
>>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_total_bytes
>>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
>>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
>>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
>>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
>>>>>>>>
>>>>>>>> 7. In the first stage, only two configurations(mbm_total_bytes and
>>>>>>>> mbm_local_bytes) will be supported.
>>>>>>>>
>>>>>>>> 8. In the future, there will be options to create multiple configurations
>>>>>>>> and corresponding directory will be created in
>>>>>>>> /sysf/fs/resctrl/test/mon_data/mon_L3_00/<configation name>.
>>>>>>
>>>>>> Would this be done by creating a new file in the /sys/fs/resctrl/info/L3_MON/counter_configs
>>>>>> directory? Like this:
>>>>>>
>>>>>> # echo "LclFill, LclNTWr, RmtFill" >
>>>>>>          /sys/fs/resctrl/info/L3_MON/counter_configs/cache_stuff
>>>>>>
>>>>>> This seems OK (dependent on the user picking meaningful names for
>>>>>> the set of attributes picked ... but if they want to name this
>>>>>> monitor file "brian" then they have to live with any confusion
>>>>>> that they bring on themselves).
>>>>>>
>>>>>> Would this involve an extension to kernfs? I don't see a function
>>>>>> pointer callback for file creation in kernfs_syscall_ops.
>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> I know you are all busy with multiple series going on parallel. I am still
>>>>>>> waiting for the inputs on this. It will be great if you can spend some time
>>>>>>> on this to see if we can find common ground on the interface.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Babu
>>>>>>
>>>>>> -Tony
>>>>>>
>>>>>
>>>>>
>>>>> thanks
>>>>> Babu
>>>>
>>>> Reinette
>>>>
>>>>
>>>
>>
>>
> 

Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 11 months ago
Hi Reinette,

On 3/12/25 12:14, Reinette Chatre wrote:
> Hi Babu,
> 
> On 3/12/25 9:03 AM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 3/12/25 10:07, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 3/11/25 1:35 PM, Moger, Babu wrote:
>>>> Hi All,
>>>>
>>>> On 3/10/25 22:51, Reinette Chatre wrote:
>>>>>
>>>>>
>>>>> On 3/10/25 6:44 PM, Moger, Babu wrote:
>>>>>> Hi Tony,
>>>>>>
>>>>>> On 3/10/2025 6:22 PM, Luck, Tony wrote:
>>>>>>> On Mon, Mar 10, 2025 at 05:48:44PM -0500, Moger, Babu wrote:
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> On 3/5/2025 1:34 PM, Moger, Babu wrote:
>>>>>>>>> Hi Peter,
>>>>>>>>>
>>>>>>>>> On 3/5/25 04:40, Peter Newman wrote:
>>>>>>>>>> Hi Babu,
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 4, 2025 at 10:49 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>
>>>>>>>>>>> On 3/4/25 10:44, Peter Newman wrote:
>>>>>>>>>>>> On Mon, Mar 3, 2025 at 8:16 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Peter/Reinette,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2/26/25 07:27, Peter Newman wrote:
>>>>>>>>>>>>>> Hi Babu,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 2/25/25 11:11, Peter Newman wrote:
>>>>>>>>>>>>>>>> Hi Reinette,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 2/21/25 5:12 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <value>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>>>>>>>>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>>>>>>>>>>>>>>>>>>> is low enough to be of concern.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>>>>>>>>>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>>>>>>>>>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>>>>>>>>>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>>>>>>>>>>>>>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>>>>>>>>>>>>>>>>>>> for.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>>>>>>>>>>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>>>>>>>>>>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>>>>>>>>>>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>>>>>>>>>>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>>>>>>>>>>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>>>>>>>>>>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>>>>>>>>>>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>>>>>>>>>>>>>>>>>>> customers.
>>>>>>>>>>>>>>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>>>>>>>>>>>>>>>>>>> event-set for applying to a single counter rather than as individual
>>>>>>>>>>>>>>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>>>>>>>>>>>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>>>>>>>>>>>>>>>>>>> event names.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thank you for clarifying.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> In the letters as events model, choosing the events assigned to a
>>>>>>>>>>>>>>>>>>>>>> group wouldn't be enough information, since we would want to control
>>>>>>>>>>>>>>>>>>>>>> which events should share a counter and which should be counted by
>>>>>>>>>>>>>>>>>>>>>> separate counters. I think the amount of information that would need
>>>>>>>>>>>>>>>>>>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>>>>>>>>>>>>>>>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>>>>>>>>>>>>>>>>>>> writes in ABMC would look like...
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>>>>>>>>>>>>>>>>>>> example and above the counter configuration appears to be global. You do mention
>>>>>>>>>>>>>>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>>>>>>>>>>>>>>>>>>> configuration is a requirement?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> If it's global and we want a particular group to be watched by more
>>>>>>>>>>>>>>>>>>>> counters, I wouldn't want this to result in allocating more counters
>>>>>>>>>>>>>>>>>>>> for that group in all domains, or allocating counters in domains where
>>>>>>>>>>>>>>>>>>>> they're not needed. I want to encourage my users to avoid allocating
>>>>>>>>>>>>>>>>>>>> monitoring resources in domains where a job is not allowed to run so
>>>>>>>>>>>>>>>>>>>> there's less pressure on the counters.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> In Dave's proposal it looks like global configuration means
>>>>>>>>>>>>>>>>>>>> globally-defined "named counter configurations", which works because
>>>>>>>>>>>>>>>>>>>> it's really per-domain assignment of the configurations to however
>>>>>>>>>>>>>>>>>>>> many counters the group needs in each domain.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think I am becoming lost. Would a global configuration not break your
>>>>>>>>>>>>>>>>>>> view of "event-set applied to a single counter"? If a counter is configured
>>>>>>>>>>>>>>>>>>> globally then it would not make it possible to support the full configurability
>>>>>>>>>>>>>>>>>>> of the hardware.
>>>>>>>>>>>>>>>>>>> Before I add more confusion, let me try with an example that builds on your
>>>>>>>>>>>>>>>>>>> earlier example copied below:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Since the above states "per domain" I rewrite the example to highlight that as
>>>>>>>>>>>>>>>>>>> I understand it:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> You mention that you do not want counters to be allocated in domains that they
>>>>>>>>>>>>>>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>>>>>>>>>>>>>>>>>>> in domain 1, resulting in:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> With counter 0 and counter 1 available in domain 1, these counters could
>>>>>>>>>>>>>>>>>>> theoretically be configured to give group 1 more data in domain 1:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill
>>>>>>>>>>>>>>>>>>>     counter 1: LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>     counter 2: LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>     counter 3: VictimBW
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The counters are shown with different per-domain configurations that seems to
>>>>>>>>>>>>>>>>>>> match with earlier goals of (a) choose events counted by each counter and
>>>>>>>>>>>>>>>>>>> (b) do not allocate counters in domains where they are not needed. As I
>>>>>>>>>>>>>>>>>>> understand the above does contradict global counter configuration though.
>>>>>>>>>>>>>>>>>>> Or do you mean that only the *name* of the counter is global and then
>>>>>>>>>>>>>>>>>>> that it is reconfigured as part of every assignment?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yes, I meant only the *name* is global. I assume based on a particular
>>>>>>>>>>>>>>>>>> system configuration, the user will settle on a handful of useful
>>>>>>>>>>>>>>>>>> groupings to count.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    # define global configurations (in ABMC terms), not necessarily in this
>>>>>>>>>>>>>>>>>>    # syntax and probably not in the mbm_assign_control file.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>    w=VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    # legacy "total" configuration, effectively r+w
>>>>>>>>>>>>>>>>>>    t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    /group0/0=t;1=t
>>>>>>>>>>>>>>>>>>    /group1/0=t;1=t
>>>>>>>>>>>>>>>>>>    /group2/0=_;1=t
>>>>>>>>>>>>>>>>>>    /group3/0=rw;1=_
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> - group2 is restricted to domain 0
>>>>>>>>>>>>>>>>>> - group3 is restricted to domain 1
>>>>>>>>>>>>>>>>>> - the rest are unrestricted
>>>>>>>>>>>>>>>>>> - In group3, we decided we need to separate read and write traffic
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I see. Thank you for the example.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> resctrl supports per-domain configurations with the following possible when
>>>>>>>>>>>>>>>>> using mbm_total_bytes_config and mbm_local_bytes_config:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>      /group0/0=t;1=t
>>>>>>>>>>>>>>>>>      /group1/0=t;1=t
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Even though the flags are identical in all domains, the assigned counters will
>>>>>>>>>>>>>>>>> be configured differently in each domain.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> With this supported by hardware and currently also supported by resctrl it seems
>>>>>>>>>>>>>>>>> reasonable to carry this forward to what will be supported next.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The hardware supports both a per-domain mode, where all groups in a
>>>>>>>>>>>>>>>> domain use the same configurations and are limited to two events per
>>>>>>>>>>>>>>>> group and a per-group mode where every group can be configured and
>>>>>>>>>>>>>>>> assigned freely. This series is using the legacy counter access mode
>>>>>>>>>>>>>>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
>>>>>>>>>>>>>>>> in the domain can be read. If we chose to read the assigned counter
>>>>>>>>>>>>>>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
>>>>>>>>>>>>>>>> rather than asking the hardware to find the counter by RMID, we would
>>>>>>>>>>>>>>>> not be limited to 2 counters per group/domain and the hardware would
>>>>>>>>>>>>>>>> have the same flexibility as on MPAM.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In extended mode, the contents of a specific counter can be read by
>>>>>>>>>>>>>>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
>>>>>>>>>>>>>>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
>>>>>>>>>>>>>>> QM_CTR will then return the contents of the specified counter.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It is documented below.
>>>>>>>>>>>>>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
>>>>>>>>>>>>>>>    Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We previously discussed this with you (off the public list) and I
>>>>>>>>>>>>>>> initially proposed the extended assignment mode.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, the extended mode allows greater flexibility by enabling multiple
>>>>>>>>>>>>>>> counters to be assigned to the same group, rather than being limited to
>>>>>>>>>>>>>>> just two.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> However, the challenge is that we currently lack the necessary interfaces
>>>>>>>>>>>>>>> to configure multiple events per group. Without these interfaces, the
>>>>>>>>>>>>>>> extended mode is not practical at this time.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Therefore, we ultimately agreed to use the legacy mode, as it does not
>>>>>>>>>>>>>>> require modifications to the existing interface, allowing us to continue
>>>>>>>>>>>>>>> using it as is.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> (I might have said something confusing in my last messages because I
>>>>>>>>>>>>>>>> had forgotten that I switched to the extended assignment mode when
>>>>>>>>>>>>>>>> prototyping with soft-ABMC and MPAM.)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Forcing all groups on a domain to share the same 2 counter
>>>>>>>>>>>>>>>> configurations would not be acceptable for us, as the example I gave
>>>>>>>>>>>>>>>> earlier is one I've already been asked about.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I don’t see this as a blocker. It should be considered an extension to the
>>>>>>>>>>>>>>> current ABMC series. We can easily build on top of this series once we
>>>>>>>>>>>>>>> finalize how to configure the multiple event interface for each group.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't think it is, either. Only being able to use ABMC to assign
>>>>>>>>>>>>>> counters is fine for our use as an incremental step. My longer-term
>>>>>>>>>>>>>> concern is the domain-scoped mbm_total_bytes_config and
>>>>>>>>>>>>>> mbm_local_bytes_config files, but they were introduced with BMEC, so
>>>>>>>>>>>>>> there's already an expectation that the files are present when BMEC is
>>>>>>>>>>>>>> supported.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On ABMC hardware that also supports BMEC, I'm concerned about enabling
>>>>>>>>>>>>>> ABMC when only the BMEC-style event configuration interface exists.
>>>>>>>>>>>>>> The scope of my issue is just whether enabling "full" ABMC support
>>>>>>>>>>>>>> will require an additional opt-in, since that could remove the BMEC
>>>>>>>>>>>>>> interface. If it does, it's something we can live with.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As you know, this series is currently blocked without further feedback.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I’d like to begin reworking these patches to incorporate Peter’s feedback.
>>>>>>>>>>>>> Any input or suggestions would be appreciated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here’s what we’ve learned so far:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. Assignments should be independent of BMEC.
>>>>>>>>>>>>> 2. We should be able to specify multiple event types to a counter (e.g.,
>>>>>>>>>>>>> read, write, victimBM, etc.). This is also called shared counter
>>>>>>>>>>>>> 3. There should be an option to assign events per domain.
>>>>>>>>>>>>> 4. Currently, only two counters can be assigned per group, but the design
>>>>>>>>>>>>> should allow flexibility to assign more in the future as the interface
>>>>>>>>>>>>> evolves.
>>>>>>>>>>>>> 5. Utilize the extended RMID read mode.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is my proposal using Peter's earlier example:
>>>>>>>>>>>>>
>>>>>>>>>>>>> # define event configurations
>>>>>>>>>>>>>
>>>>>>>>>>>>> ========================================================
>>>>>>>>>>>>> Bits    Mnemonics       Description
>>>>>>>>>>>>> ====   ========================================================
>>>>>>>>>>>>> 6       VictimBW        Dirty Victims from all types of memory
>>>>>>>>>>>>> 5       RmtSlowFill     Reads to slow memory in the non-local NUMA domain
>>>>>>>>>>>>> 4       LclSlowFill     Reads to slow memory in the local NUMA domain
>>>>>>>>>>>>> 3       RmtNTWr         Non-temporal writes to non-local NUMA domain
>>>>>>>>>>>>> 2       LclNTWr         Non-temporal writes to local NUMA domain
>>>>>>>>>>>>> 1       mtFill          Reads to memory in the non-local NUMA domain
>>>>>>>>>>>>> 0       LclFill         Reads to memory in the local NUMA domain
>>>>>>>>>>>>> ====    ========================================================
>>>>>>>>>>>>>
>>>>>>>>>>>>> #Define flags based on combination of above event types.
>>>>>>>>>>>>>
>>>>>>>>>>>>> t = LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>> l = LclFill, LclNTWr, LclSlowFill
>>>>>>>>>>>>> r = LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>> w = VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>> v = VictimBW
>>>>>>>>>>>>>
>>>>>>>>>>>>> Peter suggested the following format earlier :
>>>>>>>>>>>>>
>>>>>>>>>>>>> /group0/0=t;1=t
>>>>>>>>>>>>> /group1/0=t;1=t
>>>>>>>>>>>>> /group2/0=_;1=t
>>>>>>>>>>>>> /group3/0=rw;1=_
>>>>>>>>>>>>
>>>>>>>>>>>> After some inquiries within Google, it sounds like nobody has invested
>>>>>>>>>>>> much into the current mbm_assign_control format yet, so it would be
>>>>>>>>>>>> best to drop it and distribute the configuration around the filesystem
>>>>>>>>>>>> hierarchy[1], which should allow us to produce something more flexible
>>>>>>>>>>>> and cleaner to implement.
>>>>>>>>>>>>
>>>>>>>>>>>> Roughly what I had in mind:
>>>>>>>>>>>>
>>>>>>>>>>>> Use mkdir in a info/<resource>_MON subdirectory to create free-form
>>>>>>>>>>>> names for the assignable configurations rather than being restricted
>>>>>>>>>>>> to single letters.  In the resulting directory, populate a file where
>>>>>>>>>>>> we can specify the set of events the config should represent. I think
>>>>>>>>>>>> we should use symbolic names for the events rather than raw BMEC field
>>>>>>>>>>>> values. Moving forward we could come up with portable names for common
>>>>>>>>>>>> events and only support the BMEC names on AMD machines for users who
>>>>>>>>>>>> want specific events and don't care about portability.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I’m still processing this. Let me start with some initial questions.
>>>>>>>>>>>
>>>>>>>>>>> So, we are creating event configurations here, which seems reasonable.
>>>>>>>>>>>
>>>>>>>>>>> Yes, we should use portable names and are not limited to BMEC names.
>>>>>>>>>>>
>>>>>>>>>>> How many configurations should we allow? Do we know?
>>>>>>>>>>
>>>>>>>>>> Do we need an upper limit?
>>>>>>>>>
>>>>>>>>> I think so. This needs to be maintained in some data structure. We can
>>>>>>>>> start with 2 default configurations for now.
>>>>>
>>>>> There is a big difference between no upper limit and 2. The hardware is
>>>>> capable of supporting per-domain configurations so more flexibility is
>>>>> certainly possible. Consider the example presented by Peter in:
>>>>> https://lore.kernel.org/lkml/CALPaoCi0mFZ9TycyNs+SCR+2tuRJovQ2809jYMun4HtC64hJmA@mail.gmail.com/
>>>>>
>>>>>>>>>>>> Next, put assignment-control file nodes in per-domain directories
>>>>>>>>>>>> (i.e., mon_data/mon_L3_00/assign_{exclusive,shared}). Writing a
>>>>>>>>>>>> counter-configuration name into the file would then allocate a counter
>>>>>>>>>>>> in the domain, apply the named configuration, and monitor the parent
>>>>>>>>>>>> group-directory. We can also put a group/resource-scoped assign_* file
>>>>>>>>>>>> higher in the hierarchy to make it easier for users who want to
>>>>>>>>>>>> configure all domains the same for a group.
>>>>>>>>>>>
>>>>>>>>>>> What is the difference between shared and exclusive?
>>>>>>>>>>
>>>>>>>>>> Shared assignment[1] means that non-exclusively-assigned counters in
>>>>>>>>>> each domain will be scheduled round-robin to the groups requesting
>>>>>>>>>> shared access to a counter. In my tests, I assigned the counters long
>>>>>>>>>> enough to produce a single 1-second MB/s sample for the per-domain
>>>>>>>>>> aggregation files[2].
>>>>>>>>>>
>>>>>>>>>> These do not need to be implemented immediately, but knowing that they
>>>>>>>>>> work addresses the overhead and scalability concerns of reassigning
>>>>>>>>>> counters and reading their values.
>>>>>>>>>
>>>>>>>>> Ok. Lets focus on exclusive assignments for now.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Having three files—assign_shared, assign_exclusive, and unassign—for each
>>>>>>>>>>> domain seems excessive. In a system with 32 groups and 12 domains, this
>>>>>>>>>>> results in 32 × 12 × 3 files, which is quite large.
>>>>>>>>>>>
>>>>>>>>>>> There should be a more efficient way to handle this.
>>>>>>>>>>>
>>>>>>>>>>> Initially, we started with a group-level file for this interface, but it
>>>>>>>>>>> was rejected due to the high number of sysfs calls, making it inefficient.
>>>>>>>>>>
>>>>>>>>>> I had rejected it due to the high-frequency of access of a large
>>>>>>>>>> number of files, which has since been addressed by shared assignment
>>>>>>>>>> (or automatic reassignment) and aggregated mbps files.
>>>>>>>>>
>>>>>>>>> I think we should address this as well. Creating three extra files for
>>>>>>>>> each group isn’t ideal when there are more efficient alternatives.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Additionally, how can we list all assignments with a single sysfs call?
>>>>>>>>>>>
>>>>>>>>>>> That was another problem we need to address.
>>>>>>>>>>
>>>>>>>>>> This is not a requirement I was aware of. If the user forgot where
>>>>>>>>>> they assigned counters (or forgot to disable auto-assignment), they
>>>>>>>>>> can read multiple sysfs nodes to remind themselves.
>>>>>>>>>
>>>>>>>>> I suggest, we should provide users with an option to list the assignments
>>>>>>>>> of all groups in a single command. As the number of groups increases, it
>>>>>>>>> becomes cumbersome to query each group individually.
>>>>>>>>>
>>>>>>>>> To achieve this, we can reuse our existing mbm_assign_control interface
>>>>>>>>> for this purpose. More details on this below.
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The configuration names listed in assign_* would result in files of
>>>>>>>>>>>> the same name in the appropriate mon_data domain directories from
>>>>>>>>>>>> which the count values can be read.
>>>>>>>>>>>>
>>>>>>>>>>>>    # mkdir info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>>>>>    # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>>>    # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>>>    # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>>>    # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>>> LclFill
>>>>>>>>>>>> LclNTWr
>>>>>>>>>>>> LclSlowFill
>>>>>>>>>>>
>>>>>>>>>>> I feel we can just have the configs. event_filter file is not required.
>>>>>>>>>>
>>>>>>>>>> That's right, I forgot that we can implement kernfs_ops::open(). I was
>>>>>>>>>> only looking at struct kernfs_syscall_ops
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> #cat info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>>>> LclFill <-rename these to generic names.
>>>>>>>>>>> LclNTWr
>>>>>>>>>>> LclSlowFill
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think portable and non-portable event names should both be available
>>>>>>>>>> as options. There are simple bandwidth measurement mechanisms that
>>>>>>>>>> will be applied in general, but when they turn up an issue, it can
>>>>>>>>>> often lead to a more focused investigation, requiring more precise
>>>>>>>>>> events.
>>>>>>>>>
>>>>>>>>> I aggree. We should provide both portable and non-portable event names.
>>>>>>>>>
>>>>>>>>> Here is my draft proposal based on the discussion so far and reusing some
>>>>>>>>> of the current interface. Idea here is to start with basic assigment
>>>>>>>>> feature with options to enhance it in the future. Feel free to
>>>>>>>>> comment/suggest.
>>>>>>>>>
>>>>>>>>> 1. Event configurations will be in
>>>>>>>>>      /sys/fs/resctrl/info/L3_MON/counter_configs/.
>>>>>>>>>
>>>>>>>>>      There will be two pre-defined configurations by default.
>>>>>>>>>
>>>>>>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
>>>>>>>>>      LclFill, LclNTWr,LclSlowFill,VictimBM,RmtSlowFill,LclSlowFill,RmtFill
>>>>>>>>>
>>>>>>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>>      LclFill, LclNTWr, LclSlowFill
>>>>>>>>>
>>>>>>>>> 2. Users will have options to update these configurations.
>>>>>>>>>
>>>>>>>>>      #echo "LclFill, LclNTWr, RmtFill" >
>>>>>>>>>         /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>
>>>>>>> This part seems odd to me. Now the "mbm_local_bytes" files aren't
>>>>>>> reporting "local_bytes" any more. They report something different,
>>>>>>> and users only know if they come to check the options currently
>>>>>>> configured in this file. Changing the contents without changing
>>>>>>> the name seems confusing to me.
>>>>>>
>>>>>> It is the same behaviour right now with BMEC. It is configurable.
>>>>>> By default it is mbm_local_bytes, but users can configure whatever they want to monitor using /info/L3_MON/mbm_local_bytes_config.
>>>>>>
>>>>>> We can continue the same behaviour with ABMC, but the configuration will be in /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes.
>>>>>
>>>>> This could be supported by following Peter's original proposal where the name
>>>>> of the counter configuration is provided by the user via a mkdir:
>>>>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>>>>
>>>>> As he mentioned there could be pre-populated mbm_local_bytes/mbm_total_bytes.
>>>>
>>>> Sure. We can do that. I was thinking in the first phase, just provide the
>>>> default pre-defined configuration and option to update the configuration.
>>>>
>>>> We can add the mkdir support later. That way we can provide basic ABMC
>>>> support without too much code complexity with mkdir support.
>>>
>>> This is not clear to me how you envision the "first phase". Is it what you
>>> proposed above, for example:
>>>       #echo "LclFill, LclNTWr, RmtFill" >
>>>          /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>
>>> In above the counter configuration name is a file. 
>>
>> Yes. That is correct.
>>
>> There will be two configuration files by default when resctrl is mounted
>> when ABMC is enabled.
>> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
>> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>
>>>
>>> How could mkdir support be added to this later if there are already files present?
>>
>> We already have these directories when resctrl is mounted.
>> /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
>> /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
>> /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
>> /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
>>
>> We dont need "mkdir" support for default  configurations.
> 
> I was referring to the "mkdir" support for additional configurations that
> I understood you are thinking about adding later. For example,
> (copied from Peter's message
> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/):
> 
> 
>  # mkdir info/L3_MON/counter_configs/mbm_local_bytes
>  # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>  # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>  # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>  # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> LclFill
> LclNTWr
> LclSlowFill
> 
> Any "later" work needs to be backward compatible with the first phase.

Actually, we dont need extra file "event_filter".
This was discussed here.
https://lore.kernel.org/lkml/CALPaoChLL8p49eANYgQ0dJiFs7G=223fGae+LJyx3DwEhNeR8A@mail.gmail.com/

# echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes
# echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes
# echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes
# cat info/L3_MON/counter_configs/mbm_local_bytes
 LclFill
 LclNTWr
 LclSlowFill

In the future, we can add mkdir support.

# mkdir info/L3_MON/counter_configs/mbm_read_only
# echo LclFill > info/L3_MON/counter_configs/mbm_read_only
# cat info/L3_MON/counter_configs/mbm_read_only
  LclFill

#echo mbm_read_only > test/mon_data/mon_L3_00/assign_exclusive

Which would result in the creation of test/mon_data/mon_L3_*/mbm_read_only

So, there is not breakage of backword compatibility.

> 
> If the first phase starts with a file:
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
> ... I do not see how second phase can be backward compatible when that work
> needs a directory with the same name that contains a file for configuration:
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> 
> sidenote: I think interactions with the "event_filter" file needs more
> descriptions since it is not clear with the provided example how user space
> may want to interact with the file when adding vs replacing event configurations.
> 
>>
>> My plan was to support only the default configurations in the first phase.
>> That way there is no difference in the usage model with ABMC when mounted.
>>
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>>      # #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>>      LclFill, LclNTWr, RmtFill
>>>>>>>>>
>>>>>>>>> 3. The default configurations will be used when user mounts the resctrl.
>>>>>>>>>
>>>>>>>>>      mount  -t resctrl resctrl /sys/fs/resctrl/
>>>>>>>>>      mkdir /sys/fs/resctrl/test/
>>>>>>>>>
>>>>>>>>> 4. The resctrl group/domains can be in one of these assingnment states.
>>>>>>>>>      e: Exclusive
>>>>>>>>>      s: Shared
>>>>>>>>>      u: Unassigned
>>>>>>>>>
>>>>>>>>>      Exclusive mode is supported now. Shared mode will be supported in the
>>>>>>>>> future.
>>>>>>>>>
>>>>>>>>> 5. We can use the current /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>> to list the assignment state of all the groups.
>>>>>>>>>
>>>>>>>>>      Format:
>>>>>>>>>      "<CTRL_MON group>/<MON group>/<confguration>:<domain_id>=<assign state>"
>>>>>>>>>
>>>>>>>>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>      test//mbm_total_bytes:0=e;1=e
>>>>>>>>>      test//mbm_local_bytes:0=e;1=e
>>>>>>>>>      //mbm_total_bytes:0=e;1=e
>>>>>>>>>      //mbm_local_bytes:0=e;1=e
>>>>>
>>>>> This would make mbm_assign_control even more unwieldy and quicker to exceed a
>>>>> page of data (these examples never seem to reflect those AMD systems with the many
>>>>> L3 domains). How to handle resctrl files larger than 4KB needs to be well understood
>>>>> and solved when/if going this route.
>>>>
>>>> This problem is not specific this series. I feel it is a generic problem
>>>> to many of the semilar interfaces. I dont know how it is addressed. May
>>>> have to investigate on this. Any pointers would be helpful.
>>>
>>> Dave Martin already did a lot of analysis here. What other pointers do you need?

Yea. He did. I still need little more details on implementation of that.
Will come back to that when we decide which way to go.

>>>
>>>>
>>>>
>>>>>
>>>>> There seems to be two opinions about this file at moment. Would it be possible to
>>>>> summarize the discussion with pros/cons raised to make an informed selection?
>>>>> I understand that Google as represented by Peter no longer requires/requests this
>>>>> file but the motivation for this change seems new and does not seem to reduce the
>>>>> original motivation for this file. We may also want to separate requirements for reading
>>>>> from and writing to this file.
>>>>
>>>> Yea. We can just use mbm_assign_control for reading the assignment states.
>>>>
>>>> Summary: We have two proposals.
>>>>
>>>> First one from Peter:
>>>>
>>>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>>>
>>>>
>>>> Pros
>>>> a.  Allows flexible creation of free-form names for assignable
>>>> configurations, stored in info/L3_MON/counter_configs/.
>>>>
>>>> b.  Events can be accessed using corresponding free-form names in the
>>>> mon_data directory, making it clear to users what each event represents.
>>>>
>>>>
>>>> Cons:
>>>> a. Requires three separate files for assignment in each group
>>>> (assign_exclusive, assign_shared, unassign), which might be excessive.
>>>>
>>>> b. No built-in listing support, meaning users must query each group
>>>> individually to check assignment states.
>>>>
>>>>
>>>> Second Proposal (Mine)
>>>>
>>>> https://lore.kernel.org/lkml/a4ab53b5-03be-4299-8853-e86270d46f2e@amd.com/
>>>>
>>>> Pros:
>>>>
>>>> a. Maintains the flexibility of free-form names for assignable
>>>> configurations (info/L3_MON/counter_configs/).
>>>>
>>>> b. Events remain accessible via free-form names in mon_data, ensuring
>>>> clarity on their purpose.
>>>>
>>>> c. Adds the ability to list assignment states for all groups in a single
>>>> command.
>>>>
>>>> Cons:
>>>> a.  Potential buffer overflow issues when handling a large number of
>>>> groups and domains and code complexity to fix the issue.
>>>>
>>>>
>>>> Third Option: A Hybrid Approach
>>>>
>>>> We could combine elements from both proposals:
>>>>
>>>> a. Retain the free-form naming approach for assignable configurations in
>>>> info/L3_MON/counter_configs/.
>>>>
>>>> b. Use the assignment method from the first proposal:
>>>>    $mkdir test
>>>>    $echo mbm_local_bytes > test/mon_data/mon_L3_00/assign_exclusive
>>>>
>>>> c. Introduce listing support via the info/L3_MON/mbm_assign_control
>>>> interface, enabling users to read assignment states for all groups in one
>>>> place. Only reading support.
>>>>
>>>>
>>>>>
>>>>>>>>>
>>>>>>>>> 6. Users can modify the assignment state by writing to mbm_assign_control.
>>>>>>>>>
>>>>>>>>>      Format:
>>>>>>>>>      “<CTRL_MON group>/<MON group>/<configuration>:<domain_id>=<assign state>”
>>>>>>>>>
>>>>>>>>>      #echo "test//mbm_local_bytes:0=e;1=e" >
>>>>>>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>
>>>>>>>>>      #echo "test//mbm_local_bytes:0=u;1=u" >
>>>>>>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>
>>>>>>>>>      # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>      test//mbm_total_bytes:0=u;1=u
>>>>>>>>>      test//mbm_local_bytes:0=u;1=u
>>>>>>>>>      //mbm_total_bytes:0=e;1=e
>>>>>>>>>      //mbm_local_bytes:0=e;1=e
>>>>>>>>>
>>>>>>>>>      The corresponding events will be read in
>>>>>>>>>
>>>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_total_bytes
>>>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
>>>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
>>>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
>>>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
>>>>>>>>>
>>>>>>>>> 7. In the first stage, only two configurations(mbm_total_bytes and
>>>>>>>>> mbm_local_bytes) will be supported.
>>>>>>>>>
>>>>>>>>> 8. In the future, there will be options to create multiple configurations
>>>>>>>>> and corresponding directory will be created in
>>>>>>>>> /sysf/fs/resctrl/test/mon_data/mon_L3_00/<configation name>.
>>>>>>>
>>>>>>> Would this be done by creating a new file in the /sys/fs/resctrl/info/L3_MON/counter_configs
>>>>>>> directory? Like this:
>>>>>>>
>>>>>>> # echo "LclFill, LclNTWr, RmtFill" >
>>>>>>>          /sys/fs/resctrl/info/L3_MON/counter_configs/cache_stuff
>>>>>>>
>>>>>>> This seems OK (dependent on the user picking meaningful names for
>>>>>>> the set of attributes picked ... but if they want to name this
>>>>>>> monitor file "brian" then they have to live with any confusion
>>>>>>> that they bring on themselves).
>>>>>>>
>>>>>>> Would this involve an extension to kernfs? I don't see a function
>>>>>>> pointer callback for file creation in kernfs_syscall_ops.
>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> I know you are all busy with multiple series going on parallel. I am still
>>>>>>>> waiting for the inputs on this. It will be great if you can spend some time
>>>>>>>> on this to see if we can find common ground on the interface.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Babu
>>>>>>>
>>>>>>> -Tony
>>>>>>>
>>>>>>
>>>>>>
>>>>>> thanks
>>>>>> Babu
>>>>>
>>>>> Reinette
>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 
> 

-- 
Thanks
Babu Moger
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 11 months ago
Hi Babu,

On 3/12/25 11:14 AM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 3/12/25 12:14, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 3/12/25 9:03 AM, Moger, Babu wrote:
>>> Hi Reinette,
>>>
>>> On 3/12/25 10:07, Reinette Chatre wrote:
>>>> Hi Babu,
>>>>
>>>> On 3/11/25 1:35 PM, Moger, Babu wrote:
>>>>> Hi All,
>>>>>
>>>>> On 3/10/25 22:51, Reinette Chatre wrote:
>>>>>>
>>>>>>
>>>>>> On 3/10/25 6:44 PM, Moger, Babu wrote:
>>>>>>> Hi Tony,
>>>>>>>
>>>>>>> On 3/10/2025 6:22 PM, Luck, Tony wrote:
>>>>>>>> On Mon, Mar 10, 2025 at 05:48:44PM -0500, Moger, Babu wrote:
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> On 3/5/2025 1:34 PM, Moger, Babu wrote:
>>>>>>>>>> Hi Peter,
>>>>>>>>>>
>>>>>>>>>> On 3/5/25 04:40, Peter Newman wrote:
>>>>>>>>>>> Hi Babu,
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 4, 2025 at 10:49 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>
>>>>>>>>>>>> On 3/4/25 10:44, Peter Newman wrote:
>>>>>>>>>>>>> On Mon, Mar 3, 2025 at 8:16 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Peter/Reinette,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2/26/25 07:27, Peter Newman wrote:
>>>>>>>>>>>>>>> Hi Babu,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 2/25/25 11:11, Peter Newman wrote:
>>>>>>>>>>>>>>>>> Hi Reinette,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 2/21/25 5:12 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <value>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>>>>>>>>>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>>>>>>>>>>>>>>>>>>>> is low enough to be of concern.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>>>>>>>>>>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>>>>>>>>>>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>>>>>>>>>>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>>>>>>>>>>>>>>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>>>>>>>>>>>>>>>>>>>> for.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>>>>>>>>>>>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>>>>>>>>>>>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>>>>>>>>>>>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>>>>>>>>>>>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>>>>>>>>>>>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>>>>>>>>>>>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>>>>>>>>>>>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>>>>>>>>>>>>>>>>>>>> customers.
>>>>>>>>>>>>>>>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>>>>>>>>>>>>>>>>>>>> event-set for applying to a single counter rather than as individual
>>>>>>>>>>>>>>>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>>>>>>>>>>>>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>>>>>>>>>>>>>>>>>>>> event names.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thank you for clarifying.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> In the letters as events model, choosing the events assigned to a
>>>>>>>>>>>>>>>>>>>>>>> group wouldn't be enough information, since we would want to control
>>>>>>>>>>>>>>>>>>>>>>> which events should share a counter and which should be counted by
>>>>>>>>>>>>>>>>>>>>>>> separate counters. I think the amount of information that would need
>>>>>>>>>>>>>>>>>>>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>>>>>>>>>>>>>>>>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>>>>>>>>>>>>>>>>>>>> writes in ABMC would look like...
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>>>>>>>>>>>>>>>>>>>> example and above the counter configuration appears to be global. You do mention
>>>>>>>>>>>>>>>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>>>>>>>>>>>>>>>>>>>> configuration is a requirement?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> If it's global and we want a particular group to be watched by more
>>>>>>>>>>>>>>>>>>>>> counters, I wouldn't want this to result in allocating more counters
>>>>>>>>>>>>>>>>>>>>> for that group in all domains, or allocating counters in domains where
>>>>>>>>>>>>>>>>>>>>> they're not needed. I want to encourage my users to avoid allocating
>>>>>>>>>>>>>>>>>>>>> monitoring resources in domains where a job is not allowed to run so
>>>>>>>>>>>>>>>>>>>>> there's less pressure on the counters.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> In Dave's proposal it looks like global configuration means
>>>>>>>>>>>>>>>>>>>>> globally-defined "named counter configurations", which works because
>>>>>>>>>>>>>>>>>>>>> it's really per-domain assignment of the configurations to however
>>>>>>>>>>>>>>>>>>>>> many counters the group needs in each domain.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I think I am becoming lost. Would a global configuration not break your
>>>>>>>>>>>>>>>>>>>> view of "event-set applied to a single counter"? If a counter is configured
>>>>>>>>>>>>>>>>>>>> globally then it would not make it possible to support the full configurability
>>>>>>>>>>>>>>>>>>>> of the hardware.
>>>>>>>>>>>>>>>>>>>> Before I add more confusion, let me try with an example that builds on your
>>>>>>>>>>>>>>>>>>>> earlier example copied below:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Since the above states "per domain" I rewrite the example to highlight that as
>>>>>>>>>>>>>>>>>>>> I understand it:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> You mention that you do not want counters to be allocated in domains that they
>>>>>>>>>>>>>>>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>>>>>>>>>>>>>>>>>>>> in domain 1, resulting in:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> With counter 0 and counter 1 available in domain 1, these counters could
>>>>>>>>>>>>>>>>>>>> theoretically be configured to give group 1 more data in domain 1:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill
>>>>>>>>>>>>>>>>>>>>     counter 1: LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>     counter 2: LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>     counter 3: VictimBW
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The counters are shown with different per-domain configurations that seems to
>>>>>>>>>>>>>>>>>>>> match with earlier goals of (a) choose events counted by each counter and
>>>>>>>>>>>>>>>>>>>> (b) do not allocate counters in domains where they are not needed. As I
>>>>>>>>>>>>>>>>>>>> understand the above does contradict global counter configuration though.
>>>>>>>>>>>>>>>>>>>> Or do you mean that only the *name* of the counter is global and then
>>>>>>>>>>>>>>>>>>>> that it is reconfigured as part of every assignment?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes, I meant only the *name* is global. I assume based on a particular
>>>>>>>>>>>>>>>>>>> system configuration, the user will settle on a handful of useful
>>>>>>>>>>>>>>>>>>> groupings to count.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    # define global configurations (in ABMC terms), not necessarily in this
>>>>>>>>>>>>>>>>>>>    # syntax and probably not in the mbm_assign_control file.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>    w=VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    # legacy "total" configuration, effectively r+w
>>>>>>>>>>>>>>>>>>>    t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    /group0/0=t;1=t
>>>>>>>>>>>>>>>>>>>    /group1/0=t;1=t
>>>>>>>>>>>>>>>>>>>    /group2/0=_;1=t
>>>>>>>>>>>>>>>>>>>    /group3/0=rw;1=_
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> - group2 is restricted to domain 0
>>>>>>>>>>>>>>>>>>> - group3 is restricted to domain 1
>>>>>>>>>>>>>>>>>>> - the rest are unrestricted
>>>>>>>>>>>>>>>>>>> - In group3, we decided we need to separate read and write traffic
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I see. Thank you for the example.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> resctrl supports per-domain configurations with the following possible when
>>>>>>>>>>>>>>>>>> using mbm_total_bytes_config and mbm_local_bytes_config:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>      /group0/0=t;1=t
>>>>>>>>>>>>>>>>>>      /group1/0=t;1=t
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Even though the flags are identical in all domains, the assigned counters will
>>>>>>>>>>>>>>>>>> be configured differently in each domain.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> With this supported by hardware and currently also supported by resctrl it seems
>>>>>>>>>>>>>>>>>> reasonable to carry this forward to what will be supported next.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The hardware supports both a per-domain mode, where all groups in a
>>>>>>>>>>>>>>>>> domain use the same configurations and are limited to two events per
>>>>>>>>>>>>>>>>> group and a per-group mode where every group can be configured and
>>>>>>>>>>>>>>>>> assigned freely. This series is using the legacy counter access mode
>>>>>>>>>>>>>>>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
>>>>>>>>>>>>>>>>> in the domain can be read. If we chose to read the assigned counter
>>>>>>>>>>>>>>>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
>>>>>>>>>>>>>>>>> rather than asking the hardware to find the counter by RMID, we would
>>>>>>>>>>>>>>>>> not be limited to 2 counters per group/domain and the hardware would
>>>>>>>>>>>>>>>>> have the same flexibility as on MPAM.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In extended mode, the contents of a specific counter can be read by
>>>>>>>>>>>>>>>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
>>>>>>>>>>>>>>>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
>>>>>>>>>>>>>>>> QM_CTR will then return the contents of the specified counter.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It is documented below.
>>>>>>>>>>>>>>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
>>>>>>>>>>>>>>>>    Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We previously discussed this with you (off the public list) and I
>>>>>>>>>>>>>>>> initially proposed the extended assignment mode.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, the extended mode allows greater flexibility by enabling multiple
>>>>>>>>>>>>>>>> counters to be assigned to the same group, rather than being limited to
>>>>>>>>>>>>>>>> just two.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> However, the challenge is that we currently lack the necessary interfaces
>>>>>>>>>>>>>>>> to configure multiple events per group. Without these interfaces, the
>>>>>>>>>>>>>>>> extended mode is not practical at this time.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Therefore, we ultimately agreed to use the legacy mode, as it does not
>>>>>>>>>>>>>>>> require modifications to the existing interface, allowing us to continue
>>>>>>>>>>>>>>>> using it as is.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> (I might have said something confusing in my last messages because I
>>>>>>>>>>>>>>>>> had forgotten that I switched to the extended assignment mode when
>>>>>>>>>>>>>>>>> prototyping with soft-ABMC and MPAM.)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Forcing all groups on a domain to share the same 2 counter
>>>>>>>>>>>>>>>>> configurations would not be acceptable for us, as the example I gave
>>>>>>>>>>>>>>>>> earlier is one I've already been asked about.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I don’t see this as a blocker. It should be considered an extension to the
>>>>>>>>>>>>>>>> current ABMC series. We can easily build on top of this series once we
>>>>>>>>>>>>>>>> finalize how to configure the multiple event interface for each group.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I don't think it is, either. Only being able to use ABMC to assign
>>>>>>>>>>>>>>> counters is fine for our use as an incremental step. My longer-term
>>>>>>>>>>>>>>> concern is the domain-scoped mbm_total_bytes_config and
>>>>>>>>>>>>>>> mbm_local_bytes_config files, but they were introduced with BMEC, so
>>>>>>>>>>>>>>> there's already an expectation that the files are present when BMEC is
>>>>>>>>>>>>>>> supported.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On ABMC hardware that also supports BMEC, I'm concerned about enabling
>>>>>>>>>>>>>>> ABMC when only the BMEC-style event configuration interface exists.
>>>>>>>>>>>>>>> The scope of my issue is just whether enabling "full" ABMC support
>>>>>>>>>>>>>>> will require an additional opt-in, since that could remove the BMEC
>>>>>>>>>>>>>>> interface. If it does, it's something we can live with.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As you know, this series is currently blocked without further feedback.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I’d like to begin reworking these patches to incorporate Peter’s feedback.
>>>>>>>>>>>>>> Any input or suggestions would be appreciated.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here’s what we’ve learned so far:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. Assignments should be independent of BMEC.
>>>>>>>>>>>>>> 2. We should be able to specify multiple event types to a counter (e.g.,
>>>>>>>>>>>>>> read, write, victimBM, etc.). This is also called shared counter
>>>>>>>>>>>>>> 3. There should be an option to assign events per domain.
>>>>>>>>>>>>>> 4. Currently, only two counters can be assigned per group, but the design
>>>>>>>>>>>>>> should allow flexibility to assign more in the future as the interface
>>>>>>>>>>>>>> evolves.
>>>>>>>>>>>>>> 5. Utilize the extended RMID read mode.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here is my proposal using Peter's earlier example:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # define event configurations
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ========================================================
>>>>>>>>>>>>>> Bits    Mnemonics       Description
>>>>>>>>>>>>>> ====   ========================================================
>>>>>>>>>>>>>> 6       VictimBW        Dirty Victims from all types of memory
>>>>>>>>>>>>>> 5       RmtSlowFill     Reads to slow memory in the non-local NUMA domain
>>>>>>>>>>>>>> 4       LclSlowFill     Reads to slow memory in the local NUMA domain
>>>>>>>>>>>>>> 3       RmtNTWr         Non-temporal writes to non-local NUMA domain
>>>>>>>>>>>>>> 2       LclNTWr         Non-temporal writes to local NUMA domain
>>>>>>>>>>>>>> 1       mtFill          Reads to memory in the non-local NUMA domain
>>>>>>>>>>>>>> 0       LclFill         Reads to memory in the local NUMA domain
>>>>>>>>>>>>>> ====    ========================================================
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> #Define flags based on combination of above event types.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> t = LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>> l = LclFill, LclNTWr, LclSlowFill
>>>>>>>>>>>>>> r = LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>> w = VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>> v = VictimBW
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Peter suggested the following format earlier :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /group0/0=t;1=t
>>>>>>>>>>>>>> /group1/0=t;1=t
>>>>>>>>>>>>>> /group2/0=_;1=t
>>>>>>>>>>>>>> /group3/0=rw;1=_
>>>>>>>>>>>>>
>>>>>>>>>>>>> After some inquiries within Google, it sounds like nobody has invested
>>>>>>>>>>>>> much into the current mbm_assign_control format yet, so it would be
>>>>>>>>>>>>> best to drop it and distribute the configuration around the filesystem
>>>>>>>>>>>>> hierarchy[1], which should allow us to produce something more flexible
>>>>>>>>>>>>> and cleaner to implement.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Roughly what I had in mind:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Use mkdir in a info/<resource>_MON subdirectory to create free-form
>>>>>>>>>>>>> names for the assignable configurations rather than being restricted
>>>>>>>>>>>>> to single letters.  In the resulting directory, populate a file where
>>>>>>>>>>>>> we can specify the set of events the config should represent. I think
>>>>>>>>>>>>> we should use symbolic names for the events rather than raw BMEC field
>>>>>>>>>>>>> values. Moving forward we could come up with portable names for common
>>>>>>>>>>>>> events and only support the BMEC names on AMD machines for users who
>>>>>>>>>>>>> want specific events and don't care about portability.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I’m still processing this. Let me start with some initial questions.
>>>>>>>>>>>>
>>>>>>>>>>>> So, we are creating event configurations here, which seems reasonable.
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, we should use portable names and are not limited to BMEC names.
>>>>>>>>>>>>
>>>>>>>>>>>> How many configurations should we allow? Do we know?
>>>>>>>>>>>
>>>>>>>>>>> Do we need an upper limit?
>>>>>>>>>>
>>>>>>>>>> I think so. This needs to be maintained in some data structure. We can
>>>>>>>>>> start with 2 default configurations for now.
>>>>>>
>>>>>> There is a big difference between no upper limit and 2. The hardware is
>>>>>> capable of supporting per-domain configurations so more flexibility is
>>>>>> certainly possible. Consider the example presented by Peter in:
>>>>>> https://lore.kernel.org/lkml/CALPaoCi0mFZ9TycyNs+SCR+2tuRJovQ2809jYMun4HtC64hJmA@mail.gmail.com/
>>>>>>
>>>>>>>>>>>>> Next, put assignment-control file nodes in per-domain directories
>>>>>>>>>>>>> (i.e., mon_data/mon_L3_00/assign_{exclusive,shared}). Writing a
>>>>>>>>>>>>> counter-configuration name into the file would then allocate a counter
>>>>>>>>>>>>> in the domain, apply the named configuration, and monitor the parent
>>>>>>>>>>>>> group-directory. We can also put a group/resource-scoped assign_* file
>>>>>>>>>>>>> higher in the hierarchy to make it easier for users who want to
>>>>>>>>>>>>> configure all domains the same for a group.
>>>>>>>>>>>>
>>>>>>>>>>>> What is the difference between shared and exclusive?
>>>>>>>>>>>
>>>>>>>>>>> Shared assignment[1] means that non-exclusively-assigned counters in
>>>>>>>>>>> each domain will be scheduled round-robin to the groups requesting
>>>>>>>>>>> shared access to a counter. In my tests, I assigned the counters long
>>>>>>>>>>> enough to produce a single 1-second MB/s sample for the per-domain
>>>>>>>>>>> aggregation files[2].
>>>>>>>>>>>
>>>>>>>>>>> These do not need to be implemented immediately, but knowing that they
>>>>>>>>>>> work addresses the overhead and scalability concerns of reassigning
>>>>>>>>>>> counters and reading their values.
>>>>>>>>>>
>>>>>>>>>> Ok. Lets focus on exclusive assignments for now.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Having three files—assign_shared, assign_exclusive, and unassign—for each
>>>>>>>>>>>> domain seems excessive. In a system with 32 groups and 12 domains, this
>>>>>>>>>>>> results in 32 × 12 × 3 files, which is quite large.
>>>>>>>>>>>>
>>>>>>>>>>>> There should be a more efficient way to handle this.
>>>>>>>>>>>>
>>>>>>>>>>>> Initially, we started with a group-level file for this interface, but it
>>>>>>>>>>>> was rejected due to the high number of sysfs calls, making it inefficient.
>>>>>>>>>>>
>>>>>>>>>>> I had rejected it due to the high-frequency of access of a large
>>>>>>>>>>> number of files, which has since been addressed by shared assignment
>>>>>>>>>>> (or automatic reassignment) and aggregated mbps files.
>>>>>>>>>>
>>>>>>>>>> I think we should address this as well. Creating three extra files for
>>>>>>>>>> each group isn’t ideal when there are more efficient alternatives.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Additionally, how can we list all assignments with a single sysfs call?
>>>>>>>>>>>>
>>>>>>>>>>>> That was another problem we need to address.
>>>>>>>>>>>
>>>>>>>>>>> This is not a requirement I was aware of. If the user forgot where
>>>>>>>>>>> they assigned counters (or forgot to disable auto-assignment), they
>>>>>>>>>>> can read multiple sysfs nodes to remind themselves.
>>>>>>>>>>
>>>>>>>>>> I suggest, we should provide users with an option to list the assignments
>>>>>>>>>> of all groups in a single command. As the number of groups increases, it
>>>>>>>>>> becomes cumbersome to query each group individually.
>>>>>>>>>>
>>>>>>>>>> To achieve this, we can reuse our existing mbm_assign_control interface
>>>>>>>>>> for this purpose. More details on this below.
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> The configuration names listed in assign_* would result in files of
>>>>>>>>>>>>> the same name in the appropriate mon_data domain directories from
>>>>>>>>>>>>> which the count values can be read.
>>>>>>>>>>>>>
>>>>>>>>>>>>>    # mkdir info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>>>>>>    # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>>>>    # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>>>>    # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>>>>    # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>>>> LclFill
>>>>>>>>>>>>> LclNTWr
>>>>>>>>>>>>> LclSlowFill
>>>>>>>>>>>>
>>>>>>>>>>>> I feel we can just have the configs. event_filter file is not required.
>>>>>>>>>>>
>>>>>>>>>>> That's right, I forgot that we can implement kernfs_ops::open(). I was
>>>>>>>>>>> only looking at struct kernfs_syscall_ops
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> #cat info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>>>>> LclFill <-rename these to generic names.
>>>>>>>>>>>> LclNTWr
>>>>>>>>>>>> LclSlowFill
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I think portable and non-portable event names should both be available
>>>>>>>>>>> as options. There are simple bandwidth measurement mechanisms that
>>>>>>>>>>> will be applied in general, but when they turn up an issue, it can
>>>>>>>>>>> often lead to a more focused investigation, requiring more precise
>>>>>>>>>>> events.
>>>>>>>>>>
>>>>>>>>>> I aggree. We should provide both portable and non-portable event names.
>>>>>>>>>>
>>>>>>>>>> Here is my draft proposal based on the discussion so far and reusing some
>>>>>>>>>> of the current interface. Idea here is to start with basic assigment
>>>>>>>>>> feature with options to enhance it in the future. Feel free to
>>>>>>>>>> comment/suggest.
>>>>>>>>>>
>>>>>>>>>> 1. Event configurations will be in
>>>>>>>>>>      /sys/fs/resctrl/info/L3_MON/counter_configs/.
>>>>>>>>>>
>>>>>>>>>>      There will be two pre-defined configurations by default.
>>>>>>>>>>
>>>>>>>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
>>>>>>>>>>      LclFill, LclNTWr,LclSlowFill,VictimBM,RmtSlowFill,LclSlowFill,RmtFill
>>>>>>>>>>
>>>>>>>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>>>      LclFill, LclNTWr, LclSlowFill
>>>>>>>>>>
>>>>>>>>>> 2. Users will have options to update these configurations.
>>>>>>>>>>
>>>>>>>>>>      #echo "LclFill, LclNTWr, RmtFill" >
>>>>>>>>>>         /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>
>>>>>>>> This part seems odd to me. Now the "mbm_local_bytes" files aren't
>>>>>>>> reporting "local_bytes" any more. They report something different,
>>>>>>>> and users only know if they come to check the options currently
>>>>>>>> configured in this file. Changing the contents without changing
>>>>>>>> the name seems confusing to me.
>>>>>>>
>>>>>>> It is the same behaviour right now with BMEC. It is configurable.
>>>>>>> By default it is mbm_local_bytes, but users can configure whatever they want to monitor using /info/L3_MON/mbm_local_bytes_config.
>>>>>>>
>>>>>>> We can continue the same behaviour with ABMC, but the configuration will be in /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes.
>>>>>>
>>>>>> This could be supported by following Peter's original proposal where the name
>>>>>> of the counter configuration is provided by the user via a mkdir:
>>>>>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>>>>>
>>>>>> As he mentioned there could be pre-populated mbm_local_bytes/mbm_total_bytes.
>>>>>
>>>>> Sure. We can do that. I was thinking in the first phase, just provide the
>>>>> default pre-defined configuration and option to update the configuration.
>>>>>
>>>>> We can add the mkdir support later. That way we can provide basic ABMC
>>>>> support without too much code complexity with mkdir support.
>>>>
>>>> This is not clear to me how you envision the "first phase". Is it what you
>>>> proposed above, for example:
>>>>       #echo "LclFill, LclNTWr, RmtFill" >
>>>>          /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>
>>>> In above the counter configuration name is a file. 
>>>
>>> Yes. That is correct.
>>>
>>> There will be two configuration files by default when resctrl is mounted
>>> when ABMC is enabled.
>>> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
>>> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>
>>>>
>>>> How could mkdir support be added to this later if there are already files present?
>>>
>>> We already have these directories when resctrl is mounted.
>>> /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
>>> /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
>>> /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
>>> /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
>>>
>>> We dont need "mkdir" support for default  configurations.
>>
>> I was referring to the "mkdir" support for additional configurations that
>> I understood you are thinking about adding later. For example,
>> (copied from Peter's message
>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/):
>>
>>
>>  # mkdir info/L3_MON/counter_configs/mbm_local_bytes
>>  # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>  # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>  # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>  # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>> LclFill
>> LclNTWr
>> LclSlowFill
>>
>> Any "later" work needs to be backward compatible with the first phase.
> 
> Actually, we dont need extra file "event_filter".
> This was discussed here.
> https://lore.kernel.org/lkml/CALPaoChLL8p49eANYgQ0dJiFs7G=223fGae+LJyx3DwEhNeR8A@mail.gmail.com/

I undestand from that exchange that it is possible to read/write from
an *existing* kernfs file but it is not obvious to me how that file is
planned to be created.

My understanding of the motivation behind support for "mkdir" is to enable
user space to create custom counter configurations.

I understand that ABMC support aims to start with existing mbm_total_bytes/mbm_local_bytes
configurations but I believe the consensus is that custom configurations need
to be supported in the future.
If resctrl starts with support where counter configuration as
managed with a *file*, for example:
/sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
how will user space create future custom configurations?
As I understand that is only possible with mkdir.

> 
> # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes
> # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes
> # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes
> # cat info/L3_MON/counter_configs/mbm_local_bytes
>  LclFill
>  LclNTWr
>  LclSlowFill
> 
> In the future, we can add mkdir support.
> 
> # mkdir info/L3_MON/counter_configs/mbm_read_only

This is exactly my concern. resctrl should not start with a user space where
a counter configuration is a file (mbm_local_bytes/mbm_total_bytes) and then
switch user space interface to have counter configuration be done with
directories.

> # echo LclFill > info/L3_MON/counter_configs/mbm_read_only
> # cat info/L3_MON/counter_configs/mbm_read_only
>   LclFill

... wait ... user space writes to the directory?



> 
> #echo mbm_read_only > test/mon_data/mon_L3_00/assign_exclusive
> 
> Which would result in the creation of test/mon_data/mon_L3_*/mbm_read_only
> 
> So, there is not breakage of backword compatibility.

The way I understand it I am seeing many incompatibilities. Perhaps I am missing
something. Could you please provide detailed steps of how first phase and
second phase would look?


> 
>>
>> If the first phase starts with a file:
>> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>> ... I do not see how second phase can be backward compatible when that work
>> needs a directory with the same name that contains a file for configuration:
>> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>
>> sidenote: I think interactions with the "event_filter" file needs more
>> descriptions since it is not clear with the provided example how user space
>> may want to interact with the file when adding vs replacing event configurations.
>>
>>>
>>> My plan was to support only the default configurations in the first phase.
>>> That way there is no difference in the usage model with ABMC when mounted.
>>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>      # #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>>>      LclFill, LclNTWr, RmtFill
>>>>>>>>>>
>>>>>>>>>> 3. The default configurations will be used when user mounts the resctrl.
>>>>>>>>>>
>>>>>>>>>>      mount  -t resctrl resctrl /sys/fs/resctrl/
>>>>>>>>>>      mkdir /sys/fs/resctrl/test/
>>>>>>>>>>
>>>>>>>>>> 4. The resctrl group/domains can be in one of these assingnment states.
>>>>>>>>>>      e: Exclusive
>>>>>>>>>>      s: Shared
>>>>>>>>>>      u: Unassigned
>>>>>>>>>>
>>>>>>>>>>      Exclusive mode is supported now. Shared mode will be supported in the
>>>>>>>>>> future.
>>>>>>>>>>
>>>>>>>>>> 5. We can use the current /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>> to list the assignment state of all the groups.
>>>>>>>>>>
>>>>>>>>>>      Format:
>>>>>>>>>>      "<CTRL_MON group>/<MON group>/<confguration>:<domain_id>=<assign state>"
>>>>>>>>>>
>>>>>>>>>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>      test//mbm_total_bytes:0=e;1=e
>>>>>>>>>>      test//mbm_local_bytes:0=e;1=e
>>>>>>>>>>      //mbm_total_bytes:0=e;1=e
>>>>>>>>>>      //mbm_local_bytes:0=e;1=e
>>>>>>
>>>>>> This would make mbm_assign_control even more unwieldy and quicker to exceed a
>>>>>> page of data (these examples never seem to reflect those AMD systems with the many
>>>>>> L3 domains). How to handle resctrl files larger than 4KB needs to be well understood
>>>>>> and solved when/if going this route.
>>>>>
>>>>> This problem is not specific this series. I feel it is a generic problem
>>>>> to many of the semilar interfaces. I dont know how it is addressed. May
>>>>> have to investigate on this. Any pointers would be helpful.
>>>>
>>>> Dave Martin already did a lot of analysis here. What other pointers do you need?
> 
> Yea. He did. I still need little more details on implementation of that.
> Will come back to that when we decide which way to go.
> 
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> There seems to be two opinions about this file at moment. Would it be possible to
>>>>>> summarize the discussion with pros/cons raised to make an informed selection?
>>>>>> I understand that Google as represented by Peter no longer requires/requests this
>>>>>> file but the motivation for this change seems new and does not seem to reduce the
>>>>>> original motivation for this file. We may also want to separate requirements for reading
>>>>>> from and writing to this file.
>>>>>
>>>>> Yea. We can just use mbm_assign_control for reading the assignment states.
>>>>>
>>>>> Summary: We have two proposals.
>>>>>
>>>>> First one from Peter:
>>>>>
>>>>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>>>>
>>>>>
>>>>> Pros
>>>>> a.  Allows flexible creation of free-form names for assignable
>>>>> configurations, stored in info/L3_MON/counter_configs/.
>>>>>
>>>>> b.  Events can be accessed using corresponding free-form names in the
>>>>> mon_data directory, making it clear to users what each event represents.
>>>>>
>>>>>
>>>>> Cons:
>>>>> a. Requires three separate files for assignment in each group
>>>>> (assign_exclusive, assign_shared, unassign), which might be excessive.
>>>>>
>>>>> b. No built-in listing support, meaning users must query each group
>>>>> individually to check assignment states.
>>>>>
>>>>>
>>>>> Second Proposal (Mine)
>>>>>
>>>>> https://lore.kernel.org/lkml/a4ab53b5-03be-4299-8853-e86270d46f2e@amd.com/
>>>>>
>>>>> Pros:
>>>>>
>>>>> a. Maintains the flexibility of free-form names for assignable
>>>>> configurations (info/L3_MON/counter_configs/).
>>>>>
>>>>> b. Events remain accessible via free-form names in mon_data, ensuring
>>>>> clarity on their purpose.
>>>>>
>>>>> c. Adds the ability to list assignment states for all groups in a single
>>>>> command.
>>>>>
>>>>> Cons:
>>>>> a.  Potential buffer overflow issues when handling a large number of
>>>>> groups and domains and code complexity to fix the issue.
>>>>>
>>>>>
>>>>> Third Option: A Hybrid Approach
>>>>>
>>>>> We could combine elements from both proposals:
>>>>>
>>>>> a. Retain the free-form naming approach for assignable configurations in
>>>>> info/L3_MON/counter_configs/.
>>>>>
>>>>> b. Use the assignment method from the first proposal:
>>>>>    $mkdir test
>>>>>    $echo mbm_local_bytes > test/mon_data/mon_L3_00/assign_exclusive
>>>>>
>>>>> c. Introduce listing support via the info/L3_MON/mbm_assign_control
>>>>> interface, enabling users to read assignment states for all groups in one
>>>>> place. Only reading support.
>>>>>
>>>>>
>>>>>>
>>>>>>>>>>
>>>>>>>>>> 6. Users can modify the assignment state by writing to mbm_assign_control.
>>>>>>>>>>
>>>>>>>>>>      Format:
>>>>>>>>>>      “<CTRL_MON group>/<MON group>/<configuration>:<domain_id>=<assign state>”
>>>>>>>>>>
>>>>>>>>>>      #echo "test//mbm_local_bytes:0=e;1=e" >
>>>>>>>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>
>>>>>>>>>>      #echo "test//mbm_local_bytes:0=u;1=u" >
>>>>>>>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>
>>>>>>>>>>      # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>      test//mbm_total_bytes:0=u;1=u
>>>>>>>>>>      test//mbm_local_bytes:0=u;1=u
>>>>>>>>>>      //mbm_total_bytes:0=e;1=e
>>>>>>>>>>      //mbm_local_bytes:0=e;1=e
>>>>>>>>>>
>>>>>>>>>>      The corresponding events will be read in
>>>>>>>>>>
>>>>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>>>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_total_bytes
>>>>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
>>>>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
>>>>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
>>>>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
>>>>>>>>>>
>>>>>>>>>> 7. In the first stage, only two configurations(mbm_total_bytes and
>>>>>>>>>> mbm_local_bytes) will be supported.
>>>>>>>>>>
>>>>>>>>>> 8. In the future, there will be options to create multiple configurations
>>>>>>>>>> and corresponding directory will be created in
>>>>>>>>>> /sysf/fs/resctrl/test/mon_data/mon_L3_00/<configation name>.
>>>>>>>>
>>>>>>>> Would this be done by creating a new file in the /sys/fs/resctrl/info/L3_MON/counter_configs
>>>>>>>> directory? Like this:
>>>>>>>>
>>>>>>>> # echo "LclFill, LclNTWr, RmtFill" >
>>>>>>>>          /sys/fs/resctrl/info/L3_MON/counter_configs/cache_stuff
>>>>>>>>
>>>>>>>> This seems OK (dependent on the user picking meaningful names for
>>>>>>>> the set of attributes picked ... but if they want to name this
>>>>>>>> monitor file "brian" then they have to live with any confusion
>>>>>>>> that they bring on themselves).
>>>>>>>>
>>>>>>>> Would this involve an extension to kernfs? I don't see a function
>>>>>>>> pointer callback for file creation in kernfs_syscall_ops.
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I know you are all busy with multiple series going on parallel. I am still
>>>>>>>>> waiting for the inputs on this. It will be great if you can spend some time
>>>>>>>>> on this to see if we can find common ground on the interface.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Babu
>>>>>>>>
>>>>>>>> -Tony
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> thanks
>>>>>>> Babu
>>>>>>
>>>>>> Reinette
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
> 

Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 11 months ago
Hi Reinette,

On 3/13/25 11:08, Reinette Chatre wrote:
> Hi Babu,
> 
> On 3/12/25 11:14 AM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 3/12/25 12:14, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 3/12/25 9:03 AM, Moger, Babu wrote:
>>>> Hi Reinette,
>>>>
>>>> On 3/12/25 10:07, Reinette Chatre wrote:
>>>>> Hi Babu,
>>>>>
..

>>>>>> We can add the mkdir support later. That way we can provide basic ABMC
>>>>>> support without too much code complexity with mkdir support.
>>>>>
>>>>> This is not clear to me how you envision the "first phase". Is it what you
>>>>> proposed above, for example:
>>>>>       #echo "LclFill, LclNTWr, RmtFill" >
>>>>>          /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>
>>>>> In above the counter configuration name is a file. 
>>>>
>>>> Yes. That is correct.
>>>>
>>>> There will be two configuration files by default when resctrl is mounted
>>>> when ABMC is enabled.
>>>> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
>>>> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>
>>>>>
>>>>> How could mkdir support be added to this later if there are already files present?
>>>>
>>>> We already have these directories when resctrl is mounted.
>>>> /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
>>>> /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
>>>> /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
>>>> /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
>>>>
>>>> We dont need "mkdir" support for default  configurations.
>>>
>>> I was referring to the "mkdir" support for additional configurations that
>>> I understood you are thinking about adding later. For example,
>>> (copied from Peter's message
>>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/):
>>>
>>>
>>>  # mkdir info/L3_MON/counter_configs/mbm_local_bytes
>>>  # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>  # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>  # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>  # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>> LclFill
>>> LclNTWr
>>> LclSlowFill
>>>
>>> Any "later" work needs to be backward compatible with the first phase.
>>
>> Actually, we dont need extra file "event_filter".
>> This was discussed here.
>> https://lore.kernel.org/lkml/CALPaoChLL8p49eANYgQ0dJiFs7G=223fGae+LJyx3DwEhNeR8A@mail.gmail.com/
> 
> I undestand from that exchange that it is possible to read/write from
> an *existing* kernfs file but it is not obvious to me how that file is
> planned to be created.

My bad.. I misspoke here. We need "event_filter" file under each
configuration.


> 
> My understanding of the motivation behind support for "mkdir" is to enable
> user space to create custom counter configurations.
> 

That is correct.

> I understand that ABMC support aims to start with existing mbm_total_bytes/mbm_local_bytes
> configurations but I believe the consensus is that custom configurations need
> to be supported in the future.
> If resctrl starts with support where counter configuration as
> managed with a *file*, for example:
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
> how will user space create future custom configurations?
> As I understand that is only possible with mkdir.
> 
>>
>> # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes
>> # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes
>> # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes
>> # cat info/L3_MON/counter_configs/mbm_local_bytes
>>  LclFill
>>  LclNTWr
>>  LclSlowFill
>>
>> In the future, we can add mkdir support.
>>
>> # mkdir info/L3_MON/counter_configs/mbm_read_only
> 
> This is exactly my concern. resctrl should not start with a user space where
> a counter configuration is a file (mbm_local_bytes/mbm_total_bytes) and then
> switch user space interface to have counter configuration be done with
> directories.
> 
>> # echo LclFill > info/L3_MON/counter_configs/mbm_read_only
>> # cat info/L3_MON/counter_configs/mbm_read_only
>>   LclFill
> 
> ... wait ... user space writes to the directory?
> 

My bad. This is wrong. Let me rewrite the steps below.

> 
> 
>>
>> #echo mbm_read_only > test/mon_data/mon_L3_00/assign_exclusive
>>
>> Which would result in the creation of test/mon_data/mon_L3_*/mbm_read_only
>>
>> So, there is not breakage of backword compatibility.
> 
> The way I understand it I am seeing many incompatibilities. Perhaps I am missing
> something. Could you please provide detailed steps of how first phase and
> second phase would look?

No. You didn't miss anything. I misspoke on few steps.

Here are the steps. Just copying steps from Peters proposal.
https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/


1. Mount the resctrl
   mount -t resctrl resctrl /sys/fs/resctrl

2. When ABMC is supported two default configurations will be created.

  a. info/L3_MON/counter_configs/mbm_total_bytes/event_filter
  b. info/L3_MON/counter_configs/mbm_local_bytes/event_filter

  These files will be populated with default total and local events
  # cat info/L3_MON/counter_configs/mbm_total_bytes/event_filter
    VictimBW
    RmtSlowFill
    RmtNTWr
    RmtFill
    LclFill
    LclNTWr
    LclSlowFill

  # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
   LclFill,
   LclNTWr
   LclSlowFill

3. Users will have options to update the event configuration.
   echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter

4. As usual the events can be read from the mon_data directories.
   #mkdir /sys/fs/resctrl/test
   #cd   /sys/fs/resctr/test
   #cat  test/mon_data/mon_data/mon_L3_00/mbm_tota_bytes
   101010
   #cat   test/mon_data/mon_data/mon_L3_00/mbm_local_bytes
   32323

5. There will be 3 files created in each group's mon_data directory when
ABMC is supported.

   a. test/mon_data/mon_L3_00/assign_exclusive
   b. test/mon_data/mon_L3_00/assign_shared
   c. test/mon_data/mon_L3_00/unassign


6. Events can be assigned/unassigned by these commands

 # echo mbm_total_bytes > test/mon_data/mon_L3_00/assign_exclusive
 # echo mbm_local_bytes > test/mon_data/mon_L3_01/assign_exclusive
 # echo mbm_local_bytes > test/mon_data/mon_L3_01/unassign


Note:
I feel 3 files are excessive here. We can probably achieve everything in
just one file.

Not sure about mbm_assign_control interface as there are concerns with
group listing holding the lock for long.

-----------------------------------------------------------------------
Second phase, we can add support for "mkdir"

1. mkdir info/L3_MON/counter_configs/mbm_read_only

2. mkdir option will create "event_filter" file.
   info/L3_MON/counter_configs/mbm_read_only/event_filter

3. Users can modify event configuration.
   echo LclFill > info/L3_MON/counter_configs/mbm_read_only/event_filter

4. Users can assign the events

  echo mbm_read_only > test/mon_data/mon_L3_00/assign_exclusive

5. Events can be read in

   test/mon_data/mon_data/mon_L3_00/mbm_read_only

-- 
Thanks
Babu Moger
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 11 months ago
Hi Babu,

On 3/13/25 1:13 PM, Moger, Babu wrote:
> On 3/13/25 11:08, Reinette Chatre wrote:
>> On 3/12/25 11:14 AM, Moger, Babu wrote:
>>> On 3/12/25 12:14, Reinette Chatre wrote:
>>>> On 3/12/25 9:03 AM, Moger, Babu wrote:
>>>>> On 3/12/25 10:07, Reinette Chatre wrote:


> Here are the steps. Just copying steps from Peters proposal.
> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/

Thank you very much for detailing the steps. It is starting the fall into place
for me.

> 
> 
> 1. Mount the resctrl
>    mount -t resctrl resctrl /sys/fs/resctrl

I assume that on ABMC system the plan remains to have ABMC enabled by default, which
will continue to depend on BMEC.

How would the existing BMEC implementation be impacted in this case?

Without any changes to BMEC support the mbm_total_bytes_config and mbm_local_bytes_config
files will remain and user space may continue to use them to change the event
configurations with confusing expecations/results on an ABMC system.

One possibility may be that a user may see below on ABMC system even if BMEC is supported:
# cat /sys/fs/resctrl/info/L3_MON/mon_features
llc_occupancy
mbm_total_bytes
mbm_local_bytes

With the above a user cannot be expected to want to interact with mbm_total_bytes_config
and mbm_local_bytes_config, which may be the simplest to do.

To follow that, we should also consider how "mon_features" will change with this
implementation.

> 
> 2. When ABMC is supported two default configurations will be created.
> 
>   a. info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>   b. info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> 
>   These files will be populated with default total and local events
>   # cat info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>     VictimBW
>     RmtSlowFill
>     RmtNTWr
>     RmtFill
>     LclFill
>     LclNTWr
>     LclSlowFill

Looks good. Here we could perhaps start nitpicking about naming and line separation.
I think it may be easier if the fields are separated by comma, but more on that
below ...

> 
>   # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>    LclFill,
>    LclNTWr
>    LclSlowFill
> 
> 3. Users will have options to update the event configuration.
>    echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter

We need to be clear on how user space interacts with this file. For example,
can user space "append" configurations? Specifically, if the file has
contents like your earlier example:
# cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
 LclFill
 LclNTWr
 LclSlowFill

Should above be created with (note "append" needed for second and third):
echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
echo LclNTWr >> info/L3_MON/counter_configs/mbm_local_bytes/event_filter
echo LclSlowFill >> info/L3_MON/counter_configs/mbm_local_bytes/event_filter

Is it possible to set multiple configurations in one write like below?
echo "LclFill,LclNTWr,LclSlowFill" > info/L3_MON/counter_configs/mbm_local_bytes/event_filter

(note above where it may be easier for user space to use comma (or some other field separator)
when providing multiple configurations at a time, with this, to match, having output in
commas may be easier since it makes user interface copy&paste easier)

If file has content like:
# cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
 LclNTWr
 LclSlowFill

What is impact of the following:
echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter

Is it (append):
# cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
 LclFill
 LclNTWr
 LclSlowFill

or (overwrite):
# cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
 LclFill

I do think the interface will be more intuitive it if follows regular file
operations wrt "append" and such. I have not looked into how kernfs supports
"append".

As alternative, we can try to work the previous mbm_assign_control syntax in here (use + and -).

For example:

# cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
LclNTWr
# echo "+LclFill,-LclNTWr,+LclSlowFill" > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
# cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
LclFill,LclSlowFill

With something like above resctrl just deals with file writes as before.


> 
> 4. As usual the events can be read from the mon_data directories.
>    #mkdir /sys/fs/resctrl/test
>    #cd   /sys/fs/resctr/test
>    #cat  test/mon_data/mon_data/mon_L3_00/mbm_tota_bytes
>    101010
>    #cat   test/mon_data/mon_data/mon_L3_00/mbm_local_bytes
>    32323
> 
> 5. There will be 3 files created in each group's mon_data directory when
> ABMC is supported.
> 
>    a. test/mon_data/mon_L3_00/assign_exclusive
>    b. test/mon_data/mon_L3_00/assign_shared
>    c. test/mon_data/mon_L3_00/unassign
> 
> 
> 6. Events can be assigned/unassigned by these commands
> 
>  # echo mbm_total_bytes > test/mon_data/mon_L3_00/assign_exclusive
>  # echo mbm_local_bytes > test/mon_data/mon_L3_01/assign_exclusive
>  # echo mbm_local_bytes > test/mon_data/mon_L3_01/unassign
> 
> 
> Note:
> I feel 3 files are excessive here. We can probably achieve everything in
> just one file.

Could you please elaborate what your concern is? You mention that it is
excessive but it is not clear to me what issues may arise by
having three files instead of one.

I do think, and Peter also mentioned [1] this, that it may be useful,
to "put a group/resource-scoped assign_* file higher in the hierarchy
to make it easier for users who want to configure all domains the
same for a group."

Placing *additional* files higher in hierarchy (used to manage counters in all
domains) may be more useful that trying to provide the shared/exclusive/unassign
in one file per domain.

> 
> Not sure about mbm_assign_control interface as there are concerns with
> group listing holding the lock for long.
> 
> -----------------------------------------------------------------------
> Second phase, we can add support for "mkdir"
> 
> 1. mkdir info/L3_MON/counter_configs/mbm_read_only
> 
> 2. mkdir option will create "event_filter" file.
>    info/L3_MON/counter_configs/mbm_read_only/event_filter
> 

Got it!

> 3. Users can modify event configuration.
>    echo LclFill > info/L3_MON/counter_configs/mbm_read_only/event_filter
> 
> 4. Users can assign the events
> 
>   echo mbm_read_only > test/mon_data/mon_L3_00/assign_exclusive
> 
> 5. Events can be read in
> 
>    test/mon_data/mon_data/mon_L3_00/mbm_read_only
> 

Related to comment from Tony [2] about rmdir, please also consider that
original mbm_local_bytes/mbm_total_bytes could also be removed because at this
point they should not appear different from other counter configurations ... apart
from being pre-populated for backward compatibility.

Thank you.

Reinette


[1] https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
[2] https://lore.kernel.org/lkml/Z9NB0wd8ZewLjNAd@agluck-desk3/
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 11 months ago
Hi Reinette,

On Thu, Mar 13, 2025 at 10:22 PM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Babu,
>
> On 3/13/25 1:13 PM, Moger, Babu wrote:
> > On 3/13/25 11:08, Reinette Chatre wrote:
> >> On 3/12/25 11:14 AM, Moger, Babu wrote:
> >>> On 3/12/25 12:14, Reinette Chatre wrote:
> >>>> On 3/12/25 9:03 AM, Moger, Babu wrote:
> >>>>> On 3/12/25 10:07, Reinette Chatre wrote:
>
>
> > Here are the steps. Just copying steps from Peters proposal.
> > https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>
> Thank you very much for detailing the steps. It is starting the fall into place
> for me.
>
> >
> >
> > 1. Mount the resctrl
> >    mount -t resctrl resctrl /sys/fs/resctrl
>
> I assume that on ABMC system the plan remains to have ABMC enabled by default, which
> will continue to depend on BMEC.
>
> How would the existing BMEC implementation be impacted in this case?
>
> Without any changes to BMEC support the mbm_total_bytes_config and mbm_local_bytes_config
> files will remain and user space may continue to use them to change the event
> configurations with confusing expecations/results on an ABMC system.
>
> One possibility may be that a user may see below on ABMC system even if BMEC is supported:
> # cat /sys/fs/resctrl/info/L3_MON/mon_features
> llc_occupancy
> mbm_total_bytes
> mbm_local_bytes
>
> With the above a user cannot be expected to want to interact with mbm_total_bytes_config
> and mbm_local_bytes_config, which may be the simplest to do.

How about making mbm_local_bytes and mbm_total_bytes always be
configured using mbm_{local,total}_bytes_config and only allowing the
full ABMC configurability on user-defined configurations. This could
resolve the issue of backwards compatibility with the BMEC files and
remove the need for the user opting-in to ABMC mode.

It will be less clean implementation-wise, since there will be two
classes of event configuration to deal with, but I think it seems
logical from the user's side.

>
> To follow that, we should also consider how "mon_features" will change with this
> implementation.
>
> >
> > 2. When ABMC is supported two default configurations will be created.
> >
> >   a. info/L3_MON/counter_configs/mbm_total_bytes/event_filter
> >   b. info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> >
> >   These files will be populated with default total and local events
> >   # cat info/L3_MON/counter_configs/mbm_total_bytes/event_filter
> >     VictimBW
> >     RmtSlowFill
> >     RmtNTWr
> >     RmtFill
> >     LclFill
> >     LclNTWr
> >     LclSlowFill
>
> Looks good. Here we could perhaps start nitpicking about naming and line separation.
> I think it may be easier if the fields are separated by comma, but more on that
> below ...
>
> >
> >   # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> >    LclFill,
> >    LclNTWr
> >    LclSlowFill
> >
> > 3. Users will have options to update the event configuration.
> >    echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>
> We need to be clear on how user space interacts with this file. For example,
> can user space "append" configurations? Specifically, if the file has
> contents like your earlier example:
> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>  LclFill
>  LclNTWr
>  LclSlowFill
>
> Should above be created with (note "append" needed for second and third):
> echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> echo LclNTWr >> info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> echo LclSlowFill >> info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>
> Is it possible to set multiple configurations in one write like below?
> echo "LclFill,LclNTWr,LclSlowFill" > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>
> (note above where it may be easier for user space to use comma (or some other field separator)
> when providing multiple configurations at a time, with this, to match, having output in
> commas may be easier since it makes user interface copy&paste easier)
>
> If file has content like:
> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>  LclNTWr
>  LclSlowFill
>
> What is impact of the following:
> echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>
> Is it (append):
> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>  LclFill
>  LclNTWr
>  LclSlowFill
>
> or (overwrite):
> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>  LclFill
>
> I do think the interface will be more intuitive it if follows regular file
> operations wrt "append" and such. I have not looked into how kernfs supports
> "append".

I expect specifying counter_configs to be a rare or one-time
operation, so I think ease-of-use is the only concern. I think
multiple, appending writes is the most straightforward to implement
and invoke (for a shell user), but I think commas are easy enough to
support as well, even though it would look better when reading back to
see the entries on separate lines.

I believe you can inspect the file descriptor's flags from the
kernfs_open_file reference: of->file->f_flags & O_APPEND

I haven't tried this, though.

-Peter
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 11 months ago
Hi Peter,

On 3/17/2025 11:27 AM, Peter Newman wrote:
> Hi Reinette,
> 
> On Thu, Mar 13, 2025 at 10:22 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>>
>> Hi Babu,
>>
>> On 3/13/25 1:13 PM, Moger, Babu wrote:
>>> On 3/13/25 11:08, Reinette Chatre wrote:
>>>> On 3/12/25 11:14 AM, Moger, Babu wrote:
>>>>> On 3/12/25 12:14, Reinette Chatre wrote:
>>>>>> On 3/12/25 9:03 AM, Moger, Babu wrote:
>>>>>>> On 3/12/25 10:07, Reinette Chatre wrote:
>>
>>
>>> Here are the steps. Just copying steps from Peters proposal.
>>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>
>> Thank you very much for detailing the steps. It is starting the fall into place
>> for me.
>>
>>>
>>>
>>> 1. Mount the resctrl
>>>     mount -t resctrl resctrl /sys/fs/resctrl
>>
>> I assume that on ABMC system the plan remains to have ABMC enabled by default, which
>> will continue to depend on BMEC.
>>
>> How would the existing BMEC implementation be impacted in this case?
>>
>> Without any changes to BMEC support the mbm_total_bytes_config and mbm_local_bytes_config
>> files will remain and user space may continue to use them to change the event
>> configurations with confusing expecations/results on an ABMC system.
>>
>> One possibility may be that a user may see below on ABMC system even if BMEC is supported:
>> # cat /sys/fs/resctrl/info/L3_MON/mon_features
>> llc_occupancy
>> mbm_total_bytes
>> mbm_local_bytes
>>
>> With the above a user cannot be expected to want to interact with mbm_total_bytes_config
>> and mbm_local_bytes_config, which may be the simplest to do.
> 
> How about making mbm_local_bytes and mbm_total_bytes always be
> configured using mbm_{local,total}_bytes_config and only allowing the
> full ABMC configurability on user-defined configurations. This could
> resolve the issue of backwards compatibility with the BMEC files and
> remove the need for the user opting-in to ABMC mode.

There is no opt-in mode. ABMC will be enabled by default if supported.
Users will have option to go back to legacy mode.

The default configurations will be used for total(0x7f equivalent to 
enable all) and local(0x15 equivalent to all local events).

Same thing will show up at
a. info/L3_MON/counter_configs/mbm_total_bytes/event_filter
b. info/L3_MON/counter_configs/mbm_local_bytes/event_filter

> 
> It will be less clean implementation-wise, since there will be two
> classes of event configuration to deal with, but I think it seems
> logical from the user's side.
> 
>>
>> To follow that, we should also consider how "mon_features" will change with this
>> implementation.
>>
>>>
>>> 2. When ABMC is supported two default configurations will be created.
>>>
>>>    a. info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>>    b. info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>
>>>    These files will be populated with default total and local events
>>>    # cat info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>>      VictimBW
>>>      RmtSlowFill
>>>      RmtNTWr
>>>      RmtFill
>>>      LclFill
>>>      LclNTWr
>>>      LclSlowFill
>>
>> Looks good. Here we could perhaps start nitpicking about naming and line separation.
>> I think it may be easier if the fields are separated by comma, but more on that
>> below ...
>>
>>>
>>>    # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>     LclFill,
>>>     LclNTWr
>>>     LclSlowFill
>>>
>>> 3. Users will have options to update the event configuration.
>>>     echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>
>> We need to be clear on how user space interacts with this file. For example,
>> can user space "append" configurations? Specifically, if the file has
>> contents like your earlier example:
>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>   LclFill
>>   LclNTWr
>>   LclSlowFill
>>
>> Should above be created with (note "append" needed for second and third):
>> echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>> echo LclNTWr >> info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>> echo LclSlowFill >> info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>
>> Is it possible to set multiple configurations in one write like below?
>> echo "LclFill,LclNTWr,LclSlowFill" > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>
>> (note above where it may be easier for user space to use comma (or some other field separator)
>> when providing multiple configurations at a time, with this, to match, having output in
>> commas may be easier since it makes user interface copy&paste easier)
>>
>> If file has content like:
>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>   LclNTWr
>>   LclSlowFill
>>
>> What is impact of the following:
>> echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>
>> Is it (append):
>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>   LclFill
>>   LclNTWr
>>   LclSlowFill
>>
>> or (overwrite):
>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>   LclFill
>>
>> I do think the interface will be more intuitive it if follows regular file
>> operations wrt "append" and such. I have not looked into how kernfs supports
>> "append".
> 
> I expect specifying counter_configs to be a rare or one-time
> operation, so I think ease-of-use is the only concern. I think
> multiple, appending writes is the most straightforward to implement
> and invoke (for a shell user), but I think commas are easy enough to
> support as well, even though it would look better when reading back to
> see the entries on separate lines.
> 
> I believe you can inspect the file descriptor's flags from the
> kernfs_open_file reference: of->file->f_flags & O_APPEND
> 
> I haven't tried this, though.
> 
> -Peter
> 

Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 10 months, 4 weeks ago
Hi Babu and Peter,

On 3/17/25 4:00 PM, Moger, Babu wrote:
> Hi Peter,
> 
> On 3/17/2025 11:27 AM, Peter Newman wrote:
>> Hi Reinette,
>>
>> On Thu, Mar 13, 2025 at 10:22 PM Reinette Chatre
>> <reinette.chatre@intel.com> wrote:
>>>
>>> Hi Babu,
>>>
>>> On 3/13/25 1:13 PM, Moger, Babu wrote:
>>>> On 3/13/25 11:08, Reinette Chatre wrote:
>>>>> On 3/12/25 11:14 AM, Moger, Babu wrote:
>>>>>> On 3/12/25 12:14, Reinette Chatre wrote:
>>>>>>> On 3/12/25 9:03 AM, Moger, Babu wrote:
>>>>>>>> On 3/12/25 10:07, Reinette Chatre wrote:
>>>
>>>
>>>> Here are the steps. Just copying steps from Peters proposal.
>>>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>>
>>> Thank you very much for detailing the steps. It is starting the fall into place
>>> for me.
>>>
>>>>
>>>>
>>>> 1. Mount the resctrl
>>>>     mount -t resctrl resctrl /sys/fs/resctrl
>>>
>>> I assume that on ABMC system the plan remains to have ABMC enabled by default, which
>>> will continue to depend on BMEC.
>>>
>>> How would the k existing BMEC implementation be impacted in this case?
>>>
>>> Without any changes to BMEC support the mbm_total_bytes_config and mbm_local_bytes_config
>>> files will remain and user space may continue to use them to change the event
>>> configurations with confusing expecations/results on an ABMC system.
>>>
>>> One possibility may be that a user may see below on ABMC system even if BMEC is supported:
>>> # cat /sys/fs/resctrl/info/L3_MON/mon_features
>>> llc_occupancy
>>> mbm_total_bytes
>>> mbm_local_bytes
>>>
>>> With the above a user cannot be expected to want to interact with mbm_total_bytes_config
>>> and mbm_local_bytes_config, which may be the simplest to do.
>>
>> How about making mbm_local_bytes and mbm_total_bytes always be
>> configured using mbm_{local,total}_bytes_config and only allowing the
>> full ABMC configurability on user-defined configurations. This could
>> resolve the issue of backwards compatibility with the BMEC files and
>> remove the need for the user opting-in to ABMC mode.

hmmm, yes, backward compatibility is a big issue with an earlier suggestion
from me. Users with scripts/tools using mbm_{local,total}_bytes_config
would expect that to continue to work on systems that support BMEC.
resctrl could continue to use mbm_{local,total}_bytes_config
even though the inconsistent interface is not ideal

> 
> There is no opt-in mode. ABMC will be enabled by default if supported.
> Users will have option to go back to legacy mode.

I assume there will still be the opt-in for automatic counter assignment
on monitor group creation (mkdir)?

> 
> The default configurations will be used for total(0x7f equivalent to enable all) and local(0x15 equivalent to all local events).
> 
> Same thing will show up at
> a. info/L3_MON/counter_configs/mbm_total_bytes/event_filter
> b. info/L3_MON/counter_configs/mbm_local_bytes/event_filter

These files could possibly be read-only but the moment user space uses
mbm_{local,total}_bytes_config to change the configurations between domains
this will be invalid. In this case the file could also perhaps
read "Configured using <path to>mbm_{local,total}_bytes_config". It is
not clear to me what would be most intuitive to user space.

> 
>>
>> It will be less clean implementation-wise, since there will be two
>> classes of event configuration to deal with, but I think it seems
>> logical from the user's side.
>>
>>>
>>> To follow that, we should also consider how "mon_features" will change with this
>>> implementation.
>>>
>>>>
>>>> 2. When ABMC is supported two default configurations will be created.
>>>>
>>>>    a. info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>>>    b. info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>
>>>>    These files will be populated with default total and local events
>>>>    # cat info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>>>      VictimBW
>>>>      RmtSlowFill
>>>>      RmtNTWr
>>>>      RmtFill
>>>>      LclFill
>>>>      LclNTWr
>>>>      LclSlowFill
>>>
>>> Looks good. Here we could perhaps start nitpicking about naming and line separation.
>>> I think it may be easier if the fields are separated by comma, but more on that
>>> below ...
>>>
>>>>
>>>>    # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>     LclFill,
>>>>     LclNTWr
>>>>     LclSlowFill
>>>>
>>>> 3. Users will have options to update the event configuration.
>>>>     echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>
>>> We need to be clear on how user space interacts with this file. For example,
>>> can user space "append" configurations? Specifically, if the file has
>>> contents like your earlier example:
>>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>   LclFill
>>>   LclNTWr
>>>   LclSlowFill
>>>
>>> Should above be created with (note "append" needed for second and third):
>>> echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>> echo LclNTWr >> info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>> echo LclSlowFill >> info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>
>>> Is it possible to set multiple configurations in one write like below?
>>> echo "LclFill,LclNTWr,LclSlowFill" > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>
>>> (note above where it may be easier for user space to use comma (or some other field separator)
>>> when providing multiple configurations at a time, with this, to match, having output in
>>> commas may be easier since it makes user interface copy&paste easier)
>>>
>>> If file has content like:
>>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>   LclNTWr
>>>   LclSlowFill
>>>
>>> What is impact of the following:
>>> echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>
>>> Is it (append):
>>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>   LclFill
>>>   LclNTWr
>>>   LclSlowFill
>>>
>>> or (overwrite):
>>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>   LclFill
>>>
>>> I do think the interface will be more intuitive it if follows regular file
>>> operations wrt "append" and such. I have not looked into how kernfs supports
>>> "append".
>>
>> I expect specifying counter_configs to be a rare or one-time
>> operation, so I think ease-of-use is the only concern. I think
>> multiple, appending writes is the most straightforward to implement
>> and invoke (for a shell user), but I think commas are easy enough to
>> support as well, even though it would look better when reading back to
>> see the entries on separate lines.

When the counter configuration consist out of multiple events then it may
be convenient to just write it all in one go and having a shell user use
newline as field separator does not seem convenient. Appending writes sound
good no matter the field separator.
Reading back we may have to consider both what looks good to user space and
what is easy to parse by a script.

>>
>> I believe you can inspect the file descriptor's flags from the
>> kernfs_open_file reference: of->file->f_flags & O_APPEND
>>
>> I haven't tried this, though.

Thanks for looking this up.

Reinette
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 10 months, 3 weeks ago
Hi Reinette,

On 3/19/25 15:53, Reinette Chatre wrote:
> Hi Babu and Peter,
> 
> On 3/17/25 4:00 PM, Moger, Babu wrote:
>> Hi Peter,
>>
>> On 3/17/2025 11:27 AM, Peter Newman wrote:
>>> Hi Reinette,
>>>
>>> On Thu, Mar 13, 2025 at 10:22 PM Reinette Chatre
>>> <reinette.chatre@intel.com> wrote:
>>>>
>>>> Hi Babu,
>>>>
>>>> On 3/13/25 1:13 PM, Moger, Babu wrote:
>>>>> On 3/13/25 11:08, Reinette Chatre wrote:
>>>>>> On 3/12/25 11:14 AM, Moger, Babu wrote:
>>>>>>> On 3/12/25 12:14, Reinette Chatre wrote:
>>>>>>>> On 3/12/25 9:03 AM, Moger, Babu wrote:
>>>>>>>>> On 3/12/25 10:07, Reinette Chatre wrote:
>>>>
>>>>
>>>>> Here are the steps. Just copying steps from Peters proposal.
>>>>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>>>
>>>> Thank you very much for detailing the steps. It is starting the fall into place
>>>> for me.
>>>>
>>>>>
>>>>>
>>>>> 1. Mount the resctrl
>>>>>     mount -t resctrl resctrl /sys/fs/resctrl
>>>>
>>>> I assume that on ABMC system the plan remains to have ABMC enabled by default, which
>>>> will continue to depend on BMEC.
>>>>
>>>> How would the k existing BMEC implementation be impacted in this case?
>>>>
>>>> Without any changes to BMEC support the mbm_total_bytes_config and mbm_local_bytes_config
>>>> files will remain and user space may continue to use them to change the event
>>>> configurations with confusing expecations/results on an ABMC system.
>>>>
>>>> One possibility may be that a user may see below on ABMC system even if BMEC is supported:
>>>> # cat /sys/fs/resctrl/info/L3_MON/mon_features
>>>> llc_occupancy
>>>> mbm_total_bytes
>>>> mbm_local_bytes
>>>>
>>>> With the above a user cannot be expected to want to interact with mbm_total_bytes_config
>>>> and mbm_local_bytes_config, which may be the simplest to do.
>>>
>>> How about making mbm_local_bytes and mbm_total_bytes always be
>>> configured using mbm_{local,total}_bytes_config and only allowing the
>>> full ABMC configurability on user-defined configurations. This could
>>> resolve the issue of backwards compatibility with the BMEC files and
>>> remove the need for the user opting-in to ABMC mode.
> 
> hmmm, yes, backward compatibility is a big issue with an earlier suggestion
> from me. Users with scripts/tools using mbm_{local,total}_bytes_config
> would expect that to continue to work on systems that support BMEC.
> resctrl could continue to use mbm_{local,total}_bytes_config
> even though the inconsistent interface is not ideal
> 
>>
>> There is no opt-in mode. ABMC will be enabled by default if supported.
>> Users will have option to go back to legacy mode.
> 
> I assume there will still be the opt-in for automatic counter assignment
> on monitor group creation (mkdir)?

Yes. It will be available.
-- 
Thanks
Babu Moger
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 11 months ago
Hi Reinette,

On 3/13/2025 4:21 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 3/13/25 1:13 PM, Moger, Babu wrote:
>> On 3/13/25 11:08, Reinette Chatre wrote:
>>> On 3/12/25 11:14 AM, Moger, Babu wrote:
>>>> On 3/12/25 12:14, Reinette Chatre wrote:
>>>>> On 3/12/25 9:03 AM, Moger, Babu wrote:
>>>>>> On 3/12/25 10:07, Reinette Chatre wrote:
> 
> 
>> Here are the steps. Just copying steps from Peters proposal.
>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
> 
> Thank you very much for detailing the steps. It is starting the fall into place
> for me.
> 
>>
>>
>> 1. Mount the resctrl
>>     mount -t resctrl resctrl /sys/fs/resctrl
> 
> I assume that on ABMC system the plan remains to have ABMC enabled by default, which
> will continue to depend on BMEC.

Yes. ABMC will be enabled by default. ABMC will use the configurations 
from info/L3_MON/counter_configs. ABMC will not depend on BMEC.

> How would the existing BMEC implementation be impacted in this case?

BMEC will only work with pre-ABMC(or default) mode.


> 
> Without any changes to BMEC support the mbm_total_bytes_config and mbm_local_bytes_config
> files will remain and user space may continue to use them to change the event
> configurations with confusing expecations/results on an ABMC system.
> 
> One possibility may be that a user may see below on ABMC system even if BMEC is supported:
> # cat /sys/fs/resctrl/info/L3_MON/mon_features
> llc_occupancy
> mbm_total_bytes
> mbm_local_bytes
> 
> With the above a user cannot be expected to want to interact with mbm_total_bytes_config
> and mbm_local_bytes_config, which may be the simplest to do.

yes.

> 
> To follow that, we should also consider how "mon_features" will change with this
> implementation.

May be

# cat /sys/fs/resctrl/info/L3_MON/mon_features
  llc_occupancy
  mbm_total_bytes
  mbm_local_bytes
  counter_configs/mbm_total_bytes/event_filter
  counter_configs/mbm_local_bytes/event_filter

> 
>>
>> 2. When ABMC is supported two default configurations will be created.
>>
>>    a. info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>    b. info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>
>>    These files will be populated with default total and local events
>>    # cat info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>      VictimBW
>>      RmtSlowFill
>>      RmtNTWr
>>      RmtFill
>>      LclFill
>>      LclNTWr
>>      LclSlowFill
> 
> Looks good. Here we could perhaps start nitpicking about naming and line separation.
> I think it may be easier if the fields are separated by comma, but more on that
> below ...
> 
>>
>>    # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>     LclFill,
>>     LclNTWr
>>     LclSlowFill
>>
>> 3. Users will have options to update the event configuration.
>>     echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> 
> We need to be clear on how user space interacts with this file. For example,
> can user space "append" configurations? Specifically, if the file has
> contents like your earlier example:
> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>   LclFill
>   LclNTWr
>   LclSlowFill
> 
> Should above be created with (note "append" needed for second and third):
> echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> echo LclNTWr >> info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> echo LclSlowFill >> info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> 
> Is it possible to set multiple configurations in one write like below?
> echo "LclFill,LclNTWr,LclSlowFill" > info/L3_MON/counter_configs/mbm_local_bytes/event_filter

Yes. We should support that.

> 
> (note above where it may be easier for user space to use comma (or some other field separator)
> when providing multiple configurations at a time, with this, to match, having output in
> commas may be easier since it makes user interface copy&paste easier)
> 
> If file has content like:
> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>   LclNTWr
>   LclSlowFill
> 
> What is impact of the following:
> echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> 
> Is it (append):
> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>   LclFill
>   LclNTWr
>   LclSlowFill
> 
> or (overwrite):
> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>   LclFill
> 
> I do think the interface will be more intuitive it if follows regular file
> operations wrt "append" and such. I have not looked into how kernfs supports
> "append".

Just searching quickly, I have not seen any append operations on kernfs.


> As alternative, we can try to work the previous mbm_assign_control syntax in here (use + and -).
> 
> For example:
> 
> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> LclNTWr
> # echo "+LclFill,-LclNTWr,+LclSlowFill" > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> LclFill,LclSlowFill
> 
> With something like above resctrl just deals with file writes as before.

Or without complicating much we can just support basic operations.

# cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
   LclFill, LclNTWr, LclSlowFill

# echo "LclFill, LclNTWr, LclSlowFill, VictimBW" > 
info/L3_MON/counter_configs/mbm_local_bytes/event_filter

# cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
   LclFill, LclNTWr, LclSlowFill, VictimBW

# echo "LclFill, LclNTWr" > 
info/L3_MON/counter_configs/mbm_local_bytes/event_filter

# cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
   LclFill, LclNTWr

> 
> 
>>
>> 4. As usual the events can be read from the mon_data directories.
>>     #mkdir /sys/fs/resctrl/test
>>     #cd   /sys/fs/resctr/test
>>     #cat  test/mon_data/mon_data/mon_L3_00/mbm_tota_bytes
>>     101010
>>     #cat   test/mon_data/mon_data/mon_L3_00/mbm_local_bytes
>>     32323
>>
>> 5. There will be 3 files created in each group's mon_data directory when
>> ABMC is supported.
>>
>>     a. test/mon_data/mon_L3_00/assign_exclusive
>>     b. test/mon_data/mon_L3_00/assign_shared
>>     c. test/mon_data/mon_L3_00/unassign
>>
>>
>> 6. Events can be assigned/unassigned by these commands
>>
>>   # echo mbm_total_bytes > test/mon_data/mon_L3_00/assign_exclusive
>>   # echo mbm_local_bytes > test/mon_data/mon_L3_01/assign_exclusive
>>   # echo mbm_local_bytes > test/mon_data/mon_L3_01/unassign
>>
>>
>> Note:
>> I feel 3 files are excessive here. We can probably achieve everything in
>> just one file.
> 
> Could you please elaborate what your concern is? You mention that it is
> excessive but it is not clear to me what issues may arise by
> having three files instead of one.

All these 3 properties are mutually exclusive. Only one can true at a 
time. Example:
#cat assign_exclusive
0
#cat assign_shared
0
#cat uassigned
1

Three operations to find out the assign state.

Instead of that
#cat mon_l3_assignments
unassigned


> 
> I do think, and Peter also mentioned [1] this, that it may be useful,
> to "put a group/resource-scoped assign_* file higher in the hierarchy
> to make it easier for users who want to configure all domains the
> same for a group."
> 
> Placing *additional* files higher in hierarchy (used to manage counters in all
> domains) may be more useful that trying to provide the shared/exclusive/unassign
> in one file per domain.

Yea. To make it better we can add "mon_l3_assignments" in groups main 
directory. We can do all the operation in just one file.

https://lore.kernel.org/lkml/efb5293f-b0ef-4c94-bf10-9ca7ebb3b53f@amd.com/


> 
>>
>> Not sure about mbm_assign_control interface as there are concerns with
>> group listing holding the lock for long.
>>
>> -----------------------------------------------------------------------
>> Second phase, we can add support for "mkdir"
>>
>> 1. mkdir info/L3_MON/counter_configs/mbm_read_only
>>
>> 2. mkdir option will create "event_filter" file.
>>     info/L3_MON/counter_configs/mbm_read_only/event_filter
>>
> 
> Got it!
> 
>> 3. Users can modify event configuration.
>>     echo LclFill > info/L3_MON/counter_configs/mbm_read_only/event_filter
>>
>> 4. Users can assign the events
>>
>>    echo mbm_read_only > test/mon_data/mon_L3_00/assign_exclusive
>>
>> 5. Events can be read in
>>
>>     test/mon_data/mon_data/mon_L3_00/mbm_read_only
>>
> 
> Related to comment from Tony [2] about rmdir, please also consider that
> original mbm_local_bytes/mbm_total_bytes could also be removed because at this
> point they should not appear different from other counter configurations ... apart
> from being pre-populated for backward compatibility.

Sure.

> 
> Thank you.
> 
> Reinette
> 
> 
> [1] https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
> [2] https://lore.kernel.org/lkml/Z9NB0wd8ZewLjNAd@agluck-desk3/
> 

Thanks
Babu
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 10 months, 4 weeks ago
Hi Babu,

On 3/14/25 9:18 AM, Moger, Babu wrote:
> On 3/13/2025 4:21 PM, Reinette Chatre wrote:
>> On 3/13/25 1:13 PM, Moger, Babu wrote:
>>> On 3/13/25 11:08, Reinette Chatre wrote:
>>>> On 3/12/25 11:14 AM, Moger, Babu wrote:
>>>>> On 3/12/25 12:14, Reinette Chatre wrote:
>>>>>> On 3/12/25 9:03 AM, Moger, Babu wrote:
>>>>>>> On 3/12/25 10:07, Reinette Chatre wrote:
>>
>>
>>> Here are the steps. Just copying steps from Peters proposal.
>>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>
>> Thank you very much for detailing the steps. It is starting the fall into place
>> for me.
>>
>>>
>>>
>>> 1. Mount the resctrl
>>>     mount -t resctrl resctrl /sys/fs/resctrl
>>
>> I assume that on ABMC system the plan remains to have ABMC enabled by default, which
>> will continue to depend on BMEC.
> 
> Yes. ABMC will be enabled by default. ABMC will use the configurations from info/L3_MON/counter_configs. ABMC will not depend on BMEC.

I see. The previous dependency was thus just something enforced by OS to support the
chosen implementation? 
Looks like the two features share some registers.

> 
>> How would the existing BMEC implementation be impacted in this case?
> 
> BMEC will only work with pre-ABMC(or default) mode.

ok. Does this mean that if a user boots kernel with "rdt=!bmec" then ABMC will keep working?


>> Without any changes to BMEC support the mbm_total_bytes_config and mbm_local_bytes_config
>> files will remain and user space may continue to use them to change the event
>> configurations with confusing expecations/results on an ABMC system.
>>
>> One possibility may be that a user may see below on ABMC system even if BMEC is supported:
>> # cat /sys/fs/resctrl/info/L3_MON/mon_features
>> llc_occupancy
>> mbm_total_bytes
>> mbm_local_bytes
>>
>> With the above a user cannot be expected to want to interact with mbm_total_bytes_config
>> and mbm_local_bytes_config, which may be the simplest to do.
> 
> yes.
> 
>>
>> To follow that, we should also consider how "mon_features" will change with this
>> implementation.
> 
> May be
> 
> # cat /sys/fs/resctrl/info/L3_MON/mon_features
>  llc_occupancy
>  mbm_total_bytes
>  mbm_local_bytes
>  counter_configs/mbm_total_bytes/event_filter
>  counter_configs/mbm_local_bytes/event_filter
> 

I read the docs again to understand what kind of flexibility we have here. As I interpret it
the "mon_features" is associated with "events" and there is a clear documented association
between the "events" listed in this file and which files a user can expect to exist in the
"mon_data" directory. Considering this I think it may be helpful to provide the
counter configurations in this file. This matches well with mbm_total_bytes/mbm_local_bytes
being treated as "counter configurations".

Whether counter configuration is supported could be determined by existence of
the "counter_configs" directory?

For example,
# cat /sys/fs/resctrl/info/L3_MON/mon_features
 llc_occupancy
 mbm_total_bytes
 mbm_local_bytes

# mkdir /sys/fs/resctrl/info/L3_MON/counter_configs/only_read_fills

# cat /sys/fs/resctrl/info/L3_MON/mon_features
 llc_occupancy
 mbm_total_bytes
 mbm_local_bytes
 only_read_fills

This could possibly be a way to support user interface when configuring the
counter. For example, a user may easily create a new counter configuration
by creating a directory, but there may be some requirements wrt its configuration
that need to be met before that configuration/event may appear in the
"mon_features" file.

>>> 2. When ABMC is supported two default configurations will be created.
>>>
>>>    a. info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>>    b. info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>
>>>    These files will be populated with default total and local events
>>>    # cat info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>>      VictimBW
>>>      RmtSlowFill
>>>      RmtNTWr
>>>      RmtFill
>>>      LclFill
>>>      LclNTWr
>>>      LclSlowFill
>>
>> Looks good. Here we could perhaps start nitpicking about naming and line separation.
>> I think it may be easier if the fields are separated by comma, but more on that
>> below ...
>>
>>>
>>>    # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>     LclFill,
>>>     LclNTWr
>>>     LclSlowFill
>>>
>>> 3. Users will have options to update the event configuration.
>>>     echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>
>> We need to be clear on how user space interacts with this file. For example,
>> can user space "append" configurations? Specifically, if the file has
>> contents like your earlier example:
>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>   LclFill
>>   LclNTWr
>>   LclSlowFill
>>
>> Should above be created with (note "append" needed for second and third):
>> echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>> echo LclNTWr >> info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>> echo LclSlowFill >> info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>
>> Is it possible to set multiple configurations in one write like below?
>> echo "LclFill,LclNTWr,LclSlowFill" > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> 
> Yes. We should support that.

Reading Peter's response (https://lore.kernel.org/lkml/CALPaoCj7aSVxHisQTdKQ5KN0-aNzN8rRkRPVc7pjGMLSxfPvrA@mail.gmail.com/)
it sounds as though this part is now in the fine-tuning phase.
If there are other formats that is more convenient for user space then we should surely
consider that.

> 
>>
>> (note above where it may be easier for user space to use comma (or some other field separator)
>> when providing multiple configurations at a time, with this, to match, having output in
>> commas may be easier since it makes user interface copy&paste easier)
>>
>> If file has content like:
>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>   LclNTWr
>>   LclSlowFill
>>
>> What is impact of the following:
>> echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>
>> Is it (append):
>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>   LclFill
>>   LclNTWr
>>   LclSlowFill
>>
>> or (overwrite):
>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>   LclFill
>>
>> I do think the interface will be more intuitive it if follows regular file
>> operations wrt "append" and such. I have not looked into how kernfs supports
>> "append".
> 
> Just searching quickly, I have not seen any append operations on kernfs.
> 
> 
>> As alternative, we can try to work the previous mbm_assign_control syntax in here (use + and -).
>>
>> For example:
>>
>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>> LclNTWr
>> # echo "+LclFill,-LclNTWr,+LclSlowFill" > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>> LclFill,LclSlowFill
>>
>> With something like above resctrl just deals with file writes as before.
> 
> Or without complicating much we can just support basic operations.
> 
> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>   LclFill, LclNTWr, LclSlowFill
> 
> # echo "LclFill, LclNTWr, LclSlowFill, VictimBW" > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> 
> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>   LclFill, LclNTWr, LclSlowFill, VictimBW
> 
> # echo "LclFill, LclNTWr" > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> 
> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>   LclFill, LclNTWr
> 

Looks good to me. As I see it this could be expanded to support "append" if needed.

>>
>>
>>>
>>> 4. As usual the events can be read from the mon_data directories.
>>>     #mkdir /sys/fs/resctrl/test
>>>     #cd   /sys/fs/resctr/test
>>>     #cat  test/mon_data/mon_data/mon_L3_00/mbm_tota_bytes
>>>     101010
>>>     #cat   test/mon_data/mon_data/mon_L3_00/mbm_local_bytes
>>>     32323
>>>
>>> 5. There will be 3 files created in each group's mon_data directory when
>>> ABMC is supported.
>>>
>>>     a. test/mon_data/mon_L3_00/assign_exclusive
>>>     b. test/mon_data/mon_L3_00/assign_shared
>>>     c. test/mon_data/mon_L3_00/unassign
>>>
>>>
>>> 6. Events can be assigned/unassigned by these commands
>>>
>>>   # echo mbm_total_bytes > test/mon_data/mon_L3_00/assign_exclusive
>>>   # echo mbm_local_bytes > test/mon_data/mon_L3_01/assign_exclusive
>>>   # echo mbm_local_bytes > test/mon_data/mon_L3_01/unassign
>>>
>>>
>>> Note:
>>> I feel 3 files are excessive here. We can probably achieve everything in
>>> just one file.
>>
>> Could you please elaborate what your concern is? You mention that it is
>> excessive but it is not clear to me what issues may arise by
>> having three files instead of one.
> 
> All these 3 properties are mutually exclusive. Only one can true at a time. Example:
> #cat assign_exclusive
> 0
> #cat assign_shared
> 0
> #cat uassigned
> 1
> 
> Three operations to find out the assign state.

ah - good point.

> 
> Instead of that
> #cat mon_l3_assignments
> unassigned
> 
> 
>>
>> I do think, and Peter also mentioned [1] this, that it may be useful,
>> to "put a group/resource-scoped assign_* file higher in the hierarchy
>> to make it easier for users who want to configure all domains the
>> same for a group."
>>
>> Placing *additional* files higher in hierarchy (used to manage counters in all
>> domains) may be more useful that trying to provide the shared/exclusive/unassign
>> in one file per domain.
> 
> Yea. To make it better we can add "mon_l3_assignments" in groups main directory. We can do all the operation in just one file.
> 
> https://lore.kernel.org/lkml/efb5293f-b0ef-4c94-bf10-9ca7ebb3b53f@amd.com/

I am hesitant to respond to that message considering the corporate preamble that
sneaked in so I'll just add some thoughts here:

Having the file higher in hierarchy does seem more useful. It may be useful to reduce
amount of parsing to get to needed information. Compare with below two examples that can
be used to convey the same information:

# cat /sys/fs/resctrl/test/mon_L3_assignments
mbm_total_bytes: 0=unassigned; 1=unassigned
mbm_local_bytes: 0=unassigned; 1=unassigned

#cat /sys/fs/resctrl/test/mon_L3_assignments
0=_; 1=_

We need to take care that it is always clear what "0" or "1" means ...
Tony has been mentioning a lot of interesting things about scope
changes. I assume the "L3" in mon_L3_assignments will dictate the scope?

With a syntax like above the needed information can be presented in one line.
For example,

#cat /sys/fs/resctrl/test/mon_L3_assignments
0=mbm_total_bytes; 1=mbm_local_bytes

The caveat is that is only for assigned counters, not shared, so this needs
to change.

To support shared assignment ...  I wonder if it will be useful to users to
get the information on which other monitor groups the counter is shared _with_?

Some examples:

a) Just indicate whether a counter is shared or dedicated. (Introduce flags).
#cat /sys/fs/resctrl/test/mon_L3_assignments
0=mbm_total_bytes:s; 1=mbm_local_bytes:d

b) Indicate which groups a counter is shared with:
#cat /sys/fs/resctrl/testA/mon_L3_assignments
0=mbm_total_bytes:s(testB); 1=mbm_local_bytes:d(not needed but perhaps empty for consistent interface?)
#cat /sys/fs/resctrl/testB/mon_L3_assignments
0=mbm_total_bytes:s(testA); 1=mbm_local_bytes:d(?)

... (b) may just be overkill and we should instead follow Tony's
guideline (see https://lore.kernel.org/lkml/Z9CiwLrhuTODruCj@agluck-desk3/ )
that users should be able to keep track themselves.

Reinette
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 10 months, 3 weeks ago
Hi Reinette,

On 3/19/25 13:36, Reinette Chatre wrote:
> Hi Babu,
> 
> On 3/14/25 9:18 AM, Moger, Babu wrote:
>> On 3/13/2025 4:21 PM, Reinette Chatre wrote:
>>> On 3/13/25 1:13 PM, Moger, Babu wrote:
>>>> On 3/13/25 11:08, Reinette Chatre wrote:
>>>>> On 3/12/25 11:14 AM, Moger, Babu wrote:
>>>>>> On 3/12/25 12:14, Reinette Chatre wrote:
>>>>>>> On 3/12/25 9:03 AM, Moger, Babu wrote:
>>>>>>>> On 3/12/25 10:07, Reinette Chatre wrote:
>>>
>>>
>>>> Here are the steps. Just copying steps from Peters proposal.
>>>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>>
>>> Thank you very much for detailing the steps. It is starting the fall into place
>>> for me.
>>>
>>>>
>>>>
>>>> 1. Mount the resctrl
>>>>     mount -t resctrl resctrl /sys/fs/resctrl
>>>
>>> I assume that on ABMC system the plan remains to have ABMC enabled by default, which
>>> will continue to depend on BMEC.
>>
>> Yes. ABMC will be enabled by default. ABMC will use the configurations from info/L3_MON/counter_configs. ABMC will not depend on BMEC.
> 
> I see. The previous dependency was thus just something enforced by OS to support the
> chosen implementation? 

Yes. That is correct. We went that route mainly not to change the
rmid_read operation.

With ABMC, we need to set Extended EVTID and ABMC bit in QM_EVTSEL
register while reading the cntr_id events. Will add those patches in next
version to make it clear.

> Looks like the two features share some registers.
> 
>>
>>> How would the existing BMEC implementation be impacted in this case?
>>
>> BMEC will only work with pre-ABMC(or default) mode.
> 
> ok. Does this mean that if a user boots kernel with "rdt=!bmec" then ABMC will keep working?

Yes. That is correct.

> 
> 
>>> Without any changes to BMEC support the mbm_total_bytes_config and mbm_local_bytes_config
>>> files will remain and user space may continue to use them to change the event
>>> configurations with confusing expecations/results on an ABMC system.
>>>
>>> One possibility may be that a user may see below on ABMC system even if BMEC is supported:
>>> # cat /sys/fs/resctrl/info/L3_MON/mon_features
>>> llc_occupancy
>>> mbm_total_bytes
>>> mbm_local_bytes
>>>
>>> With the above a user cannot be expected to want to interact with mbm_total_bytes_config
>>> and mbm_local_bytes_config, which may be the simplest to do.
>>
>> yes.
>>
>>>
>>> To follow that, we should also consider how "mon_features" will change with this
>>> implementation.
>>
>> May be
>>
>> # cat /sys/fs/resctrl/info/L3_MON/mon_features
>>  llc_occupancy
>>  mbm_total_bytes
>>  mbm_local_bytes
>>  counter_configs/mbm_total_bytes/event_filter
>>  counter_configs/mbm_local_bytes/event_filter
>>
> 
> I read the docs again to understand what kind of flexibility we have here. As I interpret it
> the "mon_features" is associated with "events" and there is a clear documented association
> between the "events" listed in this file and which files a user can expect to exist in the
> "mon_data" directory. Considering this I think it may be helpful to provide the
> counter configurations in this file. This matches well with mbm_total_bytes/mbm_local_bytes
> being treated as "counter configurations".
> 
> Whether counter configuration is supported could be determined by existence of
> the "counter_configs" directory?
> 
> For example,
> # cat /sys/fs/resctrl/info/L3_MON/mon_features
>  llc_occupancy
>  mbm_total_bytes
>  mbm_local_bytes
> 
> # mkdir /sys/fs/resctrl/info/L3_MON/counter_configs/only_read_fills
> 
> # cat /sys/fs/resctrl/info/L3_MON/mon_features
>  llc_occupancy
>  mbm_total_bytes
>  mbm_local_bytes
>  only_read_fills
> 
> This could possibly be a way to support user interface when configuring the
> counter. For example, a user may easily create a new counter configuration
> by creating a directory, but there may be some requirements wrt its configuration
> that need to be met before that configuration/event may appear in the
> "mon_features" file.

Yes. I am fine with this approach.

> 
>>>> 2. When ABMC is supported two default configurations will be created.
>>>>
>>>>    a. info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>>>    b. info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>
>>>>    These files will be populated with default total and local events
>>>>    # cat info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>>>      VictimBW
>>>>      RmtSlowFill
>>>>      RmtNTWr
>>>>      RmtFill
>>>>      LclFill
>>>>      LclNTWr
>>>>      LclSlowFill
>>>
>>> Looks good. Here we could perhaps start nitpicking about naming and line separation.
>>> I think it may be easier if the fields are separated by comma, but more on that
>>> below ...
>>>
>>>>
>>>>    # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>     LclFill,
>>>>     LclNTWr
>>>>     LclSlowFill
>>>>
>>>> 3. Users will have options to update the event configuration.
>>>>     echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>
>>> We need to be clear on how user space interacts with this file. For example,
>>> can user space "append" configurations? Specifically, if the file has
>>> contents like your earlier example:
>>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>   LclFill
>>>   LclNTWr
>>>   LclSlowFill
>>>
>>> Should above be created with (note "append" needed for second and third):
>>> echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>> echo LclNTWr >> info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>> echo LclSlowFill >> info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>
>>> Is it possible to set multiple configurations in one write like below?
>>> echo "LclFill,LclNTWr,LclSlowFill" > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>
>> Yes. We should support that.
> 
> Reading Peter's response (https://lore.kernel.org/lkml/CALPaoCj7aSVxHisQTdKQ5KN0-aNzN8rRkRPVc7pjGMLSxfPvrA@mail.gmail.com/)
> it sounds as though this part is now in the fine-tuning phase.
> If there are other formats that is more convenient for user space then we should surely
> consider that.

I aggee. We can revise it further as we review.

> 
>>
>>>
>>> (note above where it may be easier for user space to use comma (or some other field separator)
>>> when providing multiple configurations at a time, with this, to match, having output in
>>> commas may be easier since it makes user interface copy&paste easier)
>>>
>>> If file has content like:
>>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>   LclNTWr
>>>   LclSlowFill
>>>
>>> What is impact of the following:
>>> echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>
>>> Is it (append):
>>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>   LclFill
>>>   LclNTWr
>>>   LclSlowFill
>>>
>>> or (overwrite):
>>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>   LclFill
>>>
>>> I do think the interface will be more intuitive it if follows regular file
>>> operations wrt "append" and such. I have not looked into how kernfs supports
>>> "append".
>>
>> Just searching quickly, I have not seen any append operations on kernfs.
>>
>>
>>> As alternative, we can try to work the previous mbm_assign_control syntax in here (use + and -).
>>>
>>> For example:
>>>
>>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>> LclNTWr
>>> # echo "+LclFill,-LclNTWr,+LclSlowFill" > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>> LclFill,LclSlowFill
>>>
>>> With something like above resctrl just deals with file writes as before.
>>
>> Or without complicating much we can just support basic operations.
>>
>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>   LclFill, LclNTWr, LclSlowFill
>>
>> # echo "LclFill, LclNTWr, LclSlowFill, VictimBW" > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>
>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>   LclFill, LclNTWr, LclSlowFill, VictimBW
>>
>> # echo "LclFill, LclNTWr" > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>
>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>   LclFill, LclNTWr
>>
> 
> Looks good to me. As I see it this could be expanded to support "append" if needed.

thanks.

> 
>>>
>>>
>>>>
>>>> 4. As usual the events can be read from the mon_data directories.
>>>>     #mkdir /sys/fs/resctrl/test
>>>>     #cd   /sys/fs/resctr/test
>>>>     #cat  test/mon_data/mon_data/mon_L3_00/mbm_tota_bytes
>>>>     101010
>>>>     #cat   test/mon_data/mon_data/mon_L3_00/mbm_local_bytes
>>>>     32323
>>>>
>>>> 5. There will be 3 files created in each group's mon_data directory when
>>>> ABMC is supported.
>>>>
>>>>     a. test/mon_data/mon_L3_00/assign_exclusive
>>>>     b. test/mon_data/mon_L3_00/assign_shared
>>>>     c. test/mon_data/mon_L3_00/unassign
>>>>
>>>>
>>>> 6. Events can be assigned/unassigned by these commands
>>>>
>>>>   # echo mbm_total_bytes > test/mon_data/mon_L3_00/assign_exclusive
>>>>   # echo mbm_local_bytes > test/mon_data/mon_L3_01/assign_exclusive
>>>>   # echo mbm_local_bytes > test/mon_data/mon_L3_01/unassign
>>>>
>>>>
>>>> Note:
>>>> I feel 3 files are excessive here. We can probably achieve everything in
>>>> just one file.
>>>
>>> Could you please elaborate what your concern is? You mention that it is
>>> excessive but it is not clear to me what issues may arise by
>>> having three files instead of one.
>>
>> All these 3 properties are mutually exclusive. Only one can true at a time. Example:
>> #cat assign_exclusive
>> 0
>> #cat assign_shared
>> 0
>> #cat uassigned
>> 1
>>
>> Three operations to find out the assign state.
> 
> ah - good point.
> 
>>
>> Instead of that
>> #cat mon_l3_assignments
>> unassigned
>>
>>
>>>
>>> I do think, and Peter also mentioned [1] this, that it may be useful,
>>> to "put a group/resource-scoped assign_* file higher in the hierarchy
>>> to make it easier for users who want to configure all domains the
>>> same for a group."
>>>
>>> Placing *additional* files higher in hierarchy (used to manage counters in all
>>> domains) may be more useful that trying to provide the shared/exclusive/unassign
>>> in one file per domain.
>>
>> Yea. To make it better we can add "mon_l3_assignments" in groups main directory. We can do all the operation in just one file.
>>
>> https://lore.kernel.org/lkml/efb5293f-b0ef-4c94-bf10-9ca7ebb3b53f@amd.com/
> 
> I am hesitant to respond to that message considering the corporate preamble that
> sneaked in so I'll just add some thoughts here:

Yea. I noticed it later. Will take care next time.

> 
> Having the file higher in hierarchy does seem more useful. It may be useful to reduce
> amount of parsing to get to needed information. Compare with below two examples that can
> be used to convey the same information:
> 
> # cat /sys/fs/resctrl/test/mon_L3_assignments
> mbm_total_bytes: 0=unassigned; 1=unassigned
> mbm_local_bytes: 0=unassigned; 1=unassigned
> 
> #cat /sys/fs/resctrl/test/mon_L3_assignments
> 0=_; 1=_
> 
> We need to take care that it is always clear what "0" or "1" means ...
> Tony has been mentioning a lot of interesting things about scope
> changes. I assume the "L3" in mon_L3_assignments will dictate the scope?

I didnt think about the scope here. I was thinking of changing it to
"mbm_assignments".

> 
> With a syntax like above the needed information can be presented in one line.
> For example,
> 
> #cat /sys/fs/resctrl/test/mon_L3_assignments
> 0=mbm_total_bytes; 1=mbm_local_bytes
> 
> The caveat is that is only for assigned counters, not shared, so this needs
> to change.
> 
> To support shared assignment ...  I wonder if it will be useful to users to
> get the information on which other monitor groups the counter is shared _with_?
> 
> Some examples:
> 
> a) Just indicate whether a counter is shared or dedicated. (Introduce flags).
> #cat /sys/fs/resctrl/test/mon_L3_assignments
> 0=mbm_total_bytes:s; 1=mbm_local_bytes:d
> 
> b) Indicate which groups a counter is shared with:
> #cat /sys/fs/resctrl/testA/mon_L3_assignments
> 0=mbm_total_bytes:s(testB); 1=mbm_local_bytes:d(not needed but perhaps empty for consistent interface?)
> #cat /sys/fs/resctrl/testB/mon_L3_assignments
> 0=mbm_total_bytes:s(testA); 1=mbm_local_bytes:d(?)

This format does not tell what is going on with all other domains. We need
to display all the domains. I think that is important because users need
to know what to expect reading the events on specific domains.

I think we need to convey all the following information to the user.

1. Event Configuation: What is event configuration applied here?

2. Domains: To which all the domains the configaration is applied?
   This is useful in multi-domain configuration.
   We also need to know if which domains are assigned or unassigned.

3. Type of assignment: Exclusive(e or d) or Shared(s) or Unassigned(_)

4. Finally: Where to read the events?
 This is important when we add "mkdir" support in the future,
mon_data/mon_l3_*/config_name will be created.


With that in mind this might be helpful.

# cat /sys/fs/resctrl/test/mon_L3_assignments
mbm_total_bytes: 0=e; 1=_
mbm_local_bytes: 0=_; 1=s

This format tells the user all the information.
mbm_total_bytes and mbm_local_bytes configurations are applied and
configuration are coming from counter_configs.

User can read the events in
mon_data/mon_L3_*/mbm_total_bytes
mon_data/mon_L3_*/mbm_local_bytes

mbm_total_bytes is assigned on domain 0 and not on domain 1.
Reading the mbm_total_bytes on domain 1 will report "unassigned".

mbm_local_bytes is not assigned on domain 0 and assigned on domain 1.
Reading the mbm_local_bytes on domain 0 will report "unassigned".

I dont have much information on shared assignment at this point. Dont know
if we can display shared group.

> 
> ... (b) may just be overkill and we should instead follow Tony's
> guideline (see https://lore.kernel.org/lkml/Z9CiwLrhuTODruCj@agluck-desk3/ )
> that users should be able to keep track themselves.
> 
> Reinette
> 

-- 
Thanks
Babu Moger
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 10 months, 3 weeks ago
Hi Babu,

On 3/20/25 11:12 AM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 3/19/25 13:36, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 3/14/25 9:18 AM, Moger, Babu wrote:
>>> On 3/13/2025 4:21 PM, Reinette Chatre wrote:
>>>> On 3/13/25 1:13 PM, Moger, Babu wrote:
>>>>> On 3/13/25 11:08, Reinette Chatre wrote:
>>>>>> On 3/12/25 11:14 AM, Moger, Babu wrote:
>>>>>>> On 3/12/25 12:14, Reinette Chatre wrote:
>>>>>>>> On 3/12/25 9:03 AM, Moger, Babu wrote:
>>>>>>>>> On 3/12/25 10:07, Reinette Chatre wrote:
>>>>
>>>>
>>>>> Here are the steps. Just copying steps from Peters proposal.
>>>>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>>>
>>>> Thank you very much for detailing the steps. It is starting the fall into place
>>>> for me.
>>>>
>>>>>
>>>>>
>>>>> 1. Mount the resctrl
>>>>>     mount -t resctrl resctrl /sys/fs/resctrl
>>>>
>>>> I assume that on ABMC system the plan remains to have ABMC enabled by default, which
>>>> will continue to depend on BMEC.
>>>
>>> Yes. ABMC will be enabled by default. ABMC will use the configurations from info/L3_MON/counter_configs. ABMC will not depend on BMEC.
>>
>> I see. The previous dependency was thus just something enforced by OS to support the
>> chosen implementation? 
> 
> Yes. That is correct. We went that route mainly not to change the
> rmid_read operation.
> 
> With ABMC, we need to set Extended EVTID and ABMC bit in QM_EVTSEL
> register while reading the cntr_id events. Will add those patches in next
> version to make it clear.

Thank you.

> 
>> Looks like the two features share some registers.
>>
>>>
>>>> How would the existing BMEC implementation be impacted in this case?
>>>
>>> BMEC will only work with pre-ABMC(or default) mode.
>>
>> ok. Does this mean that if a user boots kernel with "rdt=!bmec" then ABMC will keep working?
> 
> Yes. That is correct.

Just to confirm and bring the two email threads together ... it sounds like the
expectation is that existing users of BMEC are expected to use mon_features to
know if mbm_{total,local}_bytes_config are supported. If system supports ABMC
then BMEC will not be available and thus mon_features will not contain
mbm_{total,local}_bytes_config. Existing users that rely on
mbm_{total,local}_bytes_config will experience failures and are expected
to switch to ABMC?


> 
>>
>>
>>>> Without any changes to BMEC support the mbm_total_bytes_config and mbm_local_bytes_config
>>>> files will remain and user space may continue to use them to change the event
>>>> configurations with confusing expecations/results on an ABMC system.
>>>>
>>>> One possibility may be that a user may see below on ABMC system even if BMEC is supported:
>>>> # cat /sys/fs/resctrl/info/L3_MON/mon_features
>>>> llc_occupancy
>>>> mbm_total_bytes
>>>> mbm_local_bytes
>>>>
>>>> With the above a user cannot be expected to want to interact with mbm_total_bytes_config
>>>> and mbm_local_bytes_config, which may be the simplest to do.
>>>
>>> yes.


...

>>>>
>>>> I do think, and Peter also mentioned [1] this, that it may be useful,
>>>> to "put a group/resource-scoped assign_* file higher in the hierarchy
>>>> to make it easier for users who want to configure all domains the
>>>> same for a group."
>>>>
>>>> Placing *additional* files higher in hierarchy (used to manage counters in all
>>>> domains) may be more useful that trying to provide the shared/exclusive/unassign
>>>> in one file per domain.
>>>
>>> Yea. To make it better we can add "mon_l3_assignments" in groups main directory. We can do all the operation in just one file.
>>>
>>> https://lore.kernel.org/lkml/efb5293f-b0ef-4c94-bf10-9ca7ebb3b53f@amd.com/
>>
>> I am hesitant to respond to that message considering the corporate preamble that
>> sneaked in so I'll just add some thoughts here:
> 
> Yea. I noticed it later. Will take care next time.
> 
>>
>> Having the file higher in hierarchy does seem more useful. It may be useful to reduce
>> amount of parsing to get to needed information. Compare with below two examples that can
>> be used to convey the same information:
>>
>> # cat /sys/fs/resctrl/test/mon_L3_assignments
>> mbm_total_bytes: 0=unassigned; 1=unassigned
>> mbm_local_bytes: 0=unassigned; 1=unassigned
>>
>> #cat /sys/fs/resctrl/test/mon_L3_assignments
>> 0=_; 1=_
>>
>> We need to take care that it is always clear what "0" or "1" means ...
>> Tony has been mentioning a lot of interesting things about scope
>> changes. I assume the "L3" in mon_L3_assignments will dictate the scope?
> 
> I didnt think about the scope here. I was thinking of changing it to
> "mbm_assignments".

ah, I see, not a general "monitoring" file but specific to MBM. This still
may encounter difficulty if AMD does something like SNC where MBM could
be done per numa node. Perhaps we could constrain this even more with a
"mbm_L3_assignments". If anything ever shows up that need to do MBM
counter assignment at some other scope then at least we have the option
to create another file "mbm_?_assignments".

> 
>>
>> With a syntax like above the needed information can be presented in one line.
>> For example,
>>
>> #cat /sys/fs/resctrl/test/mon_L3_assignments
>> 0=mbm_total_bytes; 1=mbm_local_bytes
>>
>> The caveat is that is only for assigned counters, not shared, so this needs
>> to change.
>>
>> To support shared assignment ...  I wonder if it will be useful to users to
>> get the information on which other monitor groups the counter is shared _with_?
>>
>> Some examples:
>>
>> a) Just indicate whether a counter is shared or dedicated. (Introduce flags).
>> #cat /sys/fs/resctrl/test/mon_L3_assignments
>> 0=mbm_total_bytes:s; 1=mbm_local_bytes:d
>>
>> b) Indicate which groups a counter is shared with:
>> #cat /sys/fs/resctrl/testA/mon_L3_assignments
>> 0=mbm_total_bytes:s(testB); 1=mbm_local_bytes:d(not needed but perhaps empty for consistent interface?)
>> #cat /sys/fs/resctrl/testB/mon_L3_assignments
>> 0=mbm_total_bytes:s(testA); 1=mbm_local_bytes:d(?)
> 
> This format does not tell what is going on with all other domains. We need
> to display all the domains. I think that is important because users need
> to know what to expect reading the events on specific domains.
> 
> I think we need to convey all the following information to the user.
> 
> 1. Event Configuation: What is event configuration applied here?
> 
> 2. Domains: To which all the domains the configaration is applied?
>    This is useful in multi-domain configuration.
>    We also need to know if which domains are assigned or unassigned.
> 
> 3. Type of assignment: Exclusive(e or d) or Shared(s) or Unassigned(_)
> 
> 4. Finally: Where to read the events?
>  This is important when we add "mkdir" support in the future,
> mon_data/mon_l3_*/config_name will be created.
> 
> 
> With that in mind this might be helpful.
> 
> # cat /sys/fs/resctrl/test/mon_L3_assignments
> mbm_total_bytes: 0=e; 1=_
> mbm_local_bytes: 0=_; 1=s
> 
> This format tells the user all the information.
> mbm_total_bytes and mbm_local_bytes configurations are applied and
> configuration are coming from counter_configs.
> 
> User can read the events in
> mon_data/mon_L3_*/mbm_total_bytes
> mon_data/mon_L3_*/mbm_local_bytes
> 
> mbm_total_bytes is assigned on domain 0 and not on domain 1.
> Reading the mbm_total_bytes on domain 1 will report "unassigned".
> 
> mbm_local_bytes is not assigned on domain 0 and assigned on domain 1.
> Reading the mbm_local_bytes on domain 0 will report "unassigned".

Thank you very much for spelling it out. Much appreciated. This looks good to me.
Please include your list of requirements for interface in the cover-letter and/or
patch that introduces the interface.

> 
> I dont have much information on shared assignment at this point. Dont know
> if we can display shared group.

The proposed interface accommodates shared counters. The expectation is that
users can keep track themselves and if not, then the information can be
obtained with a read of every group's counter assignment. The issue here is
that this may worst case need a large number of file operations if expectation
is that it will still be possible to create num RMID monitoring groups.

Using files inside monitor group for this information may actually not be ideal.
If this information is needed then we could perhaps add a new file. For
example:
/sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/<file reporting which monitor groups share this counter configuration in different domains>
Of course, I do not know if this will be required and this seems manageable as
a later enhancement if needed.

> 
>>
>> ... (b) may just be overkill and we should instead follow Tony's
>> guideline (see https://lore.kernel.org/lkml/Z9CiwLrhuTODruCj@agluck-desk3/ )
>> that users should be able to keep track themselves.
>>
>> Reinette
>>
> 

Reinette
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 10 months, 3 weeks ago
Hi Reinette,

On 3/20/2025 5:35 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 3/20/25 11:12 AM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 3/19/25 13:36, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 3/14/25 9:18 AM, Moger, Babu wrote:
>>>> On 3/13/2025 4:21 PM, Reinette Chatre wrote:
>>>>> On 3/13/25 1:13 PM, Moger, Babu wrote:
>>>>>> On 3/13/25 11:08, Reinette Chatre wrote:
>>>>>>> On 3/12/25 11:14 AM, Moger, Babu wrote:
>>>>>>>> On 3/12/25 12:14, Reinette Chatre wrote:
>>>>>>>>> On 3/12/25 9:03 AM, Moger, Babu wrote:
>>>>>>>>>> On 3/12/25 10:07, Reinette Chatre wrote:
>>>>>
>>>>>
>>>>>> Here are the steps. Just copying steps from Peters proposal.
>>>>>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>>>>
>>>>> Thank you very much for detailing the steps. It is starting the fall into place
>>>>> for me.
>>>>>
>>>>>>
>>>>>>
>>>>>> 1. Mount the resctrl
>>>>>>      mount -t resctrl resctrl /sys/fs/resctrl
>>>>>
>>>>> I assume that on ABMC system the plan remains to have ABMC enabled by default, which
>>>>> will continue to depend on BMEC.
>>>>
>>>> Yes. ABMC will be enabled by default. ABMC will use the configurations from info/L3_MON/counter_configs. ABMC will not depend on BMEC.
>>>
>>> I see. The previous dependency was thus just something enforced by OS to support the
>>> chosen implementation?
>>
>> Yes. That is correct. We went that route mainly not to change the
>> rmid_read operation.
>>
>> With ABMC, we need to set Extended EVTID and ABMC bit in QM_EVTSEL
>> register while reading the cntr_id events. Will add those patches in next
>> version to make it clear.
> 
> Thank you.
> 
>>
>>> Looks like the two features share some registers.
>>>
>>>>
>>>>> How would the existing BMEC implementation be impacted in this case?
>>>>
>>>> BMEC will only work with pre-ABMC(or default) mode.
>>>
>>> ok. Does this mean that if a user boots kernel with "rdt=!bmec" then ABMC will keep working?
>>
>> Yes. That is correct.
> 
> Just to confirm and bring the two email threads together ... it sounds like the
> expectation is that existing users of BMEC are expected to use mon_features to
> know if mbm_{total,local}_bytes_config are supported. If system supports ABMC
> then BMEC will not be available and thus mon_features will not contain
> mbm_{total,local}_bytes_config. Existing users that rely on
> mbm_{total,local}_bytes_config will experience failures and are expected
> to switch to ABMC?

Yes. Exactly.

> 
> 
>>
>>>
>>>
>>>>> Without any changes to BMEC support the mbm_total_bytes_config and mbm_local_bytes_config
>>>>> files will remain and user space may continue to use them to change the event
>>>>> configurations with confusing expecations/results on an ABMC system.
>>>>>
>>>>> One possibility may be that a user may see below on ABMC system even if BMEC is supported:
>>>>> # cat /sys/fs/resctrl/info/L3_MON/mon_features
>>>>> llc_occupancy
>>>>> mbm_total_bytes
>>>>> mbm_local_bytes
>>>>>
>>>>> With the above a user cannot be expected to want to interact with mbm_total_bytes_config
>>>>> and mbm_local_bytes_config, which may be the simplest to do.
>>>>
>>>> yes.
> 
> 
> ...
> 
>>>>>
>>>>> I do think, and Peter also mentioned [1] this, that it may be useful,
>>>>> to "put a group/resource-scoped assign_* file higher in the hierarchy
>>>>> to make it easier for users who want to configure all domains the
>>>>> same for a group."
>>>>>
>>>>> Placing *additional* files higher in hierarchy (used to manage counters in all
>>>>> domains) may be more useful that trying to provide the shared/exclusive/unassign
>>>>> in one file per domain.
>>>>
>>>> Yea. To make it better we can add "mon_l3_assignments" in groups main directory. We can do all the operation in just one file.
>>>>
>>>> https://lore.kernel.org/lkml/efb5293f-b0ef-4c94-bf10-9ca7ebb3b53f@amd.com/
>>>
>>> I am hesitant to respond to that message considering the corporate preamble that
>>> sneaked in so I'll just add some thoughts here:
>>
>> Yea. I noticed it later. Will take care next time.
>>
>>>
>>> Having the file higher in hierarchy does seem more useful. It may be useful to reduce
>>> amount of parsing to get to needed information. Compare with below two examples that can
>>> be used to convey the same information:
>>>
>>> # cat /sys/fs/resctrl/test/mon_L3_assignments
>>> mbm_total_bytes: 0=unassigned; 1=unassigned
>>> mbm_local_bytes: 0=unassigned; 1=unassigned
>>>
>>> #cat /sys/fs/resctrl/test/mon_L3_assignments
>>> 0=_; 1=_
>>>
>>> We need to take care that it is always clear what "0" or "1" means ...
>>> Tony has been mentioning a lot of interesting things about scope
>>> changes. I assume the "L3" in mon_L3_assignments will dictate the scope?
>>
>> I didnt think about the scope here. I was thinking of changing it to
>> "mbm_assignments".
> 
> ah, I see, not a general "monitoring" file but specific to MBM. This still
> may encounter difficulty if AMD does something like SNC where MBM could
> be done per numa node. Perhaps we could constrain this even more with a
> "mbm_L3_assignments". If anything ever shows up that need to do MBM
> counter assignment at some other scope then at least we have the option
> to create another file "mbm_?_assignments".

Yes. Sounds good to me.

> 
>>
>>>
>>> With a syntax like above the needed information can be presented in one line.
>>> For example,
>>>
>>> #cat /sys/fs/resctrl/test/mon_L3_assignments
>>> 0=mbm_total_bytes; 1=mbm_local_bytes
>>>
>>> The caveat is that is only for assigned counters, not shared, so this needs
>>> to change.
>>>
>>> To support shared assignment ...  I wonder if it will be useful to users to
>>> get the information on which other monitor groups the counter is shared _with_?
>>>
>>> Some examples:
>>>
>>> a) Just indicate whether a counter is shared or dedicated. (Introduce flags).
>>> #cat /sys/fs/resctrl/test/mon_L3_assignments
>>> 0=mbm_total_bytes:s; 1=mbm_local_bytes:d
>>>
>>> b) Indicate which groups a counter is shared with:
>>> #cat /sys/fs/resctrl/testA/mon_L3_assignments
>>> 0=mbm_total_bytes:s(testB); 1=mbm_local_bytes:d(not needed but perhaps empty for consistent interface?)
>>> #cat /sys/fs/resctrl/testB/mon_L3_assignments
>>> 0=mbm_total_bytes:s(testA); 1=mbm_local_bytes:d(?)
>>
>> This format does not tell what is going on with all other domains. We need
>> to display all the domains. I think that is important because users need
>> to know what to expect reading the events on specific domains.
>>
>> I think we need to convey all the following information to the user.
>>
>> 1. Event Configuation: What is event configuration applied here?
>>
>> 2. Domains: To which all the domains the configaration is applied?
>>     This is useful in multi-domain configuration.
>>     We also need to know if which domains are assigned or unassigned.
>>
>> 3. Type of assignment: Exclusive(e or d) or Shared(s) or Unassigned(_)
>>
>> 4. Finally: Where to read the events?
>>   This is important when we add "mkdir" support in the future,
>> mon_data/mon_l3_*/config_name will be created.
>>
>>
>> With that in mind this might be helpful.
>>
>> # cat /sys/fs/resctrl/test/mon_L3_assignments
>> mbm_total_bytes: 0=e; 1=_
>> mbm_local_bytes: 0=_; 1=s
>>
>> This format tells the user all the information.
>> mbm_total_bytes and mbm_local_bytes configurations are applied and
>> configuration are coming from counter_configs.
>>
>> User can read the events in
>> mon_data/mon_L3_*/mbm_total_bytes
>> mon_data/mon_L3_*/mbm_local_bytes
>>
>> mbm_total_bytes is assigned on domain 0 and not on domain 1.
>> Reading the mbm_total_bytes on domain 1 will report "unassigned".
>>
>> mbm_local_bytes is not assigned on domain 0 and assigned on domain 1.
>> Reading the mbm_local_bytes on domain 0 will report "unassigned".
> 
> Thank you very much for spelling it out. Much appreciated. This looks good to me.
> Please include your list of requirements for interface in the cover-letter and/or
> patch that introduces the interface.

Sure. Will do.

> 
>>
>> I dont have much information on shared assignment at this point. Dont know
>> if we can display shared group.
> 
> The proposed interface accommodates shared counters. The expectation is that
> users can keep track themselves and if not, then the information can be
> obtained with a read of every group's counter assignment. The issue here is
> that this may worst case need a large number of file operations if expectation
> is that it will still be possible to create num RMID monitoring groups.
> 
> Using files inside monitor group for this information may actually not be ideal.
> If this information is needed then we could perhaps add a new file. For
> example:
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/<file reporting which monitor groups share this counter configuration in different domains>
> Of course, I do not know if this will be required and this seems manageable as
> a later enhancement if needed.
> 

Yes. It can be done this way.

Thanks
Babu

Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Luck, Tony 11 months ago
On Thu, Mar 13, 2025 at 03:13:32PM -0500, Moger, Babu wrote:
> Hi Reinette,
> 
> On 3/13/25 11:08, Reinette Chatre wrote:
> > Hi Babu,
> > 
> > On 3/12/25 11:14 AM, Moger, Babu wrote:
> >> Hi Reinette,
> >>
> >> On 3/12/25 12:14, Reinette Chatre wrote:
> >>> Hi Babu,
> >>>
> >>> On 3/12/25 9:03 AM, Moger, Babu wrote:
> >>>> Hi Reinette,
> >>>>
> >>>> On 3/12/25 10:07, Reinette Chatre wrote:
> >>>>> Hi Babu,
> >>>>>
> ..
> 
> >>>>>> We can add the mkdir support later. That way we can provide basic ABMC
> >>>>>> support without too much code complexity with mkdir support.
> >>>>>
> >>>>> This is not clear to me how you envision the "first phase". Is it what you
> >>>>> proposed above, for example:
> >>>>>       #echo "LclFill, LclNTWr, RmtFill" >
> >>>>>          /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
> >>>>>
> >>>>> In above the counter configuration name is a file. 
> >>>>
> >>>> Yes. That is correct.
> >>>>
> >>>> There will be two configuration files by default when resctrl is mounted
> >>>> when ABMC is enabled.
> >>>> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
> >>>> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
> >>>>
> >>>>>
> >>>>> How could mkdir support be added to this later if there are already files present?
> >>>>
> >>>> We already have these directories when resctrl is mounted.
> >>>> /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
> >>>> /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
> >>>> /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
> >>>> /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
> >>>>
> >>>> We dont need "mkdir" support for default  configurations.
> >>>
> >>> I was referring to the "mkdir" support for additional configurations that
> >>> I understood you are thinking about adding later. For example,
> >>> (copied from Peter's message
> >>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/):
> >>>
> >>>
> >>>  # mkdir info/L3_MON/counter_configs/mbm_local_bytes
> >>>  # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> >>>  # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> >>>  # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> >>>  # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> >>> LclFill
> >>> LclNTWr
> >>> LclSlowFill
> >>>
> >>> Any "later" work needs to be backward compatible with the first phase.
> >>
> >> Actually, we dont need extra file "event_filter".
> >> This was discussed here.
> >> https://lore.kernel.org/lkml/CALPaoChLL8p49eANYgQ0dJiFs7G=223fGae+LJyx3DwEhNeR8A@mail.gmail.com/
> > 
> > I undestand from that exchange that it is possible to read/write from
> > an *existing* kernfs file but it is not obvious to me how that file is
> > planned to be created.
> 
> My bad.. I misspoke here. We need "event_filter" file under each
> configuration.
> 
> 
> > 
> > My understanding of the motivation behind support for "mkdir" is to enable
> > user space to create custom counter configurations.
> > 
> 
> That is correct.
> 
> > I understand that ABMC support aims to start with existing mbm_total_bytes/mbm_local_bytes
> > configurations but I believe the consensus is that custom configurations need
> > to be supported in the future.
> > If resctrl starts with support where counter configuration as
> > managed with a *file*, for example:
> > /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
> > how will user space create future custom configurations?
> > As I understand that is only possible with mkdir.
> > 
> >>
> >> # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes
> >> # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes
> >> # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes
> >> # cat info/L3_MON/counter_configs/mbm_local_bytes
> >>  LclFill
> >>  LclNTWr
> >>  LclSlowFill
> >>
> >> In the future, we can add mkdir support.
> >>
> >> # mkdir info/L3_MON/counter_configs/mbm_read_only
> > 
> > This is exactly my concern. resctrl should not start with a user space where
> > a counter configuration is a file (mbm_local_bytes/mbm_total_bytes) and then
> > switch user space interface to have counter configuration be done with
> > directories.
> > 
> >> # echo LclFill > info/L3_MON/counter_configs/mbm_read_only
> >> # cat info/L3_MON/counter_configs/mbm_read_only
> >>   LclFill
> > 
> > ... wait ... user space writes to the directory?
> > 
> 
> My bad. This is wrong. Let me rewrite the steps below.
> 
> > 
> > 
> >>
> >> #echo mbm_read_only > test/mon_data/mon_L3_00/assign_exclusive
> >>
> >> Which would result in the creation of test/mon_data/mon_L3_*/mbm_read_only
> >>
> >> So, there is not breakage of backword compatibility.
> > 
> > The way I understand it I am seeing many incompatibilities. Perhaps I am missing
> > something. Could you please provide detailed steps of how first phase and
> > second phase would look?
> 
> No. You didn't miss anything. I misspoke on few steps.
> 
> Here are the steps. Just copying steps from Peters proposal.
> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
> 
> 
> 1. Mount the resctrl
>    mount -t resctrl resctrl /sys/fs/resctrl
> 
> 2. When ABMC is supported two default configurations will be created.
> 
>   a. info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>   b. info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> 
>   These files will be populated with default total and local events
>   # cat info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>     VictimBW
>     RmtSlowFill
>     RmtNTWr
>     RmtFill
>     LclFill
>     LclNTWr
>     LclSlowFill
> 
>   # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>    LclFill,
>    LclNTWr
>    LclSlowFill
> 
> 3. Users will have options to update the event configuration.
>    echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter

Once the "mkdir" support described below is implemented users will not
need to redefine these legacy event file names. That makes me happy.

> 
> 4. As usual the events can be read from the mon_data directories.
>    #mkdir /sys/fs/resctrl/test
>    #cd   /sys/fs/resctr/test
>    #cat  test/mon_data/mon_data/mon_L3_00/mbm_tota_bytes
>    101010
>    #cat   test/mon_data/mon_data/mon_L3_00/mbm_local_bytes
>    32323
> 
> 5. There will be 3 files created in each group's mon_data directory when
> ABMC is supported.
> 
>    a. test/mon_data/mon_L3_00/assign_exclusive
>    b. test/mon_data/mon_L3_00/assign_shared
>    c. test/mon_data/mon_L3_00/unassign
> 
> 
> 6. Events can be assigned/unassigned by these commands
> 
>  # echo mbm_total_bytes > test/mon_data/mon_L3_00/assign_exclusive
>  # echo mbm_local_bytes > test/mon_data/mon_L3_01/assign_exclusive
>  # echo mbm_local_bytes > test/mon_data/mon_L3_01/unassign
> 
> 
> Note:
> I feel 3 files are excessive here. We can probably achieve everything in
> just one file.

Maybe the one file could look like:

# cat mon_L3_assignments
mbm_total_bytes: exclusive
mbm_local_bytes: shared
mbm_read_only: unassigned

with new lines appearing when mkdir creates new events, and the obvious
write semantics:

# echo "mbm_total_bytes: unassigned" > mon_L3_assignments

to make updates.

> Not sure about mbm_assign_control interface as there are concerns with
> group listing holding the lock for long.
> 
> -----------------------------------------------------------------------
> Second phase, we can add support for "mkdir"
> 
> 1. mkdir info/L3_MON/counter_configs/mbm_read_only
> 
> 2. mkdir option will create "event_filter" file.
>    info/L3_MON/counter_configs/mbm_read_only/event_filter
> 
> 3. Users can modify event configuration.
>    echo LclFill > info/L3_MON/counter_configs/mbm_read_only/event_filter
> 
> 4. Users can assign the events
> 
>   echo mbm_read_only > test/mon_data/mon_L3_00/assign_exclusive
> 
> 5. Events can be read in
> 
>    test/mon_data/mon_data/mon_L3_00/mbm_read_only

Is there a matching "rmdir" to make this go away again?

> -- 
> Thanks
> Babu Moger
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 11 months ago
[AMD Official Use Only - AMD Internal Distribution Only]

Hi Tony,

On 3/13/2025 3:36 PM, Luck, Tony wrote:
> On Thu, Mar 13, 2025 at 03:13:32PM -0500, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 3/13/25 11:08, Reinette Chatre wrote:
>>> Hi Babu,
>>>

>>>> #echo mbm_read_only > test/mon_data/mon_L3_00/assign_exclusive
>>>>
>>>> Which would result in the creation of test/mon_data/mon_L3_*/mbm_read_only
>>>>
>>>> So, there is not breakage of backword compatibility.
>>>
>>> The way I understand it I am seeing many incompatibilities. Perhaps I am missing
>>> something. Could you please provide detailed steps of how first phase and
>>> second phase would look?
>>
>> No. You didn't miss anything. I misspoke on few steps.
>>
>> Here are the steps. Just copying steps from Peters proposal.
>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>
>>
>> 1. Mount the resctrl
>>     mount -t resctrl resctrl /sys/fs/resctrl
>>
>> 2. When ABMC is supported two default configurations will be created.
>>
>>    a. info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>    b. info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>
>>    These files will be populated with default total and local events
>>    # cat info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>>      VictimBW
>>      RmtSlowFill
>>      RmtNTWr
>>      RmtFill
>>      LclFill
>>      LclNTWr
>>      LclSlowFill
>>
>>    # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>     LclFill,
>>     LclNTWr
>>     LclSlowFill
>>
>> 3. Users will have options to update the event configuration.
>>     echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>
> Once the "mkdir" support described below is implemented users will not
> need to redefine these legacy event file names. That makes me happy.

Yea. That is correct.

>
>>
>> 4. As usual the events can be read from the mon_data directories.
>>     #mkdir /sys/fs/resctrl/test
>>     #cd   /sys/fs/resctr/test
>>     #cat  test/mon_data/mon_data/mon_L3_00/mbm_tota_bytes
>>     101010
>>     #cat   test/mon_data/mon_data/mon_L3_00/mbm_local_bytes
>>     32323
>>
>> 5. There will be 3 files created in each group's mon_data directory when
>> ABMC is supported.
>>
>>     a. test/mon_data/mon_L3_00/assign_exclusive
>>     b. test/mon_data/mon_L3_00/assign_shared
>>     c. test/mon_data/mon_L3_00/unassign
>>
>>
>> 6. Events can be assigned/unassigned by these commands
>>
>>   # echo mbm_total_bytes > test/mon_data/mon_L3_00/assign_exclusive
>>   # echo mbm_local_bytes > test/mon_data/mon_L3_01/assign_exclusive
>>   # echo mbm_local_bytes > test/mon_data/mon_L3_01/unassign
>>
>>
>> Note:
>> I feel 3 files are excessive here. We can probably achieve everything in
>> just one file.
>
> Maybe the one file could look like:
>
> # cat mon_L3_assignments
> mbm_total_bytes: exclusive
> mbm_local_bytes: shared
> mbm_read_only: unassigned
>
> with new lines appearing when mkdir creates new events, and the obvious
> write semantics:
>
> # echo "mbm_total_bytes: unassigned" > mon_L3_assignments
>
> to make updates.

Yes. That would work.  Also we could move the file to group's main directory like we have other files already.

#cat /sys/fs/resctrl/test/mon_L3_assignments
 mbm_total_bytes: 0=unassigned; 1=unassigned
 mbm_local_bytes: 0=unassigned; 1=unassigned

To assign mbm_total_bytes config on domain 0.

$echo "mbm_total_bytes: 0=exclusive " > mon_L3_assignments

To assign mbm_total_bytes config on all the domains.

$echo "mbm_total_bytes: *=exclusive " > mon_L3_assignments

#cat /sys/fs/resctrl/test/mon_L3_assignments
 mbm_total_bytes: 0=exclusive; 1=exclusive
 mbm_local_bytes: 0=unassigned; 1=unassigned

>
>> Not sure about mbm_assign_control interface as there are concerns with
>> group listing holding the lock for long.
>>
>> -----------------------------------------------------------------------
>> Second phase, we can add support for "mkdir"
>>
>> 1. mkdir info/L3_MON/counter_configs/mbm_read_only
>>
>> 2. mkdir option will create "event_filter" file.
>>     info/L3_MON/counter_configs/mbm_read_only/event_filter
>>
>> 3. Users can modify event configuration.
>>     echo LclFill > info/L3_MON/counter_configs/mbm_read_only/event_filter
>>
>> 4. Users can assign the events
>>
>>    echo mbm_read_only > test/mon_data/mon_L3_00/assign_exclusive
>>
>> 5. Events can be read in
>>
>>     test/mon_data/mon_data/mon_L3_00/mbm_read_only
>
> Is there a matching "rmdir" to make this go away again?
>

I would think so.

Thanks
Babu
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Luck, Tony 11 months ago
On Tue, Mar 11, 2025 at 03:35:28PM -0500, Moger, Babu wrote:
> Hi All,
> 
> On 3/10/25 22:51, Reinette Chatre wrote:
> > 
> > 
> > On 3/10/25 6:44 PM, Moger, Babu wrote:
> >> Hi Tony,
> >>
> >> On 3/10/2025 6:22 PM, Luck, Tony wrote:
> >>> On Mon, Mar 10, 2025 at 05:48:44PM -0500, Moger, Babu wrote:
> >>>> Hi All,
> >>>>
> >>>> On 3/5/2025 1:34 PM, Moger, Babu wrote:
> >>>>> Hi Peter,
> >>>>>
> >>>>> On 3/5/25 04:40, Peter Newman wrote:
> >>>>>> Hi Babu,
> >>>>>>
> >>>>>> On Tue, Mar 4, 2025 at 10:49 PM Moger, Babu <babu.moger@amd.com> wrote:
> >>>>>>>
> >>>>>>> Hi Peter,
> >>>>>>>
> >>>>>>> On 3/4/25 10:44, Peter Newman wrote:
> >>>>>>>> On Mon, Mar 3, 2025 at 8:16 PM Moger, Babu <babu.moger@amd.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Peter/Reinette,
> >>>>>>>>>
> >>>>>>>>> On 2/26/25 07:27, Peter Newman wrote:
> >>>>>>>>>> Hi Babu,
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@amd.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi Peter,
> >>>>>>>>>>>
> >>>>>>>>>>> On 2/25/25 11:11, Peter Newman wrote:
> >>>>>>>>>>>> Hi Reinette,
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
> >>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi Peter,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 2/21/25 5:12 AM, Peter Newman wrote:
> >>>>>>>>>>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
> >>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
> >>>>>>>>>>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
> >>>>>>>>>>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
> >>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
> >>>>>>>>>>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
> >>>>>>>>>>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
> >>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
> >>>>>>>>>>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
> >>>>>>>>>>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
> >>>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
> >>>>>>>>>>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
> >>>>>>>>>>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
> >>>>>>>>>>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
> >>>>>>>>>>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
> >>>>>>>>>>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
> >>>>>>>>>>>>>>>>>>>>>>>>> Please help me understand if you see it differently.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
> >>>>>>>>>>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_read_bytes a
> >>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_write_bytes b
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
> >>>>>>>>>>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
> >>>>>>>>>>>>>>>>>>>>>>>>> <value>
> >>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> >>>>>>>>>>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
> >>>>>>>>>>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
> >>>>>>>>>>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
> >>>>>>>>>>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
> >>>>>>>>>>>>>>>>>>>>> is low enough to be of concern.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
> >>>>>>>>>>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
> >>>>>>>>>>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
> >>>>>>>>>>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
> >>>>>>>>>>>>>>>>>>>> investigation, I would question whether they know what they're looking
> >>>>>>>>>>>>>>>>>>>> for.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
> >>>>>>>>>>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
> >>>>>>>>>>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
> >>>>>>>>>>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
> >>>>>>>>>>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
> >>>>>>>>>>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
> >>>>>>>>>>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
> >>>>>>>>>>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
> >>>>>>>>>>>>>>>>>>> customers.
> >>>>>>>>>>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
> >>>>>>>>>>>>>>>>>> event-set for applying to a single counter rather than as individual
> >>>>>>>>>>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
> >>>>>>>>>>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
> >>>>>>>>>>>>>>>>>> event names.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thank you for clarifying.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> In the letters as events model, choosing the events assigned to a
> >>>>>>>>>>>>>>>>>> group wouldn't be enough information, since we would want to control
> >>>>>>>>>>>>>>>>>> which events should share a counter and which should be counted by
> >>>>>>>>>>>>>>>>>> separate counters. I think the amount of information that would need
> >>>>>>>>>>>>>>>>>> to be encoded into mbm_assign_control to represent the level of
> >>>>>>>>>>>>>>>>>> configurability supported by hardware would quickly get out of hand.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
> >>>>>>>>>>>>>>>>>> writes in ABMC would look like...
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> (per domain)
> >>>>>>>>>>>>>>>>>> group 0:
> >>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>>>>>>>>>>> group 1:
> >>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>>>>>>>>>>> ...
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
> >>>>>>>>>>>>>>>>> example and above the counter configuration appears to be global. You do mention
> >>>>>>>>>>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
> >>>>>>>>>>>>>>>>> configuration is a requirement?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> If it's global and we want a particular group to be watched by more
> >>>>>>>>>>>>>>>> counters, I wouldn't want this to result in allocating more counters
> >>>>>>>>>>>>>>>> for that group in all domains, or allocating counters in domains where
> >>>>>>>>>>>>>>>> they're not needed. I want to encourage my users to avoid allocating
> >>>>>>>>>>>>>>>> monitoring resources in domains where a job is not allowed to run so
> >>>>>>>>>>>>>>>> there's less pressure on the counters.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> In Dave's proposal it looks like global configuration means
> >>>>>>>>>>>>>>>> globally-defined "named counter configurations", which works because
> >>>>>>>>>>>>>>>> it's really per-domain assignment of the configurations to however
> >>>>>>>>>>>>>>>> many counters the group needs in each domain.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I think I am becoming lost. Would a global configuration not break your
> >>>>>>>>>>>>>>> view of "event-set applied to a single counter"? If a counter is configured
> >>>>>>>>>>>>>>> globally then it would not make it possible to support the full configurability
> >>>>>>>>>>>>>>> of the hardware.
> >>>>>>>>>>>>>>> Before I add more confusion, let me try with an example that builds on your
> >>>>>>>>>>>>>>> earlier example copied below:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> (per domain)
> >>>>>>>>>>>>>>>>>> group 0:
> >>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>>>>>>>>>>> group 1:
> >>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>>>>>>>>>>> ...
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Since the above states "per domain" I rewrite the example to highlight that as
> >>>>>>>>>>>>>>> I understand it:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> group 0:
> >>>>>>>>>>>>>>>    domain 0:
> >>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>>>>>>>>    domain 1:
> >>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>>>>>>>> group 1:
> >>>>>>>>>>>>>>>    domain 0:
> >>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>>>>>>>>    domain 1:
> >>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> You mention that you do not want counters to be allocated in domains that they
> >>>>>>>>>>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
> >>>>>>>>>>>>>>> in domain 1, resulting in:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> group 0:
> >>>>>>>>>>>>>>>    domain 0:
> >>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>>>>>>>> group 1:
> >>>>>>>>>>>>>>>    domain 0:
> >>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>>>>>>>>    domain 1:
> >>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> With counter 0 and counter 1 available in domain 1, these counters could
> >>>>>>>>>>>>>>> theoretically be configured to give group 1 more data in domain 1:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> group 0:
> >>>>>>>>>>>>>>>    domain 0:
> >>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>>>>>>>> group 1:
> >>>>>>>>>>>>>>>    domain 0:
> >>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>>>>>>>>    domain 1:
> >>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill
> >>>>>>>>>>>>>>>     counter 1: LclNTWr,RmtNTWr
> >>>>>>>>>>>>>>>     counter 2: LclSlowFill,RmtSlowFill
> >>>>>>>>>>>>>>>     counter 3: VictimBW
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The counters are shown with different per-domain configurations that seems to
> >>>>>>>>>>>>>>> match with earlier goals of (a) choose events counted by each counter and
> >>>>>>>>>>>>>>> (b) do not allocate counters in domains where they are not needed. As I
> >>>>>>>>>>>>>>> understand the above does contradict global counter configuration though.
> >>>>>>>>>>>>>>> Or do you mean that only the *name* of the counter is global and then
> >>>>>>>>>>>>>>> that it is reconfigured as part of every assignment?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes, I meant only the *name* is global. I assume based on a particular
> >>>>>>>>>>>>>> system configuration, the user will settle on a handful of useful
> >>>>>>>>>>>>>> groupings to count.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>    # define global configurations (in ABMC terms), not necessarily in this
> >>>>>>>>>>>>>>    # syntax and probably not in the mbm_assign_control file.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>    r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>>>>>>>    w=VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>    # legacy "total" configuration, effectively r+w
> >>>>>>>>>>>>>>    t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>    /group0/0=t;1=t
> >>>>>>>>>>>>>>    /group1/0=t;1=t
> >>>>>>>>>>>>>>    /group2/0=_;1=t
> >>>>>>>>>>>>>>    /group3/0=rw;1=_
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> - group2 is restricted to domain 0
> >>>>>>>>>>>>>> - group3 is restricted to domain 1
> >>>>>>>>>>>>>> - the rest are unrestricted
> >>>>>>>>>>>>>> - In group3, we decided we need to separate read and write traffic
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I see. Thank you for the example.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> resctrl supports per-domain configurations with the following possible when
> >>>>>>>>>>>>> using mbm_total_bytes_config and mbm_local_bytes_config:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>>>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>      /group0/0=t;1=t
> >>>>>>>>>>>>>      /group1/0=t;1=t
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Even though the flags are identical in all domains, the assigned counters will
> >>>>>>>>>>>>> be configured differently in each domain.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> With this supported by hardware and currently also supported by resctrl it seems
> >>>>>>>>>>>>> reasonable to carry this forward to what will be supported next.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The hardware supports both a per-domain mode, where all groups in a
> >>>>>>>>>>>> domain use the same configurations and are limited to two events per
> >>>>>>>>>>>> group and a per-group mode where every group can be configured and
> >>>>>>>>>>>> assigned freely. This series is using the legacy counter access mode
> >>>>>>>>>>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
> >>>>>>>>>>>> in the domain can be read. If we chose to read the assigned counter
> >>>>>>>>>>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
> >>>>>>>>>>>> rather than asking the hardware to find the counter by RMID, we would
> >>>>>>>>>>>> not be limited to 2 counters per group/domain and the hardware would
> >>>>>>>>>>>> have the same flexibility as on MPAM.
> >>>>>>>>>>>
> >>>>>>>>>>> In extended mode, the contents of a specific counter can be read by
> >>>>>>>>>>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
> >>>>>>>>>>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
> >>>>>>>>>>> QM_CTR will then return the contents of the specified counter.
> >>>>>>>>>>>
> >>>>>>>>>>> It is documented below.
> >>>>>>>>>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
> >>>>>>>>>>>    Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
> >>>>>>>>>>>
> >>>>>>>>>>> We previously discussed this with you (off the public list) and I
> >>>>>>>>>>> initially proposed the extended assignment mode.
> >>>>>>>>>>>
> >>>>>>>>>>> Yes, the extended mode allows greater flexibility by enabling multiple
> >>>>>>>>>>> counters to be assigned to the same group, rather than being limited to
> >>>>>>>>>>> just two.
> >>>>>>>>>>>
> >>>>>>>>>>> However, the challenge is that we currently lack the necessary interfaces
> >>>>>>>>>>> to configure multiple events per group. Without these interfaces, the
> >>>>>>>>>>> extended mode is not practical at this time.
> >>>>>>>>>>>
> >>>>>>>>>>> Therefore, we ultimately agreed to use the legacy mode, as it does not
> >>>>>>>>>>> require modifications to the existing interface, allowing us to continue
> >>>>>>>>>>> using it as is.
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> (I might have said something confusing in my last messages because I
> >>>>>>>>>>>> had forgotten that I switched to the extended assignment mode when
> >>>>>>>>>>>> prototyping with soft-ABMC and MPAM.)
> >>>>>>>>>>>>
> >>>>>>>>>>>> Forcing all groups on a domain to share the same 2 counter
> >>>>>>>>>>>> configurations would not be acceptable for us, as the example I gave
> >>>>>>>>>>>> earlier is one I've already been asked about.
> >>>>>>>>>>>
> >>>>>>>>>>> I don’t see this as a blocker. It should be considered an extension to the
> >>>>>>>>>>> current ABMC series. We can easily build on top of this series once we
> >>>>>>>>>>> finalize how to configure the multiple event interface for each group.
> >>>>>>>>>>
> >>>>>>>>>> I don't think it is, either. Only being able to use ABMC to assign
> >>>>>>>>>> counters is fine for our use as an incremental step. My longer-term
> >>>>>>>>>> concern is the domain-scoped mbm_total_bytes_config and
> >>>>>>>>>> mbm_local_bytes_config files, but they were introduced with BMEC, so
> >>>>>>>>>> there's already an expectation that the files are present when BMEC is
> >>>>>>>>>> supported.
> >>>>>>>>>>
> >>>>>>>>>> On ABMC hardware that also supports BMEC, I'm concerned about enabling
> >>>>>>>>>> ABMC when only the BMEC-style event configuration interface exists.
> >>>>>>>>>> The scope of my issue is just whether enabling "full" ABMC support
> >>>>>>>>>> will require an additional opt-in, since that could remove the BMEC
> >>>>>>>>>> interface. If it does, it's something we can live with.
> >>>>>>>>>
> >>>>>>>>> As you know, this series is currently blocked without further feedback.
> >>>>>>>>>
> >>>>>>>>> I’d like to begin reworking these patches to incorporate Peter’s feedback.
> >>>>>>>>> Any input or suggestions would be appreciated.
> >>>>>>>>>
> >>>>>>>>> Here’s what we’ve learned so far:
> >>>>>>>>>
> >>>>>>>>> 1. Assignments should be independent of BMEC.
> >>>>>>>>> 2. We should be able to specify multiple event types to a counter (e.g.,
> >>>>>>>>> read, write, victimBM, etc.). This is also called shared counter
> >>>>>>>>> 3. There should be an option to assign events per domain.
> >>>>>>>>> 4. Currently, only two counters can be assigned per group, but the design
> >>>>>>>>> should allow flexibility to assign more in the future as the interface
> >>>>>>>>> evolves.
> >>>>>>>>> 5. Utilize the extended RMID read mode.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Here is my proposal using Peter's earlier example:
> >>>>>>>>>
> >>>>>>>>> # define event configurations
> >>>>>>>>>
> >>>>>>>>> ========================================================
> >>>>>>>>> Bits    Mnemonics       Description
> >>>>>>>>> ====   ========================================================
> >>>>>>>>> 6       VictimBW        Dirty Victims from all types of memory
> >>>>>>>>> 5       RmtSlowFill     Reads to slow memory in the non-local NUMA domain
> >>>>>>>>> 4       LclSlowFill     Reads to slow memory in the local NUMA domain
> >>>>>>>>> 3       RmtNTWr         Non-temporal writes to non-local NUMA domain
> >>>>>>>>> 2       LclNTWr         Non-temporal writes to local NUMA domain
> >>>>>>>>> 1       mtFill          Reads to memory in the non-local NUMA domain
> >>>>>>>>> 0       LclFill         Reads to memory in the local NUMA domain
> >>>>>>>>> ====    ========================================================
> >>>>>>>>>
> >>>>>>>>> #Define flags based on combination of above event types.
> >>>>>>>>>
> >>>>>>>>> t = LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>> l = LclFill, LclNTWr, LclSlowFill
> >>>>>>>>> r = LclFill,RmtFill,LclSlowFill,RmtSlowFill
> >>>>>>>>> w = VictimBW,LclNTWr,RmtNTWr
> >>>>>>>>> v = VictimBW
> >>>>>>>>>
> >>>>>>>>> Peter suggested the following format earlier :
> >>>>>>>>>
> >>>>>>>>> /group0/0=t;1=t
> >>>>>>>>> /group1/0=t;1=t
> >>>>>>>>> /group2/0=_;1=t
> >>>>>>>>> /group3/0=rw;1=_
> >>>>>>>>
> >>>>>>>> After some inquiries within Google, it sounds like nobody has invested
> >>>>>>>> much into the current mbm_assign_control format yet, so it would be
> >>>>>>>> best to drop it and distribute the configuration around the filesystem
> >>>>>>>> hierarchy[1], which should allow us to produce something more flexible
> >>>>>>>> and cleaner to implement.
> >>>>>>>>
> >>>>>>>> Roughly what I had in mind:
> >>>>>>>>
> >>>>>>>> Use mkdir in a info/<resource>_MON subdirectory to create free-form
> >>>>>>>> names for the assignable configurations rather than being restricted
> >>>>>>>> to single letters.  In the resulting directory, populate a file where
> >>>>>>>> we can specify the set of events the config should represent. I think
> >>>>>>>> we should use symbolic names for the events rather than raw BMEC field
> >>>>>>>> values. Moving forward we could come up with portable names for common
> >>>>>>>> events and only support the BMEC names on AMD machines for users who
> >>>>>>>> want specific events and don't care about portability.
> >>>>>>>
> >>>>>>>
> >>>>>>> I’m still processing this. Let me start with some initial questions.
> >>>>>>>
> >>>>>>> So, we are creating event configurations here, which seems reasonable.
> >>>>>>>
> >>>>>>> Yes, we should use portable names and are not limited to BMEC names.
> >>>>>>>
> >>>>>>> How many configurations should we allow? Do we know?
> >>>>>>
> >>>>>> Do we need an upper limit?
> >>>>>
> >>>>> I think so. This needs to be maintained in some data structure. We can
> >>>>> start with 2 default configurations for now.
> > 
> > There is a big difference between no upper limit and 2. The hardware is
> > capable of supporting per-domain configurations so more flexibility is
> > certainly possible. Consider the example presented by Peter in:
> > https://lore.kernel.org/lkml/CALPaoCi0mFZ9TycyNs+SCR+2tuRJovQ2809jYMun4HtC64hJmA@mail.gmail.com/
> > 
> >>>>>>>> Next, put assignment-control file nodes in per-domain directories
> >>>>>>>> (i.e., mon_data/mon_L3_00/assign_{exclusive,shared}). Writing a
> >>>>>>>> counter-configuration name into the file would then allocate a counter
> >>>>>>>> in the domain, apply the named configuration, and monitor the parent
> >>>>>>>> group-directory. We can also put a group/resource-scoped assign_* file
> >>>>>>>> higher in the hierarchy to make it easier for users who want to
> >>>>>>>> configure all domains the same for a group.
> >>>>>>>
> >>>>>>> What is the difference between shared and exclusive?
> >>>>>>
> >>>>>> Shared assignment[1] means that non-exclusively-assigned counters in
> >>>>>> each domain will be scheduled round-robin to the groups requesting
> >>>>>> shared access to a counter. In my tests, I assigned the counters long
> >>>>>> enough to produce a single 1-second MB/s sample for the per-domain
> >>>>>> aggregation files[2].
> >>>>>>
> >>>>>> These do not need to be implemented immediately, but knowing that they
> >>>>>> work addresses the overhead and scalability concerns of reassigning
> >>>>>> counters and reading their values.
> >>>>>
> >>>>> Ok. Lets focus on exclusive assignments for now.
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> Having three files—assign_shared, assign_exclusive, and unassign—for each
> >>>>>>> domain seems excessive. In a system with 32 groups and 12 domains, this
> >>>>>>> results in 32 × 12 × 3 files, which is quite large.
> >>>>>>>
> >>>>>>> There should be a more efficient way to handle this.
> >>>>>>>
> >>>>>>> Initially, we started with a group-level file for this interface, but it
> >>>>>>> was rejected due to the high number of sysfs calls, making it inefficient.
> >>>>>>
> >>>>>> I had rejected it due to the high-frequency of access of a large
> >>>>>> number of files, which has since been addressed by shared assignment
> >>>>>> (or automatic reassignment) and aggregated mbps files.
> >>>>>
> >>>>> I think we should address this as well. Creating three extra files for
> >>>>> each group isn’t ideal when there are more efficient alternatives.
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> Additionally, how can we list all assignments with a single sysfs call?
> >>>>>>>
> >>>>>>> That was another problem we need to address.
> >>>>>>
> >>>>>> This is not a requirement I was aware of. If the user forgot where
> >>>>>> they assigned counters (or forgot to disable auto-assignment), they
> >>>>>> can read multiple sysfs nodes to remind themselves.
> >>>>>
> >>>>> I suggest, we should provide users with an option to list the assignments
> >>>>> of all groups in a single command. As the number of groups increases, it
> >>>>> becomes cumbersome to query each group individually.
> >>>>>
> >>>>> To achieve this, we can reuse our existing mbm_assign_control interface
> >>>>> for this purpose. More details on this below.
> >>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> The configuration names listed in assign_* would result in files of
> >>>>>>>> the same name in the appropriate mon_data domain directories from
> >>>>>>>> which the count values can be read.
> >>>>>>>>
> >>>>>>>>    # mkdir info/L3_MON/counter_configs/mbm_local_bytes
> >>>>>>>>    # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> >>>>>>>>    # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> >>>>>>>>    # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> >>>>>>>>    # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> >>>>>>>> LclFill
> >>>>>>>> LclNTWr
> >>>>>>>> LclSlowFill
> >>>>>>>
> >>>>>>> I feel we can just have the configs. event_filter file is not required.
> >>>>>>
> >>>>>> That's right, I forgot that we can implement kernfs_ops::open(). I was
> >>>>>> only looking at struct kernfs_syscall_ops
> >>>>>>
> >>>>>>>
> >>>>>>> #cat info/L3_MON/counter_configs/mbm_local_bytes
> >>>>>>> LclFill <-rename these to generic names.
> >>>>>>> LclNTWr
> >>>>>>> LclSlowFill
> >>>>>>>
> >>>>>>
> >>>>>> I think portable and non-portable event names should both be available
> >>>>>> as options. There are simple bandwidth measurement mechanisms that
> >>>>>> will be applied in general, but when they turn up an issue, it can
> >>>>>> often lead to a more focused investigation, requiring more precise
> >>>>>> events.
> >>>>>
> >>>>> I aggree. We should provide both portable and non-portable event names.
> >>>>>
> >>>>> Here is my draft proposal based on the discussion so far and reusing some
> >>>>> of the current interface. Idea here is to start with basic assigment
> >>>>> feature with options to enhance it in the future. Feel free to
> >>>>> comment/suggest.
> >>>>>
> >>>>> 1. Event configurations will be in
> >>>>>      /sys/fs/resctrl/info/L3_MON/counter_configs/.
> >>>>>
> >>>>>      There will be two pre-defined configurations by default.
> >>>>>
> >>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
> >>>>>      LclFill, LclNTWr,LclSlowFill,VictimBM,RmtSlowFill,LclSlowFill,RmtFill
> >>>>>
> >>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
> >>>>>      LclFill, LclNTWr, LclSlowFill
> >>>>>
> >>>>> 2. Users will have options to update these configurations.
> >>>>>
> >>>>>      #echo "LclFill, LclNTWr, RmtFill" >
> >>>>>         /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
> >>>
> >>> This part seems odd to me. Now the "mbm_local_bytes" files aren't
> >>> reporting "local_bytes" any more. They report something different,
> >>> and users only know if they come to check the options currently
> >>> configured in this file. Changing the contents without changing
> >>> the name seems confusing to me.
> >>
> >> It is the same behaviour right now with BMEC. It is configurable.
> >> By default it is mbm_local_bytes, but users can configure whatever they want to monitor using /info/L3_MON/mbm_local_bytes_config.
> >>
> >> We can continue the same behaviour with ABMC, but the configuration will be in /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes.
> > 
> > This could be supported by following Peter's original proposal where the name
> > of the counter configuration is provided by the user via a mkdir:
> > https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
> > 
> > As he mentioned there could be pre-populated mbm_local_bytes/mbm_total_bytes.
> 
> Sure. We can do that. I was thinking in the first phase, just provide the
> default pre-defined configuration and option to update the configuration.
> 
> We can add the mkdir support later. That way we can provide basic ABMC
> support without too much code complexity with mkdir support.
> 
> > 
> >>
> >>>
> >>>>>
> >>>>>      # #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
> >>>>>      LclFill, LclNTWr, RmtFill
> >>>>>
> >>>>> 3. The default configurations will be used when user mounts the resctrl.
> >>>>>
> >>>>>      mount  -t resctrl resctrl /sys/fs/resctrl/
> >>>>>      mkdir /sys/fs/resctrl/test/
> >>>>>
> >>>>> 4. The resctrl group/domains can be in one of these assingnment states.
> >>>>>      e: Exclusive
> >>>>>      s: Shared
> >>>>>      u: Unassigned
> >>>>>
> >>>>>      Exclusive mode is supported now. Shared mode will be supported in the
> >>>>> future.
> >>>>>
> >>>>> 5. We can use the current /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >>>>> to list the assignment state of all the groups.
> >>>>>
> >>>>>      Format:
> >>>>>      "<CTRL_MON group>/<MON group>/<confguration>:<domain_id>=<assign state>"
> >>>>>
> >>>>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >>>>>      test//mbm_total_bytes:0=e;1=e
> >>>>>      test//mbm_local_bytes:0=e;1=e
> >>>>>      //mbm_total_bytes:0=e;1=e
> >>>>>      //mbm_local_bytes:0=e;1=e
> > 
> > This would make mbm_assign_control even more unwieldy and quicker to exceed a
> > page of data (these examples never seem to reflect those AMD systems with the many
> > L3 domains). How to handle resctrl files larger than 4KB needs to be well understood
> > and solved when/if going this route.
> 
> This problem is not specific this series. I feel it is a generic problem
> to many of the semilar interfaces. I dont know how it is addressed. May
> have to investigate on this. Any pointers would be helpful.
> 
> 
> > 
> > There seems to be two opinions about this file at moment. Would it be possible to
> > summarize the discussion with pros/cons raised to make an informed selection?
> > I understand that Google as represented by Peter no longer requires/requests this
> > file but the motivation for this change seems new and does not seem to reduce the
> > original motivation for this file. We may also want to separate requirements for reading
> > from and writing to this file.
> 
> Yea. We can just use mbm_assign_control for reading the assignment states.
> 
> Summary: We have two proposals.
> 
> First one from Peter:
> 
> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
> 
> 
> Pros
> a.  Allows flexible creation of free-form names for assignable
> configurations, stored in info/L3_MON/counter_configs/.
> 
> b.  Events can be accessed using corresponding free-form names in the
> mon_data directory, making it clear to users what each event represents.
> 
> 
> Cons:
> a. Requires three separate files for assignment in each group
> (assign_exclusive, assign_shared, unassign), which might be excessive.
> 
> b. No built-in listing support, meaning users must query each group
> individually to check assignment states.

How big of a problem is this in reality? I'd assume that users of this
feature would only reassign counter attributes at some slow rate (set
up counters, measure for at least a few seconds, then set up for next
measurement). Cost to open/read/close a few hundred kernfs files isn't
very high. Biggest cost might be hogging the resctrl mutex which would
cause jitter in the tasks reading data from resctrl monitors.

Anyone doing this at scale should be able to keep track of what they set,
so wouldn't need to read at all. I'm not a big believer in "multiple
agents independently tweaking resctrl without knowledge of each other".

> 
> Second Proposal (Mine)
> 
> https://lore.kernel.org/lkml/a4ab53b5-03be-4299-8853-e86270d46f2e@amd.com/
> 
> Pros:
> 
> a. Maintains the flexibility of free-form names for assignable
> configurations (info/L3_MON/counter_configs/).
> 
> b. Events remain accessible via free-form names in mon_data, ensuring
> clarity on their purpose.
> 
> c. Adds the ability to list assignment states for all groups in a single
> command.
> 
> Cons:
> a.  Potential buffer overflow issues when handling a large number of
> groups and domains and code complexity to fix the issue.
> 
> 
> Third Option: A Hybrid Approach
> 
> We could combine elements from both proposals:
> 
> a. Retain the free-form naming approach for assignable configurations in
> info/L3_MON/counter_configs/.
> 
> b. Use the assignment method from the first proposal:
>    $mkdir test
>    $echo mbm_local_bytes > test/mon_data/mon_L3_00/assign_exclusive
> 
> c. Introduce listing support via the info/L3_MON/mbm_assign_control
> interface, enabling users to read assignment states for all groups in one
> place. Only reading support.
> 
> 
> > 
> >>>>>
> >>>>> 6. Users can modify the assignment state by writing to mbm_assign_control.
> >>>>>
> >>>>>      Format:
> >>>>>      “<CTRL_MON group>/<MON group>/<configuration>:<domain_id>=<assign state>”
> >>>>>
> >>>>>      #echo "test//mbm_local_bytes:0=e;1=e" >
> >>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >>>>>
> >>>>>      #echo "test//mbm_local_bytes:0=u;1=u" >
> >>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >>>>>
> >>>>>      # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >>>>>      test//mbm_total_bytes:0=u;1=u
> >>>>>      test//mbm_local_bytes:0=u;1=u
> >>>>>      //mbm_total_bytes:0=e;1=e
> >>>>>      //mbm_local_bytes:0=e;1=e
> >>>>>
> >>>>>      The corresponding events will be read in
> >>>>>
> >>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> >>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_total_bytes
> >>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> >>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
> >>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
> >>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
> >>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
> >>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
> >>>>>
> >>>>> 7. In the first stage, only two configurations(mbm_total_bytes and
> >>>>> mbm_local_bytes) will be supported.
> >>>>>
> >>>>> 8. In the future, there will be options to create multiple configurations
> >>>>> and corresponding directory will be created in
> >>>>> /sysf/fs/resctrl/test/mon_data/mon_L3_00/<configation name>.
> >>>
> >>> Would this be done by creating a new file in the /sys/fs/resctrl/info/L3_MON/counter_configs
> >>> directory? Like this:
> >>>
> >>> # echo "LclFill, LclNTWr, RmtFill" >
> >>>          /sys/fs/resctrl/info/L3_MON/counter_configs/cache_stuff
> >>>
> >>> This seems OK (dependent on the user picking meaningful names for
> >>> the set of attributes picked ... but if they want to name this
> >>> monitor file "brian" then they have to live with any confusion
> >>> that they bring on themselves).
> >>>
> >>> Would this involve an extension to kernfs? I don't see a function
> >>> pointer callback for file creation in kernfs_syscall_ops.
> >>>
> >>>>>
> >>>>
> >>>> I know you are all busy with multiple series going on parallel. I am still
> >>>> waiting for the inputs on this. It will be great if you can spend some time
> >>>> on this to see if we can find common ground on the interface.
> >>>>
> >>>> Thanks
> >>>> Babu
> >>>
> >>> -Tony
> >>>
> >>
> >>
> >> thanks
> >> Babu
> > 
> > Reinette
> > 
> > 
> 
> -- 
> Thanks
> Babu Moger

-Tony
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 11 months ago
Hi Tony,

On 3/11/25 1:53 PM, Luck, Tony wrote:
> On Tue, Mar 11, 2025 at 03:35:28PM -0500, Moger, Babu wrote:
>> Hi All,
>>
>> On 3/10/25 22:51, Reinette Chatre wrote:
>>>
>>>
>>> On 3/10/25 6:44 PM, Moger, Babu wrote:
>>>> Hi Tony,
>>>>
>>>> On 3/10/2025 6:22 PM, Luck, Tony wrote:
>>>>> On Mon, Mar 10, 2025 at 05:48:44PM -0500, Moger, Babu wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> On 3/5/2025 1:34 PM, Moger, Babu wrote:
>>>>>>> Hi Peter,
>>>>>>>
>>>>>>> On 3/5/25 04:40, Peter Newman wrote:
>>>>>>>> Hi Babu,
>>>>>>>>
>>>>>>>> On Tue, Mar 4, 2025 at 10:49 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi Peter,
>>>>>>>>>
>>>>>>>>> On 3/4/25 10:44, Peter Newman wrote:
>>>>>>>>>> On Mon, Mar 3, 2025 at 8:16 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Peter/Reinette,
>>>>>>>>>>>
>>>>>>>>>>> On 2/26/25 07:27, Peter Newman wrote:
>>>>>>>>>>>> Hi Babu,
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2/25/25 11:11, Peter Newman wrote:
>>>>>>>>>>>>>> Hi Reinette,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 2/21/25 5:12 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>>>>>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>>>>>>>>>>>>>>>>> <value>
>>>>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>>>>>>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>>>>>>>>>>>>>>>>> is low enough to be of concern.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>>>>>>>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>>>>>>>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>>>>>>>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>>>>>>>>>>>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>>>>>>>>>>>>>>>>> for.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>>>>>>>>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>>>>>>>>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>>>>>>>>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>>>>>>>>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>>>>>>>>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>>>>>>>>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>>>>>>>>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>>>>>>>>>>>>>>>>> customers.
>>>>>>>>>>>>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>>>>>>>>>>>>>>>>> event-set for applying to a single counter rather than as individual
>>>>>>>>>>>>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>>>>>>>>>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>>>>>>>>>>>>>>>>> event names.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thank you for clarifying.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> In the letters as events model, choosing the events assigned to a
>>>>>>>>>>>>>>>>>>>> group wouldn't be enough information, since we would want to control
>>>>>>>>>>>>>>>>>>>> which events should share a counter and which should be counted by
>>>>>>>>>>>>>>>>>>>> separate counters. I think the amount of information that would need
>>>>>>>>>>>>>>>>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>>>>>>>>>>>>>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>>>>>>>>>>>>>>>>> writes in ABMC would look like...
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>>>>>>>>>>>>>>>>> example and above the counter configuration appears to be global. You do mention
>>>>>>>>>>>>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>>>>>>>>>>>>>>>>> configuration is a requirement?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If it's global and we want a particular group to be watched by more
>>>>>>>>>>>>>>>>>> counters, I wouldn't want this to result in allocating more counters
>>>>>>>>>>>>>>>>>> for that group in all domains, or allocating counters in domains where
>>>>>>>>>>>>>>>>>> they're not needed. I want to encourage my users to avoid allocating
>>>>>>>>>>>>>>>>>> monitoring resources in domains where a job is not allowed to run so
>>>>>>>>>>>>>>>>>> there's less pressure on the counters.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In Dave's proposal it looks like global configuration means
>>>>>>>>>>>>>>>>>> globally-defined "named counter configurations", which works because
>>>>>>>>>>>>>>>>>> it's really per-domain assignment of the configurations to however
>>>>>>>>>>>>>>>>>> many counters the group needs in each domain.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think I am becoming lost. Would a global configuration not break your
>>>>>>>>>>>>>>>>> view of "event-set applied to a single counter"? If a counter is configured
>>>>>>>>>>>>>>>>> globally then it would not make it possible to support the full configurability
>>>>>>>>>>>>>>>>> of the hardware.
>>>>>>>>>>>>>>>>> Before I add more confusion, let me try with an example that builds on your
>>>>>>>>>>>>>>>>> earlier example copied below:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Since the above states "per domain" I rewrite the example to highlight that as
>>>>>>>>>>>>>>>>> I understand it:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You mention that you do not want counters to be allocated in domains that they
>>>>>>>>>>>>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>>>>>>>>>>>>>>>>> in domain 1, resulting in:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> With counter 0 and counter 1 available in domain 1, these counters could
>>>>>>>>>>>>>>>>> theoretically be configured to give group 1 more data in domain 1:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill
>>>>>>>>>>>>>>>>>     counter 1: LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>     counter 2: LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 3: VictimBW
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The counters are shown with different per-domain configurations that seems to
>>>>>>>>>>>>>>>>> match with earlier goals of (a) choose events counted by each counter and
>>>>>>>>>>>>>>>>> (b) do not allocate counters in domains where they are not needed. As I
>>>>>>>>>>>>>>>>> understand the above does contradict global counter configuration though.
>>>>>>>>>>>>>>>>> Or do you mean that only the *name* of the counter is global and then
>>>>>>>>>>>>>>>>> that it is reconfigured as part of every assignment?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, I meant only the *name* is global. I assume based on a particular
>>>>>>>>>>>>>>>> system configuration, the user will settle on a handful of useful
>>>>>>>>>>>>>>>> groupings to count.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    # define global configurations (in ABMC terms), not necessarily in this
>>>>>>>>>>>>>>>>    # syntax and probably not in the mbm_assign_control file.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>    w=VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    # legacy "total" configuration, effectively r+w
>>>>>>>>>>>>>>>>    t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    /group0/0=t;1=t
>>>>>>>>>>>>>>>>    /group1/0=t;1=t
>>>>>>>>>>>>>>>>    /group2/0=_;1=t
>>>>>>>>>>>>>>>>    /group3/0=rw;1=_
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - group2 is restricted to domain 0
>>>>>>>>>>>>>>>> - group3 is restricted to domain 1
>>>>>>>>>>>>>>>> - the rest are unrestricted
>>>>>>>>>>>>>>>> - In group3, we decided we need to separate read and write traffic
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I see. Thank you for the example.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> resctrl supports per-domain configurations with the following possible when
>>>>>>>>>>>>>>> using mbm_total_bytes_config and mbm_local_bytes_config:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>      /group0/0=t;1=t
>>>>>>>>>>>>>>>      /group1/0=t;1=t
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Even though the flags are identical in all domains, the assigned counters will
>>>>>>>>>>>>>>> be configured differently in each domain.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> With this supported by hardware and currently also supported by resctrl it seems
>>>>>>>>>>>>>>> reasonable to carry this forward to what will be supported next.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The hardware supports both a per-domain mode, where all groups in a
>>>>>>>>>>>>>> domain use the same configurations and are limited to two events per
>>>>>>>>>>>>>> group and a per-group mode where every group can be configured and
>>>>>>>>>>>>>> assigned freely. This series is using the legacy counter access mode
>>>>>>>>>>>>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
>>>>>>>>>>>>>> in the domain can be read. If we chose to read the assigned counter
>>>>>>>>>>>>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
>>>>>>>>>>>>>> rather than asking the hardware to find the counter by RMID, we would
>>>>>>>>>>>>>> not be limited to 2 counters per group/domain and the hardware would
>>>>>>>>>>>>>> have the same flexibility as on MPAM.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In extended mode, the contents of a specific counter can be read by
>>>>>>>>>>>>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
>>>>>>>>>>>>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
>>>>>>>>>>>>> QM_CTR will then return the contents of the specified counter.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It is documented below.
>>>>>>>>>>>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
>>>>>>>>>>>>>    Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
>>>>>>>>>>>>>
>>>>>>>>>>>>> We previously discussed this with you (off the public list) and I
>>>>>>>>>>>>> initially proposed the extended assignment mode.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, the extended mode allows greater flexibility by enabling multiple
>>>>>>>>>>>>> counters to be assigned to the same group, rather than being limited to
>>>>>>>>>>>>> just two.
>>>>>>>>>>>>>
>>>>>>>>>>>>> However, the challenge is that we currently lack the necessary interfaces
>>>>>>>>>>>>> to configure multiple events per group. Without these interfaces, the
>>>>>>>>>>>>> extended mode is not practical at this time.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Therefore, we ultimately agreed to use the legacy mode, as it does not
>>>>>>>>>>>>> require modifications to the existing interface, allowing us to continue
>>>>>>>>>>>>> using it as is.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (I might have said something confusing in my last messages because I
>>>>>>>>>>>>>> had forgotten that I switched to the extended assignment mode when
>>>>>>>>>>>>>> prototyping with soft-ABMC and MPAM.)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Forcing all groups on a domain to share the same 2 counter
>>>>>>>>>>>>>> configurations would not be acceptable for us, as the example I gave
>>>>>>>>>>>>>> earlier is one I've already been asked about.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don’t see this as a blocker. It should be considered an extension to the
>>>>>>>>>>>>> current ABMC series. We can easily build on top of this series once we
>>>>>>>>>>>>> finalize how to configure the multiple event interface for each group.
>>>>>>>>>>>>
>>>>>>>>>>>> I don't think it is, either. Only being able to use ABMC to assign
>>>>>>>>>>>> counters is fine for our use as an incremental step. My longer-term
>>>>>>>>>>>> concern is the domain-scoped mbm_total_bytes_config and
>>>>>>>>>>>> mbm_local_bytes_config files, but they were introduced with BMEC, so
>>>>>>>>>>>> there's already an expectation that the files are present when BMEC is
>>>>>>>>>>>> supported.
>>>>>>>>>>>>
>>>>>>>>>>>> On ABMC hardware that also supports BMEC, I'm concerned about enabling
>>>>>>>>>>>> ABMC when only the BMEC-style event configuration interface exists.
>>>>>>>>>>>> The scope of my issue is just whether enabling "full" ABMC support
>>>>>>>>>>>> will require an additional opt-in, since that could remove the BMEC
>>>>>>>>>>>> interface. If it does, it's something we can live with.
>>>>>>>>>>>
>>>>>>>>>>> As you know, this series is currently blocked without further feedback.
>>>>>>>>>>>
>>>>>>>>>>> I’d like to begin reworking these patches to incorporate Peter’s feedback.
>>>>>>>>>>> Any input or suggestions would be appreciated.
>>>>>>>>>>>
>>>>>>>>>>> Here’s what we’ve learned so far:
>>>>>>>>>>>
>>>>>>>>>>> 1. Assignments should be independent of BMEC.
>>>>>>>>>>> 2. We should be able to specify multiple event types to a counter (e.g.,
>>>>>>>>>>> read, write, victimBM, etc.). This is also called shared counter
>>>>>>>>>>> 3. There should be an option to assign events per domain.
>>>>>>>>>>> 4. Currently, only two counters can be assigned per group, but the design
>>>>>>>>>>> should allow flexibility to assign more in the future as the interface
>>>>>>>>>>> evolves.
>>>>>>>>>>> 5. Utilize the extended RMID read mode.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Here is my proposal using Peter's earlier example:
>>>>>>>>>>>
>>>>>>>>>>> # define event configurations
>>>>>>>>>>>
>>>>>>>>>>> ========================================================
>>>>>>>>>>> Bits    Mnemonics       Description
>>>>>>>>>>> ====   ========================================================
>>>>>>>>>>> 6       VictimBW        Dirty Victims from all types of memory
>>>>>>>>>>> 5       RmtSlowFill     Reads to slow memory in the non-local NUMA domain
>>>>>>>>>>> 4       LclSlowFill     Reads to slow memory in the local NUMA domain
>>>>>>>>>>> 3       RmtNTWr         Non-temporal writes to non-local NUMA domain
>>>>>>>>>>> 2       LclNTWr         Non-temporal writes to local NUMA domain
>>>>>>>>>>> 1       mtFill          Reads to memory in the non-local NUMA domain
>>>>>>>>>>> 0       LclFill         Reads to memory in the local NUMA domain
>>>>>>>>>>> ====    ========================================================
>>>>>>>>>>>
>>>>>>>>>>> #Define flags based on combination of above event types.
>>>>>>>>>>>
>>>>>>>>>>> t = LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>> l = LclFill, LclNTWr, LclSlowFill
>>>>>>>>>>> r = LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>> w = VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>> v = VictimBW
>>>>>>>>>>>
>>>>>>>>>>> Peter suggested the following format earlier :
>>>>>>>>>>>
>>>>>>>>>>> /group0/0=t;1=t
>>>>>>>>>>> /group1/0=t;1=t
>>>>>>>>>>> /group2/0=_;1=t
>>>>>>>>>>> /group3/0=rw;1=_
>>>>>>>>>>
>>>>>>>>>> After some inquiries within Google, it sounds like nobody has invested
>>>>>>>>>> much into the current mbm_assign_control format yet, so it would be
>>>>>>>>>> best to drop it and distribute the configuration around the filesystem
>>>>>>>>>> hierarchy[1], which should allow us to produce something more flexible
>>>>>>>>>> and cleaner to implement.
>>>>>>>>>>
>>>>>>>>>> Roughly what I had in mind:
>>>>>>>>>>
>>>>>>>>>> Use mkdir in a info/<resource>_MON subdirectory to create free-form
>>>>>>>>>> names for the assignable configurations rather than being restricted
>>>>>>>>>> to single letters.  In the resulting directory, populate a file where
>>>>>>>>>> we can specify the set of events the config should represent. I think
>>>>>>>>>> we should use symbolic names for the events rather than raw BMEC field
>>>>>>>>>> values. Moving forward we could come up with portable names for common
>>>>>>>>>> events and only support the BMEC names on AMD machines for users who
>>>>>>>>>> want specific events and don't care about portability.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I’m still processing this. Let me start with some initial questions.
>>>>>>>>>
>>>>>>>>> So, we are creating event configurations here, which seems reasonable.
>>>>>>>>>
>>>>>>>>> Yes, we should use portable names and are not limited to BMEC names.
>>>>>>>>>
>>>>>>>>> How many configurations should we allow? Do we know?
>>>>>>>>
>>>>>>>> Do we need an upper limit?
>>>>>>>
>>>>>>> I think so. This needs to be maintained in some data structure. We can
>>>>>>> start with 2 default configurations for now.
>>>
>>> There is a big difference between no upper limit and 2. The hardware is
>>> capable of supporting per-domain configurations so more flexibility is
>>> certainly possible. Consider the example presented by Peter in:
>>> https://lore.kernel.org/lkml/CALPaoCi0mFZ9TycyNs+SCR+2tuRJovQ2809jYMun4HtC64hJmA@mail.gmail.com/
>>>
>>>>>>>>>> Next, put assignment-control file nodes in per-domain directories
>>>>>>>>>> (i.e., mon_data/mon_L3_00/assign_{exclusive,shared}). Writing a
>>>>>>>>>> counter-configuration name into the file would then allocate a counter
>>>>>>>>>> in the domain, apply the named configuration, and monitor the parent
>>>>>>>>>> group-directory. We can also put a group/resource-scoped assign_* file
>>>>>>>>>> higher in the hierarchy to make it easier for users who want to
>>>>>>>>>> configure all domains the same for a group.
>>>>>>>>>
>>>>>>>>> What is the difference between shared and exclusive?
>>>>>>>>
>>>>>>>> Shared assignment[1] means that non-exclusively-assigned counters in
>>>>>>>> each domain will be scheduled round-robin to the groups requesting
>>>>>>>> shared access to a counter. In my tests, I assigned the counters long
>>>>>>>> enough to produce a single 1-second MB/s sample for the per-domain
>>>>>>>> aggregation files[2].
>>>>>>>>
>>>>>>>> These do not need to be implemented immediately, but knowing that they
>>>>>>>> work addresses the overhead and scalability concerns of reassigning
>>>>>>>> counters and reading their values.
>>>>>>>
>>>>>>> Ok. Lets focus on exclusive assignments for now.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Having three files—assign_shared, assign_exclusive, and unassign—for each
>>>>>>>>> domain seems excessive. In a system with 32 groups and 12 domains, this
>>>>>>>>> results in 32 × 12 × 3 files, which is quite large.
>>>>>>>>>
>>>>>>>>> There should be a more efficient way to handle this.
>>>>>>>>>
>>>>>>>>> Initially, we started with a group-level file for this interface, but it
>>>>>>>>> was rejected due to the high number of sysfs calls, making it inefficient.
>>>>>>>>
>>>>>>>> I had rejected it due to the high-frequency of access of a large
>>>>>>>> number of files, which has since been addressed by shared assignment
>>>>>>>> (or automatic reassignment) and aggregated mbps files.
>>>>>>>
>>>>>>> I think we should address this as well. Creating three extra files for
>>>>>>> each group isn’t ideal when there are more efficient alternatives.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Additionally, how can we list all assignments with a single sysfs call?
>>>>>>>>>
>>>>>>>>> That was another problem we need to address.
>>>>>>>>
>>>>>>>> This is not a requirement I was aware of. If the user forgot where
>>>>>>>> they assigned counters (or forgot to disable auto-assignment), they
>>>>>>>> can read multiple sysfs nodes to remind themselves.
>>>>>>>
>>>>>>> I suggest, we should provide users with an option to list the assignments
>>>>>>> of all groups in a single command. As the number of groups increases, it
>>>>>>> becomes cumbersome to query each group individually.
>>>>>>>
>>>>>>> To achieve this, we can reuse our existing mbm_assign_control interface
>>>>>>> for this purpose. More details on this below.
>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The configuration names listed in assign_* would result in files of
>>>>>>>>>> the same name in the appropriate mon_data domain directories from
>>>>>>>>>> which the count values can be read.
>>>>>>>>>>
>>>>>>>>>>    # mkdir info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>>>    # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>    # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>    # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>    # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>> LclFill
>>>>>>>>>> LclNTWr
>>>>>>>>>> LclSlowFill
>>>>>>>>>
>>>>>>>>> I feel we can just have the configs. event_filter file is not required.
>>>>>>>>
>>>>>>>> That's right, I forgot that we can implement kernfs_ops::open(). I was
>>>>>>>> only looking at struct kernfs_syscall_ops
>>>>>>>>
>>>>>>>>>
>>>>>>>>> #cat info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>> LclFill <-rename these to generic names.
>>>>>>>>> LclNTWr
>>>>>>>>> LclSlowFill
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think portable and non-portable event names should both be available
>>>>>>>> as options. There are simple bandwidth measurement mechanisms that
>>>>>>>> will be applied in general, but when they turn up an issue, it can
>>>>>>>> often lead to a more focused investigation, requiring more precise
>>>>>>>> events.
>>>>>>>
>>>>>>> I aggree. We should provide both portable and non-portable event names.
>>>>>>>
>>>>>>> Here is my draft proposal based on the discussion so far and reusing some
>>>>>>> of the current interface. Idea here is to start with basic assigment
>>>>>>> feature with options to enhance it in the future. Feel free to
>>>>>>> comment/suggest.
>>>>>>>
>>>>>>> 1. Event configurations will be in
>>>>>>>      /sys/fs/resctrl/info/L3_MON/counter_configs/.
>>>>>>>
>>>>>>>      There will be two pre-defined configurations by default.
>>>>>>>
>>>>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
>>>>>>>      LclFill, LclNTWr,LclSlowFill,VictimBM,RmtSlowFill,LclSlowFill,RmtFill
>>>>>>>
>>>>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>      LclFill, LclNTWr, LclSlowFill
>>>>>>>
>>>>>>> 2. Users will have options to update these configurations.
>>>>>>>
>>>>>>>      #echo "LclFill, LclNTWr, RmtFill" >
>>>>>>>         /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>
>>>>> This part seems odd to me. Now the "mbm_local_bytes" files aren't
>>>>> reporting "local_bytes" any more. They report something different,
>>>>> and users only know if they come to check the options currently
>>>>> configured in this file. Changing the contents without changing
>>>>> the name seems confusing to me.
>>>>
>>>> It is the same behaviour right now with BMEC. It is configurable.
>>>> By default it is mbm_local_bytes, but users can configure whatever they want to monitor using /info/L3_MON/mbm_local_bytes_config.
>>>>
>>>> We can continue the same behaviour with ABMC, but the configuration will be in /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes.
>>>
>>> This could be supported by following Peter's original proposal where the name
>>> of the counter configuration is provided by the user via a mkdir:
>>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>>
>>> As he mentioned there could be pre-populated mbm_local_bytes/mbm_total_bytes.
>>
>> Sure. We can do that. I was thinking in the first phase, just provide the
>> default pre-defined configuration and option to update the configuration.
>>
>> We can add the mkdir support later. That way we can provide basic ABMC
>> support without too much code complexity with mkdir support.
>>
>>>
>>>>
>>>>>
>>>>>>>
>>>>>>>      # #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>      LclFill, LclNTWr, RmtFill
>>>>>>>
>>>>>>> 3. The default configurations will be used when user mounts the resctrl.
>>>>>>>
>>>>>>>      mount  -t resctrl resctrl /sys/fs/resctrl/
>>>>>>>      mkdir /sys/fs/resctrl/test/
>>>>>>>
>>>>>>> 4. The resctrl group/domains can be in one of these assingnment states.
>>>>>>>      e: Exclusive
>>>>>>>      s: Shared
>>>>>>>      u: Unassigned
>>>>>>>
>>>>>>>      Exclusive mode is supported now. Shared mode will be supported in the
>>>>>>> future.
>>>>>>>
>>>>>>> 5. We can use the current /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>> to list the assignment state of all the groups.
>>>>>>>
>>>>>>>      Format:
>>>>>>>      "<CTRL_MON group>/<MON group>/<confguration>:<domain_id>=<assign state>"
>>>>>>>
>>>>>>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>      test//mbm_total_bytes:0=e;1=e
>>>>>>>      test//mbm_local_bytes:0=e;1=e
>>>>>>>      //mbm_total_bytes:0=e;1=e
>>>>>>>      //mbm_local_bytes:0=e;1=e
>>>
>>> This would make mbm_assign_control even more unwieldy and quicker to exceed a
>>> page of data (these examples never seem to reflect those AMD systems with the many
>>> L3 domains). How to handle resctrl files larger than 4KB needs to be well understood
>>> and solved when/if going this route.
>>
>> This problem is not specific this series. I feel it is a generic problem
>> to many of the semilar interfaces. I dont know how it is addressed. May
>> have to investigate on this. Any pointers would be helpful.
>>
>>
>>>
>>> There seems to be two opinions about this file at moment. Would it be possible to
>>> summarize the discussion with pros/cons raised to make an informed selection?
>>> I understand that Google as represented by Peter no longer requires/requests this
>>> file but the motivation for this change seems new and does not seem to reduce the
>>> original motivation for this file. We may also want to separate requirements for reading
>>> from and writing to this file.
>>
>> Yea. We can just use mbm_assign_control for reading the assignment states.
>>
>> Summary: We have two proposals.
>>
>> First one from Peter:
>>
>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>
>>
>> Pros
>> a.  Allows flexible creation of free-form names for assignable
>> configurations, stored in info/L3_MON/counter_configs/.
>>
>> b.  Events can be accessed using corresponding free-form names in the
>> mon_data directory, making it clear to users what each event represents.
>>
>>
>> Cons:
>> a. Requires three separate files for assignment in each group
>> (assign_exclusive, assign_shared, unassign), which might be excessive.
>>
>> b. No built-in listing support, meaning users must query each group
>> individually to check assignment states.
> 
> How big of a problem is this in reality? I'd assume that users of this
> feature would only reassign counter attributes at some slow rate (set
> up counters, measure for at least a few seconds, then set up for next
> measurement). Cost to open/read/close a few hundred kernfs files isn't
> very high. Biggest cost might be hogging the resctrl mutex which would
> cause jitter in the tasks reading data from resctrl monitors.

Good point. The length of holding the resctrl mutex should also be
considered when exploring the mbm_assign_control file. If a user attempts
to make many changes using a single file like that then holding the resctrl
mutex during entire configuration may also have a big impact. This may be
of more concern with the additional automation being added to resctrl, for
example the upcoming "shared assignment" that does automatic assignment of
counters.

> 
> Anyone doing this at scale should be able to keep track of what they set,
> so wouldn't need to read at all. I'm not a big believer in "multiple
> agents independently tweaking resctrl without knowledge of each other".
> 
>>
>> Second Proposal (Mine)
>>
>> https://lore.kernel.org/lkml/a4ab53b5-03be-4299-8853-e86270d46f2e@amd.com/
>>
>> Pros:
>>
>> a. Maintains the flexibility of free-form names for assignable
>> configurations (info/L3_MON/counter_configs/).
>>
>> b. Events remain accessible via free-form names in mon_data, ensuring
>> clarity on their purpose.
>>
>> c. Adds the ability to list assignment states for all groups in a single
>> command.
>>
>> Cons:
>> a.  Potential buffer overflow issues when handling a large number of
>> groups and domains and code complexity to fix the issue.
>>
>>
>> Third Option: A Hybrid Approach
>>
>> We could combine elements from both proposals:
>>
>> a. Retain the free-form naming approach for assignable configurations in
>> info/L3_MON/counter_configs/.
>>
>> b. Use the assignment method from the first proposal:
>>    $mkdir test
>>    $echo mbm_local_bytes > test/mon_data/mon_L3_00/assign_exclusive
>>
>> c. Introduce listing support via the info/L3_MON/mbm_assign_control
>> interface, enabling users to read assignment states for all groups in one
>> place. Only reading support.
>>
>>
>>>
>>>>>>>
>>>>>>> 6. Users can modify the assignment state by writing to mbm_assign_control.
>>>>>>>
>>>>>>>      Format:
>>>>>>>      “<CTRL_MON group>/<MON group>/<configuration>:<domain_id>=<assign state>”
>>>>>>>
>>>>>>>      #echo "test//mbm_local_bytes:0=e;1=e" >
>>>>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>
>>>>>>>      #echo "test//mbm_local_bytes:0=u;1=u" >
>>>>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>
>>>>>>>      # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>      test//mbm_total_bytes:0=u;1=u
>>>>>>>      test//mbm_local_bytes:0=u;1=u
>>>>>>>      //mbm_total_bytes:0=e;1=e
>>>>>>>      //mbm_local_bytes:0=e;1=e
>>>>>>>
>>>>>>>      The corresponding events will be read in
>>>>>>>
>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_total_bytes
>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
>>>>>>>
>>>>>>> 7. In the first stage, only two configurations(mbm_total_bytes and
>>>>>>> mbm_local_bytes) will be supported.
>>>>>>>
>>>>>>> 8. In the future, there will be options to create multiple configurations
>>>>>>> and corresponding directory will be created in
>>>>>>> /sysf/fs/resctrl/test/mon_data/mon_L3_00/<configation name>.
>>>>>
>>>>> Would this be done by creating a new file in the /sys/fs/resctrl/info/L3_MON/counter_configs
>>>>> directory? Like this:
>>>>>
>>>>> # echo "LclFill, LclNTWr, RmtFill" >
>>>>>          /sys/fs/resctrl/info/L3_MON/counter_configs/cache_stuff
>>>>>
>>>>> This seems OK (dependent on the user picking meaningful names for
>>>>> the set of attributes picked ... but if they want to name this
>>>>> monitor file "brian" then they have to live with any confusion
>>>>> that they bring on themselves).
>>>>>
>>>>> Would this involve an extension to kernfs? I don't see a function
>>>>> pointer callback for file creation in kernfs_syscall_ops.
>>>>>
>>>>>>>
>>>>>>
>>>>>> I know you are all busy with multiple series going on parallel. I am still
>>>>>> waiting for the inputs on this. It will be great if you can spend some time
>>>>>> on this to see if we can find common ground on the interface.
>>>>>>
>>>>>> Thanks
>>>>>> Babu
>>>>>
>>>>> -Tony
>>>>>
>>>>
>>>>
>>>> thanks
>>>> Babu
>>>
>>> Reinette
>>>
>>>
>>
>> -- 
>> Thanks
>> Babu Moger
> 
> -Tony

Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 11 months ago
Hi Tony,

On 3/11/25 15:53, Luck, Tony wrote:
> On Tue, Mar 11, 2025 at 03:35:28PM -0500, Moger, Babu wrote:
>> Hi All,
>>
>> On 3/10/25 22:51, Reinette Chatre wrote:
>>>
>>>
>>> On 3/10/25 6:44 PM, Moger, Babu wrote:
>>>> Hi Tony,
>>>>
>>>> On 3/10/2025 6:22 PM, Luck, Tony wrote:
>>>>> On Mon, Mar 10, 2025 at 05:48:44PM -0500, Moger, Babu wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> On 3/5/2025 1:34 PM, Moger, Babu wrote:
>>>>>>> Hi Peter,
>>>>>>>
>>>>>>> On 3/5/25 04:40, Peter Newman wrote:
>>>>>>>> Hi Babu,
>>>>>>>>
>>>>>>>> On Tue, Mar 4, 2025 at 10:49 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi Peter,
>>>>>>>>>
>>>>>>>>> On 3/4/25 10:44, Peter Newman wrote:
>>>>>>>>>> On Mon, Mar 3, 2025 at 8:16 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Peter/Reinette,
>>>>>>>>>>>
>>>>>>>>>>> On 2/26/25 07:27, Peter Newman wrote:
>>>>>>>>>>>> Hi Babu,
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2/25/25 11:11, Peter Newman wrote:
>>>>>>>>>>>>>> Hi Reinette,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 2/21/25 5:12 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>>>>>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>>>>>>>>>>>>>>>>> <value>
>>>>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>>>>>>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>>>>>>>>>>>>>>>>> is low enough to be of concern.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>>>>>>>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>>>>>>>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>>>>>>>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>>>>>>>>>>>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>>>>>>>>>>>>>>>>> for.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>>>>>>>>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>>>>>>>>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>>>>>>>>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>>>>>>>>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>>>>>>>>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>>>>>>>>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>>>>>>>>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>>>>>>>>>>>>>>>>> customers.
>>>>>>>>>>>>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>>>>>>>>>>>>>>>>> event-set for applying to a single counter rather than as individual
>>>>>>>>>>>>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>>>>>>>>>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>>>>>>>>>>>>>>>>> event names.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thank you for clarifying.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> In the letters as events model, choosing the events assigned to a
>>>>>>>>>>>>>>>>>>>> group wouldn't be enough information, since we would want to control
>>>>>>>>>>>>>>>>>>>> which events should share a counter and which should be counted by
>>>>>>>>>>>>>>>>>>>> separate counters. I think the amount of information that would need
>>>>>>>>>>>>>>>>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>>>>>>>>>>>>>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>>>>>>>>>>>>>>>>> writes in ABMC would look like...
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>>>>>>>>>>>>>>>>> example and above the counter configuration appears to be global. You do mention
>>>>>>>>>>>>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>>>>>>>>>>>>>>>>> configuration is a requirement?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If it's global and we want a particular group to be watched by more
>>>>>>>>>>>>>>>>>> counters, I wouldn't want this to result in allocating more counters
>>>>>>>>>>>>>>>>>> for that group in all domains, or allocating counters in domains where
>>>>>>>>>>>>>>>>>> they're not needed. I want to encourage my users to avoid allocating
>>>>>>>>>>>>>>>>>> monitoring resources in domains where a job is not allowed to run so
>>>>>>>>>>>>>>>>>> there's less pressure on the counters.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In Dave's proposal it looks like global configuration means
>>>>>>>>>>>>>>>>>> globally-defined "named counter configurations", which works because
>>>>>>>>>>>>>>>>>> it's really per-domain assignment of the configurations to however
>>>>>>>>>>>>>>>>>> many counters the group needs in each domain.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think I am becoming lost. Would a global configuration not break your
>>>>>>>>>>>>>>>>> view of "event-set applied to a single counter"? If a counter is configured
>>>>>>>>>>>>>>>>> globally then it would not make it possible to support the full configurability
>>>>>>>>>>>>>>>>> of the hardware.
>>>>>>>>>>>>>>>>> Before I add more confusion, let me try with an example that builds on your
>>>>>>>>>>>>>>>>> earlier example copied below:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>>    counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>    counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>>    counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>    counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Since the above states "per domain" I rewrite the example to highlight that as
>>>>>>>>>>>>>>>>> I understand it:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You mention that you do not want counters to be allocated in domains that they
>>>>>>>>>>>>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>>>>>>>>>>>>>>>>> in domain 1, resulting in:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> With counter 0 and counter 1 available in domain 1, these counters could
>>>>>>>>>>>>>>>>> theoretically be configured to give group 1 more data in domain 1:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>    domain 0:
>>>>>>>>>>>>>>>>>     counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>    domain 1:
>>>>>>>>>>>>>>>>>     counter 0: LclFill,RmtFill
>>>>>>>>>>>>>>>>>     counter 1: LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>     counter 2: LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>     counter 3: VictimBW
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The counters are shown with different per-domain configurations that seems to
>>>>>>>>>>>>>>>>> match with earlier goals of (a) choose events counted by each counter and
>>>>>>>>>>>>>>>>> (b) do not allocate counters in domains where they are not needed. As I
>>>>>>>>>>>>>>>>> understand the above does contradict global counter configuration though.
>>>>>>>>>>>>>>>>> Or do you mean that only the *name* of the counter is global and then
>>>>>>>>>>>>>>>>> that it is reconfigured as part of every assignment?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, I meant only the *name* is global. I assume based on a particular
>>>>>>>>>>>>>>>> system configuration, the user will settle on a handful of useful
>>>>>>>>>>>>>>>> groupings to count.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    # define global configurations (in ABMC terms), not necessarily in this
>>>>>>>>>>>>>>>>    # syntax and probably not in the mbm_assign_control file.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>    w=VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    # legacy "total" configuration, effectively r+w
>>>>>>>>>>>>>>>>    t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    /group0/0=t;1=t
>>>>>>>>>>>>>>>>    /group1/0=t;1=t
>>>>>>>>>>>>>>>>    /group2/0=_;1=t
>>>>>>>>>>>>>>>>    /group3/0=rw;1=_
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - group2 is restricted to domain 0
>>>>>>>>>>>>>>>> - group3 is restricted to domain 1
>>>>>>>>>>>>>>>> - the rest are unrestricted
>>>>>>>>>>>>>>>> - In group3, we decided we need to separate read and write traffic
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I see. Thank you for the example.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> resctrl supports per-domain configurations with the following possible when
>>>>>>>>>>>>>>> using mbm_total_bytes_config and mbm_local_bytes_config:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>      /group0/0=t;1=t
>>>>>>>>>>>>>>>      /group1/0=t;1=t
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Even though the flags are identical in all domains, the assigned counters will
>>>>>>>>>>>>>>> be configured differently in each domain.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> With this supported by hardware and currently also supported by resctrl it seems
>>>>>>>>>>>>>>> reasonable to carry this forward to what will be supported next.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The hardware supports both a per-domain mode, where all groups in a
>>>>>>>>>>>>>> domain use the same configurations and are limited to two events per
>>>>>>>>>>>>>> group and a per-group mode where every group can be configured and
>>>>>>>>>>>>>> assigned freely. This series is using the legacy counter access mode
>>>>>>>>>>>>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
>>>>>>>>>>>>>> in the domain can be read. If we chose to read the assigned counter
>>>>>>>>>>>>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
>>>>>>>>>>>>>> rather than asking the hardware to find the counter by RMID, we would
>>>>>>>>>>>>>> not be limited to 2 counters per group/domain and the hardware would
>>>>>>>>>>>>>> have the same flexibility as on MPAM.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In extended mode, the contents of a specific counter can be read by
>>>>>>>>>>>>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
>>>>>>>>>>>>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
>>>>>>>>>>>>> QM_CTR will then return the contents of the specified counter.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It is documented below.
>>>>>>>>>>>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
>>>>>>>>>>>>>    Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
>>>>>>>>>>>>>
>>>>>>>>>>>>> We previously discussed this with you (off the public list) and I
>>>>>>>>>>>>> initially proposed the extended assignment mode.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, the extended mode allows greater flexibility by enabling multiple
>>>>>>>>>>>>> counters to be assigned to the same group, rather than being limited to
>>>>>>>>>>>>> just two.
>>>>>>>>>>>>>
>>>>>>>>>>>>> However, the challenge is that we currently lack the necessary interfaces
>>>>>>>>>>>>> to configure multiple events per group. Without these interfaces, the
>>>>>>>>>>>>> extended mode is not practical at this time.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Therefore, we ultimately agreed to use the legacy mode, as it does not
>>>>>>>>>>>>> require modifications to the existing interface, allowing us to continue
>>>>>>>>>>>>> using it as is.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (I might have said something confusing in my last messages because I
>>>>>>>>>>>>>> had forgotten that I switched to the extended assignment mode when
>>>>>>>>>>>>>> prototyping with soft-ABMC and MPAM.)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Forcing all groups on a domain to share the same 2 counter
>>>>>>>>>>>>>> configurations would not be acceptable for us, as the example I gave
>>>>>>>>>>>>>> earlier is one I've already been asked about.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don’t see this as a blocker. It should be considered an extension to the
>>>>>>>>>>>>> current ABMC series. We can easily build on top of this series once we
>>>>>>>>>>>>> finalize how to configure the multiple event interface for each group.
>>>>>>>>>>>>
>>>>>>>>>>>> I don't think it is, either. Only being able to use ABMC to assign
>>>>>>>>>>>> counters is fine for our use as an incremental step. My longer-term
>>>>>>>>>>>> concern is the domain-scoped mbm_total_bytes_config and
>>>>>>>>>>>> mbm_local_bytes_config files, but they were introduced with BMEC, so
>>>>>>>>>>>> there's already an expectation that the files are present when BMEC is
>>>>>>>>>>>> supported.
>>>>>>>>>>>>
>>>>>>>>>>>> On ABMC hardware that also supports BMEC, I'm concerned about enabling
>>>>>>>>>>>> ABMC when only the BMEC-style event configuration interface exists.
>>>>>>>>>>>> The scope of my issue is just whether enabling "full" ABMC support
>>>>>>>>>>>> will require an additional opt-in, since that could remove the BMEC
>>>>>>>>>>>> interface. If it does, it's something we can live with.
>>>>>>>>>>>
>>>>>>>>>>> As you know, this series is currently blocked without further feedback.
>>>>>>>>>>>
>>>>>>>>>>> I’d like to begin reworking these patches to incorporate Peter’s feedback.
>>>>>>>>>>> Any input or suggestions would be appreciated.
>>>>>>>>>>>
>>>>>>>>>>> Here’s what we’ve learned so far:
>>>>>>>>>>>
>>>>>>>>>>> 1. Assignments should be independent of BMEC.
>>>>>>>>>>> 2. We should be able to specify multiple event types to a counter (e.g.,
>>>>>>>>>>> read, write, victimBM, etc.). This is also called shared counter
>>>>>>>>>>> 3. There should be an option to assign events per domain.
>>>>>>>>>>> 4. Currently, only two counters can be assigned per group, but the design
>>>>>>>>>>> should allow flexibility to assign more in the future as the interface
>>>>>>>>>>> evolves.
>>>>>>>>>>> 5. Utilize the extended RMID read mode.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Here is my proposal using Peter's earlier example:
>>>>>>>>>>>
>>>>>>>>>>> # define event configurations
>>>>>>>>>>>
>>>>>>>>>>> ========================================================
>>>>>>>>>>> Bits    Mnemonics       Description
>>>>>>>>>>> ====   ========================================================
>>>>>>>>>>> 6       VictimBW        Dirty Victims from all types of memory
>>>>>>>>>>> 5       RmtSlowFill     Reads to slow memory in the non-local NUMA domain
>>>>>>>>>>> 4       LclSlowFill     Reads to slow memory in the local NUMA domain
>>>>>>>>>>> 3       RmtNTWr         Non-temporal writes to non-local NUMA domain
>>>>>>>>>>> 2       LclNTWr         Non-temporal writes to local NUMA domain
>>>>>>>>>>> 1       mtFill          Reads to memory in the non-local NUMA domain
>>>>>>>>>>> 0       LclFill         Reads to memory in the local NUMA domain
>>>>>>>>>>> ====    ========================================================
>>>>>>>>>>>
>>>>>>>>>>> #Define flags based on combination of above event types.
>>>>>>>>>>>
>>>>>>>>>>> t = LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>> l = LclFill, LclNTWr, LclSlowFill
>>>>>>>>>>> r = LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>> w = VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>> v = VictimBW
>>>>>>>>>>>
>>>>>>>>>>> Peter suggested the following format earlier :
>>>>>>>>>>>
>>>>>>>>>>> /group0/0=t;1=t
>>>>>>>>>>> /group1/0=t;1=t
>>>>>>>>>>> /group2/0=_;1=t
>>>>>>>>>>> /group3/0=rw;1=_
>>>>>>>>>>
>>>>>>>>>> After some inquiries within Google, it sounds like nobody has invested
>>>>>>>>>> much into the current mbm_assign_control format yet, so it would be
>>>>>>>>>> best to drop it and distribute the configuration around the filesystem
>>>>>>>>>> hierarchy[1], which should allow us to produce something more flexible
>>>>>>>>>> and cleaner to implement.
>>>>>>>>>>
>>>>>>>>>> Roughly what I had in mind:
>>>>>>>>>>
>>>>>>>>>> Use mkdir in a info/<resource>_MON subdirectory to create free-form
>>>>>>>>>> names for the assignable configurations rather than being restricted
>>>>>>>>>> to single letters.  In the resulting directory, populate a file where
>>>>>>>>>> we can specify the set of events the config should represent. I think
>>>>>>>>>> we should use symbolic names for the events rather than raw BMEC field
>>>>>>>>>> values. Moving forward we could come up with portable names for common
>>>>>>>>>> events and only support the BMEC names on AMD machines for users who
>>>>>>>>>> want specific events and don't care about portability.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I’m still processing this. Let me start with some initial questions.
>>>>>>>>>
>>>>>>>>> So, we are creating event configurations here, which seems reasonable.
>>>>>>>>>
>>>>>>>>> Yes, we should use portable names and are not limited to BMEC names.
>>>>>>>>>
>>>>>>>>> How many configurations should we allow? Do we know?
>>>>>>>>
>>>>>>>> Do we need an upper limit?
>>>>>>>
>>>>>>> I think so. This needs to be maintained in some data structure. We can
>>>>>>> start with 2 default configurations for now.
>>>
>>> There is a big difference between no upper limit and 2. The hardware is
>>> capable of supporting per-domain configurations so more flexibility is
>>> certainly possible. Consider the example presented by Peter in:
>>> https://lore.kernel.org/lkml/CALPaoCi0mFZ9TycyNs+SCR+2tuRJovQ2809jYMun4HtC64hJmA@mail.gmail.com/
>>>
>>>>>>>>>> Next, put assignment-control file nodes in per-domain directories
>>>>>>>>>> (i.e., mon_data/mon_L3_00/assign_{exclusive,shared}). Writing a
>>>>>>>>>> counter-configuration name into the file would then allocate a counter
>>>>>>>>>> in the domain, apply the named configuration, and monitor the parent
>>>>>>>>>> group-directory. We can also put a group/resource-scoped assign_* file
>>>>>>>>>> higher in the hierarchy to make it easier for users who want to
>>>>>>>>>> configure all domains the same for a group.
>>>>>>>>>
>>>>>>>>> What is the difference between shared and exclusive?
>>>>>>>>
>>>>>>>> Shared assignment[1] means that non-exclusively-assigned counters in
>>>>>>>> each domain will be scheduled round-robin to the groups requesting
>>>>>>>> shared access to a counter. In my tests, I assigned the counters long
>>>>>>>> enough to produce a single 1-second MB/s sample for the per-domain
>>>>>>>> aggregation files[2].
>>>>>>>>
>>>>>>>> These do not need to be implemented immediately, but knowing that they
>>>>>>>> work addresses the overhead and scalability concerns of reassigning
>>>>>>>> counters and reading their values.
>>>>>>>
>>>>>>> Ok. Lets focus on exclusive assignments for now.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Having three files—assign_shared, assign_exclusive, and unassign—for each
>>>>>>>>> domain seems excessive. In a system with 32 groups and 12 domains, this
>>>>>>>>> results in 32 × 12 × 3 files, which is quite large.
>>>>>>>>>
>>>>>>>>> There should be a more efficient way to handle this.
>>>>>>>>>
>>>>>>>>> Initially, we started with a group-level file for this interface, but it
>>>>>>>>> was rejected due to the high number of sysfs calls, making it inefficient.
>>>>>>>>
>>>>>>>> I had rejected it due to the high-frequency of access of a large
>>>>>>>> number of files, which has since been addressed by shared assignment
>>>>>>>> (or automatic reassignment) and aggregated mbps files.
>>>>>>>
>>>>>>> I think we should address this as well. Creating three extra files for
>>>>>>> each group isn’t ideal when there are more efficient alternatives.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Additionally, how can we list all assignments with a single sysfs call?
>>>>>>>>>
>>>>>>>>> That was another problem we need to address.
>>>>>>>>
>>>>>>>> This is not a requirement I was aware of. If the user forgot where
>>>>>>>> they assigned counters (or forgot to disable auto-assignment), they
>>>>>>>> can read multiple sysfs nodes to remind themselves.
>>>>>>>
>>>>>>> I suggest, we should provide users with an option to list the assignments
>>>>>>> of all groups in a single command. As the number of groups increases, it
>>>>>>> becomes cumbersome to query each group individually.
>>>>>>>
>>>>>>> To achieve this, we can reuse our existing mbm_assign_control interface
>>>>>>> for this purpose. More details on this below.
>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The configuration names listed in assign_* would result in files of
>>>>>>>>>> the same name in the appropriate mon_data domain directories from
>>>>>>>>>> which the count values can be read.
>>>>>>>>>>
>>>>>>>>>>    # mkdir info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>>>    # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>    # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>    # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>    # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>> LclFill
>>>>>>>>>> LclNTWr
>>>>>>>>>> LclSlowFill
>>>>>>>>>
>>>>>>>>> I feel we can just have the configs. event_filter file is not required.
>>>>>>>>
>>>>>>>> That's right, I forgot that we can implement kernfs_ops::open(). I was
>>>>>>>> only looking at struct kernfs_syscall_ops
>>>>>>>>
>>>>>>>>>
>>>>>>>>> #cat info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>> LclFill <-rename these to generic names.
>>>>>>>>> LclNTWr
>>>>>>>>> LclSlowFill
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think portable and non-portable event names should both be available
>>>>>>>> as options. There are simple bandwidth measurement mechanisms that
>>>>>>>> will be applied in general, but when they turn up an issue, it can
>>>>>>>> often lead to a more focused investigation, requiring more precise
>>>>>>>> events.
>>>>>>>
>>>>>>> I aggree. We should provide both portable and non-portable event names.
>>>>>>>
>>>>>>> Here is my draft proposal based on the discussion so far and reusing some
>>>>>>> of the current interface. Idea here is to start with basic assigment
>>>>>>> feature with options to enhance it in the future. Feel free to
>>>>>>> comment/suggest.
>>>>>>>
>>>>>>> 1. Event configurations will be in
>>>>>>>      /sys/fs/resctrl/info/L3_MON/counter_configs/.
>>>>>>>
>>>>>>>      There will be two pre-defined configurations by default.
>>>>>>>
>>>>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
>>>>>>>      LclFill, LclNTWr,LclSlowFill,VictimBM,RmtSlowFill,LclSlowFill,RmtFill
>>>>>>>
>>>>>>>      #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>      LclFill, LclNTWr, LclSlowFill
>>>>>>>
>>>>>>> 2. Users will have options to update these configurations.
>>>>>>>
>>>>>>>      #echo "LclFill, LclNTWr, RmtFill" >
>>>>>>>         /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>
>>>>> This part seems odd to me. Now the "mbm_local_bytes" files aren't
>>>>> reporting "local_bytes" any more. They report something different,
>>>>> and users only know if they come to check the options currently
>>>>> configured in this file. Changing the contents without changing
>>>>> the name seems confusing to me.
>>>>
>>>> It is the same behaviour right now with BMEC. It is configurable.
>>>> By default it is mbm_local_bytes, but users can configure whatever they want to monitor using /info/L3_MON/mbm_local_bytes_config.
>>>>
>>>> We can continue the same behaviour with ABMC, but the configuration will be in /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes.
>>>
>>> This could be supported by following Peter's original proposal where the name
>>> of the counter configuration is provided by the user via a mkdir:
>>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>>
>>> As he mentioned there could be pre-populated mbm_local_bytes/mbm_total_bytes.
>>
>> Sure. We can do that. I was thinking in the first phase, just provide the
>> default pre-defined configuration and option to update the configuration.
>>
>> We can add the mkdir support later. That way we can provide basic ABMC
>> support without too much code complexity with mkdir support.
>>
>>>
>>>>
>>>>>
>>>>>>>
>>>>>>>      # #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>      LclFill, LclNTWr, RmtFill
>>>>>>>
>>>>>>> 3. The default configurations will be used when user mounts the resctrl.
>>>>>>>
>>>>>>>      mount  -t resctrl resctrl /sys/fs/resctrl/
>>>>>>>      mkdir /sys/fs/resctrl/test/
>>>>>>>
>>>>>>> 4. The resctrl group/domains can be in one of these assingnment states.
>>>>>>>      e: Exclusive
>>>>>>>      s: Shared
>>>>>>>      u: Unassigned
>>>>>>>
>>>>>>>      Exclusive mode is supported now. Shared mode will be supported in the
>>>>>>> future.
>>>>>>>
>>>>>>> 5. We can use the current /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>> to list the assignment state of all the groups.
>>>>>>>
>>>>>>>      Format:
>>>>>>>      "<CTRL_MON group>/<MON group>/<confguration>:<domain_id>=<assign state>"
>>>>>>>
>>>>>>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>      test//mbm_total_bytes:0=e;1=e
>>>>>>>      test//mbm_local_bytes:0=e;1=e
>>>>>>>      //mbm_total_bytes:0=e;1=e
>>>>>>>      //mbm_local_bytes:0=e;1=e
>>>
>>> This would make mbm_assign_control even more unwieldy and quicker to exceed a
>>> page of data (these examples never seem to reflect those AMD systems with the many
>>> L3 domains). How to handle resctrl files larger than 4KB needs to be well understood
>>> and solved when/if going this route.
>>
>> This problem is not specific this series. I feel it is a generic problem
>> to many of the semilar interfaces. I dont know how it is addressed. May
>> have to investigate on this. Any pointers would be helpful.
>>
>>
>>>
>>> There seems to be two opinions about this file at moment. Would it be possible to
>>> summarize the discussion with pros/cons raised to make an informed selection?
>>> I understand that Google as represented by Peter no longer requires/requests this
>>> file but the motivation for this change seems new and does not seem to reduce the
>>> original motivation for this file. We may also want to separate requirements for reading
>>> from and writing to this file.
>>
>> Yea. We can just use mbm_assign_control for reading the assignment states.
>>
>> Summary: We have two proposals.
>>
>> First one from Peter:
>>
>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>
>>
>> Pros
>> a.  Allows flexible creation of free-form names for assignable
>> configurations, stored in info/L3_MON/counter_configs/.
>>
>> b.  Events can be accessed using corresponding free-form names in the
>> mon_data directory, making it clear to users what each event represents.
>>
>>
>> Cons:
>> a. Requires three separate files for assignment in each group
>> (assign_exclusive, assign_shared, unassign), which might be excessive.
>>
>> b. No built-in listing support, meaning users must query each group
>> individually to check assignment states.
> 
> How big of a problem is this in reality? I'd assume that users of this
> feature would only reassign counter attributes at some slow rate (set
> up counters, measure for at least a few seconds, then set up for next
> measurement). Cost to open/read/close a few hundred kernfs files isn't
> very high. Biggest cost might be hogging the resctrl mutex which would
> cause jitter in the tasks reading data from resctrl monitors.

Yes. That is a good point. Dont know how big the problem it is.

But we all need to aggre that group listing is not requirement. We can go
ahead that route.

Lets hear from all the parties.

> 
> Anyone doing this at scale should be able to keep track of what they set,
> so wouldn't need to read at all. I'm not a big believer in "multiple
> agents independently tweaking resctrl without knowledge of each other".
> 
>>
>> Second Proposal (Mine)
>>
>> https://lore.kernel.org/lkml/a4ab53b5-03be-4299-8853-e86270d46f2e@amd.com/
>>
>> Pros:
>>
>> a. Maintains the flexibility of free-form names for assignable
>> configurations (info/L3_MON/counter_configs/).
>>
>> b. Events remain accessible via free-form names in mon_data, ensuring
>> clarity on their purpose.
>>
>> c. Adds the ability to list assignment states for all groups in a single
>> command.
>>
>> Cons:
>> a.  Potential buffer overflow issues when handling a large number of
>> groups and domains and code complexity to fix the issue.
>>
>>
>> Third Option: A Hybrid Approach
>>
>> We could combine elements from both proposals:
>>
>> a. Retain the free-form naming approach for assignable configurations in
>> info/L3_MON/counter_configs/.
>>
>> b. Use the assignment method from the first proposal:
>>    $mkdir test
>>    $echo mbm_local_bytes > test/mon_data/mon_L3_00/assign_exclusive
>>
>> c. Introduce listing support via the info/L3_MON/mbm_assign_control
>> interface, enabling users to read assignment states for all groups in one
>> place. Only reading support.
>>
>>
>>>
>>>>>>>
>>>>>>> 6. Users can modify the assignment state by writing to mbm_assign_control.
>>>>>>>
>>>>>>>      Format:
>>>>>>>      “<CTRL_MON group>/<MON group>/<configuration>:<domain_id>=<assign state>”
>>>>>>>
>>>>>>>      #echo "test//mbm_local_bytes:0=e;1=e" >
>>>>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>
>>>>>>>      #echo "test//mbm_local_bytes:0=u;1=u" >
>>>>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>
>>>>>>>      # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>      test//mbm_total_bytes:0=u;1=u
>>>>>>>      test//mbm_local_bytes:0=u;1=u
>>>>>>>      //mbm_total_bytes:0=e;1=e
>>>>>>>      //mbm_local_bytes:0=e;1=e
>>>>>>>
>>>>>>>      The corresponding events will be read in
>>>>>>>
>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_total_bytes
>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>      /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>      /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
>>>>>>>
>>>>>>> 7. In the first stage, only two configurations(mbm_total_bytes and
>>>>>>> mbm_local_bytes) will be supported.
>>>>>>>
>>>>>>> 8. In the future, there will be options to create multiple configurations
>>>>>>> and corresponding directory will be created in
>>>>>>> /sysf/fs/resctrl/test/mon_data/mon_L3_00/<configation name>.
>>>>>
>>>>> Would this be done by creating a new file in the /sys/fs/resctrl/info/L3_MON/counter_configs
>>>>> directory? Like this:
>>>>>
>>>>> # echo "LclFill, LclNTWr, RmtFill" >
>>>>>          /sys/fs/resctrl/info/L3_MON/counter_configs/cache_stuff
>>>>>
>>>>> This seems OK (dependent on the user picking meaningful names for
>>>>> the set of attributes picked ... but if they want to name this
>>>>> monitor file "brian" then they have to live with any confusion
>>>>> that they bring on themselves).
>>>>>
>>>>> Would this involve an extension to kernfs? I don't see a function
>>>>> pointer callback for file creation in kernfs_syscall_ops.
>>>>>
>>>>>>>
>>>>>>
>>>>>> I know you are all busy with multiple series going on parallel. I am still
>>>>>> waiting for the inputs on this. It will be great if you can spend some time
>>>>>> on this to see if we can find common ground on the interface.
>>>>>>
>>>>>> Thanks
>>>>>> Babu
>>>>>
>>>>> -Tony
>>>>>
>>>>
>>>>
>>>> thanks
>>>> Babu
>>>
>>> Reinette
>>>
>>>
>>
>> -- 
>> Thanks
>> Babu Moger
> 
> -Tony
> 

-- 
Thanks
Babu Moger
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 11 months, 2 weeks ago
Hi Peter,

On 2/26/25 5:27 AM, Peter Newman wrote:
> Hi Babu,
> 
> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@amd.com> wrote:
>>
>> Hi Peter,
>>
>> On 2/25/25 11:11, Peter Newman wrote:
>>> Hi Reinette,
>>>
>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
>>> <reinette.chatre@intel.com> wrote:
>>>>
>>>> Hi Peter,
>>>>
>>>> On 2/21/25 5:12 AM, Peter Newman wrote:
>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
>>>>> <reinette.chatre@intel.com> wrote:
>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>>>>>> <value>
>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>>>>>
>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>>>>>> is low enough to be of concern.
>>>>>>>>>>>
>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>>>>>> for.
>>>>>>>>>>
>>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>>>>>
>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>>>>>> customers.
>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>>>>>
>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>>>>>> event-set for applying to a single counter rather than as individual
>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>>>>>> event names.
>>>>>>>>
>>>>>>>> Thank you for clarifying.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> In the letters as events model, choosing the events assigned to a
>>>>>>>>> group wouldn't be enough information, since we would want to control
>>>>>>>>> which events should share a counter and which should be counted by
>>>>>>>>> separate counters. I think the amount of information that would need
>>>>>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>>>>>
>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>>>>>> writes in ABMC would look like...
>>>>>>>>>
>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>>>>>
>>>>>>>>> (per domain)
>>>>>>>>> group 0:
>>>>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>> group 1:
>>>>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>>>>>> example and above the counter configuration appears to be global. You do mention
>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>>>>>> configuration is a requirement?
>>>>>>>
>>>>>>> If it's global and we want a particular group to be watched by more
>>>>>>> counters, I wouldn't want this to result in allocating more counters
>>>>>>> for that group in all domains, or allocating counters in domains where
>>>>>>> they're not needed. I want to encourage my users to avoid allocating
>>>>>>> monitoring resources in domains where a job is not allowed to run so
>>>>>>> there's less pressure on the counters.
>>>>>>>
>>>>>>> In Dave's proposal it looks like global configuration means
>>>>>>> globally-defined "named counter configurations", which works because
>>>>>>> it's really per-domain assignment of the configurations to however
>>>>>>> many counters the group needs in each domain.
>>>>>>
>>>>>> I think I am becoming lost. Would a global configuration not break your
>>>>>> view of "event-set applied to a single counter"? If a counter is configured
>>>>>> globally then it would not make it possible to support the full configurability
>>>>>> of the hardware.
>>>>>> Before I add more confusion, let me try with an example that builds on your
>>>>>> earlier example copied below:
>>>>>>
>>>>>>>>> (per domain)
>>>>>>>>> group 0:
>>>>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>> group 1:
>>>>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>> ...
>>>>>>
>>>>>> Since the above states "per domain" I rewrite the example to highlight that as
>>>>>> I understand it:
>>>>>>
>>>>>> group 0:
>>>>>>  domain 0:
>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>  domain 1:
>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>> group 1:
>>>>>>  domain 0:
>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>  domain 1:
>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>
>>>>>> You mention that you do not want counters to be allocated in domains that they
>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>>>>>> in domain 1, resulting in:
>>>>>>
>>>>>> group 0:
>>>>>>  domain 0:
>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>> group 1:
>>>>>>  domain 0:
>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>  domain 1:
>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>
>>>>>> With counter 0 and counter 1 available in domain 1, these counters could
>>>>>> theoretically be configured to give group 1 more data in domain 1:
>>>>>>
>>>>>> group 0:
>>>>>>  domain 0:
>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>> group 1:
>>>>>>  domain 0:
>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>  domain 1:
>>>>>>   counter 0: LclFill,RmtFill
>>>>>>   counter 1: LclNTWr,RmtNTWr
>>>>>>   counter 2: LclSlowFill,RmtSlowFill
>>>>>>   counter 3: VictimBW
>>>>>>
>>>>>> The counters are shown with different per-domain configurations that seems to
>>>>>> match with earlier goals of (a) choose events counted by each counter and
>>>>>> (b) do not allocate counters in domains where they are not needed. As I
>>>>>> understand the above does contradict global counter configuration though.
>>>>>> Or do you mean that only the *name* of the counter is global and then
>>>>>> that it is reconfigured as part of every assignment?
>>>>>
>>>>> Yes, I meant only the *name* is global. I assume based on a particular
>>>>> system configuration, the user will settle on a handful of useful
>>>>> groupings to count.
>>>>>
>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
>>>>>
>>>>>  # define global configurations (in ABMC terms), not necessarily in this
>>>>>  # syntax and probably not in the mbm_assign_control file.
>>>>>
>>>>>  r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>  w=VictimBW,LclNTWr,RmtNTWr
>>>>>
>>>>>  # legacy "total" configuration, effectively r+w
>>>>>  t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>
>>>>>  /group0/0=t;1=t
>>>>>  /group1/0=t;1=t
>>>>>  /group2/0=_;1=t
>>>>>  /group3/0=rw;1=_
>>>>>
>>>>> - group2 is restricted to domain 0
>>>>> - group3 is restricted to domain 1
>>>>> - the rest are unrestricted
>>>>> - In group3, we decided we need to separate read and write traffic
>>>>>
>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
>>>>>
>>>>
>>>> I see. Thank you for the example.
>>>>
>>>> resctrl supports per-domain configurations with the following possible when
>>>> using mbm_total_bytes_config and mbm_local_bytes_config:
>>>>
>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>>>>
>>>>    /group0/0=t;1=t
>>>>    /group1/0=t;1=t
>>>>
>>>> Even though the flags are identical in all domains, the assigned counters will
>>>> be configured differently in each domain.
>>>>
>>>> With this supported by hardware and currently also supported by resctrl it seems
>>>> reasonable to carry this forward to what will be supported next.
>>>
>>> The hardware supports both a per-domain mode, where all groups in a
>>> domain use the same configurations and are limited to two events per
>>> group and a per-group mode where every group can be configured and
>>> assigned freely. This series is using the legacy counter access mode
>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
>>> in the domain can be read. If we chose to read the assigned counter
>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
>>> rather than asking the hardware to find the counter by RMID, we would
>>> not be limited to 2 counters per group/domain and the hardware would
>>> have the same flexibility as on MPAM.
>>
>> In extended mode, the contents of a specific counter can be read by
>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
>> QM_CTR will then return the contents of the specified counter.
>>
>> It is documented below.
>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
>>  Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
>>
>> We previously discussed this with you (off the public list) and I
>> initially proposed the extended assignment mode.
>>
>> Yes, the extended mode allows greater flexibility by enabling multiple
>> counters to be assigned to the same group, rather than being limited to
>> just two.
>>
>> However, the challenge is that we currently lack the necessary interfaces
>> to configure multiple events per group. Without these interfaces, the
>> extended mode is not practical at this time.
>>
>> Therefore, we ultimately agreed to use the legacy mode, as it does not
>> require modifications to the existing interface, allowing us to continue
>> using it as is.
>>
>>>
>>> (I might have said something confusing in my last messages because I
>>> had forgotten that I switched to the extended assignment mode when
>>> prototyping with soft-ABMC and MPAM.)
>>>
>>> Forcing all groups on a domain to share the same 2 counter
>>> configurations would not be acceptable for us, as the example I gave
>>> earlier is one I've already been asked about.
>>
>> I don’t see this as a blocker. It should be considered an extension to the
>> current ABMC series. We can easily build on top of this series once we
>> finalize how to configure the multiple event interface for each group.
> 
> I don't think it is, either. Only being able to use ABMC to assign
> counters is fine for our use as an incremental step. My longer-term
> concern is the domain-scoped mbm_total_bytes_config and
> mbm_local_bytes_config files, but they were introduced with BMEC, so
> there's already an expectation that the files are present when BMEC is
> supported.
> 
> On ABMC hardware that also supports BMEC, I'm concerned about enabling
> ABMC when only the BMEC-style event configuration interface exists.

ABMC currently depends on BMEC making the current implementation the
one you are concerned about?
https://lore.kernel.org/lkml/e4111779ebb0e7004dbedc258eeae2677f578ab1.1737577229.git.babu.moger@amd.com/

> The scope of my issue is just whether enabling "full" ABMC support
> will require an additional opt-in, since that could remove the BMEC
> interface. If it does, it's something we can live with.


Reinette

Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 11 months, 2 weeks ago
Hi Peter/Reinette,

On 2/26/25 10:25, Reinette Chatre wrote:
> Hi Peter,
> 
> On 2/26/25 5:27 AM, Peter Newman wrote:
>> Hi Babu,
>>
>> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@amd.com> wrote:
>>>
>>> Hi Peter,
>>>
>>> On 2/25/25 11:11, Peter Newman wrote:
>>>> Hi Reinette,
>>>>
>>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
>>>> <reinette.chatre@intel.com> wrote:
>>>>>
>>>>> Hi Peter,
>>>>>
>>>>> On 2/21/25 5:12 AM, Peter Newman wrote:
>>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>>>>>>> <reinette.chatre@intel.com> wrote:
>>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>>>>>>> <value>
>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>>>>>>
>>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>>>>>>> is low enough to be of concern.
>>>>>>>>>>>>
>>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>>>>>>> for.
>>>>>>>>>>>
>>>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>>>>>>
>>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>>>>>>> customers.
>>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>>>>>>
>>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>>>>>>> event-set for applying to a single counter rather than as individual
>>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>>>>>>> event names.
>>>>>>>>>
>>>>>>>>> Thank you for clarifying.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In the letters as events model, choosing the events assigned to a
>>>>>>>>>> group wouldn't be enough information, since we would want to control
>>>>>>>>>> which events should share a counter and which should be counted by
>>>>>>>>>> separate counters. I think the amount of information that would need
>>>>>>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>>>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>>>>>>
>>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>>>>>>> writes in ABMC would look like...
>>>>>>>>>>
>>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>>>>>>
>>>>>>>>>> (per domain)
>>>>>>>>>> group 0:
>>>>>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>> group 1:
>>>>>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>> ...
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>>>>>>> example and above the counter configuration appears to be global. You do mention
>>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>>>>>>> configuration is a requirement?
>>>>>>>>
>>>>>>>> If it's global and we want a particular group to be watched by more
>>>>>>>> counters, I wouldn't want this to result in allocating more counters
>>>>>>>> for that group in all domains, or allocating counters in domains where
>>>>>>>> they're not needed. I want to encourage my users to avoid allocating
>>>>>>>> monitoring resources in domains where a job is not allowed to run so
>>>>>>>> there's less pressure on the counters.
>>>>>>>>
>>>>>>>> In Dave's proposal it looks like global configuration means
>>>>>>>> globally-defined "named counter configurations", which works because
>>>>>>>> it's really per-domain assignment of the configurations to however
>>>>>>>> many counters the group needs in each domain.
>>>>>>>
>>>>>>> I think I am becoming lost. Would a global configuration not break your
>>>>>>> view of "event-set applied to a single counter"? If a counter is configured
>>>>>>> globally then it would not make it possible to support the full configurability
>>>>>>> of the hardware.
>>>>>>> Before I add more confusion, let me try with an example that builds on your
>>>>>>> earlier example copied below:
>>>>>>>
>>>>>>>>>> (per domain)
>>>>>>>>>> group 0:
>>>>>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>> group 1:
>>>>>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>> ...
>>>>>>>
>>>>>>> Since the above states "per domain" I rewrite the example to highlight that as
>>>>>>> I understand it:
>>>>>>>
>>>>>>> group 0:
>>>>>>>  domain 0:
>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>  domain 1:
>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>> group 1:
>>>>>>>  domain 0:
>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>  domain 1:
>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>
>>>>>>> You mention that you do not want counters to be allocated in domains that they
>>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>>>>>>> in domain 1, resulting in:
>>>>>>>
>>>>>>> group 0:
>>>>>>>  domain 0:
>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>> group 1:
>>>>>>>  domain 0:
>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>  domain 1:
>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>
>>>>>>> With counter 0 and counter 1 available in domain 1, these counters could
>>>>>>> theoretically be configured to give group 1 more data in domain 1:
>>>>>>>
>>>>>>> group 0:
>>>>>>>  domain 0:
>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>> group 1:
>>>>>>>  domain 0:
>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>  domain 1:
>>>>>>>   counter 0: LclFill,RmtFill
>>>>>>>   counter 1: LclNTWr,RmtNTWr
>>>>>>>   counter 2: LclSlowFill,RmtSlowFill
>>>>>>>   counter 3: VictimBW
>>>>>>>
>>>>>>> The counters are shown with different per-domain configurations that seems to
>>>>>>> match with earlier goals of (a) choose events counted by each counter and
>>>>>>> (b) do not allocate counters in domains where they are not needed. As I
>>>>>>> understand the above does contradict global counter configuration though.
>>>>>>> Or do you mean that only the *name* of the counter is global and then
>>>>>>> that it is reconfigured as part of every assignment?
>>>>>>
>>>>>> Yes, I meant only the *name* is global. I assume based on a particular
>>>>>> system configuration, the user will settle on a handful of useful
>>>>>> groupings to count.
>>>>>>
>>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
>>>>>>
>>>>>>  # define global configurations (in ABMC terms), not necessarily in this
>>>>>>  # syntax and probably not in the mbm_assign_control file.
>>>>>>
>>>>>>  r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>  w=VictimBW,LclNTWr,RmtNTWr
>>>>>>
>>>>>>  # legacy "total" configuration, effectively r+w
>>>>>>  t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>
>>>>>>  /group0/0=t;1=t
>>>>>>  /group1/0=t;1=t
>>>>>>  /group2/0=_;1=t
>>>>>>  /group3/0=rw;1=_
>>>>>>
>>>>>> - group2 is restricted to domain 0
>>>>>> - group3 is restricted to domain 1
>>>>>> - the rest are unrestricted
>>>>>> - In group3, we decided we need to separate read and write traffic
>>>>>>
>>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
>>>>>>
>>>>>
>>>>> I see. Thank you for the example.
>>>>>
>>>>> resctrl supports per-domain configurations with the following possible when
>>>>> using mbm_total_bytes_config and mbm_local_bytes_config:
>>>>>
>>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>>>>>
>>>>>    /group0/0=t;1=t
>>>>>    /group1/0=t;1=t
>>>>>
>>>>> Even though the flags are identical in all domains, the assigned counters will
>>>>> be configured differently in each domain.
>>>>>
>>>>> With this supported by hardware and currently also supported by resctrl it seems
>>>>> reasonable to carry this forward to what will be supported next.
>>>>
>>>> The hardware supports both a per-domain mode, where all groups in a
>>>> domain use the same configurations and are limited to two events per
>>>> group and a per-group mode where every group can be configured and
>>>> assigned freely. This series is using the legacy counter access mode
>>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
>>>> in the domain can be read. If we chose to read the assigned counter
>>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
>>>> rather than asking the hardware to find the counter by RMID, we would
>>>> not be limited to 2 counters per group/domain and the hardware would
>>>> have the same flexibility as on MPAM.
>>>
>>> In extended mode, the contents of a specific counter can be read by
>>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
>>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
>>> QM_CTR will then return the contents of the specified counter.
>>>
>>> It is documented below.
>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
>>>  Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
>>>
>>> We previously discussed this with you (off the public list) and I
>>> initially proposed the extended assignment mode.
>>>
>>> Yes, the extended mode allows greater flexibility by enabling multiple
>>> counters to be assigned to the same group, rather than being limited to
>>> just two.
>>>
>>> However, the challenge is that we currently lack the necessary interfaces
>>> to configure multiple events per group. Without these interfaces, the
>>> extended mode is not practical at this time.
>>>
>>> Therefore, we ultimately agreed to use the legacy mode, as it does not
>>> require modifications to the existing interface, allowing us to continue
>>> using it as is.
>>>
>>>>
>>>> (I might have said something confusing in my last messages because I
>>>> had forgotten that I switched to the extended assignment mode when
>>>> prototyping with soft-ABMC and MPAM.)
>>>>
>>>> Forcing all groups on a domain to share the same 2 counter
>>>> configurations would not be acceptable for us, as the example I gave
>>>> earlier is one I've already been asked about.
>>>
>>> I don’t see this as a blocker. It should be considered an extension to the
>>> current ABMC series. We can easily build on top of this series once we
>>> finalize how to configure the multiple event interface for each group.
>>
>> I don't think it is, either. Only being able to use ABMC to assign
>> counters is fine for our use as an incremental step. My longer-term
>> concern is the domain-scoped mbm_total_bytes_config and
>> mbm_local_bytes_config files, but they were introduced with BMEC, so
>> there's already an expectation that the files are present when BMEC is
>> supported.

It's good that we at least know about this concern now. Let's take a step
back and figure out how we can address it.

>>
>> On ABMC hardware that also supports BMEC, I'm concerned about enabling
>> ABMC when only the BMEC-style event configuration interface exists.
> 
> ABMC currently depends on BMEC making the current implementation the
> one you are concerned about?
> https://lore.kernel.org/lkml/e4111779ebb0e7004dbedc258eeae2677f578ab1.1737577229.git.babu.moger@amd.com/

I think it is more than that.

The ABMC feature allows event configuration by writing to L3_QOS_ABMC_CFG,
where we can set cntr_id, RMID, and event configuration. Currently, we
derive event configuration from BMEC settings (either
mbm_total_bytes_config or mbm_local_bytes_config).

If we don’t use BMEC values, we would need to require users to manually
specify event configuration settings.

struct mbm_cntr_cfg {
        enum resctrl_event_id   evtid;
        struct rdtgroup         *rdtgrp;
};

Currently, we determine the RMID from the rdtgroup and the event type,
while event configuration relies on BMEC:


To make event configuration independent of BMEC, we can include an
explicit event configuration field:

struct mbm_cntr_cfg {
        enum resctrl_event_id   evtid;
        u32                     evt_cfg;  // User-provided config value
        struct rdtgroup         *rdtgrp;
};

Key Considerations

1.  Counter Management: Managing counters globally (like CLOSID
management) would be simpler than handling them at the domain level,
though domain-level management is feasible.

2. User Input: Users will need to specify event configuration when
assigning events.


Here is the quick example using our current interface:
a. List the group.

#cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
//0=t:0x1F,l:0x15;1=t:0x1F,l:0x15

b. Unassign an Event:

#echo "//0-l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

#cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
//0=t:0x1F;1=t:0x1F,l:0x15

c. Assign an Event:

#echo "//0+l:0x15" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

Note that I dont want to rush here.

Peter, Can you please spend some time and propose the interface you are
thinking of based on both ABMC and MPAM.

> 
>> The scope of my issue is just whether enabling "full" ABMC support
>> will require an additional opt-in, since that could remove the BMEC
>> interface. If it does, it's something we can live with.
> 
> 
> Reinette
> 
> 

-- 
Thanks
Babu Moger
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Dave Martin 11 months, 3 weeks ago
Hi,

On Wed, Feb 19, 2025 at 12:28:16PM +0100, Peter Newman wrote:
> Hi Reinette,
> 
> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
> >
> > Hi Peter,
> >
> > On 2/17/25 2:26 AM, Peter Newman wrote:
> > > Hi Reinette,
> > >
> > > On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
> > > <reinette.chatre@intel.com> wrote:

[...]

> > >> As mentioned above, one possible issue with existing interface is that
> > >> it is limited to 26 events (assuming only lower case letters are used). The limit
> > >> is low enough to be of concern.
> > >
> > > The events which can be monitored by a single counter on ABMC and MPAM
> > > so far are combinable, so 26 counters per group today means it limits
> > > breaking down MBM traffic for each group 26 ways. If a user complained
> > > that a 26-way breakdown of a group's MBM traffic was limiting their
> > > investigation, I would question whether they know what they're looking
> > > for.
> >
> > The key here is "so far" as well as the focus on MBM only.
> >
> > It is impossible for me to predict what we will see in a couple of years
> > from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
> > to support their users. Just looking at the Intel RDT spec the event register
> > has space for 32 events for each "CPU agent" resource. That does not take into
> > account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
> > that he is working on patches [1] that will add new events and shared the idea
> > that we may be trending to support "perf" like events associated with RMID. I
> > expect AMD PQoS and Arm MPAM to provide related enhancements to support their
> > customers.
> > This all makes me think that resctrl should be ready to support more events than 26.
> 
> I was thinking of the letters as representing a reusable, user-defined
> event-set for applying to a single counter rather than as individual
> events, since MPAM and ABMC allow us to choose the set of events each
> one counts. Wherever we define the letters, we could use more symbolic
> event names.
> 
> In the letters as events model, choosing the events assigned to a
> group wouldn't be enough information, since we would want to control
> which events should share a counter and which should be counted by
> separate counters. I think the amount of information that would need
> to be encoded into mbm_assign_control to represent the level of
> configurability supported by hardware would quickly get out of hand.
> 
> Maybe as an example, one counter for all reads, one counter for all
> writes in ABMC would look like...
> 
> (L3_QOS_ABMC_CFG.BwType field names below)
> 
> (per domain)
> group 0:
>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>  counter 1: VictimBW,LclNTWr,RmtNTWr
> group 1:
>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>  counter 3: VictimBW,LclNTWr,RmtNTWr
> ...
> 
> I assume packing all of this info for a group's desired counter
> configuration into a single line (with 32 domains per line on many
> dual-socket AMD configurations I see) would be difficult to look at,
> even if we could settle on a single letter to represent each
> universally.
> 
> >
> > My goal is for resctrl to have a user interface that can as much as possible
> > be ready for whatever may be required from it years down the line. Of course,
> > I may be wrong and resctrl would never need to support more than 26 events per
> > resource (*). The risk is that resctrl *may* need to support more than 26 events
> > and how could resctrl support that?
> >
> > What is the risk of supporting more than 26 events? As I highlighted earlier
> > the interface I used as demonstration may become unwieldy to parse on a system
> > with many domains that supports many events. This is a concern for me. Any suggestions
> > will be appreciated, especially from you since I know that you are very familiar with
> > issues related to large scale use of resctrl interfaces.
> 
> It's mainly just the unwieldiness of all the information in one file.
> It's already at the limit of what I can visually look through.
> 
> I believe that shared assignments will take care of all the
> high-frequency and performance-intensive batch configuration updates I
> was originally concerned about, so I no longer see much benefit in
> finding ways to textually encode all this information in a single file
> when it would be more manageable to distribute it around the
> filesystem hierarchy.
> 
> -Peter

This was sort of what I had in my mind.

I think it may make some sense to support "t" and "l" out of the box,
as intuitively backwards-compatible event names, but provide a way to
create new "letters" as needed, with well-defined way (customisable or
not) of mapping these to event names visible in resctrlfs.  I just used
the digits for this purpose, but we could have an explicit interface
for it.

In order for this series to stabilise though, does it make sense to put
this out of scope just for now?

The current series provides a way to provide the mbm_total_bytes and
mbm_local_bytes counters on AMBC and MPAM systems, without having to
limit the total number of monitoring groups (MPAM's current approach)
or overcommit the counters so that they may not be continuously
reliable when there are too many groups (AMD?).

That seems immediately useful.

The ability to assign arbitrarily many counters to a group is a new
feature however.  Does it make sense to consider this on its own merits
when the baseline ABMC interface has been settled?

May main concern right now (from the Arm side) is to be confident that
the initial ABMC interface definition doesn't paint us into a corner.

Cheers
---Dave
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 11 months, 4 weeks ago
Hi All,

On 2/17/25 04:26, Peter Newman wrote:
> Hi Reinette,
> 
> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>>
>> Hi Babu,
>>
>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>
>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>
>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>> Please help me understand if you see it differently.
>>>>>>
>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>
>>>>>> mbm_local_read_bytes a
>>>>>> mbm_local_write_bytes b
>>>>>>
>>>>>> Then mbm_assign_control can be used as:
>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>> <value>
>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>
>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>
>> As mentioned above, one possible issue with existing interface is that
>> it is limited to 26 events (assuming only lower case letters are used). The limit
>> is low enough to be of concern.
> 
> The events which can be monitored by a single counter on ABMC and MPAM
> so far are combinable, so 26 counters per group today means it limits
> breaking down MBM traffic for each group 26 ways. If a user complained
> that a 26-way breakdown of a group's MBM traffic was limiting their
> investigation, I would question whether they know what they're looking
> for.

Based on the discussion so far, it felt like it is not a group level
breakdown. It is kind of global level breakdown. I could be wrong here.

My understanding so far, MPAM has a number of global counters. It can be
assigned to any domain in the system and monitor events.

They also have a way to configure the events (read, write or both).

Both these feature are inline with current resctrl implementation and can
be easily adapted.

One thing I am not clear why MPAM implementation plans to create separate
files(dynamically) in /sys/fs/resctrl/info/L3_MON/ directory to read the
events. We already have files in each group to read the events.

# ls -l /sys/fs/resctrl/mon_data/mon_L3_00/
total 0
-r--r--r--. 1 root root 0 Feb 17 08:16 llc_occupancy
-r--r--r--. 1 root root 0 Feb 17 08:16 mbm_local_bytes
-r--r--r--. 1 root root 0 Feb 17 08:16 mbm_total_bytes

-- 
Thanks
Babu Moger
RE: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Luck, Tony 11 months, 3 weeks ago
> Based on the discussion so far, it felt like it is not a group level
> breakdown. It is kind of global level breakdown. I could be wrong here.
>
> My understanding so far, MPAM has a number of global counters. It can be
> assigned to any domain in the system and monitor events.
>
> They also have a way to configure the events (read, write or both).
>
> Both these feature are inline with current resctrl implementation and can
> be easily adapted.
>
> One thing I am not clear why MPAM implementation plans to create separate
> files(dynamically) in /sys/fs/resctrl/info/L3_MON/ directory to read the
> events. We already have files in each group to read the events.
>
> # ls -l /sys/fs/resctrl/mon_data/mon_L3_00/
> total 0
> -r--r--r--. 1 root root 0 Feb 17 08:16 llc_occupancy
> -r--r--r--. 1 root root 0 Feb 17 08:16 mbm_local_bytes
> -r--r--r--. 1 root root 0 Feb 17 08:16 mbm_total_bytes

It would be nice if the filenames here reflected the reconfigured
events. From what I can tell on AMD with BMEC it is possible to change the
underlying events so that local b/w is reported in the mbm_total_bytes
file, and vice versa. Or an event like:

   6       Dirty Victims from the QOS domain to all types of memory

is counted.

Though maybe we'd need to create a lot of filenames for the 2**6
combinations of bits.

-Tony
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 11 months, 3 weeks ago
Hi Tony,

On 2/18/25 8:51 AM, Luck, Tony wrote:
>> Based on the discussion so far, it felt like it is not a group level
>> breakdown. It is kind of global level breakdown. I could be wrong here.
>>
>> My understanding so far, MPAM has a number of global counters. It can be
>> assigned to any domain in the system and monitor events.
>>
>> They also have a way to configure the events (read, write or both).
>>
>> Both these feature are inline with current resctrl implementation and can
>> be easily adapted.
>>
>> One thing I am not clear why MPAM implementation plans to create separate
>> files(dynamically) in /sys/fs/resctrl/info/L3_MON/ directory to read the
>> events. We already have files in each group to read the events.
>>
>> # ls -l /sys/fs/resctrl/mon_data/mon_L3_00/
>> total 0
>> -r--r--r--. 1 root root 0 Feb 17 08:16 llc_occupancy
>> -r--r--r--. 1 root root 0 Feb 17 08:16 mbm_local_bytes
>> -r--r--r--. 1 root root 0 Feb 17 08:16 mbm_total_bytes
> 
> It would be nice if the filenames here reflected the reconfigured
> events. From what I can tell on AMD with BMEC it is possible to change the
> underlying events so that local b/w is reported in the mbm_total_bytes
> file, and vice versa. Or an event like:
> 
>    6       Dirty Victims from the QOS domain to all types of memory
> 
> is counted.
> 
> Though maybe we'd need to create a lot of filenames for the 2**6
> combinations of bits.

Instead of accommodating all possible names resctrl could support
"generic" names as hinted in Dave Martin's proposal.

The complication with BMEC is that these are the underlying
mbm_local_bytes and mbm_total_bytes events on which configuration
was built. Specifically, by default and at hardware reset mbm_local_bytes
counts exactly that. The event is fixed if BMEC is not supported and
configurable if it is.

Reinette

[1] https://lore.kernel.org/lkml/Z6zeXby8ajh0ax6i@e133380.arm.com/
RE: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Luck, Tony 11 months, 3 weeks ago
> >> Based on the discussion so far, it felt like it is not a group level
> >> breakdown. It is kind of global level breakdown. I could be wrong here.
> >>
> >> My understanding so far, MPAM has a number of global counters. It can be
> >> assigned to any domain in the system and monitor events.
> >>
> >> They also have a way to configure the events (read, write or both).
> >>
> >> Both these feature are inline with current resctrl implementation and can
> >> be easily adapted.
> >>
> >> One thing I am not clear why MPAM implementation plans to create separate
> >> files(dynamically) in /sys/fs/resctrl/info/L3_MON/ directory to read the
> >> events. We already have files in each group to read the events.
> >>
> >> # ls -l /sys/fs/resctrl/mon_data/mon_L3_00/
> >> total 0
> >> -r--r--r--. 1 root root 0 Feb 17 08:16 llc_occupancy
> >> -r--r--r--. 1 root root 0 Feb 17 08:16 mbm_local_bytes
> >> -r--r--r--. 1 root root 0 Feb 17 08:16 mbm_total_bytes
> >
> > It would be nice if the filenames here reflected the reconfigured
> > events. From what I can tell on AMD with BMEC it is possible to change the
> > underlying events so that local b/w is reported in the mbm_total_bytes
> > file, and vice versa. Or an event like:
> >
> >    6       Dirty Victims from the QOS domain to all types of memory
> >
> > is counted.
> >
> > Though maybe we'd need to create a lot of filenames for the 2**6
> > combinations of bits.
>
> Instead of accommodating all possible names resctrl could support
> "generic" names as hinted in Dave Martin's proposal.
>
> The complication with BMEC is that these are the underlying
> mbm_local_bytes and mbm_total_bytes events on which configuration
> was built. Specifically, by default and at hardware reset mbm_local_bytes
> counts exactly that. The event is fixed if BMEC is not supported and
> configurable if it is.

Would if be possible to rename the files if the config changed?

I.e. initially they are named mbm_local_bytes and mbm_total_bytes.

But when the user changes the config for mbm_total_bytes using the
BMEC config file, that file is renamed everywhere to "user_config1"

-Tony
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 11 months, 3 weeks ago
Hi Tony,

On 2/18/25 11:08 AM, Luck, Tony wrote:
>>>> Based on the discussion so far, it felt like it is not a group level
>>>> breakdown. It is kind of global level breakdown. I could be wrong here.
>>>>
>>>> My understanding so far, MPAM has a number of global counters. It can be
>>>> assigned to any domain in the system and monitor events.
>>>>
>>>> They also have a way to configure the events (read, write or both).
>>>>
>>>> Both these feature are inline with current resctrl implementation and can
>>>> be easily adapted.
>>>>
>>>> One thing I am not clear why MPAM implementation plans to create separate
>>>> files(dynamically) in /sys/fs/resctrl/info/L3_MON/ directory to read the
>>>> events. We already have files in each group to read the events.
>>>>
>>>> # ls -l /sys/fs/resctrl/mon_data/mon_L3_00/
>>>> total 0
>>>> -r--r--r--. 1 root root 0 Feb 17 08:16 llc_occupancy
>>>> -r--r--r--. 1 root root 0 Feb 17 08:16 mbm_local_bytes
>>>> -r--r--r--. 1 root root 0 Feb 17 08:16 mbm_total_bytes
>>>
>>> It would be nice if the filenames here reflected the reconfigured
>>> events. From what I can tell on AMD with BMEC it is possible to change the
>>> underlying events so that local b/w is reported in the mbm_total_bytes
>>> file, and vice versa. Or an event like:
>>>
>>>    6       Dirty Victims from the QOS domain to all types of memory
>>>
>>> is counted.
>>>
>>> Though maybe we'd need to create a lot of filenames for the 2**6
>>> combinations of bits.
>>
>> Instead of accommodating all possible names resctrl could support
>> "generic" names as hinted in Dave Martin's proposal.
>>
>> The complication with BMEC is that these are the underlying
>> mbm_local_bytes and mbm_total_bytes events on which configuration
>> was built. Specifically, by default and at hardware reset mbm_local_bytes
>> counts exactly that. The event is fixed if BMEC is not supported and
>> configurable if it is.
> 
> Would if be possible to rename the files if the config changed?
> 
> I.e. initially they are named mbm_local_bytes and mbm_total_bytes.
> 
> But when the user changes the config for mbm_total_bytes using the
> BMEC config file, that file is renamed everywhere to "user_config1"
> 

The motivation for doing this to an existing interface is not clear. On
its own I think it will add confusion. It sounds to me as though there is
some future (similar to BMEC) feature that needs to be supported for which
such a change would make things compatible. For this I think it would be easier to
discuss that future feature and ensure everybody is clear on what interface
would work for that new feature before making changes to existing feature to
be compatible with it.

Reinette
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Dave Martin 11 months, 3 weeks ago
On Mon, Feb 17, 2025 at 10:45:29AM -0600, Moger, Babu wrote:
> Hi All,
> 
> On 2/17/25 04:26, Peter Newman wrote:
> > Hi Reinette,
> > 
> > On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
> > <reinette.chatre@intel.com> wrote:
> >>
> >> Hi Babu,
> >>
> >> On 2/14/25 10:31 AM, Moger, Babu wrote:
> >>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
> >>>> On 2/13/25 9:37 AM, Dave Martin wrote:
> >>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
> >>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
> >>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
> >>
> >> (quoting relevant parts with goal to focus discussion on new possible syntax)
> >>
> >>>>>> I see the support for MPAM events distinct from the support of assignable counters.
> >>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
> >>>>>> Please help me understand if you see it differently.
> >>>>>>
> >>>>>> Doing so would need to come up with alphabetical letters for these events,
> >>>>>> which seems to be needed for your proposal also? If we use possible flags of:
> >>>>>>
> >>>>>> mbm_local_read_bytes a
> >>>>>> mbm_local_write_bytes b
> >>>>>>
> >>>>>> Then mbm_assign_control can be used as:
> >>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
> >>>>>> <value>
> >>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> >>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
> >>>>>>
> >>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
> >>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
> >>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
> >>
> >> As mentioned above, one possible issue with existing interface is that
> >> it is limited to 26 events (assuming only lower case letters are used). The limit
> >> is low enough to be of concern.
> > 
> > The events which can be monitored by a single counter on ABMC and MPAM
> > so far are combinable, so 26 counters per group today means it limits
> > breaking down MBM traffic for each group 26 ways. If a user complained
> > that a 26-way breakdown of a group's MBM traffic was limiting their
> > investigation, I would question whether they know what they're looking
> > for.
> 
> Based on the discussion so far, it felt like it is not a group level
> breakdown. It is kind of global level breakdown. I could be wrong here.
> 
> My understanding so far, MPAM has a number of global counters. It can be
> assigned to any domain in the system and monitor events.
> 
> They also have a way to configure the events (read, write or both).
> 
> Both these feature are inline with current resctrl implementation and can
> be easily adapted.
> 
> One thing I am not clear why MPAM implementation plans to create separate
> files(dynamically) in /sys/fs/resctrl/info/L3_MON/ directory to read the
> events. We already have files in each group to read the events.
> 
> # ls -l /sys/fs/resctrl/mon_data/mon_L3_00/
> total 0
> -r--r--r--. 1 root root 0 Feb 17 08:16 llc_occupancy
> -r--r--r--. 1 root root 0 Feb 17 08:16 mbm_local_bytes
> -r--r--r--. 1 root root 0 Feb 17 08:16 mbm_total_bytes


To be clear, we have no current plan to do this from the Arm side.

My sketch was just a thought experiment to test whether we would have
difficulties _if_ a decision were made to extend the interface in that
direction.

But it looks OK to me: the interface proposed in this series seems to
leave enough possibilities for extension open that we could do
something like what I described later in if we decide to.


Overall, the interface proposed in this series seems a reasonable way
to support ABMC systems while keeping the consumer-side interface
(i.e., reading the mbm_total_bytes files etc.) as similar to the
classic / Intel RDT situation as possible.

MPAM can fit in with this approach, as demonstrated by James' past
branches porting the MPAM driver on top of previous versions of the
ABMC series.

As I understand it, he's almost done with porting onto this v11,
with no significant issues.

Cheers
---Dave
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 11 months, 3 weeks ago
Hi All,

On 2/18/25 06:30, Dave Martin wrote:
> On Mon, Feb 17, 2025 at 10:45:29AM -0600, Moger, Babu wrote:
>> Hi All,
>>
>> On 2/17/25 04:26, Peter Newman wrote:
>>> Hi Reinette,
>>>
>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>> <reinette.chatre@intel.com> wrote:
>>>>
>>>> Hi Babu,
>>>>
>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>
>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>
>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>
>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>
>>>>>>>> mbm_local_read_bytes a
>>>>>>>> mbm_local_write_bytes b
>>>>>>>>
>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>> <value>
>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>
>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>
>>>> As mentioned above, one possible issue with existing interface is that
>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>> is low enough to be of concern.
>>>
>>> The events which can be monitored by a single counter on ABMC and MPAM
>>> so far are combinable, so 26 counters per group today means it limits
>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>> investigation, I would question whether they know what they're looking
>>> for.
>>
>> Based on the discussion so far, it felt like it is not a group level
>> breakdown. It is kind of global level breakdown. I could be wrong here.
>>
>> My understanding so far, MPAM has a number of global counters. It can be
>> assigned to any domain in the system and monitor events.
>>
>> They also have a way to configure the events (read, write or both).
>>
>> Both these feature are inline with current resctrl implementation and can
>> be easily adapted.
>>
>> One thing I am not clear why MPAM implementation plans to create separate
>> files(dynamically) in /sys/fs/resctrl/info/L3_MON/ directory to read the
>> events. We already have files in each group to read the events.
>>
>> # ls -l /sys/fs/resctrl/mon_data/mon_L3_00/
>> total 0
>> -r--r--r--. 1 root root 0 Feb 17 08:16 llc_occupancy
>> -r--r--r--. 1 root root 0 Feb 17 08:16 mbm_local_bytes
>> -r--r--r--. 1 root root 0 Feb 17 08:16 mbm_total_bytes
> 
> 
> To be clear, we have no current plan to do this from the Arm side.
> 
> My sketch was just a thought experiment to test whether we would have
> difficulties _if_ a decision were made to extend the interface in that
> direction.
> 
> But it looks OK to me: the interface proposed in this series seems to
> leave enough possibilities for extension open that we could do
> something like what I described later in if we decide to.
> 
> 
> Overall, the interface proposed in this series seems a reasonable way
> to support ABMC systems while keeping the consumer-side interface
> (i.e., reading the mbm_total_bytes files etc.) as similar to the
> classic / Intel RDT situation as possible.
> 
> MPAM can fit in with this approach, as demonstrated by James' past
> branches porting the MPAM driver on top of previous versions of the
> ABMC series.

Thanks Dave.
> 
> As I understand it, he's almost done with porting onto this v11,
> with no significant issues.
> 
Good to know. Thanks

I am working on v12 of ABMC with few changes from Reinette's earlier
review comments.

Most of the changes are related to commit message update and user
documentation update.

Introduced couple of new functions resctrl_reset_rmid_all() and
mbm_cntr_free_all() to organize the code better based on the comment.
https://lore.kernel.org/lkml/b60b4f72-6245-46db-a126-428fb13b6310@intel.com/


On top of that I have few comments from from Dave.

1.  Change "mbm_cntr_assign" to "mbm_counter_assign".

This will require me to search and replace lot of places. There are
variables, names like num_mbm_cntrs, mbm_cntr_assignable,
resctrl_arch_mbm_cntr_assign_enabled, resctrl_arch_mbm_cntr_assign_set,
mbm_cntr_assign_enabled, resctrl_num_mbm_cntrs_show, mbm_cntr_cfg and list
goes on.

 This is mostly cosmetic and not much value add. Will drop this change if
Dave has no objections.

2. Change /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs to display per-domain
supported counters instead of a single value.


3. Use the actual events instead of flags based on the below comment.

https://lore.kernel.org/lkml/a07fca4c-c8fa-41a6-b126-59815b9a58f9@intel.com/

 Something like this.
 # echo '//0={mbm_total_bytes}{mbm_local_bytes};1={mbm_local_bytes}'
>/sys/fs/resctrl/info/L3_MON/mbm_assign_control

 Are we ready to go with this approach? I am still not clear on this.

 Reinette, What do you think?


-- 
Thanks
Babu Moger
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Dave Martin 11 months, 3 weeks ago
Hi there,

On Tue, Feb 18, 2025 at 09:39:43AM -0600, Moger, Babu wrote:
> Hi All,
> 
> On 2/18/25 06:30, Dave Martin wrote:
> > On Mon, Feb 17, 2025 at 10:45:29AM -0600, Moger, Babu wrote:
> >> Hi All,

[...]

> >> One thing I am not clear why MPAM implementation plans to create separate
> >> files(dynamically) in /sys/fs/resctrl/info/L3_MON/ directory to read the
> >> events. We already have files in each group to read the events.
> >>
> >> # ls -l /sys/fs/resctrl/mon_data/mon_L3_00/
> >> total 0
> >> -r--r--r--. 1 root root 0 Feb 17 08:16 llc_occupancy
> >> -r--r--r--. 1 root root 0 Feb 17 08:16 mbm_local_bytes
> >> -r--r--r--. 1 root root 0 Feb 17 08:16 mbm_total_bytes
> > 
> > 
> > To be clear, we have no current plan to do this from the Arm side.
> > 
> > My sketch was just a thought experiment to test whether we would have
> > difficulties _if_ a decision were made to extend the interface in that
> > direction.
> > 
> > But it looks OK to me: the interface proposed in this series seems to
> > leave enough possibilities for extension open that we could do
> > something like what I described later in if we decide to.
> > 
> > 
> > Overall, the interface proposed in this series seems a reasonable way
> > to support ABMC systems while keeping the consumer-side interface
> > (i.e., reading the mbm_total_bytes files etc.) as similar to the
> > classic / Intel RDT situation as possible.
> > 
> > MPAM can fit in with this approach, as demonstrated by James' past
> > branches porting the MPAM driver on top of previous versions of the
> > ABMC series.
> 
> Thanks Dave.
> > 
> > As I understand it, he's almost done with porting onto this v11,
> > with no significant issues.
> > 
> Good to know. Thanks
> 
> I am working on v12 of ABMC with few changes from Reinette's earlier
> review comments.
> 
> Most of the changes are related to commit message update and user
> documentation update.
> 
> Introduced couple of new functions resctrl_reset_rmid_all() and
> mbm_cntr_free_all() to organize the code better based on the comment.
> https://lore.kernel.org/lkml/b60b4f72-6245-46db-a126-428fb13b6310@intel.com/
> 
> 
> On top of that I have few comments from from Dave.
> 
> 1.  Change "mbm_cntr_assign" to "mbm_counter_assign".
> 
> This will require me to search and replace lot of places. There are
> variables, names like num_mbm_cntrs, mbm_cntr_assignable,
> resctrl_arch_mbm_cntr_assign_enabled, resctrl_arch_mbm_cntr_assign_set,
> mbm_cntr_assign_enabled, resctrl_num_mbm_cntrs_show, mbm_cntr_cfg and list
> goes on.
> 
>  This is mostly cosmetic and not much value add. Will drop this change if
> Dave has no objections.

There is no need to change the names of kernel symbols -- this was just
about the interface presented to userspace.

So, if you rename only the affect file names in resctrlfs (I think
there weren't any others) then I'm happy with that.

But if you prefer to avoid this inconsistency, the file name can stay
as-is.  It's not a huge deal.


> 2. Change /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs to display per-domain
> supported counters instead of a single value.

Ack; thanks (we could always add it back in later without an ABI break,
if people feel strongly about it and it looks feasible).


> 3. Use the actual events instead of flags based on the below comment.
> 
> https://lore.kernel.org/lkml/a07fca4c-c8fa-41a6-b126-59815b9a58f9@intel.com/
> 
>  Something like this.
>  # echo '//0={mbm_total_bytes}{mbm_local_bytes};1={mbm_local_bytes}'
> >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> 
>  Are we ready to go with this approach? I am still not clear on this.

[...]

> -- 
> Thanks
> Babu Moger

On this point, I'll defer to discussions elsewhere on the thread.


I have a few other minor comments pending to post, but it looks like
there may be a more serious issue with how the mbm_assign_control file
is handled in the kernel -- I'll try to post comments on that today.

Cheers
---Dave
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 11 months, 3 weeks ago
Hi Babu,

On 2/18/25 7:39 AM, Moger, Babu wrote:
 
> 3. Use the actual events instead of flags based on the below comment.
> 
> https://lore.kernel.org/lkml/a07fca4c-c8fa-41a6-b126-59815b9a58f9@intel.com/
> 
>  Something like this.
>  # echo '//0={mbm_total_bytes}{mbm_local_bytes};1={mbm_local_bytes}'
>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> 
>  Are we ready to go with this approach? I am still not clear on this.
> 
>  Reinette, What do you think?

I was actually expecting some push back or at least discussion on this interface
because the braces seem difficult to parse when compared to, for example, using
commas to separate the events of a domain. Peter [1] has some reservations about
going this direction and since he would end up using this interface significantly
I would prefer to resolve that first.

Reinette


[1] https://lore.kernel.org/lkml/CALPaoCh7WpohzpXhSAbumjSZBv1_+1bXON7_V1pwG4bdEBr52Q@mail.gmail.com/
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 11 months, 3 weeks ago
Hi Reinette,

On 2/18/25 12:14, Reinette Chatre wrote:
> Hi Babu,
> 
> On 2/18/25 7:39 AM, Moger, Babu wrote:
>  
>> 3. Use the actual events instead of flags based on the below comment.
>>
>> https://lore.kernel.org/lkml/a07fca4c-c8fa-41a6-b126-59815b9a58f9@intel.com/
>>
>>  Something like this.
>>  # echo '//0={mbm_total_bytes}{mbm_local_bytes};1={mbm_local_bytes}'
>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>
>>  Are we ready to go with this approach? I am still not clear on this.
>>
>>  Reinette, What do you think?
> 
> I was actually expecting some push back or at least discussion on this interface
> because the braces seem difficult to parse when compared to, for example, using

I am yet to work on it. Will work on it after confirmation.

Here is the output from a system with 12 domains. I created one "test" group.

Output is definitely harder to parse for human eyes.

#cat info/L3_MON/mbm_assign_control
test//0={mbm_total_bytes}{mbm_local_bytes};1={mbm_total_bytes}{mbm_local_bytes};2={mbm_total_bytes}{mbm_local_bytes};3={mbm_total_bytes}{mbm_local_bytes};4={mbm_total_bytes}{mbm_local_bytes};5={mbm_total_bytes}{mbm_local_bytes};6={mbm_total_bytes}{mbm_local_bytes};7={mbm_total_bytes}{mbm_local_bytes};8={mbm_total_bytes}{mbm_local_bytes};9={mbm_total_bytes}{mbm_local_bytes};10={mbm_total_bytes}{mbm_local_bytes};11={mbm_total_bytes}{mbm_local_bytes}
//0={mbm_total_bytes}{mbm_local_bytes};1={mbm_total_bytes}{mbm_local_bytes};2={mbm_total_bytes}{mbm_local_bytes};3={mbm_total_bytes}{mbm_local_bytes};4={mbm_total_bytes}{mbm_local_bytes};5={mbm_total_bytes}{mbm_local_bytes};6={mbm_total_bytes}{mbm_local_bytes};7={mbm_total_bytes}{mbm_local_bytes};8={mbm_total_bytes}{mbm_local_bytes};9={mbm_total_bytes}{mbm_local_bytes};10={mbm_total_bytes}{mbm_local_bytes};11={mbm_total_bytes}{mbm_local_bytes}

It is harder to parse in code also. We should consider only if there is a
value-add with this format.

Otherwise I prefer our current flag format.

# cat info/L3_MON/mbm_assign_control
test//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl
//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl


> commas to separate the events of a domain. Peter [1] has some reservations about

Yes. I would like to hear from Peter.

> going this direction and since he would end up using this interface significantly
> I would prefer to resolve that first.
> 
> Reinette
> 
> 
> [1] https://lore.kernel.org/lkml/CALPaoCh7WpohzpXhSAbumjSZBv1_+1bXON7_V1pwG4bdEBr52Q@mail.gmail.com/
> 
> 

-- 
Thanks
Babu Moger
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 11 months, 3 weeks ago
Hi Babu,

On 2/18/25 11:32 AM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 2/18/25 12:14, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 2/18/25 7:39 AM, Moger, Babu wrote:
>>  
>>> 3. Use the actual events instead of flags based on the below comment.
>>>
>>> https://lore.kernel.org/lkml/a07fca4c-c8fa-41a6-b126-59815b9a58f9@intel.com/
>>>
>>>  Something like this.
>>>  # echo '//0={mbm_total_bytes}{mbm_local_bytes};1={mbm_local_bytes}'
>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>
>>>  Are we ready to go with this approach? I am still not clear on this.
>>>
>>>  Reinette, What do you think?
>>
>> I was actually expecting some push back or at least discussion on this interface
>> because the braces seem difficult to parse when compared to, for example, using
> 
> I am yet to work on it. Will work on it after confirmation.
> 
> Here is the output from a system with 12 domains. I created one "test" group.
> 
> Output is definitely harder to parse for human eyes.
> 
> #cat info/L3_MON/mbm_assign_control
> test//0={mbm_total_bytes}{mbm_local_bytes};1={mbm_total_bytes}{mbm_local_bytes};2={mbm_total_bytes}{mbm_local_bytes};3={mbm_total_bytes}{mbm_local_bytes};4={mbm_total_bytes}{mbm_local_bytes};5={mbm_total_bytes}{mbm_local_bytes};6={mbm_total_bytes}{mbm_local_bytes};7={mbm_total_bytes}{mbm_local_bytes};8={mbm_total_bytes}{mbm_local_bytes};9={mbm_total_bytes}{mbm_local_bytes};10={mbm_total_bytes}{mbm_local_bytes};11={mbm_total_bytes}{mbm_local_bytes}
> //0={mbm_total_bytes}{mbm_local_bytes};1={mbm_total_bytes}{mbm_local_bytes};2={mbm_total_bytes}{mbm_local_bytes};3={mbm_total_bytes}{mbm_local_bytes};4={mbm_total_bytes}{mbm_local_bytes};5={mbm_total_bytes}{mbm_local_bytes};6={mbm_total_bytes}{mbm_local_bytes};7={mbm_total_bytes}{mbm_local_bytes};8={mbm_total_bytes}{mbm_local_bytes};9={mbm_total_bytes}{mbm_local_bytes};10={mbm_total_bytes}{mbm_local_bytes};11={mbm_total_bytes}{mbm_local_bytes}
> 
> It is harder to parse in code also. We should consider only if there is a
> value-add with this format.

Please see my comments in [2] for some motivations.

> 
> Otherwise I prefer our current flag format.
> 
> # cat info/L3_MON/mbm_assign_control
> test//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl
> //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl

We could possibly consider some middle ground where flags are separated by
commas and when the amount of used flags reach 26 the interface can use
"two letter flags" or "longer names" or "the actual event name" or ....

> 
> 
>> commas to separate the events of a domain. Peter [1] has some reservations about
> 
> Yes. I would like to hear from Peter.
> 

Reinette


[2] https://lore.kernel.org/lkml/ccd9c5d7-0266-4054-879e-e084b6972ad5@intel.com/
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Dave Martin 11 months, 3 weeks ago
Hi Reinette,

On Tue, Feb 18, 2025 at 01:29:09PM -0800, Reinette Chatre wrote:
> Hi Babu,
> 
> On 2/18/25 11:32 AM, Moger, Babu wrote:
> > Hi Reinette,
> > 
> > On 2/18/25 12:14, Reinette Chatre wrote:
> >> Hi Babu,
> >>
> >> On 2/18/25 7:39 AM, Moger, Babu wrote:
> >>  
> >>> 3. Use the actual events instead of flags based on the below comment.
> >>>
> >>> https://lore.kernel.org/lkml/a07fca4c-c8fa-41a6-b126-59815b9a58f9@intel.com/
> >>>
> >>>  Something like this.
> >>>  # echo '//0={mbm_total_bytes}{mbm_local_bytes};1={mbm_local_bytes}'
> >>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >>>
> >>>  Are we ready to go with this approach? I am still not clear on this.
> >>>
> >>>  Reinette, What do you think?
> >>
> >> I was actually expecting some push back or at least discussion on this interface
> >> because the braces seem difficult to parse when compared to, for example, using
> > 
> > I am yet to work on it. Will work on it after confirmation.
> > 
> > Here is the output from a system with 12 domains. I created one "test" group.
> > 
> > Output is definitely harder to parse for human eyes.
> > 
> > #cat info/L3_MON/mbm_assign_control
> > test//0={mbm_total_bytes}{mbm_local_bytes};1={mbm_total_bytes}{mbm_local_bytes};2={mbm_total_bytes}{mbm_local_bytes};3={mbm_total_bytes}{mbm_local_bytes};4={mbm_total_bytes}{mbm_local_bytes};5={mbm_total_bytes}{mbm_local_bytes};6={mbm_total_bytes}{mbm_local_bytes};7={mbm_total_bytes}{mbm_local_bytes};8={mbm_total_bytes}{mbm_local_bytes};9={mbm_total_bytes}{mbm_local_bytes};10={mbm_total_bytes}{mbm_local_bytes};11={mbm_total_bytes}{mbm_local_bytes}
> > //0={mbm_total_bytes}{mbm_local_bytes};1={mbm_total_bytes}{mbm_local_bytes};2={mbm_total_bytes}{mbm_local_bytes};3={mbm_total_bytes}{mbm_local_bytes};4={mbm_total_bytes}{mbm_local_bytes};5={mbm_total_bytes}{mbm_local_bytes};6={mbm_total_bytes}{mbm_local_bytes};7={mbm_total_bytes}{mbm_local_bytes};8={mbm_total_bytes}{mbm_local_bytes};9={mbm_total_bytes}{mbm_local_bytes};10={mbm_total_bytes}{mbm_local_bytes};11={mbm_total_bytes}{mbm_local_bytes}
> > 
> > It is harder to parse in code also. We should consider only if there is a
> > value-add with this format.
> 
> Please see my comments in [2] for some motivations.
> 
> > 
> > Otherwise I prefer our current flag format.
> > 
> > # cat info/L3_MON/mbm_assign_control
> > test//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl
> > //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl
> 
> We could possibly consider some middle ground where flags are separated by
> commas and when the amount of used flags reach 26 the interface can use
> "two letter flags" or "longer names" or "the actual event name" or ....
> 
> > 
> > 
> >> commas to separate the events of a domain. Peter [1] has some reservations about
> > 
> > Yes. I would like to hear from Peter.
> > 
> 
> Reinette

Ack; see also my reply to Peter on the other subthread.

I think the single-letter names provide a much less cumbersome
interface.

From the Arm side, I'd be happy to see just "t" and "l" for now, with
their current fixed mappings to event names, provided that we are
confident that we can add flexibility later without breaking the ABI.

In case this has got lost in the noise, I still think that the v11
proposal for the ABMC interface looks fine as a first step -- I just
wanted to kick the tires re extensibility.

Cheers
---Dave
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 12 months ago
Hi Reinette,

On 2/14/2025 1:18 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 2/14/25 10:31 AM, Moger, Babu wrote:
>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
> 
> (quoting relevant parts with goal to focus discussion on new possible syntax)
> 
>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>> Please help me understand if you see it differently.
>>>>>      
>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>
>>>>> mbm_local_read_bytes a
>>>>> mbm_local_write_bytes b
>>>>>
>>>>> Then mbm_assign_control can be used as:
>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>> <value>
>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>
>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
> 
> As mentioned above, one possible issue with existing interface is that
> it is limited to 26 events (assuming only lower case letters are used). The limit
> is low enough to be of concern.

Yes. Agree.

> 
> ....
> 
>>>>
>>>> Alternatively, if we want to be able to expand beyond single letters,
>>>> could we reserve one or more characters for extension purposes?
>>>>
>>>> If braces are forbidden by the syntax today, could we add support for
>>>> something like the following later on, without breaking anything?
>>>>
>>>> # echo '//0={foo}{bar};1={bar}' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>
>>>
> 
> Dave proposed a change in syntax that can (a) support unlimited events,
> (b) be more intuitive than the one letter flags that may be hard to match
> to the events they correspond to.

Yea. Sounds good.

> 
>>> Thank you for the suggestion. I think we may need something like this.
>>> Babu, what do you think?
>>
>> I'm not quite clear on this. Do we know what 'foo' and 'bar' refer to?
>> It is a random text?
> 
> Not random text. It refers to the events.
> 
> I do not know if braces is what will be settled on but a slight change in
> example to make it match your series can be:
> 
> # echo '//0={mbm_total_bytes}{mbm_local_bytes};1={mbm_local_bytes}' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> 
> With syntax like above there is no concern that we will run out of
> flags and the events assigned are clear without needing to parse separate flags.

Yes. We need to change our current "flag parsing". It should not be a 
problem.

> For a system with a lot of events and domains this will become quite a lot
> to parse though.
> 
>>
>> In his example from
>> https://lore.kernel.org/lkml/Z643WdXYARTADSBy@e133380.arm.com/
>> --------------------------------------------------------------
>> The numbers are not supposed to have an hardware significance.
>>
>>      '//0=6'
>>
>> just "means assign some unused counter for domain 0, and create files
>> in resctrl so I can configure and read it".
> 
> Thanks for pointing this out. I missed that the idea was that the
> configuration files are dynamically created.
> 
>>
>> The "6" is really just a tag for labelling the resulting resctrl
>> file names so that the user can tell them apart.  It's not supposed
>> to imply any specific hardware counter or event.
> 
> Right.
> 
>> ------------------------------------------------------------------
>>
>> It seems that 'foo' and 'bar' are tags used to create files in /sys/fs/resctrl/info/L3_MON/.
>>
>> Given that, it looks like we're discussing entirely different things.
> 
> I am still trying to understand how MPAM counters can be supported.
> 
> Reinette


Thanks
Babu


Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 12 months ago
Hi Dave,

Thanks for your help. Reinette has asked few questions already. I have few
more questions on top of that.

On 2/12/25 11:46, Dave Martin wrote:
> Hi there,
> 
> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>
>> This series adds the support for Assignable Bandwidth Monitoring Counters
>> (ABMC). It is also called QoS RMID Pinning feature
>>
>> Series is written such that it is easier to support other assignable
>> features supported from different vendors.
>>
>> The feature details are documented in the  APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC). The documentation is available at
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>
>> The patches are based on top of commit
>> d361b84d51bfe (tip/master) Merge branch into tip/master: 'x86/tdx'
>>
>> # Introduction
> 
> [...]
> 
>> # Examples
>>
>> a. Check if ABMC support is available
>> 	#mount -t resctrl resctrl /sys/fs/resctrl/
>>
>> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> 	[mbm_cntr_assign]
>> 	default
> 
> (Nit: can this be called "mbm_counter_assign"?  The name is already
> long, so I wonder whether anything is gained by using a cryptic
> abbreviation for "counter".  Same with all the "cntrs" elsewhere.
> This is purely cosmetic, though -- the interface works either way.)

Yes. We can do that.

> 
>> 	ABMC feature is detected and it is enabled.
>>
>> b. Check how many ABMC counters are available. 
>>
>> 	# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
>> 	32
> 
> Is this file needed?
> 
> With MPAM, it is more difficult to promise that the same number of
> counters will be available everywhere.
> 
> Rather than lie, or report a "safe" value here that may waste some
> counters, can we just allow the number of counters to be be discovered
> per domain via available_mbm_cntrs?

As  Reinette suggested below we can display per domain supported counters
here.
https://lore.kernel.org/lkml/9e849476-7c4b-478b-bd2a-185024def3a3@intel.com/

> 
> num_closids and num_rmids are already problematic for MPAM, so it would
> be good to avoid any more parameters of this sort from being reported
> to userspace unless there is a clear understanding of why they are
> needed.
> 
> Reporting number of counters per monitoring domain is a more natural
> fit for MPAM, as below:
> 
>> c. Check how many ABMC counters are available in each domain.
>>
>> 	# cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs 
>> 	0=30;1=30
> 
> For MPAM, this seems supportable.  Each monitoring domain will have
> some counters, and a well-defined number of them will be available for
> allocation at any one time.
> 
>> d. Create few resctrl groups.
>>
>> 	# mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
>> 	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
>> 	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
>>
>> e. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>    to list and modify any group's monitoring states. File provides single place
>>    to list monitoring states of all the resctrl groups. It makes it easier for
>>    user space to learn about the used counters without needing to traverse all
>>    the groups thus reducing the number of file system calls.
>>
>> 	The list follows the following format:
>>
>> 	"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>
>> 	Format for specific type of groups:
>>
>> 	* Default CTRL_MON group:
>> 	 "//<domain_id>=<flags>"
> 
> [...]
> 
>>        Flags can be one of the following:
>>
>>         t  MBM total event is enabled.
>>         l  MBM local event is enabled.
>>         tl Both total and local MBM events are enabled.
>>         _  None of the MBM events are enabled
>>
>> 	Examples:
> 
> [...]
> 
> I think that this basically works for MPAM.
> 
> The local/total distinction doesn't map in a consistent way onto MPAM,
> but this problem is not specific to ABMC.  It feels sensible for ABMC
> to be built around the same concepts that resctrl already has elsewhere
> in the interface.  MPAM will do its best to fit (as already).
> 
> Regarding Peter's use case of assiging multiple counters to a
> monitoring group [1], I feel that it's probably good enough to make
> sure that the ABMC interface can be extended in future in a backwards
> compatible way so as to support this, without trying to support it
> immediately.
> 
> [1] https://lore.kernel.org/lkml/CALPaoCjY-3f2tWvBjuaQPfoPhxveWxxCxHqQMn4BEaeBXBa0bA@mail.gmail.com/
> 
> 
> For example, if we added new generic "letters" -- say, "0" to "9",
> combined with new counter files in resctrlfs, that feels like a
> possible approach.  ABMC (as in this series) should just reject such
> such assignments, and the new counter files wouldn't exist.

What is "combined with new counter files"? Does MPAM going to add new
files to support counter assignment in ARM?

Also what is  "0" to "9"? Is this counter ids?


> 
> Availability of this feature could also be reported as a distinct mode
> in mbm_assign_mode, say "mbm_cntr_generic", or whatever.

Yes. That should be fine.

> 
> 
> A _sketch_ of this follows.  This is NOT a proposal -- the key
> question is whether we are confident that we can extend the interface
> in this way in the future without breaking anything.
> 
> If "yes", then the ABMC interface (as proposed by this series) works as
> a foundation to build on.
> 
> --8<--
> 
> [artists's impression]
> 
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>  	mbm_cntr_generic
>  	[mbm_cntr_assign]
>  	default

Yes. This looks good.


> # echo mbm_cntr_generic >/sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> # echo '//0=01;1=23' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control

Looks like you are assigning counter ids to domains here. That is
different than ABMC. In ABMC, we assign events (local or total) to the
domain. We internally handle the counter ids based on the availability.

Can MPAM follow the same concept?  It is possible?


> # echo t >/sys/fs/resctrl/info/L3_MON/mbm_counter0_bytes_type 
> # echo l >/sys/fs/resctrl/info/L3_MON/mbm_counter1_bytes_type 
> # echo t >/sys/fs/resctrl/info/L3_MON/mbm_counter2_bytes_type 
> # echo l >/sys/fs/resctrl/info/L3_MON/mbm_counter3_bytes_type 

This also looks different that we are have right now in resctrl fs.

Are you creating separate file for each counter id in
/sys/fs/resctrl/info/L3_MON/?


> 
> ...
> 
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_counter1_bytes
> 
> etc.
> 
> -->8--
> 
> Any thoughts on this, Peter?
> 
> [...]
> 
> Cheers
> ---Dave
> 

-- 
Thanks
Babu Moger
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Dave Martin 12 months ago
Hi,

On Thu, Feb 13, 2025 at 10:19:29AM -0600, Moger, Babu wrote:
> Hi Dave,
> 
> Thanks for your help. Reinette has asked few questions already. I have few
> more questions on top of that.
> 
> On 2/12/25 11:46, Dave Martin wrote:
> > Hi there,
> > 
> > On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
> >>
> >> This series adds the support for Assignable Bandwidth Monitoring Counters
> >> (ABMC). It is also called QoS RMID Pinning feature

[...]

> >> a. Check if ABMC support is available
> >> 	#mount -t resctrl resctrl /sys/fs/resctrl/
> >>
> >> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> >> 	[mbm_cntr_assign]
> >> 	default
> > 
> > (Nit: can this be called "mbm_counter_assign"?  The name is already
> > long, so I wonder whether anything is gained by using a cryptic
> > abbreviation for "counter".  Same with all the "cntrs" elsewhere.
> > This is purely cosmetic, though -- the interface works either way.)
> 
> Yes. We can do that.

Thanks (note, I'm also happy without this change, if you aren't
planning do a substantial respin of the series.)

[...]

> >> b. Check how many ABMC counters are available. 
> >>
> >> 	# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
> >> 	32
> > 
> > Is this file needed?
> > 
> > With MPAM, it is more difficult to promise that the same number of
> > counters will be available everywhere.
> > 
> > Rather than lie, or report a "safe" value here that may waste some
> > counters, can we just allow the number of counters to be be discovered
> > per domain via available_mbm_cntrs?
> 
> As  Reinette suggested below we can display per domain supported counters
> here.
> https://lore.kernel.org/lkml/9e849476-7c4b-478b-bd2a-185024def3a3@intel.com/

Although I'm still not convinced that this file is necessary, MPAM
should be able to work with this.

(I'm assuming that ABMC hardware has a set of counters for each
monitoring domain, of course -- otherwise this doesn't make sense.)

[...]

> >> c. Check how many ABMC counters are available in each domain.
> >>
> >> 	# cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs 
> >> 	0=30;1=30
> > 
> > For MPAM, this seems supportable.  Each monitoring domain will have
> > some counters, and a well-defined number of them will be available for
> > allocation at any one time.

[...]

> >>        Flags can be one of the following:
> >>
> >>         t  MBM total event is enabled.
> >>         l  MBM local event is enabled.
> >>         tl Both total and local MBM events are enabled.
> >>         _  None of the MBM events are enabled
> >>
> >> 	Examples:
> > 
> > [...]
> > 
> > I think that this basically works for MPAM.
> > 
> > The local/total distinction doesn't map in a consistent way onto MPAM,
> > but this problem is not specific to ABMC.  It feels sensible for ABMC
> > to be built around the same concepts that resctrl already has elsewhere
> > in the interface.  MPAM will do its best to fit (as already).
> > 
> > Regarding Peter's use case of assiging multiple counters to a
> > monitoring group [1], I feel that it's probably good enough to make
> > sure that the ABMC interface can be extended in future in a backwards
> > compatible way so as to support this, without trying to support it
> > immediately.
> > 
> > [1] https://lore.kernel.org/lkml/CALPaoCjY-3f2tWvBjuaQPfoPhxveWxxCxHqQMn4BEaeBXBa0bA@mail.gmail.com/
> > 
> > 
> > For example, if we added new generic "letters" -- say, "0" to "9",
> > combined with new counter files in resctrlfs, that feels like a
> > possible approach.  ABMC (as in this series) should just reject such
> > such assignments, and the new counter files wouldn't exist.
> 
> What is "combined with new counter files"? Does MPAM going to add new
> files to support counter assignment in ARM?
> 
> Also what is  "0" to "9"? Is this counter ids?
> 
> 
> > 
> > Availability of this feature could also be reported as a distinct mode
> > in mbm_assign_mode, say "mbm_cntr_generic", or whatever.
> 
> Yes. That should be fine.
> 
> > 
> > 
> > A _sketch_ of this follows.  This is NOT a proposal -- the key
> > question is whether we are confident that we can extend the interface
> > in this way in the future without breaking anything.
> > 
> > If "yes", then the ABMC interface (as proposed by this series) works as
> > a foundation to build on.
> > 
> > --8<--
> > 
> > [artists's impression]
> > 
> > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> >  	mbm_cntr_generic
> >  	[mbm_cntr_assign]
> >  	default
> 
> Yes. This looks good.

Good to know, thanks.  (Just to be clear, I am *not* suggesting adding
anything like this just now -- just checking whether the idea works
at all.)


> > # echo mbm_cntr_generic >/sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> > # echo '//0=01;1=23' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> 
> Looks like you are assigning counter ids to domains here. That is
> different than ABMC. In ABMC, we assign events (local or total) to the
> domain. We internally handle the counter ids based on the availability.

The numbers are not supposed to have an hardware significance.

	'//0=6'

just "means assign some unused counter for domain 0, and create files
in resctrl so I can configure and read it".

The "6" is really just a tag for labelling the resulting resctrl
file names so that the user can tell them apart.  It's not supposed
to imply any specific hardware counter or event.

> Can MPAM follow the same concept?  It is possible?

[...]

> Thanks
> Babu Moger

Yes, although there is some hard-to-avoid fuzz about the precise
meaning of "local" and "total".

As Reinette pointed out, there is the also the possibility of adding
new named events other than "local" and "total" if we find that some
kinds of event don't fit these categories.

Cheers
---Dave
RE: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Luck, Tony 12 months ago
> Yes, although there is some hard-to-avoid fuzz about the precise
> meaning of "local" and "total".

Things are only getting fuzzier with mixed DDR and CXL memory.

> As Reinette pointed out, there is the also the possibility of adding
> new named events other than "local" and "total" if we find that some
> kinds of event don't fit these categories.

Not just new names, new scopes too. Patches coming later this year
that would present:

$ cd sys/fs/resctrl
$ cat mon_data/mon_PKG_00/llc_stalls
779762866739

I.e. a way to cheaply collect some "perf" like events across
all CPUs on a package that executed jobs with a specific RMID.

Of course this can be done with perf today, but the cost to collect
this data from heavily multi-threaded workloads that context switch
rapidly is very high.

-Tony
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 12 months ago
Hi Tony,

On 2/13/25 10:39 AM, Luck, Tony wrote:
>> Yes, although there is some hard-to-avoid fuzz about the precise
>> meaning of "local" and "total".
> 
> Things are only getting fuzzier with mixed DDR and CXL memory.
> 
>> As Reinette pointed out, there is the also the possibility of adding
>> new named events other than "local" and "total" if we find that some
>> kinds of event don't fit these categories.
> 
> Not just new names, new scopes too. Patches coming later this year
> that would present:
> 
> $ cd sys/fs/resctrl
> $ cat mon_data/mon_PKG_00/llc_stalls
> 779762866739

Thank you for catching this. To support this would not be possible for
the current plan for mbm_assign_control since it does not have a way
to distinguish domain X of the PKG resource from domain X of the L3 resource.
Sounds like we need to include the resource name in the mbm_assign_control
syntax?

Reinette
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 12 months ago

On 2/13/25 10:34 PM, Reinette Chatre wrote:
> Hi Tony,
> 
> On 2/13/25 10:39 AM, Luck, Tony wrote:
>>> Yes, although there is some hard-to-avoid fuzz about the precise
>>> meaning of "local" and "total".
>>
>> Things are only getting fuzzier with mixed DDR and CXL memory.
>>
>>> As Reinette pointed out, there is the also the possibility of adding
>>> new named events other than "local" and "total" if we find that some
>>> kinds of event don't fit these categories.
>>
>> Not just new names, new scopes too. Patches coming later this year
>> that would present:
>>
>> $ cd sys/fs/resctrl
>> $ cat mon_data/mon_PKG_00/llc_stalls
>> 779762866739
> 
> Thank you for catching this. To support this would not be possible for
> the current plan for mbm_assign_control since it does not have a way
> to distinguish domain X of the PKG resource from domain X of the L3 resource.
> Sounds like we need to include the resource name in the mbm_assign_control
> syntax?

ugh ... please ignore this message. This is not needed since mbm_assign_control
is already associated with the resource.

Reinette
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 1 year ago
Hi Babu,

On Wed, Jan 22, 2025 at 9:20 PM Babu Moger <babu.moger@amd.com> wrote:
>
>
> This series adds the support for Assignable Bandwidth Monitoring Counters
> (ABMC). It is also called QoS RMID Pinning feature
>
> Series is written such that it is easier to support other assignable
> features supported from different vendors.
>
> The feature details are documented in the  APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC). The documentation is available at
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>
> The patches are based on top of commit
> d361b84d51bfe (tip/master) Merge branch into tip/master: 'x86/tdx'
>
> # Introduction
>
> Users can create as many monitor groups as RMIDs supported by the hardware.
> However, bandwidth monitoring feature on AMD system only guarantees that
> RMIDs currently assigned to a processor will be tracked by hardware.
> The counters of any other RMIDs which are no longer being tracked will be
> reset to zero. The MBM event counters return "Unavailable" for the RMIDs
> that are not tracked by hardware. So, there can be only limited number of
> groups that can give guaranteed monitoring numbers. With ever changing
> configurations there is no way to definitely know which of these groups
> are being tracked for certain point of time. Users do not have the option
> to monitor a group or set of groups for certain period of time without
> worrying about counter being reset in between.
>
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID, event pair and monitor the bandwidth as long as it is
> assigned.  The assigned RMID will be tracked by the hardware until the user
> unassigns it manually. There is no need to worry about counters being reset
> during this period. Additionally, the user can specify a bitmask identifying
> the specific bandwidth types from the given source to track with the counter.
>
> Without ABMC enabled, monitoring will work in current 'default' mode without
> assignment option.
>
> # Linux Implementation
>
> Create a generic interface aimed to support user space assignment
> of scarce counters used for monitoring. First usage of interface
> is by ABMC with option to expand usage to "soft-ABMC" and MPAM
> counters in future.

As a reminder of the work related to this, please take a look at the
thread where Reinette proposed a "shared counters" mode in
mbm_assign_control[1]. I am currently working to demonstrate that this
combined with the mbm_*_bytes_per_second events discussed earlier in
the same thread will address my users' concerns about the overhead of
reading a large number of MBM counters, resulting from a maximal
number of monitoring groups whose jobs are not isolated to any L3
monitoring domain.

ABMC will add to the number of registers which need to be programmed
in each domain, so I will need to demonstrate that ABMC combined with
these additional features addresses their performance concerns and
that the resulting interface is user-friendly enough that they will
not need a detailed understanding of the implementation to avoid an
unacceptable performance degradation (i.e., needing to understand what
conditions will increase the number of IPIs required).

If all goes well, soft-ABMC will try to extend this usage model to the
existing, pre-ABMC, AMD platforms I support.

Thanks,
-Peter

[1] https://lore.kernel.org/lkml/7ee63634-3b55-4427-8283-8e3d38105f41@intel.com/
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 1 year ago
Hi Peter,

On 2/3/25 08:54, Peter Newman wrote:
> Hi Babu,
> 
> On Wed, Jan 22, 2025 at 9:20 PM Babu Moger <babu.moger@amd.com> wrote:
>>
>>
>> This series adds the support for Assignable Bandwidth Monitoring Counters
>> (ABMC). It is also called QoS RMID Pinning feature
>>
>> Series is written such that it is easier to support other assignable
>> features supported from different vendors.
>>
>> The feature details are documented in the  APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC). The documentation is available at
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>
>> The patches are based on top of commit
>> d361b84d51bfe (tip/master) Merge branch into tip/master: 'x86/tdx'
>>
>> # Introduction
>>
>> Users can create as many monitor groups as RMIDs supported by the hardware.
>> However, bandwidth monitoring feature on AMD system only guarantees that
>> RMIDs currently assigned to a processor will be tracked by hardware.
>> The counters of any other RMIDs which are no longer being tracked will be
>> reset to zero. The MBM event counters return "Unavailable" for the RMIDs
>> that are not tracked by hardware. So, there can be only limited number of
>> groups that can give guaranteed monitoring numbers. With ever changing
>> configurations there is no way to definitely know which of these groups
>> are being tracked for certain point of time. Users do not have the option
>> to monitor a group or set of groups for certain period of time without
>> worrying about counter being reset in between.
>>
>> The ABMC feature provides an option to the user to assign a hardware
>> counter to an RMID, event pair and monitor the bandwidth as long as it is
>> assigned.  The assigned RMID will be tracked by the hardware until the user
>> unassigns it manually. There is no need to worry about counters being reset
>> during this period. Additionally, the user can specify a bitmask identifying
>> the specific bandwidth types from the given source to track with the counter.
>>
>> Without ABMC enabled, monitoring will work in current 'default' mode without
>> assignment option.
>>
>> # Linux Implementation
>>
>> Create a generic interface aimed to support user space assignment
>> of scarce counters used for monitoring. First usage of interface
>> is by ABMC with option to expand usage to "soft-ABMC" and MPAM
>> counters in future.
> 
> As a reminder of the work related to this, please take a look at the
> thread where Reinette proposed a "shared counters" mode in
> mbm_assign_control[1]. I am currently working to demonstrate that this
> combined with the mbm_*_bytes_per_second events discussed earlier in
> the same thread will address my users' concerns about the overhead of
> reading a large number of MBM counters, resulting from a maximal
> number of monitoring groups whose jobs are not isolated to any L3
> monitoring domain.
> 
> ABMC will add to the number of registers which need to be programmed
> in each domain, so I will need to demonstrate that ABMC combined with
> these additional features addresses their performance concerns and
> that the resulting interface is user-friendly enough that they will
> not need a detailed understanding of the implementation to avoid an
> unacceptable performance degradation (i.e., needing to understand what
> conditions will increase the number of IPIs required).
> 
> If all goes well, soft-ABMC will try to extend this usage model to the
> existing, pre-ABMC, AMD platforms I support.
> 
> Thanks,
> -Peter
> 
> [1] https://lore.kernel.org/lkml/7ee63634-3b55-4427-8283-8e3d38105f41@intel.com/
> 

Thanks for the heads-up. I understand what's going on and have an idea of
the plan. Please keep us updated on the progress. Also, if any changes are
needed in this series to meet your requirements, feel free to share your
feedback.
-- 
Thanks
Babu Moger
Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Dave Martin 12 months ago
On Mon, Feb 03, 2025 at 02:49:27PM -0600, Moger, Babu wrote:
> Hi Peter,
> 
> On 2/3/25 08:54, Peter Newman wrote:

[...]

> >> # Linux Implementation
> >>
> >> Create a generic interface aimed to support user space assignment
> >> of scarce counters used for monitoring. First usage of interface
> >> is by ABMC with option to expand usage to "soft-ABMC" and MPAM
> >> counters in future.
> > 
> > As a reminder of the work related to this, please take a look at the
> > thread where Reinette proposed a "shared counters" mode in
> > mbm_assign_control[1]. I am currently working to demonstrate that this
> > combined with the mbm_*_bytes_per_second events discussed earlier in
> > the same thread will address my users' concerns about the overhead of
> > reading a large number of MBM counters, resulting from a maximal
> > number of monitoring groups whose jobs are not isolated to any L3
> > monitoring domain.
> > 
> > ABMC will add to the number of registers which need to be programmed
> > in each domain, so I will need to demonstrate that ABMC combined with
> > these additional features addresses their performance concerns and
> > that the resulting interface is user-friendly enough that they will
> > not need a detailed understanding of the implementation to avoid an
> > unacceptable performance degradation (i.e., needing to understand what
> > conditions will increase the number of IPIs required).
> > 
> > If all goes well, soft-ABMC will try to extend this usage model to the
> > existing, pre-ABMC, AMD platforms I support.
> > 
> > Thanks,
> > -Peter
> > 
> > [1] https://lore.kernel.org/lkml/7ee63634-3b55-4427-8283-8e3d38105f41@intel.com/
> > 
> 
> Thanks for the heads-up. I understand what's going on and have an idea of
> the plan. Please keep us updated on the progress. Also, if any changes are
> needed in this series to meet your requirements, feel free to share your
> feedback.

Playing devil's advocate, I wonder whether there is a point beyond
which it would be better to have an interface to hand over some of the
counters to perf?

The logic for round-robin scheduling of events onto counters, dealing
with overflows etc. has already been invented over there, and it's
fiddly to get right.  Ideally resctrl wouldn't have its own special
implementation of that kind of stuff.

(Said my someone who has never tried to hack up an uncore event source
in perf.)

Cheers
---Dave
RE: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Luck, Tony 12 months ago
> Playing devil's advocate, I wonder whether there is a point beyond
> which it would be better to have an interface to hand over some of the
> counters to perf?
>
> The logic for round-robin scheduling of events onto counters, dealing
> with overflows etc. has already been invented over there, and it's
> fiddly to get right.  Ideally resctrl wouldn't have its own special
> implementation of that kind of stuff.
>
> (Said my someone who has never tried to hack up an uncore event source
> in perf.)

Initial implementation on Intel RDT tried to use perf ... it all went badly and
was reverted.

There are some very un-perf-like properties that we couldn't find a
workaround for at the time.

E.g.

1) Cache occupancy counters. These change even when your workload
isn't running (downward due to evictions).

2) Counters based on RMIDs show the aggregated values from multiple
CPUs as tasks are scheduled on cores.

But maybe you meant "don't let resctrl use all those counters" ... hand some
of them to perf to use in some other way?

-Tony