[RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)

Babu Moger posted 17 patches 1 year, 10 months ago
There is a newer version of this series
.../admin-guide/kernel-parameters.txt         |   2 +-
Documentation/arch/x86/resctrl.rst            | 144 ++++
arch/x86/include/asm/cpufeatures.h            |   1 +
arch/x86/include/asm/msr-index.h              |   2 +
arch/x86/kernel/cpu/cpuid-deps.c              |   3 +
arch/x86/kernel/cpu/resctrl/core.c            |  25 +-
arch/x86/kernel/cpu/resctrl/internal.h        |  56 +-
arch/x86/kernel/cpu/resctrl/monitor.c         |  24 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 714 +++++++++++++++++-
arch/x86/kernel/cpu/scattered.c               |   1 +
include/linux/resctrl.h                       |  12 +
11 files changed, 964 insertions(+), 20 deletions(-)
[RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Babu Moger 1 year, 10 months ago
This series adds the support for Assignable Bandwidth Monitoring Counters
(ABMC). It is also called QoS RMID Pinning feature

The feature details are documented in the  APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC). The documentation is available at
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

The patches are based on top of commit
cd80c2c94699913f9334414189487ff3f93cf0b5 (tip/master)

# Introduction

AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring
feature only guarantees that RMIDs currently assigned to a processor will
be tracked by hardware. The counters of any other RMIDs which are no longer
being tracked will be reset to zero. The MBM event counters return
"Unavailable" for the RMIDs that are not active.
    
Users can create 256 or more monitor groups. But there can be only limited
number of groups that can give guaranteed monitoring numbers. With ever
changing configurations there is no way to definitely know which of these
groups will be active for certain point of time. Users do not have the
option to monitor a group or set of groups for certain period of time
without worrying about RMID being reset in between.
    
The ABMC feature provides an option to the user to assign an RMID to the
hardware counter and monitor the bandwidth for a longer duration.
The assigned RMID will be active until the user unassigns it manually.
There is no need to worry about counters being reset during this period.
Additionally, the user can specify a bitmask identifying the specific
bandwidth types from the given source to track with the counter.

Without ABMC enabled, monitoring will work in current mode without
assignment option.

# Linux Implementation

Linux resctrl subsystem provides the interface to count maximum of two
memory bandwidth events per group, from a combination of available total
and local events. Keeping the current interface, users can assign a maximum
of 2 ABMC counters per group. User will also have the option to assign only
one counter to the group. If the system runs out of assignable ABMC
counters, kernel will display an error. Users need to unassign an already
assigned counter to make space for new assignments.


# Examples

a. Check if ABMC support is available
	#mount -t resctrl resctrl /sys/fs/resctrl/

	#cat /sys/fs/resctrl/info/L3_MON/mbm_assign
	[abmc] 
	legacy_mbm

	Linux kernel detected ABMC feature and it is enabled.

b. Check how many ABMC counters are available. 

	#cat /sys/fs/resctrl/info/L3_MON/mbm_assign_cntrs 
	32

c. Create few resctrl groups.

	# mkdir /sys/fs/resctrl/mon_groups/default_mon1
	# mkdir /sys/fs/resctrl/non_defult_group
	# mkdir /sys/fs/resctrl/non_defult_group/mon_groups/non_default_mon1

d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
   to list and modify the group's assignment states.

   The list follows the following format:

       * Default CTRL_MON group:
               "//<domain_id>=<assignment_flags>"

       * Non-default CTRL_MON group:
               "<CTRL_MON group>//<domain_id>=<assignment_flags>"

       * Child MON group of default CTRL_MON group:
               "/<MON group>/<domain_id>=<assignment_flags>"

       * Child MON group of non-default CTRL_MON group:
               "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"

       Assignment flags can be one of the following:

        t  MBM total event is assigned
        l  MBM local event is assigned
        tl Both total and local MBM events are assigned
        _  None of the MBM events are assigned

	Examples:

	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control 
	non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
	non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
	//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
	/default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;

	There are four groups and all the groups have local and total event assigned.

	"//" - This is a default CONTROL MON group

	"non_defult_group//" - This is non default CONTROL MON group

	"/default_mon1/"  - This is Child MON group of the defult group

	"non_defult_group/non_default_mon1/" - This is child MON group of the non default group

	=tl means both total and local events are assigned.

e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control.

	The write format is similar to the above list format with addition of
	op-code for the assignment operation.

        * Default CTRL_MON group:
                "//<domain_id><op-code><assignment_flags>"

        * Non-default CTRL_MON group:
                "<CTRL_MON group>//<domain_id><op-code><assignment_flags>"

        * Child MON group of default CTRL_MON group:
                "/<MON group>/<domain_id><op-code><assignment_flags>"

        * Child MON group of non-default CTRL_MON group:
                "<CTRL_MON group>/<MON group>/<domain_id><op-code><assignment_flags>"

       Op-code can be one of the following:

        = Update the assignment to match the flags
        + Assign a new state
        - Unassign a new state
        _ Unassign all the states


        Initial group status:

        # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
        non_default_ctrl_mon_grp//0=tl;1=tl;
        non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
        //0=tl;1=tl;
        /child_default_mon_grp/0=tl;1=tl;


        To update the default group to assign only total event.
        # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

        Assignment status after the update:
        # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
        non_default_ctrl_mon_grp//0=tl;1=tl;
        non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
        //0=t;1=t;
        /child_default_mon_grp/0=tl;1=tl;

        To update the MON group child_default_mon_grp to remove local event:
        # echo "/child_default_mon_grp/0-l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

        Assignment status after the update:
        # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
        //0=t;1=t;
        /child_default_mon_grp/0=t;1=t;
        non_default_ctrl_mon_grp//0=tl;1=tl;
        non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;

        To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
        remove both local and total events:
        # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/0_" >
                      /sys/fs/resctrl/info/L3_MON/mbm_assign_control

        Assignment status after the update:
        # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
        //0=t;1=t;
        /child_default_mon_grp/0=t;1=t;
        non_default_ctrl_mon_grp//0=tl;1=tl;
        non_default_ctrl_mon_grp/child_non_default_mon_grp/0=_;1=_;

	
f. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
   There is no change in reading the evetns with ABMC. If the event is unassigned
   when reading, then the read will come back as Unavailable.
	
	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	779247936
	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 
	765207488
	
g. Users will have the option to go back to legacy_mbm mode if required.
   This can be done using the following command.

	# echo "legacy_mbm" > /sys/fs/resctrl/info/L3_MON/mbm_assign
	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign
        abmc
        [legacy_mbm]

	
h. Check the bandwidth configuration for the group. Note that bandwidth
   configuration has a domain scope. Total event defaults to 0x7F (to
   count all the events) and local event defaults to 0x15 (to count all
   the local numa events). The event bitmap decoding is available at
   https://www.kernel.org/doc/Documentation/x86/resctrl.rst
   in section "mbm_total_bytes_config", "mbm_local_bytes_config":
	
	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
	0=0x7f;1=0x7f
	
	#cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config 
	0=0x15;1=0x15
	
j. Change the bandwidth source for domain 0 for the total event to count only reads.
   Note that this change effects total events on the domain 0.
	
	#echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
	0=0x33;1=0x7F
	
k. Now read the total event again. The mbm_total_bytes should display
   only the read events.
	
	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	314101
	
l. Unmount the resctrl
	 
	#umount /sys/fs/resctrl/

---
v3:
   This series adds the support for global assignment mode discussed in
   the thread. https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/
   Removed the individual assignment mode and included the global assignment interface.
   Added following interface files.
   a. /sys/fs/resctrl/info/L3_MON/mbm_assign
      Used for displaying the current assignment mode and switch between
      ABMC and legacy mode.
   b. /sys/fs/resctrl/info/L3_MON/mbm_assign_control
      Used for lising the groups assignment mode and modify the assignment states.
   c. Most of the changes are related to the new interface.
   d. Addressed the comments from Reinette, James and Peter.
   e. Hope I have addressed most of the major feedbacks discussed. If I missed
      something then it is not intentional. Please feel free to comment.
   f. Sending this as an RFC as per Reinette's comment. So, this is still open
      for discussion.

v2:
   a. Major change is the way ABMC is enabled. Earlier, user needed to remount
      with -o abmc to enable ABMC feature. Removed that option now.
      Now users can enable ABMC by "$echo 1 to /sys/fs/resctrl/info/L3_MON/mbm_assign_enable".
     
   b. Added new word 21 to x86/cpufeatures.h.

   c. Display unsupported if user attempts to read the events when ABMC is enabled
      and event is not assigned.

   d. Display monitor_state as "Unsupported" when ABMC is disabled.
  
   e. Text updates and rebase to latest tip tree (as of Jan 18).
 
   f. This series is still work in progress. I am yet to hear from ARM developers. 

v2:
  https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/

v1 :
   https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/


Babu Moger (17):
  x86/resctrl: Add support for Assignable Bandwidth Monitoring Counters
    (ABMC)
  x86/resctrl: Add ABMC feature in the command line options
  x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  x86/resctrl: Introduce resctrl_file_fflags_init
  x86/resctrl: Introduce the interface to display the assignment state
  x86/resctrl: Introduce interface to display number of ABMC counters
  x86/resctrl: Add support to enable/disable ABMC feature
  x86/resctrl: Initialize assignable counters bitmap
  x86/resctrl: Introduce assign state for the mon group
  x86/resctrl: Add data structures for ABMC assignment
  x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg
  x86/resctrl: Add the functionality to assign the RMID
  x86/resctrl: Add the functionality to unassign the RMID
  x86/resctrl: Enable ABMC by default on resctrl mount
  x86/resctrl: Introduce the interface switch between ABMC and
    legacy_mbm
  x86/resctrl: Introduce interface to list assignment states of all the
    groups
  x86/resctrl: Introduce interface to modify assignment states of the
    groups

 .../admin-guide/kernel-parameters.txt         |   2 +-
 Documentation/arch/x86/resctrl.rst            | 144 ++++
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/msr-index.h              |   2 +
 arch/x86/kernel/cpu/cpuid-deps.c              |   3 +
 arch/x86/kernel/cpu/resctrl/core.c            |  25 +-
 arch/x86/kernel/cpu/resctrl/internal.h        |  56 +-
 arch/x86/kernel/cpu/resctrl/monitor.c         |  24 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 714 +++++++++++++++++-
 arch/x86/kernel/cpu/scattered.c               |   1 +
 include/linux/resctrl.h                       |  12 +
 11 files changed, 964 insertions(+), 20 deletions(-)

-- 
2.34.1


Babu Moger (17):
  x86/resctrl: Add support for Assignable Bandwidth Monitoring Counters
    (ABMC)
  x86/resctrl: Add ABMC feature in the command line options
  x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
  x86/resctrl: Introduce resctrl_file_fflags_init
  x86/resctrl: Introduce the interface to display the assignment state
  x86/resctrl: Introduce interface to display number of ABMC counters
  x86/resctrl: Add support to enable/disable ABMC feature
  x86/resctrl: Initialize assignable counters bitmap
  x86/resctrl: Introduce assign state for the mon group
  x86/resctrl: Add data structures for ABMC assignment
  x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg
  x86/resctrl: Add the functionality to assign the RMID
  x86/resctrl: Add the functionality to unassign the RMID
  x86/resctrl: Enable ABMC by default on resctrl mount
  x86/resctrl: Introduce the interface switch between ABMC and
    legacy_mbm
  x86/resctrl: Introduce interface to list assignment states of all the
    groups
  x86/resctrl: Introduce interface to modify assignment states of the
    groups

 .../admin-guide/kernel-parameters.txt         |   2 +-
 Documentation/arch/x86/resctrl.rst            | 144 ++++
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/msr-index.h              |   2 +
 arch/x86/kernel/cpu/cpuid-deps.c              |   3 +
 arch/x86/kernel/cpu/resctrl/core.c            |  25 +-
 arch/x86/kernel/cpu/resctrl/internal.h        |  56 +-
 arch/x86/kernel/cpu/resctrl/monitor.c         |  24 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 714 +++++++++++++++++-
 arch/x86/kernel/cpu/scattered.c               |   1 +
 include/linux/resctrl.h                       |  12 +
 11 files changed, 964 insertions(+), 20 deletions(-)

-- 
2.34.1
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 1 year, 9 months ago
Hi Babu,

On 3/28/2024 6:06 PM, Babu Moger wrote:

> a. Check if ABMC support is available
> 	#mount -t resctrl resctrl /sys/fs/resctrl/
> 
> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_assign
> 	[abmc] 
> 	legacy_mbm
> 
> 	Linux kernel detected ABMC feature and it is enabled.

Please note that this adds the "abmc" feature to the resctrl
*filesystem* that supports more architectures than just AMD. Calling the
resctrl filesystem feature "abmc" means that (a) AMD needs to be ok with
other architectures calling their features that are
similar-but-maybe-not-identical-to-AMD-ABMC "abmc", or (b) this needs
a new generic name.

> b. Check how many ABMC counters are available. 
> 
> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_assign_cntrs 
> 	32
> 
> c. Create few resctrl groups.
> 
> 	# mkdir /sys/fs/resctrl/mon_groups/default_mon1
> 	# mkdir /sys/fs/resctrl/non_defult_group

Can this be non_default_group instead? Seems like non_defult_group is used
consistently but its spelling is unexpected.

> 	# mkdir /sys/fs/resctrl/non_defult_group/mon_groups/non_default_mon1
> 
> d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>    to list and modify the group's assignment states.
> 
>    The list follows the following format:
> 
>        * Default CTRL_MON group:
>                "//<domain_id>=<assignment_flags>"
> 
>        * Non-default CTRL_MON group:
>                "<CTRL_MON group>//<domain_id>=<assignment_flags>"
> 
>        * Child MON group of default CTRL_MON group:
>                "/<MON group>/<domain_id>=<assignment_flags>"
> 
>        * Child MON group of non-default CTRL_MON group:
>                "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
> 
>        Assignment flags can be one of the following:
> 
>         t  MBM total event is assigned
>         l  MBM local event is assigned
>         tl Both total and local MBM events are assigned
>         _  None of the MBM events are assigned
> 
> 	Examples:
> 
> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control 
> 	non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> 	non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> 	//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> 	/default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> 
> 	There are four groups and all the groups have local and total event assigned.
> 
> 	"//" - This is a default CONTROL MON group
> 
> 	"non_defult_group//" - This is non default CONTROL MON group
> 
> 	"/default_mon1/"  - This is Child MON group of the defult group
> 
> 	"non_defult_group/non_default_mon1/" - This is child MON group of the non default group
> 
> 	=tl means both total and local events are assigned.
> 
> e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control.
> 
> 	The write format is similar to the above list format with addition of
> 	op-code for the assignment operation.
> 
>         * Default CTRL_MON group:
>                 "//<domain_id><op-code><assignment_flags>"
> 
>         * Non-default CTRL_MON group:
>                 "<CTRL_MON group>//<domain_id><op-code><assignment_flags>"
> 
>         * Child MON group of default CTRL_MON group:
>                 "/<MON group>/<domain_id><op-code><assignment_flags>"
> 
>         * Child MON group of non-default CTRL_MON group:
>                 "<CTRL_MON group>/<MON group>/<domain_id><op-code><assignment_flags>"
> 
>        Op-code can be one of the following:
> 
>         = Update the assignment to match the flags
>         + Assign a new state
>         - Unassign a new state
>         _ Unassign all the states

As mentioned in https://lore.kernel.org/lkml/ZjO9hpuLz%2FjJYqvT@e133380.arm.com/
the "_" is not an operator but instead viewed as an part of <assignment_flags>.
It is expected to be used with "=", to unset flags it will be used as below:

echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/0=_" ...

> 
> 
>         Initial group status:
> 
>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>         non_default_ctrl_mon_grp//0=tl;1=tl;
>         non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>         //0=tl;1=tl;
>         /child_default_mon_grp/0=tl;1=tl;
> 
> 
>         To update the default group to assign only total event.
>         # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> 
>         Assignment status after the update:
>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>         non_default_ctrl_mon_grp//0=tl;1=tl;
>         non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>         //0=t;1=t;
>         /child_default_mon_grp/0=tl;1=tl;

As mentioned in https://lore.kernel.org/lkml/330e3391-b917-4a88-bae3-bdcbb8cfd6f4@intel.com/
using "0=t" is expected to only impact domain #0, not all domains. Similar for
other examples below.

> 
>         To update the MON group child_default_mon_grp to remove local event:
>         # echo "/child_default_mon_grp/0-l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> 
>         Assignment status after the update:
>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>         //0=t;1=t;
>         /child_default_mon_grp/0=t;1=t;
>         non_default_ctrl_mon_grp//0=tl;1=tl;
>         non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> 
>         To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
>         remove both local and total events:
>         # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/0_" >
>                       /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> 
>         Assignment status after the update:
>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>         //0=t;1=t;
>         /child_default_mon_grp/0=t;1=t;
>         non_default_ctrl_mon_grp//0=tl;1=tl;
>         non_default_ctrl_mon_grp/child_non_default_mon_grp/0=_;1=_;
> 
> 	
> f. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
>    There is no change in reading the evetns with ABMC. If the event is unassigned

evetns -> events

>    when reading, then the read will come back as Unavailable.
> 	
> 	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 	779247936
> 	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 
> 	765207488
> 	
> g. Users will have the option to go back to legacy_mbm mode if required.
>    This can be done using the following command.
> 
> 	# echo "legacy_mbm" > /sys/fs/resctrl/info/L3_MON/mbm_assign
> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>         abmc
>         [legacy_mbm]
> 

This needs a mention about how state is impacted when a user makes this
switch. For example, if switching from "legacy" to abmc ... if there
are fewer than "num counters" monitor groups, will they get counters
assigned dynamically? What happens to feature specific resctrl files?
What happens to the counters themselves, are they reset? What else
happens during this switch?

> 	
> h. Check the bandwidth configuration for the group. Note that bandwidth
>    configuration has a domain scope. Total event defaults to 0x7F (to
>    count all the events) and local event defaults to 0x15 (to count all
>    the local numa events). The event bitmap decoding is available at
>    https://www.kernel.org/doc/Documentation/x86/resctrl.rst
>    in section "mbm_total_bytes_config", "mbm_local_bytes_config":
> 	
> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
> 	0=0x7f;1=0x7f
> 	
> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config 
> 	0=0x15;1=0x15
> 	
> j. Change the bandwidth source for domain 0 for the total event to count only reads.
>    Note that this change effects total events on the domain 0.
> 	
> 	#echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
> 	0=0x33;1=0x7F
> 	
> k. Now read the total event again. The mbm_total_bytes should display
>    only the read events.
> 	
> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 	314101
> 	
> l. Unmount the resctrl
> 	 
> 	#umount /sys/fs/resctrl/
> 
> ---

Reinette
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 1 year, 9 months ago
Hi Reinette,

On 5/3/24 18:24, Reinette Chatre wrote:
> Hi Babu,
> 
> On 3/28/2024 6:06 PM, Babu Moger wrote:
> 
>> a. Check if ABMC support is available
>> 	#mount -t resctrl resctrl /sys/fs/resctrl/
>>
>> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>> 	[abmc] 
>> 	legacy_mbm
>>
>> 	Linux kernel detected ABMC feature and it is enabled.
> 
> Please note that this adds the "abmc" feature to the resctrl
> *filesystem* that supports more architectures than just AMD. Calling the
> resctrl filesystem feature "abmc" means that (a) AMD needs to be ok with
> other architectures calling their features that are
> similar-but-maybe-not-identical-to-AMD-ABMC "abmc", or (b) this needs
> a new generic name.

It should not a problem if other architecture calling abmc for similar
feature. But generic name is always better if there is a suggestion.


> 
>> b. Check how many ABMC counters are available. 
>>
>> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_assign_cntrs 
>> 	32
>>
>> c. Create few resctrl groups.
>>
>> 	# mkdir /sys/fs/resctrl/mon_groups/default_mon1
>> 	# mkdir /sys/fs/resctrl/non_defult_group
> 
> Can this be non_default_group instead? Seems like non_defult_group is used
> consistently but its spelling is unexpected.

Ok. Will correct it. Thanks

> 
>> 	# mkdir /sys/fs/resctrl/non_defult_group/mon_groups/non_default_mon1
>>
>> d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>    to list and modify the group's assignment states.
>>
>>    The list follows the following format:
>>
>>        * Default CTRL_MON group:
>>                "//<domain_id>=<assignment_flags>"
>>
>>        * Non-default CTRL_MON group:
>>                "<CTRL_MON group>//<domain_id>=<assignment_flags>"
>>
>>        * Child MON group of default CTRL_MON group:
>>                "/<MON group>/<domain_id>=<assignment_flags>"
>>
>>        * Child MON group of non-default CTRL_MON group:
>>                "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
>>
>>        Assignment flags can be one of the following:
>>
>>         t  MBM total event is assigned
>>         l  MBM local event is assigned
>>         tl Both total and local MBM events are assigned
>>         _  None of the MBM events are assigned
>>
>> 	Examples:
>>
>> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control 
>> 	non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>> 	non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>> 	//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>> 	/default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>
>> 	There are four groups and all the groups have local and total event assigned.
>>
>> 	"//" - This is a default CONTROL MON group
>>
>> 	"non_defult_group//" - This is non default CONTROL MON group
>>
>> 	"/default_mon1/"  - This is Child MON group of the defult group
>>
>> 	"non_defult_group/non_default_mon1/" - This is child MON group of the non default group
>>
>> 	=tl means both total and local events are assigned.
>>
>> e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control.
>>
>> 	The write format is similar to the above list format with addition of
>> 	op-code for the assignment operation.
>>
>>         * Default CTRL_MON group:
>>                 "//<domain_id><op-code><assignment_flags>"
>>
>>         * Non-default CTRL_MON group:
>>                 "<CTRL_MON group>//<domain_id><op-code><assignment_flags>"
>>
>>         * Child MON group of default CTRL_MON group:
>>                 "/<MON group>/<domain_id><op-code><assignment_flags>"
>>
>>         * Child MON group of non-default CTRL_MON group:
>>                 "<CTRL_MON group>/<MON group>/<domain_id><op-code><assignment_flags>"
>>
>>        Op-code can be one of the following:
>>
>>         = Update the assignment to match the flags
>>         + Assign a new state
>>         - Unassign a new state
>>         _ Unassign all the states
> 
> As mentioned in https://lore.kernel.org/lkml/ZjO9hpuLz%2FjJYqvT@e133380.arm.com/
> the "_" is not an operator but instead viewed as an part of <assignment_flags>.
> It is expected to be used with "=", to unset flags it will be used as below:
> 
> echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/0=_" ...

Oh.. ok.

Will correct it. I also need to verify the parshing..

> 
>>
>>
>>         Initial group status:
>>
>>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>         non_default_ctrl_mon_grp//0=tl;1=tl;
>>         non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>>         //0=tl;1=tl;
>>         /child_default_mon_grp/0=tl;1=tl;
>>
>>
>>         To update the default group to assign only total event.
>>         # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>
>>         Assignment status after the update:
>>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>         non_default_ctrl_mon_grp//0=tl;1=tl;
>>         non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>>         //0=t;1=t;
>>         /child_default_mon_grp/0=tl;1=tl;
> 
> As mentioned in https://lore.kernel.org/lkml/330e3391-b917-4a88-bae3-bdcbb8cfd6f4@intel.com/
> using "0=t" is expected to only impact domain #0, not all domains. Similar for
> other examples below.

Ok. Sure

> 
>>
>>         To update the MON group child_default_mon_grp to remove local event:
>>         # echo "/child_default_mon_grp/0-l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>
>>         Assignment status after the update:
>>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>         //0=t;1=t;
>>         /child_default_mon_grp/0=t;1=t;
>>         non_default_ctrl_mon_grp//0=tl;1=tl;
>>         non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>>
>>         To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
>>         remove both local and total events:
>>         # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/0_" >
>>                       /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>
>>         Assignment status after the update:
>>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>         //0=t;1=t;
>>         /child_default_mon_grp/0=t;1=t;
>>         non_default_ctrl_mon_grp//0=tl;1=tl;
>>         non_default_ctrl_mon_grp/child_non_default_mon_grp/0=_;1=_;
>>
>> 	
>> f. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
>>    There is no change in reading the evetns with ABMC. If the event is unassigned
> 
> evetns -> events

Sure.

> 
>>    when reading, then the read will come back as Unavailable.
>> 	
>> 	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> 	779247936
>> 	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 
>> 	765207488
>> 	
>> g. Users will have the option to go back to legacy_mbm mode if required.
>>    This can be done using the following command.
>>
>> 	# echo "legacy_mbm" > /sys/fs/resctrl/info/L3_MON/mbm_assign
>> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>>         abmc
>>         [legacy_mbm]
>>
> 
> This needs a mention about how state is impacted when a user makes this
> switch. For example, if switching from "legacy" to abmc ... if there
> are fewer than "num counters" monitor groups, will they get counters
> assigned dynamically? What happens to feature specific resctrl files?
> What happens to the counters themselves, are they reset? What else
> happens during this switch?

Sure. Will add the explanation.

When switching from "legacy" to abmc, events in already created resctrl
groups will be in Unassigned states. Users need to assign the monitors to
each group to read the events.

> 
>> 	
>> h. Check the bandwidth configuration for the group. Note that bandwidth
>>    configuration has a domain scope. Total event defaults to 0x7F (to
>>    count all the events) and local event defaults to 0x15 (to count all
>>    the local numa events). The event bitmap decoding is available at
>>    https://www.kernel.org/doc/Documentation/x86/resctrl.rst
>>    in section "mbm_total_bytes_config", "mbm_local_bytes_config":
>> 	
>> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
>> 	0=0x7f;1=0x7f
>> 	
>> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config 
>> 	0=0x15;1=0x15
>> 	
>> j. Change the bandwidth source for domain 0 for the total event to count only reads.
>>    Note that this change effects total events on the domain 0.
>> 	
>> 	#echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
>> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
>> 	0=0x33;1=0x7F
>> 	
>> k. Now read the total event again. The mbm_total_bytes should display
>>    only the read events.
>> 	
>> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> 	314101
>> 	
>> l. Unmount the resctrl
>> 	 
>> 	#umount /sys/fs/resctrl/
>>
>> ---
> 
> Reinette

-- 
Thanks
Babu Moger
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 1 year, 9 months ago
Hi Babu,

On 5/6/2024 10:18 AM, Moger, Babu wrote:
> On 5/3/24 18:24, Reinette Chatre wrote:
>> On 3/28/2024 6:06 PM, Babu Moger wrote:
>>
>>> a. Check if ABMC support is available
>>> 	#mount -t resctrl resctrl /sys/fs/resctrl/
>>>
>>> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>>> 	[abmc] 
>>> 	legacy_mbm
>>>
>>> 	Linux kernel detected ABMC feature and it is enabled.
>>
>> Please note that this adds the "abmc" feature to the resctrl
>> *filesystem* that supports more architectures than just AMD. Calling the
>> resctrl filesystem feature "abmc" means that (a) AMD needs to be ok with
>> other architectures calling their features that are
>> similar-but-maybe-not-identical-to-AMD-ABMC "abmc", or (b) this needs
>> a new generic name.
> 
> It should not a problem if other architecture calling abmc for similar
> feature. But generic name is always better if there is a suggestion.

"should not a problem" does not instill confidence that AMD is
actually ok with this.

Reinette
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 1 year, 9 months ago
Hi Reinette,

On 5/7/24 15:26, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/6/2024 10:18 AM, Moger, Babu wrote:
>> On 5/3/24 18:24, Reinette Chatre wrote:
>>> On 3/28/2024 6:06 PM, Babu Moger wrote:
>>>
>>>> a. Check if ABMC support is available
>>>> 	#mount -t resctrl resctrl /sys/fs/resctrl/
>>>>
>>>> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>>>> 	[abmc] 
>>>> 	legacy_mbm
>>>>
>>>> 	Linux kernel detected ABMC feature and it is enabled.
>>>
>>> Please note that this adds the "abmc" feature to the resctrl
>>> *filesystem* that supports more architectures than just AMD. Calling the
>>> resctrl filesystem feature "abmc" means that (a) AMD needs to be ok with
>>> other architectures calling their features that are
>>> similar-but-maybe-not-identical-to-AMD-ABMC "abmc", or (b) this needs
>>> a new generic name.
>>
>> It should not a problem if other architecture calling abmc for similar
>> feature. But generic name is always better if there is a suggestion.
> 
> "should not a problem" does not instill confidence that AMD is
> actually ok with this.

The feature "ABMC" has been used in the public document already to refer
this feature.
https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24594.pdf

If there comes a conflict then we can change it to amd_abmc. Didn't see
any conflict at this pint.
-- 
Thanks
Babu Moger
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 1 year, 9 months ago
Hi Babu,

On 5/8/2024 1:07 PM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 5/7/24 15:26, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 5/6/2024 10:18 AM, Moger, Babu wrote:
>>> On 5/3/24 18:24, Reinette Chatre wrote:
>>>> On 3/28/2024 6:06 PM, Babu Moger wrote:
>>>>
>>>>> a. Check if ABMC support is available
>>>>> 	#mount -t resctrl resctrl /sys/fs/resctrl/
>>>>>
>>>>> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>>>>> 	[abmc] 
>>>>> 	legacy_mbm
>>>>>
>>>>> 	Linux kernel detected ABMC feature and it is enabled.
>>>>
>>>> Please note that this adds the "abmc" feature to the resctrl
>>>> *filesystem* that supports more architectures than just AMD. Calling the
>>>> resctrl filesystem feature "abmc" means that (a) AMD needs to be ok with
>>>> other architectures calling their features that are
>>>> similar-but-maybe-not-identical-to-AMD-ABMC "abmc", or (b) this needs
>>>> a new generic name.
>>>
>>> It should not a problem if other architecture calling abmc for similar
>>> feature. But generic name is always better if there is a suggestion.
>>
>> "should not a problem" does not instill confidence that AMD is
>> actually ok with this.
> 
> The feature "ABMC" has been used in the public document already to refer
> this feature.
> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24594.pdf

It is clear to me that Assignable Bandwidth Monitoring Counters (ABMC) is the
name of the AMD feature. The question is whether users can use the 
same name to interact with "similar but maybe not identical" features from other
architectures, which is what this series enables.

> If there comes a conflict then we can change it to amd_abmc. Didn't see
> any conflict at this pint.

How do you envision this? The resctrl filesystem interface is intended to be
architecture neutral so it is not obvious to me how "amd_abmc" is expected
to look? Why would it be necessary to have different architecture specific names
for a similar feature from different architectures that users interact with in
the same way? Sounds to me as though this just needs a new non-AMD marketing name. 

Reinette
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 1 year, 9 months ago
Hi Reinette,

On 5/8/24 15:41, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/8/2024 1:07 PM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 5/7/24 15:26, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 5/6/2024 10:18 AM, Moger, Babu wrote:
>>>> On 5/3/24 18:24, Reinette Chatre wrote:
>>>>> On 3/28/2024 6:06 PM, Babu Moger wrote:
>>>>>
>>>>>> a. Check if ABMC support is available
>>>>>> 	#mount -t resctrl resctrl /sys/fs/resctrl/
>>>>>>
>>>>>> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>>>>>> 	[abmc] 
>>>>>> 	legacy_mbm
>>>>>>
>>>>>> 	Linux kernel detected ABMC feature and it is enabled.
>>>>>
>>>>> Please note that this adds the "abmc" feature to the resctrl
>>>>> *filesystem* that supports more architectures than just AMD. Calling the
>>>>> resctrl filesystem feature "abmc" means that (a) AMD needs to be ok with
>>>>> other architectures calling their features that are
>>>>> similar-but-maybe-not-identical-to-AMD-ABMC "abmc", or (b) this needs
>>>>> a new generic name.
>>>>
>>>> It should not a problem if other architecture calling abmc for similar
>>>> feature. But generic name is always better if there is a suggestion.
>>>
>>> "should not a problem" does not instill confidence that AMD is
>>> actually ok with this.
>>
>> The feature "ABMC" has been used in the public document already to refer
>> this feature.
>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24594.pdf
> 
> It is clear to me that Assignable Bandwidth Monitoring Counters (ABMC) is the
> name of the AMD feature. The question is whether users can use the 
> same name to interact with "similar but maybe not identical" features from other
> architectures, which is what this series enables.
> 
>> If there comes a conflict then we can change it to amd_abmc. Didn't see
>> any conflict at this pint.
> 
> How do you envision this? The resctrl filesystem interface is intended to be
> architecture neutral so it is not obvious to me how "amd_abmc" is expected
> to look? Why would it be necessary to have different architecture specific names
> for a similar feature from different architectures that users interact with in
> the same way? Sounds to me as though this just needs a new non-AMD marketing name. 

I think I misunderstood it.
It is not a concern to have a same name("abmc") for similar feature across
the architectures.

ABMC is also kind of generic. I am open to other generic suggestions. I
think we should have "assign" and "monitor" words in them.
-- 
Thanks
Babu Moger
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 1 year, 9 months ago
Hi Babu,

On 5/8/2024 4:29 PM, Moger, Babu wrote:
> On 5/8/24 15:41, Reinette Chatre wrote:
>> On 5/8/2024 1:07 PM, Moger, Babu wrote:
>>> On 5/7/24 15:26, Reinette Chatre wrote:
>>>> On 5/6/2024 10:18 AM, Moger, Babu wrote:
>>>>> On 5/3/24 18:24, Reinette Chatre wrote:
>>>>>> On 3/28/2024 6:06 PM, Babu Moger wrote:
>>>>>>
>>>>>>> a. Check if ABMC support is available
>>>>>>> 	#mount -t resctrl resctrl /sys/fs/resctrl/
>>>>>>>
>>>>>>> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>>>>>>> 	[abmc] 
>>>>>>> 	legacy_mbm
>>>>>>>
>>>>>>> 	Linux kernel detected ABMC feature and it is enabled.
>>>>>>
>>>>>> Please note that this adds the "abmc" feature to the resctrl
>>>>>> *filesystem* that supports more architectures than just AMD. Calling the
>>>>>> resctrl filesystem feature "abmc" means that (a) AMD needs to be ok with
>>>>>> other architectures calling their features that are
>>>>>> similar-but-maybe-not-identical-to-AMD-ABMC "abmc", or (b) this needs
>>>>>> a new generic name.
>>>>>
>>>>> It should not a problem if other architecture calling abmc for similar
>>>>> feature. But generic name is always better if there is a suggestion.
>>>>
>>>> "should not a problem" does not instill confidence that AMD is
>>>> actually ok with this.
>>>
>>> The feature "ABMC" has been used in the public document already to refer
>>> this feature.
>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24594.pdf
>>
>> It is clear to me that Assignable Bandwidth Monitoring Counters (ABMC) is the
>> name of the AMD feature. The question is whether users can use the 
>> same name to interact with "similar but maybe not identical" features from other
>> architectures, which is what this series enables.
>>
>>> If there comes a conflict then we can change it to amd_abmc. Didn't see
>>> any conflict at this pint.
>>
>> How do you envision this? The resctrl filesystem interface is intended to be
>> architecture neutral so it is not obvious to me how "amd_abmc" is expected
>> to look? Why would it be necessary to have different architecture specific names
>> for a similar feature from different architectures that users interact with in
>> the same way? Sounds to me as though this just needs a new non-AMD marketing name. 
> 
> I think I misunderstood it.
> It is not a concern to have a same name("abmc") for similar feature across
> the architectures.

Thank you for confirming. This joins BMEC and SMBA in this regard.

> ABMC is also kind of generic. I am open to other generic suggestions. I
> think we should have "assign" and "monitor" words in them.

One thing we can consider is to move to a simple "enable"/"disable"
interface for events. Users do not really need to know that hardware
needs to "assign a counter" to an event for it to measure. Yes,
user space can infer some of this by the number of events that
can be "enabled" at a time, but the concept of "assign a hardware counter"
is abstract and does not directly map to (as I understand) the soft-RMID
approach for other AMD hardware. Peter reminded us a while back [1] that
"assign" has a variety of meanings, even among AMD, so we should aim to
avoid any confusion. 

Reinette

[1] https://lore.kernel.org/lkml/CALPaoCjg-W3w8OKLHP_g6Evoo03fbgaOQZrGTLX6vdSLp70=SA@mail.gmail.com/
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 1 year, 9 months ago
Hi Reinette,

On 5/9/24 13:07, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/8/2024 4:29 PM, Moger, Babu wrote:
>> On 5/8/24 15:41, Reinette Chatre wrote:
>>> On 5/8/2024 1:07 PM, Moger, Babu wrote:
>>>> On 5/7/24 15:26, Reinette Chatre wrote:
>>>>> On 5/6/2024 10:18 AM, Moger, Babu wrote:
>>>>>> On 5/3/24 18:24, Reinette Chatre wrote:
>>>>>>> On 3/28/2024 6:06 PM, Babu Moger wrote:
>>>>>>>
>>>>>>>> a. Check if ABMC support is available
>>>>>>>> 	#mount -t resctrl resctrl /sys/fs/resctrl/
>>>>>>>>
>>>>>>>> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>>>>>>>> 	[abmc] 
>>>>>>>> 	legacy_mbm
>>>>>>>>
>>>>>>>> 	Linux kernel detected ABMC feature and it is enabled.
>>>>>>>
>>>>>>> Please note that this adds the "abmc" feature to the resctrl
>>>>>>> *filesystem* that supports more architectures than just AMD. Calling the
>>>>>>> resctrl filesystem feature "abmc" means that (a) AMD needs to be ok with
>>>>>>> other architectures calling their features that are
>>>>>>> similar-but-maybe-not-identical-to-AMD-ABMC "abmc", or (b) this needs
>>>>>>> a new generic name.
>>>>>>
>>>>>> It should not a problem if other architecture calling abmc for similar
>>>>>> feature. But generic name is always better if there is a suggestion.
>>>>>
>>>>> "should not a problem" does not instill confidence that AMD is
>>>>> actually ok with this.
>>>>
>>>> The feature "ABMC" has been used in the public document already to refer
>>>> this feature.
>>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24594.pdf
>>>
>>> It is clear to me that Assignable Bandwidth Monitoring Counters (ABMC) is the
>>> name of the AMD feature. The question is whether users can use the 
>>> same name to interact with "similar but maybe not identical" features from other
>>> architectures, which is what this series enables.
>>>
>>>> If there comes a conflict then we can change it to amd_abmc. Didn't see
>>>> any conflict at this pint.
>>>
>>> How do you envision this? The resctrl filesystem interface is intended to be
>>> architecture neutral so it is not obvious to me how "amd_abmc" is expected
>>> to look? Why would it be necessary to have different architecture specific names
>>> for a similar feature from different architectures that users interact with in
>>> the same way? Sounds to me as though this just needs a new non-AMD marketing name. 
>>
>> I think I misunderstood it.
>> It is not a concern to have a same name("abmc") for similar feature across
>> the architectures.
> 
> Thank you for confirming. This joins BMEC and SMBA in this regard.

Yes. Sure.
> 
>> ABMC is also kind of generic. I am open to other generic suggestions. I
>> think we should have "assign" and "monitor" words in them.
> 
> One thing we can consider is to move to a simple "enable"/"disable"
> interface for events. Users do not really need to know that hardware
> needs to "assign a counter" to an event for it to measure. Yes,
> user space can infer some of this by the number of events that
> can be "enabled" at a time, but the concept of "assign a hardware counter"
> is abstract and does not directly map to (as I understand) the soft-RMID
> approach for other AMD hardware. Peter reminded us a while back [1] that
> "assign" has a variety of meanings, even among AMD, so we should aim to
> avoid any confusion.

Documentation will refer the events as enabled or disabled instead of
assigned or unassigned.

For example:
Event flags can be one of the following:

        t  MBM total event is enabled
        l  MBM local event is enabled
        tl MBM total and local events are enabled
        _  None of the MBM events are enabled

It is fine.

> 
> Reinette
> 
> [1] https://lore.kernel.org/lkml/CALPaoCjg-W3w8OKLHP_g6Evoo03fbgaOQZrGTLX6vdSLp70=SA@mail.gmail.com/
> 
> 

-- 
Thanks
Babu Moger
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 1 year, 9 months ago
Hi Babu,

On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@amd.com> wrote:
>
>
> This series adds the support for Assignable Bandwidth Monitoring Counters
> (ABMC). It is also called QoS RMID Pinning feature
>
> The feature details are documented in the  APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC). The documentation is available at
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>
> The patches are based on top of commit
> cd80c2c94699913f9334414189487ff3f93cf0b5 (tip/master)
>
> # Introduction
>
> AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring
> feature only guarantees that RMIDs currently assigned to a processor will
> be tracked by hardware. The counters of any other RMIDs which are no longer
> being tracked will be reset to zero. The MBM event counters return
> "Unavailable" for the RMIDs that are not active.
>
> Users can create 256 or more monitor groups. But there can be only limited
> number of groups that can give guaranteed monitoring numbers. With ever
> changing configurations there is no way to definitely know which of these
> groups will be active for certain point of time. Users do not have the
> option to monitor a group or set of groups for certain period of time
> without worrying about RMID being reset in between.
>
> The ABMC feature provides an option to the user to assign an RMID to the
> hardware counter and monitor the bandwidth for a longer duration.
> The assigned RMID will be active until the user unassigns it manually.
> There is no need to worry about counters being reset during this period.
> Additionally, the user can specify a bitmask identifying the specific
> bandwidth types from the given source to track with the counter.
>
> Without ABMC enabled, monitoring will work in current mode without
> assignment option.
>
> # Linux Implementation
>
> Linux resctrl subsystem provides the interface to count maximum of two
> memory bandwidth events per group, from a combination of available total
> and local events. Keeping the current interface, users can assign a maximum
> of 2 ABMC counters per group. User will also have the option to assign only
> one counter to the group. If the system runs out of assignable ABMC
> counters, kernel will display an error. Users need to unassign an already
> assigned counter to make space for new assignments.
>
>
> # Examples
>
> a. Check if ABMC support is available
>         #mount -t resctrl resctrl /sys/fs/resctrl/
>
>         #cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>         [abmc]
>         legacy_mbm
>
>         Linux kernel detected ABMC feature and it is enabled.
>
> b. Check how many ABMC counters are available.
>
>         #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_cntrs
>         32
>
> c. Create few resctrl groups.
>
>         # mkdir /sys/fs/resctrl/mon_groups/default_mon1
>         # mkdir /sys/fs/resctrl/non_defult_group
>         # mkdir /sys/fs/resctrl/non_defult_group/mon_groups/non_default_mon1
>
> d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>    to list and modify the group's assignment states.
>
>    The list follows the following format:
>
>        * Default CTRL_MON group:
>                "//<domain_id>=<assignment_flags>"
>
>        * Non-default CTRL_MON group:
>                "<CTRL_MON group>//<domain_id>=<assignment_flags>"
>
>        * Child MON group of default CTRL_MON group:
>                "/<MON group>/<domain_id>=<assignment_flags>"
>
>        * Child MON group of non-default CTRL_MON group:
>                "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
>
>        Assignment flags can be one of the following:
>
>         t  MBM total event is assigned
>         l  MBM local event is assigned
>         tl Both total and local MBM events are assigned
>         _  None of the MBM events are assigned
>

I was able to successfully build a kernel where this interface is
adapted to work with both real ABMC on hardware that supports it and
my software workaround for older hardware.

My prototype is based on a refactored version of the codebase
supporting MPAM, but the capabilities of the MPAM hardware look
similar enough to ABMC that I'm not concerned about the feasibility.

The FS layer is informed by the arch layer (through rdt_resource
fields) how many assignable monitors are available and whether a
monitor is assigned to an entire group or a single event in a group.
Also, the FS layer can assume that monitors are indexed contiguously,
allowing it to host the data structures managing FS-level view of
monitor usage.

I used the following resctrl_arch-interfaces to propagate assignments
to the implementation:

void resctrl_arch_assign_monitor(struct rdt_domain *d, u32 mon_id, u32
closid, u32 rmid, int evtid);
void resctrl_arch_unassign_monitor(struct rdt_domain *d, u32 mon_id);

I chose to allow reassigning an assigned monitor without calling
unassign first. This is important when monitors are unassigned and
assigned in a single write to mbm_assign_control, as it allows all
updates to be performed in a single round of parallel IPIs to the
domains.


>
> g. Users will have the option to go back to legacy_mbm mode if required.
>    This can be done using the following command.
>
>         # echo "legacy_mbm" > /sys/fs/resctrl/info/L3_MON/mbm_assign
>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>         abmc
>         [legacy_mbm]

I chose to make this a mount option to simplify the management of the
monitor tracking data structures. They are simply allocated at mount
time and deallocated and unmount.

I called the option "mon_assign": The mount option parser calls
resctrl_arch_mon_assign_enable() to determine whether the
implementation supports assignment in some form. If it returns an
error, the mount fails. When successful, the assignable monitor count
is made non-zero in the appropriate rdt_resource, triggering the
behavior change in the FS layer.

I'm still not sure if it's a good idea to enable monitor assignment by
default. This would be a major disruption in the MBM usage model
triggered by moving software between AMD CPU models. I thought the
safest option was to disallow creating more monitoring groups than
monitors unless the option is selected. Given that nobody else
complained about monitoring HW limitations on the mailing list, I
assumed few users create enough monitoring groups to be impacted.

Thanks!
-Peter
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 1 year, 9 months ago
Hi Peter,

On 5/1/24 12:48, Peter Newman wrote:
> Hi Babu,
> 
> On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@amd.com> wrote:
>>
>>
>> This series adds the support for Assignable Bandwidth Monitoring Counters
>> (ABMC). It is also called QoS RMID Pinning feature
>>
>> The feature details are documented in the  APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC). The documentation is available at
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>
>> The patches are based on top of commit
>> cd80c2c94699913f9334414189487ff3f93cf0b5 (tip/master)
>>
>> # Introduction
>>
>> AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring
>> feature only guarantees that RMIDs currently assigned to a processor will
>> be tracked by hardware. The counters of any other RMIDs which are no longer
>> being tracked will be reset to zero. The MBM event counters return
>> "Unavailable" for the RMIDs that are not active.
>>
>> Users can create 256 or more monitor groups. But there can be only limited
>> number of groups that can give guaranteed monitoring numbers. With ever
>> changing configurations there is no way to definitely know which of these
>> groups will be active for certain point of time. Users do not have the
>> option to monitor a group or set of groups for certain period of time
>> without worrying about RMID being reset in between.
>>
>> The ABMC feature provides an option to the user to assign an RMID to the
>> hardware counter and monitor the bandwidth for a longer duration.
>> The assigned RMID will be active until the user unassigns it manually.
>> There is no need to worry about counters being reset during this period.
>> Additionally, the user can specify a bitmask identifying the specific
>> bandwidth types from the given source to track with the counter.
>>
>> Without ABMC enabled, monitoring will work in current mode without
>> assignment option.
>>
>> # Linux Implementation
>>
>> Linux resctrl subsystem provides the interface to count maximum of two
>> memory bandwidth events per group, from a combination of available total
>> and local events. Keeping the current interface, users can assign a maximum
>> of 2 ABMC counters per group. User will also have the option to assign only
>> one counter to the group. If the system runs out of assignable ABMC
>> counters, kernel will display an error. Users need to unassign an already
>> assigned counter to make space for new assignments.
>>
>>
>> # Examples
>>
>> a. Check if ABMC support is available
>>         #mount -t resctrl resctrl /sys/fs/resctrl/
>>
>>         #cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>>         [abmc]
>>         legacy_mbm
>>
>>         Linux kernel detected ABMC feature and it is enabled.
>>
>> b. Check how many ABMC counters are available.
>>
>>         #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_cntrs
>>         32
>>
>> c. Create few resctrl groups.
>>
>>         # mkdir /sys/fs/resctrl/mon_groups/default_mon1
>>         # mkdir /sys/fs/resctrl/non_defult_group
>>         # mkdir /sys/fs/resctrl/non_defult_group/mon_groups/non_default_mon1
>>
>> d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>    to list and modify the group's assignment states.
>>
>>    The list follows the following format:
>>
>>        * Default CTRL_MON group:
>>                "//<domain_id>=<assignment_flags>"
>>
>>        * Non-default CTRL_MON group:
>>                "<CTRL_MON group>//<domain_id>=<assignment_flags>"
>>
>>        * Child MON group of default CTRL_MON group:
>>                "/<MON group>/<domain_id>=<assignment_flags>"
>>
>>        * Child MON group of non-default CTRL_MON group:
>>                "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
>>
>>        Assignment flags can be one of the following:
>>
>>         t  MBM total event is assigned
>>         l  MBM local event is assigned
>>         tl Both total and local MBM events are assigned
>>         _  None of the MBM events are assigned
>>
> 
> I was able to successfully build a kernel where this interface is
> adapted to work with both real ABMC on hardware that supports it and
> my software workaround for older hardware.

Thanks for trying that out. Good to know.

> 
> My prototype is based on a refactored version of the codebase
> supporting MPAM, but the capabilities of the MPAM hardware look
> similar enough to ABMC that I'm not concerned about the feasibility.

That is good.

> 
> The FS layer is informed by the arch layer (through rdt_resource
> fields) how many assignable monitors are available and whether a
> monitor is assigned to an entire group or a single event in a group.
> Also, the FS layer can assume that monitors are indexed contiguously,
> allowing it to host the data structures managing FS-level view of
> monitor usage.
> 
> I used the following resctrl_arch-interfaces to propagate assignments
> to the implementation:
> 
> void resctrl_arch_assign_monitor(struct rdt_domain *d, u32 mon_id, u32
> closid, u32 rmid, int evtid);

Sure. I can add these in next version.

Few comments..

AMD does not need closid for assignment. I assume ARM requires closid.

What is mon_id here?


> void resctrl_arch_unassign_monitor(struct rdt_domain *d, u32 mon_id);

We need rmid and evtid for unassign interface here.


> 
> I chose to allow reassigning an assigned monitor without calling
> unassign first. This is important when monitors are unassigned and
> assigned in a single write to mbm_assign_control, as it allows all
> updates to be performed in a single round of parallel IPIs to the
> domains.

Yes. It is not required to call unassign before assign. Hardware(AMD)
supports it.

But, we only have 32 counters. We need to know which counter we are going
to use for assignment. If all the counters already assigned, then we can't
figure out the counter id without calling unassigm first. Using the random
counter will overwrite the already assigned counter.

> 
> 
>>
>> g. Users will have the option to go back to legacy_mbm mode if required.
>>    This can be done using the following command.
>>
>>         # echo "legacy_mbm" > /sys/fs/resctrl/info/L3_MON/mbm_assign
>>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>>         abmc
>>         [legacy_mbm]
> 
> I chose to make this a mount option to simplify the management of the
> monitor tracking data structures. They are simply allocated at mount
> time and deallocated and unmount.

Initially I added it as an mount option.
Based on our earlier discussion, we decided to use the assign feature by
default if hardware supports it. Users don't have to worry about the details.
> 
> I called the option "mon_assign": The mount option parser calls
> resctrl_arch_mon_assign_enable() to determine whether the
> implementation supports assignment in some form. If it returns an
> error, the mount fails. When successful, the assignable monitor count
> is made non-zero in the appropriate rdt_resource, triggering the
> behavior change in the FS layer.
> 
> I'm still not sure if it's a good idea to enable monitor assignment by
> default. This would be a major disruption in the MBM usage model
> triggered by moving software between AMD CPU models. I thought the

Why will it be a disruption? Why do you think mount option will solve the
problem? As always, there will be option to go back to legacy mode. right?

> safest option was to disallow creating more monitoring groups than
> monitors unless the option is selected. Given that nobody else

Current code allows to create more groups, but it will report "Monitor
assignment failed" when it runs out of monitors.

> complained about monitoring HW limitations on the mailing list, I
> assumed few users create enough monitoring groups to be impacted.
> 
> Thanks!
> -Peter

-- 
Thanks
Babu Moger
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 1 year, 9 months ago
Hi Babu,

On Thu, May 2, 2024 at 9:25 AM Moger, Babu <babu.moger@amd.com> wrote:
> On 5/1/24 12:48, Peter Newman wrote:
> > The FS layer is informed by the arch layer (through rdt_resource
> > fields) how many assignable monitors are available and whether a
> > monitor is assigned to an entire group or a single event in a group.
> > Also, the FS layer can assume that monitors are indexed contiguously,
> > allowing it to host the data structures managing FS-level view of
> > monitor usage.
> >
> > I used the following resctrl_arch-interfaces to propagate assignments
> > to the implementation:
> >
> > void resctrl_arch_assign_monitor(struct rdt_domain *d, u32 mon_id, u32
> > closid, u32 rmid, int evtid);
>
> Sure. I can add these in next version.
>
> Few comments..
>
> AMD does not need closid for assignment. I assume ARM requires closid.

Correct, MPAM needs a CLOSID+RMID (PARTID+PMG) to identify a
monitoring group. The CLOSID parameter is ignored on x86.

>
> What is mon_id here?

On ABMC, the value is programmed into L3_QOS_ABMC_CFG.CtrID


>
> > void resctrl_arch_unassign_monitor(struct rdt_domain *d, u32 mon_id);
>
> We need rmid and evtid for unassign interface here.

From my reading of the ABMC specification, it does not look necessary
to program BwSrc or BwType when changing L3_QOS_ABMC_CFG.CtrEn to 0
for a particular CtrID. This interface only disables a counter, so it
should not need to know about how it was previously used when assign
is able to reassign, as assign will always reset the arch_mbm data.

I do not see any harm in the arch_mbm data being stale while the
counter is unassigned, because the data is not accessed when reading
the hardware counter fails. In general, resctrl_arch_rmid_read()
cannot return any information if the hardware counter is not readable
at the time it is called.

>
>
> >
> > I chose to allow reassigning an assigned monitor without calling
> > unassign first. This is important when monitors are unassigned and
> > assigned in a single write to mbm_assign_control, as it allows all
> > updates to be performed in a single round of parallel IPIs to the
> > domains.
>
> Yes. It is not required to call unassign before assign. Hardware(AMD)
> supports it.
>
> But, we only have 32 counters. We need to know which counter we are going
> to use for assignment. If all the counters already assigned, then we can't
> figure out the counter id without calling unassigm first. Using the random
> counter will overwrite the already assigned counter.

I made the caller of resctrl_arch_assign_monitor() responsible for
selecting which monitor to assign. As long as the user orders the
unassign operations before the assign operations in a write to
mbm_assign_control, the FS code will be able to find an available
monitor ID.


> > I chose to make this a mount option to simplify the management of the
> > monitor tracking data structures. They are simply allocated at mount
> > time and deallocated and unmount.
>
> Initially I added it as an mount option.
> Based on our earlier discussion, we decided to use the assign feature by
> default if hardware supports it. Users don't have to worry about the details.
> >
> > I called the option "mon_assign": The mount option parser calls
> > resctrl_arch_mon_assign_enable() to determine whether the
> > implementation supports assignment in some form. If it returns an
> > error, the mount fails. When successful, the assignable monitor count
> > is made non-zero in the appropriate rdt_resource, triggering the
> > behavior change in the FS layer.
> >
> > I'm still not sure if it's a good idea to enable monitor assignment by
> > default. This would be a major disruption in the MBM usage model
> > triggered by moving software between AMD CPU models. I thought the
>
> Why will it be a disruption? Why do you think mount option will solve the
> problem? As always, there will be option to go back to legacy mode. right?
>
> > safest option was to disallow creating more monitoring groups than
> > monitors unless the option is selected. Given that nobody else
>
> Current code allows to create more groups, but it will report "Monitor
> assignment failed" when it runs out of monitors.

Ok that should be fine then.

However, I don't think it's necessary to support dynamically changing
the usage model of monitoring groups without remounting. I believe it
makes it more difficult for the FS code to generically manage monitor
assignment.

-Peter
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 1 year, 9 months ago
Hi Peter,

On 5/2/24 12:50, Peter Newman wrote:
> Hi Babu,
> 
> On Thu, May 2, 2024 at 9:25 AM Moger, Babu <babu.moger@amd.com> wrote:
>> On 5/1/24 12:48, Peter Newman wrote:
>>> The FS layer is informed by the arch layer (through rdt_resource
>>> fields) how many assignable monitors are available and whether a
>>> monitor is assigned to an entire group or a single event in a group.
>>> Also, the FS layer can assume that monitors are indexed contiguously,
>>> allowing it to host the data structures managing FS-level view of
>>> monitor usage.
>>>
>>> I used the following resctrl_arch-interfaces to propagate assignments
>>> to the implementation:
>>>
>>> void resctrl_arch_assign_monitor(struct rdt_domain *d, u32 mon_id, u32
>>> closid, u32 rmid, int evtid);
>>
>> Sure. I can add these in next version.
>>
>> Few comments..
>>
>> AMD does not need closid for assignment. I assume ARM requires closid.
> 
> Correct, MPAM needs a CLOSID+RMID (PARTID+PMG) to identify a
> monitoring group. The CLOSID parameter is ignored on x86.
> 
>>
>> What is mon_id here?
> 
> On ABMC, the value is programmed into L3_QOS_ABMC_CFG.CtrID

ok.

> 
> 
>>
>>> void resctrl_arch_unassign_monitor(struct rdt_domain *d, u32 mon_id);
>>
>> We need rmid and evtid for unassign interface here.
> 
> From my reading of the ABMC specification, it does not look necessary
> to program BwSrc or BwType when changing L3_QOS_ABMC_CFG.CtrEn to 0
> for a particular CtrID. This interface only disables a counter, so it
> should not need to know about how it was previously used when assign
> is able to reassign, as assign will always reset the arch_mbm data.

Yes. That is correct. We may not need to set BwSrc or BwType for unassign.

But, we need evtid to update the monitor state of the rdtgroup.
> 
> I do not see any harm in the arch_mbm data being stale while the
> counter is unassigned, because the data is not accessed when reading
> the hardware counter fails. In general, resctrl_arch_rmid_read()
> cannot return any information if the hardware counter is not readable
> at the time it is called.

Ok. Le me check about keeping the stale arch_mbm data after unassign.
It may be okay.


> 
>>
>>
>>>
>>> I chose to allow reassigning an assigned monitor without calling
>>> unassign first. This is important when monitors are unassigned and
>>> assigned in a single write to mbm_assign_control, as it allows all
>>> updates to be performed in a single round of parallel IPIs to the
>>> domains.
>>
>> Yes. It is not required to call unassign before assign. Hardware(AMD)
>> supports it.
>>
>> But, we only have 32 counters. We need to know which counter we are going
>> to use for assignment. If all the counters already assigned, then we can't
>> figure out the counter id without calling unassigm first. Using the random
>> counter will overwrite the already assigned counter.
> 
> I made the caller of resctrl_arch_assign_monitor() responsible for
> selecting which monitor to assign. As long as the user orders the
> unassign operations before the assign operations in a write to
> mbm_assign_control, the FS code will be able to find an available
> monitor ID.

How does assign_resctrl_arch_assign_monitor() selects the monitor id (or
counter id) if all of them are assigned already.

In this series the monitor ids are allocated using assign_cntrs_alloc.
rdtgroup_assign_abmc()  calls assign_cntrs_alloc() to get monitor id. It
reports error if it cannot get free monitor id.

Expectation is the user to unassign an event from another group(or the
same group) before calling assign.

Are you expecting something else here?

> 
> 
>>> I chose to make this a mount option to simplify the management of the
>>> monitor tracking data structures. They are simply allocated at mount
>>> time and deallocated and unmount.
>>
>> Initially I added it as an mount option.
>> Based on our earlier discussion, we decided to use the assign feature by
>> default if hardware supports it. Users don't have to worry about the details.
>>>
>>> I called the option "mon_assign": The mount option parser calls
>>> resctrl_arch_mon_assign_enable() to determine whether the
>>> implementation supports assignment in some form. If it returns an
>>> error, the mount fails. When successful, the assignable monitor count
>>> is made non-zero in the appropriate rdt_resource, triggering the
>>> behavior change in the FS layer.
>>>
>>> I'm still not sure if it's a good idea to enable monitor assignment by
>>> default. This would be a major disruption in the MBM usage model
>>> triggered by moving software between AMD CPU models. I thought the
>>
>> Why will it be a disruption? Why do you think mount option will solve the
>> problem? As always, there will be option to go back to legacy mode. right?
>>
>>> safest option was to disallow creating more monitoring groups than
>>> monitors unless the option is selected. Given that nobody else
>>
>> Current code allows to create more groups, but it will report "Monitor
>> assignment failed" when it runs out of monitors.
> 
> Ok that should be fine then.
> 
> However, I don't think it's necessary to support dynamically changing
> the usage model of monitoring groups without remounting. I believe it
> makes it more difficult for the FS code to generically manage monitor
> assignment.

Are you suggesting to enable ABMC by default when available?

Then provide the mount option switch back to legacy mode?
I am fine with that if we all agree on that.
-- 
Thanks
Babu Moger
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 1 year, 9 months ago
Hi Peter and Babu,

On 5/2/2024 1:14 PM, Moger, Babu wrote:
> On 5/2/24 12:50, Peter Newman wrote:
>> On Thu, May 2, 2024 at 9:25 AM Moger, Babu <babu.moger@amd.com> wrote:
>>> On 5/1/24 12:48, Peter Newman wrote:
...

>>>> I chose to make this a mount option to simplify the management of the
>>>> monitor tracking data structures. They are simply allocated at mount
>>>> time and deallocated and unmount.
>>>
>>> Initially I added it as an mount option.
>>> Based on our earlier discussion, we decided to use the assign feature by
>>> default if hardware supports it. Users don't have to worry about the details.
>>>>
>>>> I called the option "mon_assign": The mount option parser calls
>>>> resctrl_arch_mon_assign_enable() to determine whether the
>>>> implementation supports assignment in some form. If it returns an
>>>> error, the mount fails. When successful, the assignable monitor count
>>>> is made non-zero in the appropriate rdt_resource, triggering the
>>>> behavior change in the FS layer.
>>>>
>>>> I'm still not sure if it's a good idea to enable monitor assignment by
>>>> default. This would be a major disruption in the MBM usage model
>>>> triggered by moving software between AMD CPU models. I thought the
>>>
>>> Why will it be a disruption? Why do you think mount option will solve the
>>> problem? As always, there will be option to go back to legacy mode. right?
>>>
>>>> safest option was to disallow creating more monitoring groups than
>>>> monitors unless the option is selected. Given that nobody else
>>>
>>> Current code allows to create more groups, but it will report "Monitor
>>> assignment failed" when it runs out of monitors.
>>
>> Ok that should be fine then.
>>
>> However, I don't think it's necessary to support dynamically changing
>> the usage model of monitoring groups without remounting. I believe it
>> makes it more difficult for the FS code to generically manage monitor
>> assignment.
> 
> Are you suggesting to enable ABMC by default when available?

I do think ABMC should be enabled by default when available and it looks
to be what this series aims to do [1]. The way I reason about this is
that legacy user space gets more reliable monitoring behavior without
needing to change behavior.

I thought there was discussion about communicating to user space
when an attempt is made to read data from an event that does not
have a counter assigned. Something like below but I did not notice this
in this series.

# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
Unassigned

> 
> Then provide the mount option switch back to legacy mode?
> I am fine with that if we all agree on that.

Why is a mount option needed? I think we should avoid requiring a remount
unless required and I would like to understand why it is required here.

Peter: could you please elaborate what you mean with it makes it more
difficult for the FS code to generically manage monitor assignment?

Why would user space be required to recreate all control and monitor
groups if wanting to change how memory bandwidth monitoring is done?

From this implementation it has been difficult to understand the impact
of switching between ABMC and legacy.

Reinette

[1] https://lore.kernel.org/lkml/e898059f3c182886b1c16353be7db76d9b852b02.1711674410.git.babu.moger@amd.com/
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 1 year, 9 months ago
Hi Reinette,

On Thu, May 2, 2024 at 4:21 PM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Peter and Babu,
>
> On 5/2/2024 1:14 PM, Moger, Babu wrote:
> > Are you suggesting to enable ABMC by default when available?
>
> I do think ABMC should be enabled by default when available and it looks
> to be what this series aims to do [1]. The way I reason about this is
> that legacy user space gets more reliable monitoring behavior without
> needing to change behavior.

I don't like that for a monitor assignment-aware user, following the
creation of new monitoring groups, there will be less monitors
available for assignment. If the user wants precise control over where
monitors are allocated, they would need to manually unassign the
automatically-assigned monitor after creating new groups.

It's an annoyance, but I'm not sure if it would break any realistic
usage model. Maybe if the monitoring agent operates independently of
whoever creates monitoring groups it could result in brief periods
where less monitors than expected are available because whoever just
created a new monitoring group hasn't given the automatically-assigned
monitors back yet.

>
> I thought there was discussion about communicating to user space
> when an attempt is made to read data from an event that does not
> have a counter assigned. Something like below but I did not notice this
> in this series.
>
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> Unassigned
>
> >
> > Then provide the mount option switch back to legacy mode?
> > I am fine with that if we all agree on that.
>
> Why is a mount option needed? I think we should avoid requiring a remount
> unless required and I would like to understand why it is required here.
>
> Peter: could you please elaborate what you mean with it makes it more
> difficult for the FS code to generically manage monitor assignment?
>
> Why would user space be required to recreate all control and monitor
> groups if wanting to change how memory bandwidth monitoring is done?

I was looking at this more from the perspective of whether it's
necessary to support the live transition of the groups' configuration
back and forth between programming models.  I find it very unlikely
for the userspace controller software to change its mind about the
programming model for monitoring in a running system, so I thought
this would be in the same category as choosing at mount time whether
or not to use CDP or the MBA software controller.

Also, in the software implementation of monitor assignment for older
AMD processors, which is based on allocating a subset of RMIDs, I'm
concerned that the context switch handler would want to read the
monitors associated with the incoming thread's current group to
determine whether it should use one of the tracked RMIDs. I believe it
would be cleaner if the lifetime of the generic monitor-tracking
structures would last until the static branches gating
__resctrl_sched_in() could be disabled.

>
> From this implementation it has been difficult to understand the impact
> of switching between ABMC and legacy.

I'll see if there's a good way to share my software monitor assignment
prototype so it's clearer how the user interface would interact with
diverse implementations. Unfortunately, it's difficult to see the
required abstraction boundaries without the fs/resctrl refactoring
changes[1] applied. It would also require my changes[2] for reading a
thread's RMID from the FS structures to prevent monitor assignments
from forcing an update of all task_structs in the system.

-Peter

[1] https://lore.kernel.org/lkml/20240426150537.8094-1-Dave.Martin@arm.com/
[2] https://lore.kernel.org/lkml/20240325172707.73966-1-peternewman@google.com/
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 1 year, 9 months ago
Hi Peter,

On 5/2/2024 5:57 PM, Peter Newman wrote:
> Hi Reinette,
> 
> On Thu, May 2, 2024 at 4:21 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>>
>> Hi Peter and Babu,
>>
>> On 5/2/2024 1:14 PM, Moger, Babu wrote:
>>> Are you suggesting to enable ABMC by default when available?
>>
>> I do think ABMC should be enabled by default when available and it looks
>> to be what this series aims to do [1]. The way I reason about this is
>> that legacy user space gets more reliable monitoring behavior without
>> needing to change behavior.
> 
> I don't like that for a monitor assignment-aware user, following the
> creation of new monitoring groups, there will be less monitors
> available for assignment. If the user wants precise control over where
> monitors are allocated, they would need to manually unassign the
> automatically-assigned monitor after creating new groups.
> 
> It's an annoyance, but I'm not sure if it would break any realistic
> usage model. Maybe if the monitoring agent operates independently of
> whoever creates monitoring groups it could result in brief periods
> where less monitors than expected are available because whoever just
> created a new monitoring group hasn't given the automatically-assigned
> monitors back yet.
> 

I will respond in other thread.


>>
>> I thought there was discussion about communicating to user space
>> when an attempt is made to read data from an event that does not
>> have a counter assigned. Something like below but I did not notice this
>> in this series.
>>
>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> Unassigned
>>
>>>
>>> Then provide the mount option switch back to legacy mode?
>>> I am fine with that if we all agree on that.
>>
>> Why is a mount option needed? I think we should avoid requiring a remount
>> unless required and I would like to understand why it is required here.
>>
>> Peter: could you please elaborate what you mean with it makes it more
>> difficult for the FS code to generically manage monitor assignment?
>>
>> Why would user space be required to recreate all control and monitor
>> groups if wanting to change how memory bandwidth monitoring is done?
> 
> I was looking at this more from the perspective of whether it's
> necessary to support the live transition of the groups' configuration
> back and forth between programming models.  I find it very unlikely
> for the userspace controller software to change its mind about the
> programming model for monitoring in a running system, so I thought
> this would be in the same category as choosing at mount time whether
> or not to use CDP or the MBA software controller.

This seems reasonable to me if only considering ABMC and legacy. When
also taking into account soft-RMID it is no longer obvious to me. I do
still have an impression that the soft-RMID solution impacts context switch
duration so I am considering the scenario where user space may want to
use soft-RMID for portions of time to get an idea of workload behavior and
then dynamically move to less accurate measurements to not impact the
workloads all the time.

In this case perhaps more like how user space can dynamically change power
saving mode based on requirements of responsiveness etc.


> Also, in the software implementation of monitor assignment for older
> AMD processors, which is based on allocating a subset of RMIDs, I'm
> concerned that the context switch handler would want to read the
> monitors associated with the incoming thread's current group to
> determine whether it should use one of the tracked RMIDs. I believe it
> would be cleaner if the lifetime of the generic monitor-tracking
> structures would last until the static branches gating
> __resctrl_sched_in() could be disabled.

Yes, this falls under the umbrella of needing to understand the impact
of switching between mechanisms that is not obvious to me.

> 
>>
>> From this implementation it has been difficult to understand the impact
>> of switching between ABMC and legacy.
> 
> I'll see if there's a good way to share my software monitor assignment
> prototype so it's clearer how the user interface would interact with
> diverse implementations. Unfortunately, it's difficult to see the
> required abstraction boundaries without the fs/resctrl refactoring
> changes[1] applied. It would also require my changes[2] for reading a
> thread's RMID from the FS structures to prevent monitor assignments
> from forcing an update of all task_structs in the system.
> 
> -Peter
> 
> [1] https://lore.kernel.org/lkml/20240426150537.8094-1-Dave.Martin@arm.com/
> [2] https://lore.kernel.org/lkml/20240325172707.73966-1-peternewman@google.com/
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 1 year, 9 months ago
Hi Peter,

On 5/2/2024 7:57 PM, Peter Newman wrote:
> Hi Reinette,
> 
> On Thu, May 2, 2024 at 4:21 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>>
>> Hi Peter and Babu,
>>
>> On 5/2/2024 1:14 PM, Moger, Babu wrote:
>>> Are you suggesting to enable ABMC by default when available?
>>
>> I do think ABMC should be enabled by default when available and it looks
>> to be what this series aims to do [1]. The way I reason about this is
>> that legacy user space gets more reliable monitoring behavior without
>> needing to change behavior.
> 
> I don't like that for a monitor assignment-aware user, following the
> creation of new monitoring groups, there will be less monitors
> available for assignment. If the user wants precise control over where
> monitors are allocated, they would need to manually unassign the
> automatically-assigned monitor after creating new groups.
> 
> It's an annoyance, but I'm not sure if it would break any realistic
> usage model. Maybe if the monitoring agent operates independently of

Yes. Its annoyance.

But if you think about it, normal users don't create too many groups.
They wont have to worry about assign/unassign headache if we enable 
monitor assignment automatically. Also there is pqos tool which uses 
this interface. It does not have to know about assign/unassign stuff.


> whoever creates monitoring groups it could result in brief periods
> where less monitors than expected are available because whoever just
> created a new monitoring group hasn't given the automatically-assigned
> monitors back yet.
> 
>>
>> I thought there was discussion about communicating to user space
>> when an attempt is made to read data from an event that does not
>> have a counter assigned. Something like below but I did not notice this
>> in this series.
>>
>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> Unassigned
>>
>>>
>>> Then provide the mount option switch back to legacy mode?
>>> I am fine with that if we all agree on that.
>>
>> Why is a mount option needed? I think we should avoid requiring a remount
>> unless required and I would like to understand why it is required here.
>>
>> Peter: could you please elaborate what you mean with it makes it more
>> difficult for the FS code to generically manage monitor assignment?
>>
>> Why would user space be required to recreate all control and monitor
>> groups if wanting to change how memory bandwidth monitoring is done?
> 
> I was looking at this more from the perspective of whether it's
> necessary to support the live transition of the groups' configuration
> back and forth between programming models.  I find it very unlikely
> for the userspace controller software to change its mind about the
> programming model for monitoring in a running system, so I thought
> this would be in the same category as choosing at mount time whether
> or not to use CDP or the MBA software controller.

Good point about the mount option is, we don't create extra files for 
monitor assignment in /sys/fs/resctrl when we mount with legacy option.

> 
> Also, in the software implementation of monitor assignment for older
> AMD processors, which is based on allocating a subset of RMIDs, I'm
> concerned that the context switch handler would want to read the
> monitors associated with the incoming thread's current group to
> determine whether it should use one of the tracked RMIDs. I believe it
> would be cleaner if the lifetime of the generic monitor-tracking
> structures would last until the static branches gating
> __resctrl_sched_in() could be disabled.
> 
>>
>>  From this implementation it has been difficult to understand the impact
>> of switching between ABMC and legacy.
> 
> I'll see if there's a good way to share my software monitor assignment
> prototype so it's clearer how the user interface would interact with
> diverse implementations. Unfortunately, it's difficult to see the
> required abstraction boundaries without the fs/resctrl refactoring
> changes[1] applied. It would also require my changes[2] for reading a
> thread's RMID from the FS structures to prevent monitor assignments
> from forcing an update of all task_structs in the system.
> 
> -Peter
> 
> [1] https://lore.kernel.org/lkml/20240426150537.8094-1-Dave.Martin@arm.com/
> [2] https://lore.kernel.org/lkml/20240325172707.73966-1-peternewman@google.com/
> 

-- 
- Babu Moger
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 1 year, 9 months ago
Hi Peter/Reinette,

On 5/3/2024 3:44 PM, Moger, Babu wrote:
> Hi Peter,
> 
> On 5/2/2024 7:57 PM, Peter Newman wrote:
>> Hi Reinette,
>>
>> On Thu, May 2, 2024 at 4:21 PM Reinette Chatre
>> <reinette.chatre@intel.com> wrote:
>>>
>>> Hi Peter and Babu,
>>>
>>> On 5/2/2024 1:14 PM, Moger, Babu wrote:
>>>> Are you suggesting to enable ABMC by default when available?
>>>
>>> I do think ABMC should be enabled by default when available and it looks
>>> to be what this series aims to do [1]. The way I reason about this is
>>> that legacy user space gets more reliable monitoring behavior without
>>> needing to change behavior.
>>
>> I don't like that for a monitor assignment-aware user, following the
>> creation of new monitoring groups, there will be less monitors
>> available for assignment. If the user wants precise control over where
>> monitors are allocated, they would need to manually unassign the
>> automatically-assigned monitor after creating new groups.
>>
>> It's an annoyance, but I'm not sure if it would break any realistic
>> usage model. Maybe if the monitoring agent operates independently of
> 
> Yes. Its annoyance.
> 
> But if you think about it, normal users don't create too many groups.
> They wont have to worry about assign/unassign headache if we enable 
> monitor assignment automatically. Also there is pqos tool which uses 
> this interface. It does not have to know about assign/unassign stuff.
> 
> 
>> whoever creates monitoring groups it could result in brief periods
>> where less monitors than expected are available because whoever just
>> created a new monitoring group hasn't given the automatically-assigned
>> monitors back yet.
>>
>>>
>>> I thought there was discussion about communicating to user space
>>> when an attempt is made to read data from an event that does not
>>> have a counter assigned. Something like below but I did not notice this
>>> in this series.
>>>
>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>> Unassigned
>>>
>>>>
>>>> Then provide the mount option switch back to legacy mode?
>>>> I am fine with that if we all agree on that.
>>>
>>> Why is a mount option needed? I think we should avoid requiring a 
>>> remount
>>> unless required and I would like to understand why it is required here.
>>>
>>> Peter: could you please elaborate what you mean with it makes it more
>>> difficult for the FS code to generically manage monitor assignment?
>>>
>>> Why would user space be required to recreate all control and monitor
>>> groups if wanting to change how memory bandwidth monitoring is done?
>>
>> I was looking at this more from the perspective of whether it's
>> necessary to support the live transition of the groups' configuration
>> back and forth between programming models.  I find it very unlikely
>> for the userspace controller software to change its mind about the
>> programming model for monitoring in a running system, so I thought
>> this would be in the same category as choosing at mount time whether
>> or not to use CDP or the MBA software controller.
> 
> Good point about the mount option is, we don't create extra files for 
> monitor assignment in /sys/fs/resctrl when we mount with legacy option.

I think we still have not decided about the "mount" option for switching 
to legacy monitoring. Mount option seems safe at this point. We don't 
have to deal with extra files in resctrl filesystem with dynamic switching.
> 
>>
>> Also, in the software implementation of monitor assignment for older
>> AMD processors, which is based on allocating a subset of RMIDs, I'm
>> concerned that the context switch handler would want to read the
>> monitors associated with the incoming thread's current group to
>> determine whether it should use one of the tracked RMIDs. I believe it
>> would be cleaner if the lifetime of the generic monitor-tracking
>> structures would last until the static branches gating
>> __resctrl_sched_in() could be disabled.
>>
>>>
>>>  From this implementation it has been difficult to understand the impact
>>> of switching between ABMC and legacy.
>>
>> I'll see if there's a good way to share my software monitor assignment
>> prototype so it's clearer how the user interface would interact with
>> diverse implementations. Unfortunately, it's difficult to see the
>> required abstraction boundaries without the fs/resctrl refactoring
>> changes[1] applied. It would also require my changes[2] for reading a
>> thread's RMID from the FS structures to prevent monitor assignments
>> from forcing an update of all task_structs in the system.
>>
>> -Peter
>>
>> [1] 
>> https://lore.kernel.org/lkml/20240426150537.8094-1-Dave.Martin@arm.com/
>> [2] 
>> https://lore.kernel.org/lkml/20240325172707.73966-1-peternewman@google.com/ 
>>
>>
> 

-- 
- Babu Moger
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 1 year, 9 months ago

On 5/9/2024 5:57 PM, Moger, Babu wrote:
> On 5/3/2024 3:44 PM, Moger, Babu wrote:
>> On 5/2/2024 7:57 PM, Peter Newman wrote:
>>> On Thu, May 2, 2024 at 4:21 PM Reinette Chatre
>>> <reinette.chatre@intel.com> wrote:
>>>> On 5/2/2024 1:14 PM, Moger, Babu wrote:
>>>>> Are you suggesting to enable ABMC by default when available?
>>>>
>>>> I do think ABMC should be enabled by default when available and it looks
>>>> to be what this series aims to do [1]. The way I reason about this is
>>>> that legacy user space gets more reliable monitoring behavior without
>>>> needing to change behavior.
>>>
>>> I don't like that for a monitor assignment-aware user, following the
>>> creation of new monitoring groups, there will be less monitors
>>> available for assignment. If the user wants precise control over where
>>> monitors are allocated, they would need to manually unassign the
>>> automatically-assigned monitor after creating new groups.
>>>
>>> It's an annoyance, but I'm not sure if it would break any realistic
>>> usage model. Maybe if the monitoring agent operates independently of
>>
>> Yes. Its annoyance.
>>
>> But if you think about it, normal users don't create too many groups.
>> They wont have to worry about assign/unassign headache if we enable monitor assignment automatically. Also there is pqos tool which uses this interface. It does not have to know about assign/unassign stuff.
>>
>>
>>> whoever creates monitoring groups it could result in brief periods
>>> where less monitors than expected are available because whoever just
>>> created a new monitoring group hasn't given the automatically-assigned
>>> monitors back yet.
>>>
>>>>
>>>> I thought there was discussion about communicating to user space
>>>> when an attempt is made to read data from an event that does not
>>>> have a counter assigned. Something like below but I did not notice this
>>>> in this series.
>>>>
>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>>> Unassigned
>>>>
>>>>>
>>>>> Then provide the mount option switch back to legacy mode?
>>>>> I am fine with that if we all agree on that.
>>>>
>>>> Why is a mount option needed? I think we should avoid requiring a remount
>>>> unless required and I would like to understand why it is required here.
>>>>
>>>> Peter: could you please elaborate what you mean with it makes it more
>>>> difficult for the FS code to generically manage monitor assignment?
>>>>
>>>> Why would user space be required to recreate all control and monitor
>>>> groups if wanting to change how memory bandwidth monitoring is done?
>>>
>>> I was looking at this more from the perspective of whether it's
>>> necessary to support the live transition of the groups' configuration
>>> back and forth between programming models.  I find it very unlikely
>>> for the userspace controller software to change its mind about the
>>> programming model for monitoring in a running system, so I thought
>>> this would be in the same category as choosing at mount time whether
>>> or not to use CDP or the MBA software controller.
>>
>> Good point about the mount option is, we don't create extra files for monitor assignment in /sys/fs/resctrl when we mount with legacy option.
> 
> I think we still have not decided about the "mount" option for
> switching to legacy monitoring. Mount option seems safe at this
> point. 

I have not heard back after sending [1] so I do still believe that users
may want a way to not have soft-RMID running all the time without impacting
monitor and control groups.


> We don't have to deal with extra files in resctrl filesystem
> with dynamic switching.
Reinette

[1] https://lore.kernel.org/lkml/ea56c630-80f4-4564-beb3-2b61e810a558@intel.com/
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 1 year, 9 months ago
Hi Babu,

On Fri, May 3, 2024 at 1:44 PM Moger, Babu <bmoger@amd.com> wrote:
>
> Hi Peter,
>
> On 5/2/2024 7:57 PM, Peter Newman wrote:
> > Hi Reinette,
> >
> > On Thu, May 2, 2024 at 4:21 PM Reinette Chatre
> >> I do think ABMC should be enabled by default when available and it looks
> >> to be what this series aims to do [1]. The way I reason about this is
> >> that legacy user space gets more reliable monitoring behavior without
> >> needing to change behavior.
> >
> > I don't like that for a monitor assignment-aware user, following the
> > creation of new monitoring groups, there will be less monitors
> > available for assignment. If the user wants precise control over where
> > monitors are allocated, they would need to manually unassign the
> > automatically-assigned monitor after creating new groups.
> >
> > It's an annoyance, but I'm not sure if it would break any realistic
> > usage model. Maybe if the monitoring agent operates independently of
>
> Yes. Its annoyance.
>
> But if you think about it, normal users don't create too many groups.
> They wont have to worry about assign/unassign headache if we enable
> monitor assignment automatically. Also there is pqos tool which uses
> this interface. It does not have to know about assign/unassign stuff.

Thinking about this again, I don't think it's much of a concern
because the automatic assignment on mongroup creation behavior can be
trivially disabled using a boolean flag.

-Peter
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 1 year, 9 months ago
Hi Peter,

On 5/3/2024 2:00 PM, Peter Newman wrote:
> Hi Babu,
> 
> On Fri, May 3, 2024 at 1:44 PM Moger, Babu <bmoger@amd.com> wrote:
>>
>> Hi Peter,
>>
>> On 5/2/2024 7:57 PM, Peter Newman wrote:
>>> Hi Reinette,
>>>
>>> On Thu, May 2, 2024 at 4:21 PM Reinette Chatre
>>>> I do think ABMC should be enabled by default when available and it looks
>>>> to be what this series aims to do [1]. The way I reason about this is
>>>> that legacy user space gets more reliable monitoring behavior without
>>>> needing to change behavior.
>>>
>>> I don't like that for a monitor assignment-aware user, following the
>>> creation of new monitoring groups, there will be less monitors
>>> available for assignment. If the user wants precise control over where
>>> monitors are allocated, they would need to manually unassign the
>>> automatically-assigned monitor after creating new groups.
>>>
>>> It's an annoyance, but I'm not sure if it would break any realistic
>>> usage model. Maybe if the monitoring agent operates independently of
>>
>> Yes. Its annoyance.
>>
>> But if you think about it, normal users don't create too many groups.
>> They wont have to worry about assign/unassign headache if we enable
>> monitor assignment automatically. Also there is pqos tool which uses
>> this interface. It does not have to know about assign/unassign stuff.
> 
> Thinking about this again, I don't think it's much of a concern
> because the automatic assignment on mongroup creation behavior can be
> trivially disabled using a boolean flag.

This could be a config option.

Reinette

Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 1 year, 8 months ago
Hi Reinette, Babu,

On Fri, May 3, 2024 at 2:15 PM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Peter,
>
> On 5/3/2024 2:00 PM, Peter Newman wrote:
> > Hi Babu,
> >
> > On Fri, May 3, 2024 at 1:44 PM Moger, Babu <bmoger@amd.com> wrote:
> >>
> >> Hi Peter,
> >>
> >> On 5/2/2024 7:57 PM, Peter Newman wrote:
> >>> Hi Reinette,
> >>>
> >>> On Thu, May 2, 2024 at 4:21 PM Reinette Chatre
> >>>> I do think ABMC should be enabled by default when available and it looks
> >>>> to be what this series aims to do [1]. The way I reason about this is
> >>>> that legacy user space gets more reliable monitoring behavior without
> >>>> needing to change behavior.
> >>>
> >>> I don't like that for a monitor assignment-aware user, following the
> >>> creation of new monitoring groups, there will be less monitors
> >>> available for assignment. If the user wants precise control over where
> >>> monitors are allocated, they would need to manually unassign the
> >>> automatically-assigned monitor after creating new groups.
> >>>
> >>> It's an annoyance, but I'm not sure if it would break any realistic
> >>> usage model. Maybe if the monitoring agent operates independently of
> >>
> >> Yes. Its annoyance.
> >>
> >> But if you think about it, normal users don't create too many groups.
> >> They wont have to worry about assign/unassign headache if we enable
> >> monitor assignment automatically. Also there is pqos tool which uses
> >> this interface. It does not have to know about assign/unassign stuff.
> >
> > Thinking about this again, I don't think it's much of a concern
> > because the automatic assignment on mongroup creation behavior can be
> > trivially disabled using a boolean flag.
>
> This could be a config option.

I'd like to work out the details of this option.

info/L3_MON/mbm_assign_on_mkdir?

boolean (parsed with kstrtobool()), defaulting to true?

-Peter
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 1 year, 8 months ago
Hi Peter,

On 5/17/24 16:51, Peter Newman wrote:
> Hi Reinette, Babu,
> 
> On Fri, May 3, 2024 at 2:15 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>>
>> Hi Peter,
>>
>> On 5/3/2024 2:00 PM, Peter Newman wrote:
>>> Hi Babu,
>>>
>>> On Fri, May 3, 2024 at 1:44 PM Moger, Babu <bmoger@amd.com> wrote:
>>>>
>>>> Hi Peter,
>>>>
>>>> On 5/2/2024 7:57 PM, Peter Newman wrote:
>>>>> Hi Reinette,
>>>>>
>>>>> On Thu, May 2, 2024 at 4:21 PM Reinette Chatre
>>>>>> I do think ABMC should be enabled by default when available and it looks
>>>>>> to be what this series aims to do [1]. The way I reason about this is
>>>>>> that legacy user space gets more reliable monitoring behavior without
>>>>>> needing to change behavior.
>>>>>
>>>>> I don't like that for a monitor assignment-aware user, following the
>>>>> creation of new monitoring groups, there will be less monitors
>>>>> available for assignment. If the user wants precise control over where
>>>>> monitors are allocated, they would need to manually unassign the
>>>>> automatically-assigned monitor after creating new groups.
>>>>>
>>>>> It's an annoyance, but I'm not sure if it would break any realistic
>>>>> usage model. Maybe if the monitoring agent operates independently of
>>>>
>>>> Yes. Its annoyance.
>>>>
>>>> But if you think about it, normal users don't create too many groups.
>>>> They wont have to worry about assign/unassign headache if we enable
>>>> monitor assignment automatically. Also there is pqos tool which uses
>>>> this interface. It does not have to know about assign/unassign stuff.
>>>
>>> Thinking about this again, I don't think it's much of a concern
>>> because the automatic assignment on mongroup creation behavior can be
>>> trivially disabled using a boolean flag.
>>
>> This could be a config option.
> 
> I'd like to work out the details of this option.
> 
> info/L3_MON/mbm_assign_on_mkdir?
> 
> boolean (parsed with kstrtobool()), defaulting to true?

I am thinking is not a big concern. We only have limited (32) counters.
Automatic monitor assignment works only for first 16 groups(2 counters for
each group). When the counters are exhausted auto assignment does not
work. In your case(with more than 16 groups) the auto assignment does not
work. I feel having a config option is really not necessary.

 --
Thanks
Babu Moger
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 1 year, 8 months ago
Hi Babu,

On Mon, May 20, 2024 at 7:25 AM Moger, Babu <babu.moger@amd.com> wrote:
>
> Hi Peter,
>
> On 5/17/24 16:51, Peter Newman wrote:
> > Hi Reinette, Babu,
> >
> > On Fri, May 3, 2024 at 2:15 PM Reinette Chatre
> > <reinette.chatre@intel.com> wrote:
> >>
> >> Hi Peter,
> >>
> >> On 5/3/2024 2:00 PM, Peter Newman wrote:
> >>> Hi Babu,
> >>>
> >>> On Fri, May 3, 2024 at 1:44 PM Moger, Babu <bmoger@amd.com> wrote:
> >>>>
> >>>> Hi Peter,
> >>>>
> >>>> On 5/2/2024 7:57 PM, Peter Newman wrote:
> >>>>> Hi Reinette,
> >>>>>
> >>>>> On Thu, May 2, 2024 at 4:21 PM Reinette Chatre
> >>>>>> I do think ABMC should be enabled by default when available and it looks
> >>>>>> to be what this series aims to do [1]. The way I reason about this is
> >>>>>> that legacy user space gets more reliable monitoring behavior without
> >>>>>> needing to change behavior.
> >>>>>
> >>>>> I don't like that for a monitor assignment-aware user, following the
> >>>>> creation of new monitoring groups, there will be less monitors
> >>>>> available for assignment. If the user wants precise control over where
> >>>>> monitors are allocated, they would need to manually unassign the
> >>>>> automatically-assigned monitor after creating new groups.
> >>>>>
> >>>>> It's an annoyance, but I'm not sure if it would break any realistic
> >>>>> usage model. Maybe if the monitoring agent operates independently of
> >>>>
> >>>> Yes. Its annoyance.
> >>>>
> >>>> But if you think about it, normal users don't create too many groups.
> >>>> They wont have to worry about assign/unassign headache if we enable
> >>>> monitor assignment automatically. Also there is pqos tool which uses
> >>>> this interface. It does not have to know about assign/unassign stuff.
> >>>
> >>> Thinking about this again, I don't think it's much of a concern
> >>> because the automatic assignment on mongroup creation behavior can be
> >>> trivially disabled using a boolean flag.
> >>
> >> This could be a config option.
> >
> > I'd like to work out the details of this option.
> >
> > info/L3_MON/mbm_assign_on_mkdir?
> >
> > boolean (parsed with kstrtobool()), defaulting to true?
>
> I am thinking is not a big concern. We only have limited (32) counters.
> Automatic monitor assignment works only for first 16 groups(2 counters for
> each group). When the counters are exhausted auto assignment does not
> work. In your case(with more than 16 groups) the auto assignment does not
> work. I feel having a config option is really not necessary.

I'm not sure I follow the argument you're trying to make because it
doesn't sound like an argument against adding a config option. What
exactly do you mean by "work" vs "not work"?

Also it doesn't address my original concern about needing to manually
(and non-atomically) undo the auto assignment in order to account for
where the monitors are assigned or ensure that creating a new
monitoring group will succeed.

-Peter
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 1 year, 8 months ago
Hi Peter,

On 5/20/24 11:00, Peter Newman wrote:
> Hi Babu,
> 
> On Mon, May 20, 2024 at 7:25 AM Moger, Babu <babu.moger@amd.com> wrote:
>>
>> Hi Peter,
>>
>> On 5/17/24 16:51, Peter Newman wrote:
>>> Hi Reinette, Babu,
>>>
>>> On Fri, May 3, 2024 at 2:15 PM Reinette Chatre
>>> <reinette.chatre@intel.com> wrote:
>>>>
>>>> Hi Peter,
>>>>
>>>> On 5/3/2024 2:00 PM, Peter Newman wrote:
>>>>> Hi Babu,
>>>>>
>>>>> On Fri, May 3, 2024 at 1:44 PM Moger, Babu <bmoger@amd.com> wrote:
>>>>>>
>>>>>> Hi Peter,
>>>>>>
>>>>>> On 5/2/2024 7:57 PM, Peter Newman wrote:
>>>>>>> Hi Reinette,
>>>>>>>
>>>>>>> On Thu, May 2, 2024 at 4:21 PM Reinette Chatre
>>>>>>>> I do think ABMC should be enabled by default when available and it looks
>>>>>>>> to be what this series aims to do [1]. The way I reason about this is
>>>>>>>> that legacy user space gets more reliable monitoring behavior without
>>>>>>>> needing to change behavior.
>>>>>>>
>>>>>>> I don't like that for a monitor assignment-aware user, following the
>>>>>>> creation of new monitoring groups, there will be less monitors
>>>>>>> available for assignment. If the user wants precise control over where
>>>>>>> monitors are allocated, they would need to manually unassign the
>>>>>>> automatically-assigned monitor after creating new groups.
>>>>>>>
>>>>>>> It's an annoyance, but I'm not sure if it would break any realistic
>>>>>>> usage model. Maybe if the monitoring agent operates independently of
>>>>>>
>>>>>> Yes. Its annoyance.
>>>>>>
>>>>>> But if you think about it, normal users don't create too many groups.
>>>>>> They wont have to worry about assign/unassign headache if we enable
>>>>>> monitor assignment automatically. Also there is pqos tool which uses
>>>>>> this interface. It does not have to know about assign/unassign stuff.
>>>>>
>>>>> Thinking about this again, I don't think it's much of a concern
>>>>> because the automatic assignment on mongroup creation behavior can be
>>>>> trivially disabled using a boolean flag.
>>>>
>>>> This could be a config option.
>>>
>>> I'd like to work out the details of this option.
>>>
>>> info/L3_MON/mbm_assign_on_mkdir?
>>>
>>> boolean (parsed with kstrtobool()), defaulting to true?
>>
>> I am thinking is not a big concern. We only have limited (32) counters.
>> Automatic monitor assignment works only for first 16 groups(2 counters for
>> each group). When the counters are exhausted auto assignment does not
>> work. In your case(with more than 16 groups) the auto assignment does not
>> work. I feel having a config option is really not necessary.
> 
> I'm not sure I follow the argument you're trying to make because it
> doesn't sound like an argument against adding a config option. What
> exactly do you mean by "work" vs "not work"?
> 
> Also it doesn't address my original concern about needing to manually
> (and non-atomically) undo the auto assignment in order to account for
> where the monitors are assigned or ensure that creating a new
> monitoring group will succeed.
> 

Sorry for the confusion.

Auto monitor assignment works only for small number of groups(15 or less).

After that point user can create more groups. But auto assignment will not
work because the hw counters are all exhausted. You need to manually
unassign a counter from another group and use that counter for new assignment.

I assume that you are dealing with more than 16 groups. In that case, you
have to manually assign/unassign anyways.

Having a config option "info/L3_MON/mbm_assign_on_mkdir" will not be much
helpful for you.

-- 
Thanks
Babu Moger
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Dave Martin 1 year, 9 months ago
Hi Babu,

On Thu, Mar 28, 2024 at 08:06:33PM -0500, Babu Moger wrote:
> 
> This series adds the support for Assignable Bandwidth Monitoring Counters
> (ABMC). It is also called QoS RMID Pinning feature
> 
> The feature details are documented in the  APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC). The documentation is available at
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> 
> The patches are based on top of commit
> cd80c2c94699913f9334414189487ff3f93cf0b5 (tip/master)

A few very general comments from me here, since I'm not vary familiar
with this topic...


> # Introduction
> 
> AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring
> feature only guarantees that RMIDs currently assigned to a processor will
> be tracked by hardware. The counters of any other RMIDs which are no longer
> being tracked will be reset to zero. The MBM event counters return
> "Unavailable" for the RMIDs that are not active.
>     
> Users can create 256 or more monitor groups. But there can be only limited
> number of groups that can give guaranteed monitoring numbers. With ever
> changing configurations there is no way to definitely know which of these
> groups will be active for certain point of time. Users do not have the
> option to monitor a group or set of groups for certain period of time
> without worrying about RMID being reset in between.
>     
> The ABMC feature provides an option to the user to assign an RMID to the
> hardware counter and monitor the bandwidth for a longer duration.
> The assigned RMID will be active until the user unassigns it manually.
> There is no need to worry about counters being reset during this period.
> Additionally, the user can specify a bitmask identifying the specific
> bandwidth types from the given source to track with the counter.
> 
> Without ABMC enabled, monitoring will work in current mode without
> assignment option.
> 
> # Linux Implementation
> 
> Linux resctrl subsystem provides the interface to count maximum of two
> memory bandwidth events per group, from a combination of available total
> and local events. Keeping the current interface, users can assign a maximum
> of 2 ABMC counters per group. User will also have the option to assign only
> one counter to the group. If the system runs out of assignable ABMC
> counters, kernel will display an error. Users need to unassign an already
> assigned counter to make space for new assignments.
> 
> 
> # Examples
> 
> a. Check if ABMC support is available
> 	#mount -t resctrl resctrl /sys/fs/resctrl/
> 
> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_assign
> 	[abmc] 
> 	legacy_mbm
> 
> 	Linux kernel detected ABMC feature and it is enabled.
> 
> b. Check how many ABMC counters are available. 
> 
> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_assign_cntrs 
> 	32
> 
> c. Create few resctrl groups.
> 
> 	# mkdir /sys/fs/resctrl/mon_groups/default_mon1
> 	# mkdir /sys/fs/resctrl/non_defult_group
> 	# mkdir /sys/fs/resctrl/non_defult_group/mon_groups/non_default_mon1
> 
> d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>    to list and modify the group's assignment states.
> 
>    The list follows the following format:

This section looks possibly inconsistent with (e.)

Is (d.) the userspace read format, with (e.) being the format written by
userspace?


>        * Default CTRL_MON group:
>                "//<domain_id>=<assignment_flags>"
> 
>        * Non-default CTRL_MON group:
>                "<CTRL_MON group>//<domain_id>=<assignment_flags>"
> 
>        * Child MON group of default CTRL_MON group:
>                "/<MON group>/<domain_id>=<assignment_flags>"
> 
>        * Child MON group of non-default CTRL_MON group:
>                "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
> 
>        Assignment flags can be one of the following:
> 
>         t  MBM total event is assigned

With my MPAM hat on this looks a bit weird, although I suppose it
follows on from the way "mbm_total_bytes" and "mbm_local_bytes" are
already exposed in resctrlfs.

From an abstract point of view, "total" and "local" are just event
selection criteria, additional to those in mbm_cfg_mask.  The different
way they are treated in the hardware feels like an x86 implementation
detail.

For MPAM we don't currently distinguish local from non-local traffic, so
I guess this just reduces to a simple on-off (i.e., "t" or nothing),
which I guess is tolerable.

This might want more thought if there is an expectation that more
categories will be added here, though (?)

>         l  MBM local event is assigned
>         tl Both total and local MBM events are assigned
>         _  None of the MBM events are assigned

This use of '_' seems unusual.  Can we not just have the empty string
for "nothing assigned"?

Since every assignment is terminated by ';' or end-of-line, I don't
think that there would be any parsing ambiguity (?)

> 
> 	Examples:
> 
> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control 
> 	non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> 	non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> 	//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> 	/default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> 
> 	There are four groups and all the groups have local and total event assigned.
> 
> 	"//" - This is a default CONTROL MON group
> 
> 	"non_defult_group//" - This is non default CONTROL MON group
> 
> 	"/default_mon1/"  - This is Child MON group of the defult group
> 
> 	"non_defult_group/non_default_mon1/" - This is child MON group of the non default group
> 
> 	=tl means both total and local events are assigned.
> 
> e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control.
> 
> 	The write format is similar to the above list format with addition of
> 	op-code for the assignment operation.

With by resctrl newbie hat on:

It feels a bit complex (for the kernel) to have userspace needing to
write a script into a magic file that we need to parse, specifying
updates to a bunch of controls already visible as objects in resctrlfs
in their own right.

What's the expected use case here?

If userspace really does need to switch lots of events simultaneously
then I guess the overhead of enumerating and poking lots of individual
files might be unacceptable though, and we would still need some global
interfaces for operations such as "unassign everything"...


OTOH, the proposed approach is not so different from the way the
schemata files already work.

> 
>         * Default CTRL_MON group:
>                 "//<domain_id><op-code><assignment_flags>"
> 
>         * Non-default CTRL_MON group:
>                 "<CTRL_MON group>//<domain_id><op-code><assignment_flags>"
> 
>         * Child MON group of default CTRL_MON group:
>                 "/<MON group>/<domain_id><op-code><assignment_flags>"
> 
>         * Child MON group of non-default CTRL_MON group:
>                 "<CTRL_MON group>/<MON group>/<domain_id><op-code><assignment_flags>"
> 
>        Op-code can be one of the following:
> 
>         = Update the assignment to match the flags
>         + Assign a new state
>         - Unassign a new state
>         _ Unassign all the states

If we adopt "empty string" to mean "no events", then

	<foo>/<bar>/<domain>=

would unassign all events, so '_' would not be needed as a separate
syntax.

[...]

Cheers
---Dave
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 1 year, 9 months ago
Hi Dave,

On Mon, Apr 22, 2024 at 9:33 AM Dave Martin <Dave.Martin@arm.com> wrote:
>
> Hi Babu,
>
> On Thu, Mar 28, 2024 at 08:06:33PM -0500, Babu Moger wrote:
> >        Assignment flags can be one of the following:
> >
> >         t  MBM total event is assigned
>
> With my MPAM hat on this looks a bit weird, although I suppose it
> follows on from the way "mbm_total_bytes" and "mbm_local_bytes" are
> already exposed in resctrlfs.
>
> From an abstract point of view, "total" and "local" are just event
> selection criteria, additional to those in mbm_cfg_mask.  The different
> way they are treated in the hardware feels like an x86 implementation
> detail.
>
> For MPAM we don't currently distinguish local from non-local traffic, so
> I guess this just reduces to a simple on-off (i.e., "t" or nothing),
> which I guess is tolerable.
>
> This might want more thought if there is an expectation that more
> categories will be added here, though (?)

There should be a path forward whenever we start supporting
user-configured counter classes. I assume the letters a-z will be
enough to cover all the counter classes which could be used at once.

>
> >         l  MBM local event is assigned
> >         tl Both total and local MBM events are assigned
> >         _  None of the MBM events are assigned
>
> This use of '_' seems unusual.  Can we not just have the empty string
> for "nothing assigned"?
>
> Since every assignment is terminated by ';' or end-of-line, I don't
> think that there would be any parsing ambiguity (?)
>
> >
> >       Examples:
> >
> >       # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >       non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >       non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >       //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >       /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >
> >       There are four groups and all the groups have local and total event assigned.
> >
> >       "//" - This is a default CONTROL MON group
> >
> >       "non_defult_group//" - This is non default CONTROL MON group
> >
> >       "/default_mon1/"  - This is Child MON group of the defult group
> >
> >       "non_defult_group/non_default_mon1/" - This is child MON group of the non default group
> >
> >       =tl means both total and local events are assigned.
> >
> > e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control.
> >
> >       The write format is similar to the above list format with addition of
> >       op-code for the assignment operation.
>
> With by resctrl newbie hat on:
>
> It feels a bit complex (for the kernel) to have userspace needing to
> write a script into a magic file that we need to parse, specifying
> updates to a bunch of controls already visible as objects in resctrlfs
> in their own right.
>
> What's the expected use case here?

I went over the use case of iterating a small number of monitors over
a much larger number of monitoring groups here:

https://lore.kernel.org/lkml/CALPaoCi=PCWr6U5zYtFPmyaFHU_iqZtZL-LaHC2mYxbETXk3ig@mail.gmail.com/

>
> If userspace really does need to switch lots of events simultaneously
> then I guess the overhead of enumerating and poking lots of individual
> files might be unacceptable though, and we would still need some global
> interfaces for operations such as "unassign everything"...

My main goal is for the number of parallel IPI batches to all the
domains (or write syscalls) to be O(num_rmids / num_monitors) rather
than O(num_rmids * num_monitors) as I need to know how frequently we
can afford to sample the current memory bandwidth of the maximum
number of monitoring groups supported.

Thanks!
-Peter
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Dave Martin 1 year, 9 months ago
Hi Peter,

On Mon, Apr 22, 2024 at 11:23:50AM -0700, Peter Newman wrote:
> Hi Dave,
> 
> On Mon, Apr 22, 2024 at 9:33 AM Dave Martin <Dave.Martin@arm.com> wrote:
> >
> > Hi Babu,
> >
> > On Thu, Mar 28, 2024 at 08:06:33PM -0500, Babu Moger wrote:
> > >        Assignment flags can be one of the following:
> > >
> > >         t  MBM total event is assigned
> >
> > With my MPAM hat on this looks a bit weird, although I suppose it
> > follows on from the way "mbm_total_bytes" and "mbm_local_bytes" are
> > already exposed in resctrlfs.
> >
> > From an abstract point of view, "total" and "local" are just event
> > selection criteria, additional to those in mbm_cfg_mask.  The different
> > way they are treated in the hardware feels like an x86 implementation
> > detail.
> >
> > For MPAM we don't currently distinguish local from non-local traffic, so
> > I guess this just reduces to a simple on-off (i.e., "t" or nothing),
> > which I guess is tolerable.
> >
> > This might want more thought if there is an expectation that more
> > categories will be added here, though (?)
> 
> There should be a path forward whenever we start supporting
> user-configured counter classes. I assume the letters a-z will be
> enough to cover all the counter classes which could be used at once.

Ack, though I'd appreciate a response on the point about "_" below in
case people missed it.

> 
> >
> > >         l  MBM local event is assigned
> > >         tl Both total and local MBM events are assigned
> > >         _  None of the MBM events are assigned
> >
> > This use of '_' seems unusual.  Can we not just have the empty string
> > for "nothing assigned"?
> >
> > Since every assignment is terminated by ';' or end-of-line, I don't
> > think that there would be any parsing ambiguity (?)
> >
> > >
> > >       Examples:
> > >
> > >       # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> > >       non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> > >       non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> > >       //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> > >       /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> > >
> > >       There are four groups and all the groups have local and total event assigned.
> > >
> > >       "//" - This is a default CONTROL MON group
> > >
> > >       "non_defult_group//" - This is non default CONTROL MON group
> > >
> > >       "/default_mon1/"  - This is Child MON group of the defult group
> > >
> > >       "non_defult_group/non_default_mon1/" - This is child MON group of the non default group
> > >
> > >       =tl means both total and local events are assigned.
> > >
> > > e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control.
> > >
> > >       The write format is similar to the above list format with addition of
> > >       op-code for the assignment operation.
> >
> > With by resctrl newbie hat on:
> >
> > It feels a bit complex (for the kernel) to have userspace needing to
> > write a script into a magic file that we need to parse, specifying
> > updates to a bunch of controls already visible as objects in resctrlfs
> > in their own right.
> >
> > What's the expected use case here?
> 
> I went over the use case of iterating a small number of monitors over
> a much larger number of monitoring groups here:
> 
> https://lore.kernel.org/lkml/CALPaoCi=PCWr6U5zYtFPmyaFHU_iqZtZL-LaHC2mYxbETXk3ig@mail.gmail.com/
> 
> >
> > If userspace really does need to switch lots of events simultaneously
> > then I guess the overhead of enumerating and poking lots of individual
> > files might be unacceptable though, and we would still need some global
> > interfaces for operations such as "unassign everything"...
> 
> My main goal is for the number of parallel IPI batches to all the
> domains (or write syscalls) to be O(num_rmids / num_monitors) rather
> than O(num_rmids * num_monitors) as I need to know how frequently we
> can afford to sample the current memory bandwidth of the maximum
> number of monitoring groups supported.

Fair enough; I wasn't fully aware of the background discussions.

Cheers
---Dave
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 1 year, 9 months ago
Hi Dave,

On 4/23/24 07:38, Dave Martin wrote:
> Hi Peter,
> 
> On Mon, Apr 22, 2024 at 11:23:50AM -0700, Peter Newman wrote:
>> Hi Dave,
>>
>> On Mon, Apr 22, 2024 at 9:33 AM Dave Martin <Dave.Martin@arm.com> wrote:
>>>
>>> Hi Babu,
>>>
>>> On Thu, Mar 28, 2024 at 08:06:33PM -0500, Babu Moger wrote:
>>>>        Assignment flags can be one of the following:
>>>>
>>>>         t  MBM total event is assigned
>>>
>>> With my MPAM hat on this looks a bit weird, although I suppose it
>>> follows on from the way "mbm_total_bytes" and "mbm_local_bytes" are
>>> already exposed in resctrlfs.
>>>
>>> From an abstract point of view, "total" and "local" are just event
>>> selection criteria, additional to those in mbm_cfg_mask.  The different
>>> way they are treated in the hardware feels like an x86 implementation
>>> detail.
>>>
>>> For MPAM we don't currently distinguish local from non-local traffic, so
>>> I guess this just reduces to a simple on-off (i.e., "t" or nothing),
>>> which I guess is tolerable.
>>>
>>> This might want more thought if there is an expectation that more
>>> categories will be added here, though (?)
>>
>> There should be a path forward whenever we start supporting
>> user-configured counter classes. I assume the letters a-z will be
>> enough to cover all the counter classes which could be used at once.
> 
> Ack, though I'd appreciate a response on the point about "_" below in
> case people missed it.

It was based on the dynamic debug interface and also Reinette's suggestion
as well.
https://www.kernel.org/doc/html/v4.10/admin-guide/dynamic-debug-howto.html
(Look for "No flags are set").

We tried to use that similar interface.
-- 
Thanks
Babu Moger
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Dave Martin 1 year, 9 months ago
Hi Babu,

On Tue, Apr 23, 2024 at 10:43:25AM -0500, Moger, Babu wrote:
> Hi Dave,
> 
> On 4/23/24 07:38, Dave Martin wrote:
> > Hi Peter,
> > 
> > On Mon, Apr 22, 2024 at 11:23:50AM -0700, Peter Newman wrote:
> >> Hi Dave,
> >>
> >> On Mon, Apr 22, 2024 at 9:33 AM Dave Martin <Dave.Martin@arm.com> wrote:
> >>>
> >>> Hi Babu,
> >>>
> >>> On Thu, Mar 28, 2024 at 08:06:33PM -0500, Babu Moger wrote:
> >>>>        Assignment flags can be one of the following:
> >>>>
> >>>>         t  MBM total event is assigned
> >>>
> >>> With my MPAM hat on this looks a bit weird, although I suppose it
> >>> follows on from the way "mbm_total_bytes" and "mbm_local_bytes" are
> >>> already exposed in resctrlfs.
> >>>
> >>> From an abstract point of view, "total" and "local" are just event
> >>> selection criteria, additional to those in mbm_cfg_mask.  The different
> >>> way they are treated in the hardware feels like an x86 implementation
> >>> detail.
> >>>
> >>> For MPAM we don't currently distinguish local from non-local traffic, so
> >>> I guess this just reduces to a simple on-off (i.e., "t" or nothing),
> >>> which I guess is tolerable.
> >>>
> >>> This might want more thought if there is an expectation that more
> >>> categories will be added here, though (?)
> >>
> >> There should be a path forward whenever we start supporting
> >> user-configured counter classes. I assume the letters a-z will be
> >> enough to cover all the counter classes which could be used at once.
> > 
> > Ack, though I'd appreciate a response on the point about "_" below in
> > case people missed it.
> 
> It was based on the dynamic debug interface and also Reinette's suggestion
> as well.
> https://www.kernel.org/doc/html/v4.10/admin-guide/dynamic-debug-howto.html
> (Look for "No flags are set").
> 
> We tried to use that similar interface.

Fair enough; I haven't touched dynamic debug for quite a while and had
forgotten about this convention being used there.

Apologies for the noise on that!

Cheers
---Dave
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Peter Newman 1 year, 10 months ago
Hi Babu,

On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@amd.com> wrote:
>    The list follows the following format:
>
>        * Default CTRL_MON group:
>                "//<domain_id>=<assignment_flags>"
>
>        * Non-default CTRL_MON group:
>                "<CTRL_MON group>//<domain_id>=<assignment_flags>"
>
>        * Child MON group of default CTRL_MON group:
>                "/<MON group>/<domain_id>=<assignment_flags>"
>
>        * Child MON group of non-default CTRL_MON group:
>                "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
>
>        Assignment flags can be one of the following:
>
>         t  MBM total event is assigned
>         l  MBM local event is assigned
>         tl Both total and local MBM events are assigned
>         _  None of the MBM events are assigned
>
>         Examples:
>
>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>         non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>         non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>         //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>         /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>
>         There are four groups and all the groups have local and total event assigned.
>
>         "//" - This is a default CONTROL MON group
>
>         "non_defult_group//" - This is non default CONTROL MON group
>
>         "/default_mon1/"  - This is Child MON group of the defult group
>
>         "non_defult_group/non_default_mon1/" - This is child MON group of the non default group
>
>         =tl means both total and local events are assigned.

I recall there was supposed to be a way to perform the same update on
all domains together so that it isn't tedious to not do per-domain
customizations. (And also to avoid serializing programming all the
domains the same way.)


>
>  .../admin-guide/kernel-parameters.txt         |   2 +-
>  Documentation/arch/x86/resctrl.rst            | 144 ++++
>  arch/x86/include/asm/cpufeatures.h            |   1 +
>  arch/x86/include/asm/msr-index.h              |   2 +
>  arch/x86/kernel/cpu/cpuid-deps.c              |   3 +
>  arch/x86/kernel/cpu/resctrl/core.c            |  25 +-
>  arch/x86/kernel/cpu/resctrl/internal.h        |  56 +-
>  arch/x86/kernel/cpu/resctrl/monitor.c         |  24 +-
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 714 +++++++++++++++++-
>  arch/x86/kernel/cpu/scattered.c               |   1 +
>  include/linux/resctrl.h                       |  12 +
>  11 files changed, 964 insertions(+), 20 deletions(-)
>
> --
> 2.34.1
>

This should be fine for me to get started with. I'll see if I can work
backwards from the patches adding the parsing code to see how I'll
work the software implementation in.

Thanks!
-Peter
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 1 year, 10 months ago
Hi Peter,


On 4/4/24 14:08, Peter Newman wrote:
> Hi Babu,
> 
> On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@amd.com> wrote:
>>    The list follows the following format:
>>
>>        * Default CTRL_MON group:
>>                "//<domain_id>=<assignment_flags>"
>>
>>        * Non-default CTRL_MON group:
>>                "<CTRL_MON group>//<domain_id>=<assignment_flags>"
>>
>>        * Child MON group of default CTRL_MON group:
>>                "/<MON group>/<domain_id>=<assignment_flags>"
>>
>>        * Child MON group of non-default CTRL_MON group:
>>                "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
>>
>>        Assignment flags can be one of the following:
>>
>>         t  MBM total event is assigned
>>         l  MBM local event is assigned
>>         tl Both total and local MBM events are assigned
>>         _  None of the MBM events are assigned
>>
>>         Examples:
>>
>>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>         non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>         non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>         //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>         /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>
>>         There are four groups and all the groups have local and total event assigned.
>>
>>         "//" - This is a default CONTROL MON group
>>
>>         "non_defult_group//" - This is non default CONTROL MON group
>>
>>         "/default_mon1/"  - This is Child MON group of the defult group
>>
>>         "non_defult_group/non_default_mon1/" - This is child MON group of the non default group
>>
>>         =tl means both total and local events are assigned.
> 
> I recall there was supposed to be a way to perform the same update on
> all domains together so that it isn't tedious to not do per-domain

Yes. Correct. Reinette suggested to have "no domains" means ALL the domains.

Example:

Initial list:
$cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
non_def_ctrl_mon_grep//0=_;1=_;2=_;3=_;4=_;5=_;6=_;7=_;
//0=_;1=_;2=_;3=_;4=_;5=_;6=_;7=_;

Two groups and no events assigned.


To assign total event on all the domains, The command will look like this.

$ echo "//=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

Parsing becomes ugly here. I look for domain number after the name. Now I
have add some ugly checks there.


I also thought about something like this:

$ echo "//FFFF=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

FFFF means all the domains. But there could be domain number with FFFF also.

So, I dropped the idea.


> customizations. (And also to avoid serializing programming all the
> domains the same way.)

One more thing with respect to domains:

This series updates all the domains when assignment is requested.
Makes it easy to implement.

For example:

$ echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

This command will assign total event on all the domains on default group
even though user passed only domain 0.

I am looking at supporting domain specific assignment right now.
If your use case is specific to each domain then I can add that support in
in next revision.

> 
> 
>>
>>  .../admin-guide/kernel-parameters.txt         |   2 +-
>>  Documentation/arch/x86/resctrl.rst            | 144 ++++
>>  arch/x86/include/asm/cpufeatures.h            |   1 +
>>  arch/x86/include/asm/msr-index.h              |   2 +
>>  arch/x86/kernel/cpu/cpuid-deps.c              |   3 +
>>  arch/x86/kernel/cpu/resctrl/core.c            |  25 +-
>>  arch/x86/kernel/cpu/resctrl/internal.h        |  56 +-
>>  arch/x86/kernel/cpu/resctrl/monitor.c         |  24 +-
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 714 +++++++++++++++++-
>>  arch/x86/kernel/cpu/scattered.c               |   1 +
>>  include/linux/resctrl.h                       |  12 +
>>  11 files changed, 964 insertions(+), 20 deletions(-)
>>
>> --
>> 2.34.1
>>
> 
> This should be fine for me to get started with. I'll see if I can work
> backwards from the patches adding the parsing code to see how I'll
> work the software implementation in.
> 
> Thanks!
> -Peter

-- 
Thanks
Babu Moger
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Dave Martin 1 year, 9 months ago
Hi Babu,

On Thu, Apr 04, 2024 at 03:02:45PM -0500, Moger, Babu wrote:
> Hi Peter,
> 
> 
> On 4/4/24 14:08, Peter Newman wrote:
> > Hi Babu,
> > 
> > On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@amd.com> wrote:
> >>    The list follows the following format:
> >>
> >>        * Default CTRL_MON group:
> >>                "//<domain_id>=<assignment_flags>"
> >>
> >>        * Non-default CTRL_MON group:
> >>                "<CTRL_MON group>//<domain_id>=<assignment_flags>"
> >>
> >>        * Child MON group of default CTRL_MON group:
> >>                "/<MON group>/<domain_id>=<assignment_flags>"
> >>
> >>        * Child MON group of non-default CTRL_MON group:
> >>                "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
> >>
> >>        Assignment flags can be one of the following:
> >>
> >>         t  MBM total event is assigned
> >>         l  MBM local event is assigned
> >>         tl Both total and local MBM events are assigned
> >>         _  None of the MBM events are assigned
> >>
> >>         Examples:
> >>
> >>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >>         non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >>         non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >>         //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >>         /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >>
> >>         There are four groups and all the groups have local and total event assigned.
> >>
> >>         "//" - This is a default CONTROL MON group
> >>
> >>         "non_defult_group//" - This is non default CONTROL MON group
> >>
> >>         "/default_mon1/"  - This is Child MON group of the defult group
> >>
> >>         "non_defult_group/non_default_mon1/" - This is child MON group of the non default group
> >>
> >>         =tl means both total and local events are assigned.
> > 
> > I recall there was supposed to be a way to perform the same update on
> > all domains together so that it isn't tedious to not do per-domain
> 
> Yes. Correct. Reinette suggested to have "no domains" means ALL the domains.

Would "*" be more intuitive?

Whatever is done here to describe the "wildcard node", would it be worth
having the node field parse the same way in the "schemata" files?

Is there any merit in having range match expressions, e.g. something like

	0-3,8-11=foo;4-7,12-*=bar

(The latter is obvious feature creep though, so a real use case for this
would be needed to justify it.  I don't have one right now...)

[...]

Cheers
---Dave
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 1 year, 9 months ago
Hi Dave,

On 4/22/24 11:34, Dave Martin wrote:
> Hi Babu,
> 
> On Thu, Apr 04, 2024 at 03:02:45PM -0500, Moger, Babu wrote:
>> Hi Peter,
>>
>>
>> On 4/4/24 14:08, Peter Newman wrote:
>>> Hi Babu,
>>>
>>> On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@amd.com> wrote:
>>>>    The list follows the following format:
>>>>
>>>>        * Default CTRL_MON group:
>>>>                "//<domain_id>=<assignment_flags>"
>>>>
>>>>        * Non-default CTRL_MON group:
>>>>                "<CTRL_MON group>//<domain_id>=<assignment_flags>"
>>>>
>>>>        * Child MON group of default CTRL_MON group:
>>>>                "/<MON group>/<domain_id>=<assignment_flags>"
>>>>
>>>>        * Child MON group of non-default CTRL_MON group:
>>>>                "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
>>>>
>>>>        Assignment flags can be one of the following:
>>>>
>>>>         t  MBM total event is assigned
>>>>         l  MBM local event is assigned
>>>>         tl Both total and local MBM events are assigned
>>>>         _  None of the MBM events are assigned
>>>>
>>>>         Examples:
>>>>
>>>>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>         non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>>         non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>>         //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>>         /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>>
>>>>         There are four groups and all the groups have local and total event assigned.
>>>>
>>>>         "//" - This is a default CONTROL MON group
>>>>
>>>>         "non_defult_group//" - This is non default CONTROL MON group
>>>>
>>>>         "/default_mon1/"  - This is Child MON group of the defult group
>>>>
>>>>         "non_defult_group/non_default_mon1/" - This is child MON group of the non default group
>>>>
>>>>         =tl means both total and local events are assigned.
>>>
>>> I recall there was supposed to be a way to perform the same update on
>>> all domains together so that it isn't tedious to not do per-domain
>>
>> Yes. Correct. Reinette suggested to have "no domains" means ALL the domains.
> 
> Would "*" be more intuitive?

We could. But I don't see the need for wildcard ("*") or ranges and
complexity that comes with that.

Even in schemata processing we don't use the wildcard or ranges and also
there is no mention of that in documentation.
https://www.kernel.org/doc/Documentation/x86/resctrl.rst

Domains(or nodes) are processed one by one. Some examples.

 # cat schemata
    SMBA:0=2048;1=2048;2=2048;3=2048
      MB:0=2048;1=2048;2=2048;3=2048
      L3:0=ffff;1=ffff;2=ffff;3=ffff

  # echo "SMBA:1=64" > schemata
  # cat schemata
    SMBA:0=2048;1=  64;2=2048;3=2048
      MB:0=2048;1=2048;2=2048;3=2048
      L3:0=ffff;1=ffff;2=ffff;3=ffff



> 
> Whatever is done here to describe the "wildcard node", would it be worth
> having the node field parse the same way in the "schemata" files?
> 
> Is there any merit in having range match expressions, e.g. something like
> 
> 	0-3,8-11=foo;4-7,12-*=bar
> 
> (The latter is obvious feature creep though, so a real use case for this
> would be needed to justify it.  I don't have one right now...)
> 
> [...]
> 
> Cheers
> ---Dave

-- 
Thanks
Babu Moger
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Dave Martin 1 year, 9 months ago
On Mon, Apr 22, 2024 at 03:44:26PM -0500, Moger, Babu wrote:
> Hi Dave,
> 
> On 4/22/24 11:34, Dave Martin wrote:
> > Hi Babu,
> > 
> > On Thu, Apr 04, 2024 at 03:02:45PM -0500, Moger, Babu wrote:
> >> Hi Peter,
> >>
> >>
> >> On 4/4/24 14:08, Peter Newman wrote:
> >>> Hi Babu,
> >>>
> >>> On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@amd.com> wrote:
> >>>>    The list follows the following format:
> >>>>
> >>>>        * Default CTRL_MON group:
> >>>>                "//<domain_id>=<assignment_flags>"
> >>>>
> >>>>        * Non-default CTRL_MON group:
> >>>>                "<CTRL_MON group>//<domain_id>=<assignment_flags>"
> >>>>
> >>>>        * Child MON group of default CTRL_MON group:
> >>>>                "/<MON group>/<domain_id>=<assignment_flags>"
> >>>>
> >>>>        * Child MON group of non-default CTRL_MON group:
> >>>>                "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
> >>>>
> >>>>        Assignment flags can be one of the following:
> >>>>
> >>>>         t  MBM total event is assigned
> >>>>         l  MBM local event is assigned
> >>>>         tl Both total and local MBM events are assigned
> >>>>         _  None of the MBM events are assigned
> >>>>
> >>>>         Examples:
> >>>>
> >>>>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >>>>         non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >>>>         non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >>>>         //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >>>>         /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >>>>
> >>>>         There are four groups and all the groups have local and total event assigned.
> >>>>
> >>>>         "//" - This is a default CONTROL MON group
> >>>>
> >>>>         "non_defult_group//" - This is non default CONTROL MON group
> >>>>
> >>>>         "/default_mon1/"  - This is Child MON group of the defult group
> >>>>
> >>>>         "non_defult_group/non_default_mon1/" - This is child MON group of the non default group
> >>>>
> >>>>         =tl means both total and local events are assigned.
> >>>
> >>> I recall there was supposed to be a way to perform the same update on
> >>> all domains together so that it isn't tedious to not do per-domain
> >>
> >> Yes. Correct. Reinette suggested to have "no domains" means ALL the domains.
> > 
> > Would "*" be more intuitive?
> 
> We could. But I don't see the need for wildcard ("*") or ranges and
> complexity that comes with that.

For "*", I mean that this would just stand for "all cpus", not a generic
string match; apologies if I didn't make that clear.

I think that an explicit "*" is still a less surprising way to say
"everything" than "" (which if it means anything at all, usually means
"nothing").

I may have misunderstood the intention here: _if_ the intention is to
provide a way to enable/disable an event in all domains without having
to enumerate them all one by one, then I think "*" is preferable syntax
to "".  That was my only real suggestion here.

> 
> Even in schemata processing we don't use the wildcard or ranges and also
> there is no mention of that in documentation.
> https://www.kernel.org/doc/Documentation/x86/resctrl.rst

I know, though writing the schemata files can be tedious and annoying,
since their content is often very repetitive, so ...

> 
> Domains(or nodes) are processed one by one. Some examples.
> 
>  # cat schemata
>     SMBA:0=2048;1=2048;2=2048;3=2048
>       MB:0=2048;1=2048;2=2048;3=2048
>       L3:0=ffff;1=ffff;2=ffff;3=ffff
> 
>   # echo "SMBA:1=64" > schemata
>   # cat schemata
>     SMBA:0=2048;1=  64;2=2048;3=2048
>       MB:0=2048;1=2048;2=2048;3=2048
>       L3:0=ffff;1=ffff;2=ffff;3=ffff

... it would be convenient to be able to do something like

# echo "SMBA:*=64" >schemata
# grep SMBA: schemata
SMBA:0=  64;1=  64;2=  64;3=  64

Anyway, this is nothing directly to do with this series; just a
thought.


> > Whatever is done here to describe the "wildcard node", would it be worth
> > having the node field parse the same way in the "schemata" files?
> > 
> > Is there any merit in having range match expressions, e.g. something like
> > 
> > 	0-3,8-11=foo;4-7,12-*=bar
> > 
> > (The latter is obvious feature creep though, so a real use case for this
> > would be needed to justify it.  I don't have one right now...)

[...]

> Thanks
> Babu Moger

I do agree that unless someone jumps up and down saying this would
help their use case, this is probably a step too far.

Just thinking aloud (and this kind of feature could be added later in a
backwards compatible way if someone really needs it).

Cheers
---Dave
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Reinette Chatre 1 year, 9 months ago

On 4/23/2024 5:37 AM, Dave Martin wrote:
> On Mon, Apr 22, 2024 at 03:44:26PM -0500, Moger, Babu wrote:
>> Hi Dave,
>>
>> On 4/22/24 11:34, Dave Martin wrote:
>>> Hi Babu,
>>>
>>> On Thu, Apr 04, 2024 at 03:02:45PM -0500, Moger, Babu wrote:
>>>> Hi Peter,
>>>>
>>>>
>>>> On 4/4/24 14:08, Peter Newman wrote:
>>>>> Hi Babu,
>>>>>
>>>>> On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@amd.com> wrote:
>>>>>>    The list follows the following format:
>>>>>>
>>>>>>        * Default CTRL_MON group:
>>>>>>                "//<domain_id>=<assignment_flags>"
>>>>>>
>>>>>>        * Non-default CTRL_MON group:
>>>>>>                "<CTRL_MON group>//<domain_id>=<assignment_flags>"
>>>>>>
>>>>>>        * Child MON group of default CTRL_MON group:
>>>>>>                "/<MON group>/<domain_id>=<assignment_flags>"
>>>>>>
>>>>>>        * Child MON group of non-default CTRL_MON group:
>>>>>>                "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
>>>>>>
>>>>>>        Assignment flags can be one of the following:
>>>>>>
>>>>>>         t  MBM total event is assigned
>>>>>>         l  MBM local event is assigned
>>>>>>         tl Both total and local MBM events are assigned
>>>>>>         _  None of the MBM events are assigned
>>>>>>
>>>>>>         Examples:
>>>>>>
>>>>>>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>         non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>>>>         non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>>>>         //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>>>>         /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>>>>
>>>>>>         There are four groups and all the groups have local and total event assigned.
>>>>>>
>>>>>>         "//" - This is a default CONTROL MON group
>>>>>>
>>>>>>         "non_defult_group//" - This is non default CONTROL MON group
>>>>>>
>>>>>>         "/default_mon1/"  - This is Child MON group of the defult group
>>>>>>
>>>>>>         "non_defult_group/non_default_mon1/" - This is child MON group of the non default group
>>>>>>
>>>>>>         =tl means both total and local events are assigned.
>>>>>
>>>>> I recall there was supposed to be a way to perform the same update on
>>>>> all domains together so that it isn't tedious to not do per-domain
>>>>
>>>> Yes. Correct. Reinette suggested to have "no domains" means ALL the domains.
>>>
>>> Would "*" be more intuitive?
>>
>> We could. But I don't see the need for wildcard ("*") or ranges and
>> complexity that comes with that.
> 
> For "*", I mean that this would just stand for "all cpus", not a generic
> string match; apologies if I didn't make that clear.

(reading this by replacing "all cpus" with "all domains")

This sounds reasonable to me. It may indeed make the parsing simpler by
not needing the ugly checks Babu mentioned in [1].

Reinette

[1] https://lore.kernel.org/lkml/7ccd59b8-9fe3-4d1f-82f5-f33d96dbf5ac@amd.com/
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Dave Martin 1 year, 9 months ago
On Tue, Apr 23, 2024 at 09:15:07PM -0700, Reinette Chatre wrote:
> 
> 
> On 4/23/2024 5:37 AM, Dave Martin wrote:
> > On Mon, Apr 22, 2024 at 03:44:26PM -0500, Moger, Babu wrote:
> >> Hi Dave,
> >>
> >> On 4/22/24 11:34, Dave Martin wrote:
> >>> Hi Babu,
> >>>
> >>> On Thu, Apr 04, 2024 at 03:02:45PM -0500, Moger, Babu wrote:
> >>>> Hi Peter,
> >>>>
> >>>>
> >>>> On 4/4/24 14:08, Peter Newman wrote:
> >>>>> Hi Babu,
> >>>>>
> >>>>> On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@amd.com> wrote:
> >>>>>>    The list follows the following format:
> >>>>>>
> >>>>>>        * Default CTRL_MON group:
> >>>>>>                "//<domain_id>=<assignment_flags>"
> >>>>>>
> >>>>>>        * Non-default CTRL_MON group:
> >>>>>>                "<CTRL_MON group>//<domain_id>=<assignment_flags>"
> >>>>>>
> >>>>>>        * Child MON group of default CTRL_MON group:
> >>>>>>                "/<MON group>/<domain_id>=<assignment_flags>"
> >>>>>>
> >>>>>>        * Child MON group of non-default CTRL_MON group:
> >>>>>>                "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
> >>>>>>
> >>>>>>        Assignment flags can be one of the following:
> >>>>>>
> >>>>>>         t  MBM total event is assigned
> >>>>>>         l  MBM local event is assigned
> >>>>>>         tl Both total and local MBM events are assigned
> >>>>>>         _  None of the MBM events are assigned
> >>>>>>
> >>>>>>         Examples:
> >>>>>>
> >>>>>>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >>>>>>         non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >>>>>>         non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >>>>>>         //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >>>>>>         /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >>>>>>
> >>>>>>         There are four groups and all the groups have local and total event assigned.
> >>>>>>
> >>>>>>         "//" - This is a default CONTROL MON group
> >>>>>>
> >>>>>>         "non_defult_group//" - This is non default CONTROL MON group
> >>>>>>
> >>>>>>         "/default_mon1/"  - This is Child MON group of the defult group
> >>>>>>
> >>>>>>         "non_defult_group/non_default_mon1/" - This is child MON group of the non default group
> >>>>>>
> >>>>>>         =tl means both total and local events are assigned.
> >>>>>
> >>>>> I recall there was supposed to be a way to perform the same update on
> >>>>> all domains together so that it isn't tedious to not do per-domain
> >>>>
> >>>> Yes. Correct. Reinette suggested to have "no domains" means ALL the domains.
> >>>
> >>> Would "*" be more intuitive?
> >>
> >> We could. But I don't see the need for wildcard ("*") or ranges and
> >> complexity that comes with that.
> > 
> > For "*", I mean that this would just stand for "all cpus", not a generic
> > string match; apologies if I didn't make that clear.
> 
> (reading this by replacing "all cpus" with "all domains")
> 
> This sounds reasonable to me. It may indeed make the parsing simpler by
> not needing the ugly checks Babu mentioned in [1].
> 
> Reinette
> 
> [1] https://lore.kernel.org/lkml/7ccd59b8-9fe3-4d1f-82f5-f33d96dbf5ac@amd.com/

Ack, I meant "all domains", sorry!

Note, should we try to detect things like:

	<resource>:0=fee;1=fie;*=foe;0=fum

..?

Either we treat conflicting assignments as an error, or we do them all
in the order specified, so that assignments on the right override those
on the left (which is what the schemata parsing in ctrlmondata.c:
parse_line() seems to do today if I understand the code correctly).

In the latter case,

	<resource>:*=fee;1=fie

would set all nodes except 1 to "fee", and node 1 to "fie", which might
be useful (or at least, convenient).

If we're worried about that being exposed as ABI and used by userspace,
we might want to disallow it explicitly.

Cheers
---Dave
Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Posted by Moger, Babu 1 year, 9 months ago
Hi Dave,

On 4/24/24 09:16, Dave Martin wrote:
> On Tue, Apr 23, 2024 at 09:15:07PM -0700, Reinette Chatre wrote:
>>
>>
>> On 4/23/2024 5:37 AM, Dave Martin wrote:
>>> On Mon, Apr 22, 2024 at 03:44:26PM -0500, Moger, Babu wrote:
>>>> Hi Dave,
>>>>
>>>> On 4/22/24 11:34, Dave Martin wrote:
>>>>> Hi Babu,
>>>>>
>>>>> On Thu, Apr 04, 2024 at 03:02:45PM -0500, Moger, Babu wrote:
>>>>>> Hi Peter,
>>>>>>
>>>>>>
>>>>>> On 4/4/24 14:08, Peter Newman wrote:
>>>>>>> Hi Babu,
>>>>>>>
>>>>>>> On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@amd.com> wrote:
>>>>>>>>    The list follows the following format:
>>>>>>>>
>>>>>>>>        * Default CTRL_MON group:
>>>>>>>>                "//<domain_id>=<assignment_flags>"
>>>>>>>>
>>>>>>>>        * Non-default CTRL_MON group:
>>>>>>>>                "<CTRL_MON group>//<domain_id>=<assignment_flags>"
>>>>>>>>
>>>>>>>>        * Child MON group of default CTRL_MON group:
>>>>>>>>                "/<MON group>/<domain_id>=<assignment_flags>"
>>>>>>>>
>>>>>>>>        * Child MON group of non-default CTRL_MON group:
>>>>>>>>                "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
>>>>>>>>
>>>>>>>>        Assignment flags can be one of the following:
>>>>>>>>
>>>>>>>>         t  MBM total event is assigned
>>>>>>>>         l  MBM local event is assigned
>>>>>>>>         tl Both total and local MBM events are assigned
>>>>>>>>         _  None of the MBM events are assigned
>>>>>>>>
>>>>>>>>         Examples:
>>>>>>>>
>>>>>>>>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>         non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>>>>>>         non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>>>>>>         //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>>>>>>         /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>>>>>>
>>>>>>>>         There are four groups and all the groups have local and total event assigned.
>>>>>>>>
>>>>>>>>         "//" - This is a default CONTROL MON group
>>>>>>>>
>>>>>>>>         "non_defult_group//" - This is non default CONTROL MON group
>>>>>>>>
>>>>>>>>         "/default_mon1/"  - This is Child MON group of the defult group
>>>>>>>>
>>>>>>>>         "non_defult_group/non_default_mon1/" - This is child MON group of the non default group
>>>>>>>>
>>>>>>>>         =tl means both total and local events are assigned.
>>>>>>>
>>>>>>> I recall there was supposed to be a way to perform the same update on
>>>>>>> all domains together so that it isn't tedious to not do per-domain
>>>>>>
>>>>>> Yes. Correct. Reinette suggested to have "no domains" means ALL the domains.
>>>>>
>>>>> Would "*" be more intuitive?
>>>>
>>>> We could. But I don't see the need for wildcard ("*") or ranges and
>>>> complexity that comes with that.
>>>
>>> For "*", I mean that this would just stand for "all cpus", not a generic
>>> string match; apologies if I didn't make that clear.
>>
>> (reading this by replacing "all cpus" with "all domains")
>>
>> This sounds reasonable to me. It may indeed make the parsing simpler by
>> not needing the ugly checks Babu mentioned in [1].

Sure. Will plan to address "all domains" (*) option in next revision.

>>
>> Reinette
>>
>> [1] https://lore.kernel.org/lkml/7ccd59b8-9fe3-4d1f-82f5-f33d96dbf5ac@amd.com/
> 
> Ack, I meant "all domains", sorry!
> 
> Note, should we try to detect things like:
> 
> 	<resource>:0=fee;1=fie;*=foe;0=fum
> 
> ..?
> 
> Either we treat conflicting assignments as an error, or we do them all
> in the order specified, so that assignments on the right override those
> on the left (which is what the schemata parsing in ctrlmondata.c:
> parse_line() seems to do today if I understand the code correctly).
> 
> In the latter case,
> 
> 	<resource>:*=fee;1=fie
> 
> would set all nodes except 1 to "fee", and node 1 to "fie", which might
> be useful (or at least, convenient).
> 
> If we're worried about that being exposed as ABI and used by userspace,
> we might want to disallow it explicitly.
> 

Sure. Right now we are not planning to support domain specific
assignments. But, will plan to keep options open for future support.
-- 
Thanks
Babu Moger