.../admin-guide/kernel-parameters.txt | 2 +- Documentation/arch/x86/resctrl.rst | 144 ++++ arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/msr-index.h | 2 + arch/x86/kernel/cpu/cpuid-deps.c | 3 + arch/x86/kernel/cpu/resctrl/core.c | 25 +- arch/x86/kernel/cpu/resctrl/internal.h | 56 +- arch/x86/kernel/cpu/resctrl/monitor.c | 24 +- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 714 +++++++++++++++++- arch/x86/kernel/cpu/scattered.c | 1 + include/linux/resctrl.h | 12 + 11 files changed, 964 insertions(+), 20 deletions(-)
This series adds the support for Assignable Bandwidth Monitoring Counters
(ABMC). It is also called QoS RMID Pinning feature
The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC). The documentation is available at
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
The patches are based on top of commit
cd80c2c94699913f9334414189487ff3f93cf0b5 (tip/master)
# Introduction
AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring
feature only guarantees that RMIDs currently assigned to a processor will
be tracked by hardware. The counters of any other RMIDs which are no longer
being tracked will be reset to zero. The MBM event counters return
"Unavailable" for the RMIDs that are not active.
Users can create 256 or more monitor groups. But there can be only limited
number of groups that can give guaranteed monitoring numbers. With ever
changing configurations there is no way to definitely know which of these
groups will be active for certain point of time. Users do not have the
option to monitor a group or set of groups for certain period of time
without worrying about RMID being reset in between.
The ABMC feature provides an option to the user to assign an RMID to the
hardware counter and monitor the bandwidth for a longer duration.
The assigned RMID will be active until the user unassigns it manually.
There is no need to worry about counters being reset during this period.
Additionally, the user can specify a bitmask identifying the specific
bandwidth types from the given source to track with the counter.
Without ABMC enabled, monitoring will work in current mode without
assignment option.
# Linux Implementation
Linux resctrl subsystem provides the interface to count maximum of two
memory bandwidth events per group, from a combination of available total
and local events. Keeping the current interface, users can assign a maximum
of 2 ABMC counters per group. User will also have the option to assign only
one counter to the group. If the system runs out of assignable ABMC
counters, kernel will display an error. Users need to unassign an already
assigned counter to make space for new assignments.
# Examples
a. Check if ABMC support is available
#mount -t resctrl resctrl /sys/fs/resctrl/
#cat /sys/fs/resctrl/info/L3_MON/mbm_assign
[abmc]
legacy_mbm
Linux kernel detected ABMC feature and it is enabled.
b. Check how many ABMC counters are available.
#cat /sys/fs/resctrl/info/L3_MON/mbm_assign_cntrs
32
c. Create few resctrl groups.
# mkdir /sys/fs/resctrl/mon_groups/default_mon1
# mkdir /sys/fs/resctrl/non_defult_group
# mkdir /sys/fs/resctrl/non_defult_group/mon_groups/non_default_mon1
d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
to list and modify the group's assignment states.
The list follows the following format:
* Default CTRL_MON group:
"//<domain_id>=<assignment_flags>"
* Non-default CTRL_MON group:
"<CTRL_MON group>//<domain_id>=<assignment_flags>"
* Child MON group of default CTRL_MON group:
"/<MON group>/<domain_id>=<assignment_flags>"
* Child MON group of non-default CTRL_MON group:
"<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
Assignment flags can be one of the following:
t MBM total event is assigned
l MBM local event is assigned
tl Both total and local MBM events are assigned
_ None of the MBM events are assigned
Examples:
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
/default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
There are four groups and all the groups have local and total event assigned.
"//" - This is a default CONTROL MON group
"non_defult_group//" - This is non default CONTROL MON group
"/default_mon1/" - This is Child MON group of the defult group
"non_defult_group/non_default_mon1/" - This is child MON group of the non default group
=tl means both total and local events are assigned.
e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control.
The write format is similar to the above list format with addition of
op-code for the assignment operation.
* Default CTRL_MON group:
"//<domain_id><op-code><assignment_flags>"
* Non-default CTRL_MON group:
"<CTRL_MON group>//<domain_id><op-code><assignment_flags>"
* Child MON group of default CTRL_MON group:
"/<MON group>/<domain_id><op-code><assignment_flags>"
* Child MON group of non-default CTRL_MON group:
"<CTRL_MON group>/<MON group>/<domain_id><op-code><assignment_flags>"
Op-code can be one of the following:
= Update the assignment to match the flags
+ Assign a new state
- Unassign a new state
_ Unassign all the states
Initial group status:
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
//0=tl;1=tl;
/child_default_mon_grp/0=tl;1=tl;
To update the default group to assign only total event.
# echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
Assignment status after the update:
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
//0=t;1=t;
/child_default_mon_grp/0=tl;1=tl;
To update the MON group child_default_mon_grp to remove local event:
# echo "/child_default_mon_grp/0-l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
Assignment status after the update:
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
//0=t;1=t;
/child_default_mon_grp/0=t;1=t;
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
remove both local and total events:
# echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/0_" >
/sys/fs/resctrl/info/L3_MON/mbm_assign_control
Assignment status after the update:
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
//0=t;1=t;
/child_default_mon_grp/0=t;1=t;
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=_;1=_;
f. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
There is no change in reading the evetns with ABMC. If the event is unassigned
when reading, then the read will come back as Unavailable.
# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
779247936
# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
765207488
g. Users will have the option to go back to legacy_mbm mode if required.
This can be done using the following command.
# echo "legacy_mbm" > /sys/fs/resctrl/info/L3_MON/mbm_assign
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign
abmc
[legacy_mbm]
h. Check the bandwidth configuration for the group. Note that bandwidth
configuration has a domain scope. Total event defaults to 0x7F (to
count all the events) and local event defaults to 0x15 (to count all
the local numa events). The event bitmap decoding is available at
https://www.kernel.org/doc/Documentation/x86/resctrl.rst
in section "mbm_total_bytes_config", "mbm_local_bytes_config":
#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
0=0x7f;1=0x7f
#cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
0=0x15;1=0x15
j. Change the bandwidth source for domain 0 for the total event to count only reads.
Note that this change effects total events on the domain 0.
#echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
0=0x33;1=0x7F
k. Now read the total event again. The mbm_total_bytes should display
only the read events.
#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
314101
l. Unmount the resctrl
#umount /sys/fs/resctrl/
---
v3:
This series adds the support for global assignment mode discussed in
the thread. https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/
Removed the individual assignment mode and included the global assignment interface.
Added following interface files.
a. /sys/fs/resctrl/info/L3_MON/mbm_assign
Used for displaying the current assignment mode and switch between
ABMC and legacy mode.
b. /sys/fs/resctrl/info/L3_MON/mbm_assign_control
Used for lising the groups assignment mode and modify the assignment states.
c. Most of the changes are related to the new interface.
d. Addressed the comments from Reinette, James and Peter.
e. Hope I have addressed most of the major feedbacks discussed. If I missed
something then it is not intentional. Please feel free to comment.
f. Sending this as an RFC as per Reinette's comment. So, this is still open
for discussion.
v2:
a. Major change is the way ABMC is enabled. Earlier, user needed to remount
with -o abmc to enable ABMC feature. Removed that option now.
Now users can enable ABMC by "$echo 1 to /sys/fs/resctrl/info/L3_MON/mbm_assign_enable".
b. Added new word 21 to x86/cpufeatures.h.
c. Display unsupported if user attempts to read the events when ABMC is enabled
and event is not assigned.
d. Display monitor_state as "Unsupported" when ABMC is disabled.
e. Text updates and rebase to latest tip tree (as of Jan 18).
f. This series is still work in progress. I am yet to hear from ARM developers.
v2:
https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/
v1 :
https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/
Babu Moger (17):
x86/resctrl: Add support for Assignable Bandwidth Monitoring Counters
(ABMC)
x86/resctrl: Add ABMC feature in the command line options
x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
x86/resctrl: Introduce resctrl_file_fflags_init
x86/resctrl: Introduce the interface to display the assignment state
x86/resctrl: Introduce interface to display number of ABMC counters
x86/resctrl: Add support to enable/disable ABMC feature
x86/resctrl: Initialize assignable counters bitmap
x86/resctrl: Introduce assign state for the mon group
x86/resctrl: Add data structures for ABMC assignment
x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg
x86/resctrl: Add the functionality to assign the RMID
x86/resctrl: Add the functionality to unassign the RMID
x86/resctrl: Enable ABMC by default on resctrl mount
x86/resctrl: Introduce the interface switch between ABMC and
legacy_mbm
x86/resctrl: Introduce interface to list assignment states of all the
groups
x86/resctrl: Introduce interface to modify assignment states of the
groups
.../admin-guide/kernel-parameters.txt | 2 +-
Documentation/arch/x86/resctrl.rst | 144 ++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 2 +
arch/x86/kernel/cpu/cpuid-deps.c | 3 +
arch/x86/kernel/cpu/resctrl/core.c | 25 +-
arch/x86/kernel/cpu/resctrl/internal.h | 56 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 24 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 714 +++++++++++++++++-
arch/x86/kernel/cpu/scattered.c | 1 +
include/linux/resctrl.h | 12 +
11 files changed, 964 insertions(+), 20 deletions(-)
--
2.34.1
Babu Moger (17):
x86/resctrl: Add support for Assignable Bandwidth Monitoring Counters
(ABMC)
x86/resctrl: Add ABMC feature in the command line options
x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
x86/resctrl: Introduce resctrl_file_fflags_init
x86/resctrl: Introduce the interface to display the assignment state
x86/resctrl: Introduce interface to display number of ABMC counters
x86/resctrl: Add support to enable/disable ABMC feature
x86/resctrl: Initialize assignable counters bitmap
x86/resctrl: Introduce assign state for the mon group
x86/resctrl: Add data structures for ABMC assignment
x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg
x86/resctrl: Add the functionality to assign the RMID
x86/resctrl: Add the functionality to unassign the RMID
x86/resctrl: Enable ABMC by default on resctrl mount
x86/resctrl: Introduce the interface switch between ABMC and
legacy_mbm
x86/resctrl: Introduce interface to list assignment states of all the
groups
x86/resctrl: Introduce interface to modify assignment states of the
groups
.../admin-guide/kernel-parameters.txt | 2 +-
Documentation/arch/x86/resctrl.rst | 144 ++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 2 +
arch/x86/kernel/cpu/cpuid-deps.c | 3 +
arch/x86/kernel/cpu/resctrl/core.c | 25 +-
arch/x86/kernel/cpu/resctrl/internal.h | 56 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 24 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 714 +++++++++++++++++-
arch/x86/kernel/cpu/scattered.c | 1 +
include/linux/resctrl.h | 12 +
11 files changed, 964 insertions(+), 20 deletions(-)
--
2.34.1
Hi Babu, On 3/28/2024 6:06 PM, Babu Moger wrote: > a. Check if ABMC support is available > #mount -t resctrl resctrl /sys/fs/resctrl/ > > #cat /sys/fs/resctrl/info/L3_MON/mbm_assign > [abmc] > legacy_mbm > > Linux kernel detected ABMC feature and it is enabled. Please note that this adds the "abmc" feature to the resctrl *filesystem* that supports more architectures than just AMD. Calling the resctrl filesystem feature "abmc" means that (a) AMD needs to be ok with other architectures calling their features that are similar-but-maybe-not-identical-to-AMD-ABMC "abmc", or (b) this needs a new generic name. > b. Check how many ABMC counters are available. > > #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_cntrs > 32 > > c. Create few resctrl groups. > > # mkdir /sys/fs/resctrl/mon_groups/default_mon1 > # mkdir /sys/fs/resctrl/non_defult_group Can this be non_default_group instead? Seems like non_defult_group is used consistently but its spelling is unexpected. > # mkdir /sys/fs/resctrl/non_defult_group/mon_groups/non_default_mon1 > > d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control > to list and modify the group's assignment states. > > The list follows the following format: > > * Default CTRL_MON group: > "//<domain_id>=<assignment_flags>" > > * Non-default CTRL_MON group: > "<CTRL_MON group>//<domain_id>=<assignment_flags>" > > * Child MON group of default CTRL_MON group: > "/<MON group>/<domain_id>=<assignment_flags>" > > * Child MON group of non-default CTRL_MON group: > "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>" > > Assignment flags can be one of the following: > > t MBM total event is assigned > l MBM local event is assigned > tl Both total and local MBM events are assigned > _ None of the MBM events are assigned > > Examples: > > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > > There are four groups and all the groups have local and total event assigned. > > "//" - This is a default CONTROL MON group > > "non_defult_group//" - This is non default CONTROL MON group > > "/default_mon1/" - This is Child MON group of the defult group > > "non_defult_group/non_default_mon1/" - This is child MON group of the non default group > > =tl means both total and local events are assigned. > > e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control. > > The write format is similar to the above list format with addition of > op-code for the assignment operation. > > * Default CTRL_MON group: > "//<domain_id><op-code><assignment_flags>" > > * Non-default CTRL_MON group: > "<CTRL_MON group>//<domain_id><op-code><assignment_flags>" > > * Child MON group of default CTRL_MON group: > "/<MON group>/<domain_id><op-code><assignment_flags>" > > * Child MON group of non-default CTRL_MON group: > "<CTRL_MON group>/<MON group>/<domain_id><op-code><assignment_flags>" > > Op-code can be one of the following: > > = Update the assignment to match the flags > + Assign a new state > - Unassign a new state > _ Unassign all the states As mentioned in https://lore.kernel.org/lkml/ZjO9hpuLz%2FjJYqvT@e133380.arm.com/ the "_" is not an operator but instead viewed as an part of <assignment_flags>. It is expected to be used with "=", to unset flags it will be used as below: echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/0=_" ... > > > Initial group status: > > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > non_default_ctrl_mon_grp//0=tl;1=tl; > non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; > //0=tl;1=tl; > /child_default_mon_grp/0=tl;1=tl; > > > To update the default group to assign only total event. > # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control > > Assignment status after the update: > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > non_default_ctrl_mon_grp//0=tl;1=tl; > non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; > //0=t;1=t; > /child_default_mon_grp/0=tl;1=tl; As mentioned in https://lore.kernel.org/lkml/330e3391-b917-4a88-bae3-bdcbb8cfd6f4@intel.com/ using "0=t" is expected to only impact domain #0, not all domains. Similar for other examples below. > > To update the MON group child_default_mon_grp to remove local event: > # echo "/child_default_mon_grp/0-l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control > > Assignment status after the update: > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > //0=t;1=t; > /child_default_mon_grp/0=t;1=t; > non_default_ctrl_mon_grp//0=tl;1=tl; > non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; > > To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to > remove both local and total events: > # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/0_" > > /sys/fs/resctrl/info/L3_MON/mbm_assign_control > > Assignment status after the update: > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > //0=t;1=t; > /child_default_mon_grp/0=t;1=t; > non_default_ctrl_mon_grp//0=tl;1=tl; > non_default_ctrl_mon_grp/child_non_default_mon_grp/0=_;1=_; > > > f. Read the event mbm_total_bytes and mbm_local_bytes of the default group. > There is no change in reading the evetns with ABMC. If the event is unassigned evetns -> events > when reading, then the read will come back as Unavailable. > > # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes > 779247936 > # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes > 765207488 > > g. Users will have the option to go back to legacy_mbm mode if required. > This can be done using the following command. > > # echo "legacy_mbm" > /sys/fs/resctrl/info/L3_MON/mbm_assign > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign > abmc > [legacy_mbm] > This needs a mention about how state is impacted when a user makes this switch. For example, if switching from "legacy" to abmc ... if there are fewer than "num counters" monitor groups, will they get counters assigned dynamically? What happens to feature specific resctrl files? What happens to the counters themselves, are they reset? What else happens during this switch? > > h. Check the bandwidth configuration for the group. Note that bandwidth > configuration has a domain scope. Total event defaults to 0x7F (to > count all the events) and local event defaults to 0x15 (to count all > the local numa events). The event bitmap decoding is available at > https://www.kernel.org/doc/Documentation/x86/resctrl.rst > in section "mbm_total_bytes_config", "mbm_local_bytes_config": > > #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config > 0=0x7f;1=0x7f > > #cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config > 0=0x15;1=0x15 > > j. Change the bandwidth source for domain 0 for the total event to count only reads. > Note that this change effects total events on the domain 0. > > #echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config > #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config > 0=0x33;1=0x7F > > k. Now read the total event again. The mbm_total_bytes should display > only the read events. > > #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes > 314101 > > l. Unmount the resctrl > > #umount /sys/fs/resctrl/ > > --- Reinette
Hi Reinette, On 5/3/24 18:24, Reinette Chatre wrote: > Hi Babu, > > On 3/28/2024 6:06 PM, Babu Moger wrote: > >> a. Check if ABMC support is available >> #mount -t resctrl resctrl /sys/fs/resctrl/ >> >> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign >> [abmc] >> legacy_mbm >> >> Linux kernel detected ABMC feature and it is enabled. > > Please note that this adds the "abmc" feature to the resctrl > *filesystem* that supports more architectures than just AMD. Calling the > resctrl filesystem feature "abmc" means that (a) AMD needs to be ok with > other architectures calling their features that are > similar-but-maybe-not-identical-to-AMD-ABMC "abmc", or (b) this needs > a new generic name. It should not a problem if other architecture calling abmc for similar feature. But generic name is always better if there is a suggestion. > >> b. Check how many ABMC counters are available. >> >> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_cntrs >> 32 >> >> c. Create few resctrl groups. >> >> # mkdir /sys/fs/resctrl/mon_groups/default_mon1 >> # mkdir /sys/fs/resctrl/non_defult_group > > Can this be non_default_group instead? Seems like non_defult_group is used > consistently but its spelling is unexpected. Ok. Will correct it. Thanks > >> # mkdir /sys/fs/resctrl/non_defult_group/mon_groups/non_default_mon1 >> >> d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> to list and modify the group's assignment states. >> >> The list follows the following format: >> >> * Default CTRL_MON group: >> "//<domain_id>=<assignment_flags>" >> >> * Non-default CTRL_MON group: >> "<CTRL_MON group>//<domain_id>=<assignment_flags>" >> >> * Child MON group of default CTRL_MON group: >> "/<MON group>/<domain_id>=<assignment_flags>" >> >> * Child MON group of non-default CTRL_MON group: >> "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>" >> >> Assignment flags can be one of the following: >> >> t MBM total event is assigned >> l MBM local event is assigned >> tl Both total and local MBM events are assigned >> _ None of the MBM events are assigned >> >> Examples: >> >> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; >> non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; >> //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; >> /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; >> >> There are four groups and all the groups have local and total event assigned. >> >> "//" - This is a default CONTROL MON group >> >> "non_defult_group//" - This is non default CONTROL MON group >> >> "/default_mon1/" - This is Child MON group of the defult group >> >> "non_defult_group/non_default_mon1/" - This is child MON group of the non default group >> >> =tl means both total and local events are assigned. >> >> e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control. >> >> The write format is similar to the above list format with addition of >> op-code for the assignment operation. >> >> * Default CTRL_MON group: >> "//<domain_id><op-code><assignment_flags>" >> >> * Non-default CTRL_MON group: >> "<CTRL_MON group>//<domain_id><op-code><assignment_flags>" >> >> * Child MON group of default CTRL_MON group: >> "/<MON group>/<domain_id><op-code><assignment_flags>" >> >> * Child MON group of non-default CTRL_MON group: >> "<CTRL_MON group>/<MON group>/<domain_id><op-code><assignment_flags>" >> >> Op-code can be one of the following: >> >> = Update the assignment to match the flags >> + Assign a new state >> - Unassign a new state >> _ Unassign all the states > > As mentioned in https://lore.kernel.org/lkml/ZjO9hpuLz%2FjJYqvT@e133380.arm.com/ > the "_" is not an operator but instead viewed as an part of <assignment_flags>. > It is expected to be used with "=", to unset flags it will be used as below: > > echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/0=_" ... Oh.. ok. Will correct it. I also need to verify the parshing.. > >> >> >> Initial group status: >> >> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> non_default_ctrl_mon_grp//0=tl;1=tl; >> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; >> //0=tl;1=tl; >> /child_default_mon_grp/0=tl;1=tl; >> >> >> To update the default group to assign only total event. >> # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> >> Assignment status after the update: >> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> non_default_ctrl_mon_grp//0=tl;1=tl; >> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; >> //0=t;1=t; >> /child_default_mon_grp/0=tl;1=tl; > > As mentioned in https://lore.kernel.org/lkml/330e3391-b917-4a88-bae3-bdcbb8cfd6f4@intel.com/ > using "0=t" is expected to only impact domain #0, not all domains. Similar for > other examples below. Ok. Sure > >> >> To update the MON group child_default_mon_grp to remove local event: >> # echo "/child_default_mon_grp/0-l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> >> Assignment status after the update: >> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> //0=t;1=t; >> /child_default_mon_grp/0=t;1=t; >> non_default_ctrl_mon_grp//0=tl;1=tl; >> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl; >> >> To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to >> remove both local and total events: >> # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/0_" > >> /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> >> Assignment status after the update: >> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> //0=t;1=t; >> /child_default_mon_grp/0=t;1=t; >> non_default_ctrl_mon_grp//0=tl;1=tl; >> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=_;1=_; >> >> >> f. Read the event mbm_total_bytes and mbm_local_bytes of the default group. >> There is no change in reading the evetns with ABMC. If the event is unassigned > > evetns -> events Sure. > >> when reading, then the read will come back as Unavailable. >> >> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes >> 779247936 >> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes >> 765207488 >> >> g. Users will have the option to go back to legacy_mbm mode if required. >> This can be done using the following command. >> >> # echo "legacy_mbm" > /sys/fs/resctrl/info/L3_MON/mbm_assign >> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign >> abmc >> [legacy_mbm] >> > > This needs a mention about how state is impacted when a user makes this > switch. For example, if switching from "legacy" to abmc ... if there > are fewer than "num counters" monitor groups, will they get counters > assigned dynamically? What happens to feature specific resctrl files? > What happens to the counters themselves, are they reset? What else > happens during this switch? Sure. Will add the explanation. When switching from "legacy" to abmc, events in already created resctrl groups will be in Unassigned states. Users need to assign the monitors to each group to read the events. > >> >> h. Check the bandwidth configuration for the group. Note that bandwidth >> configuration has a domain scope. Total event defaults to 0x7F (to >> count all the events) and local event defaults to 0x15 (to count all >> the local numa events). The event bitmap decoding is available at >> https://www.kernel.org/doc/Documentation/x86/resctrl.rst >> in section "mbm_total_bytes_config", "mbm_local_bytes_config": >> >> #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config >> 0=0x7f;1=0x7f >> >> #cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config >> 0=0x15;1=0x15 >> >> j. Change the bandwidth source for domain 0 for the total event to count only reads. >> Note that this change effects total events on the domain 0. >> >> #echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config >> #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config >> 0=0x33;1=0x7F >> >> k. Now read the total event again. The mbm_total_bytes should display >> only the read events. >> >> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes >> 314101 >> >> l. Unmount the resctrl >> >> #umount /sys/fs/resctrl/ >> >> --- > > Reinette -- Thanks Babu Moger
Hi Babu, On 5/6/2024 10:18 AM, Moger, Babu wrote: > On 5/3/24 18:24, Reinette Chatre wrote: >> On 3/28/2024 6:06 PM, Babu Moger wrote: >> >>> a. Check if ABMC support is available >>> #mount -t resctrl resctrl /sys/fs/resctrl/ >>> >>> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign >>> [abmc] >>> legacy_mbm >>> >>> Linux kernel detected ABMC feature and it is enabled. >> >> Please note that this adds the "abmc" feature to the resctrl >> *filesystem* that supports more architectures than just AMD. Calling the >> resctrl filesystem feature "abmc" means that (a) AMD needs to be ok with >> other architectures calling their features that are >> similar-but-maybe-not-identical-to-AMD-ABMC "abmc", or (b) this needs >> a new generic name. > > It should not a problem if other architecture calling abmc for similar > feature. But generic name is always better if there is a suggestion. "should not a problem" does not instill confidence that AMD is actually ok with this. Reinette
Hi Reinette, On 5/7/24 15:26, Reinette Chatre wrote: > Hi Babu, > > On 5/6/2024 10:18 AM, Moger, Babu wrote: >> On 5/3/24 18:24, Reinette Chatre wrote: >>> On 3/28/2024 6:06 PM, Babu Moger wrote: >>> >>>> a. Check if ABMC support is available >>>> #mount -t resctrl resctrl /sys/fs/resctrl/ >>>> >>>> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign >>>> [abmc] >>>> legacy_mbm >>>> >>>> Linux kernel detected ABMC feature and it is enabled. >>> >>> Please note that this adds the "abmc" feature to the resctrl >>> *filesystem* that supports more architectures than just AMD. Calling the >>> resctrl filesystem feature "abmc" means that (a) AMD needs to be ok with >>> other architectures calling their features that are >>> similar-but-maybe-not-identical-to-AMD-ABMC "abmc", or (b) this needs >>> a new generic name. >> >> It should not a problem if other architecture calling abmc for similar >> feature. But generic name is always better if there is a suggestion. > > "should not a problem" does not instill confidence that AMD is > actually ok with this. The feature "ABMC" has been used in the public document already to refer this feature. https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24594.pdf If there comes a conflict then we can change it to amd_abmc. Didn't see any conflict at this pint. -- Thanks Babu Moger
Hi Babu, On 5/8/2024 1:07 PM, Moger, Babu wrote: > Hi Reinette, > > On 5/7/24 15:26, Reinette Chatre wrote: >> Hi Babu, >> >> On 5/6/2024 10:18 AM, Moger, Babu wrote: >>> On 5/3/24 18:24, Reinette Chatre wrote: >>>> On 3/28/2024 6:06 PM, Babu Moger wrote: >>>> >>>>> a. Check if ABMC support is available >>>>> #mount -t resctrl resctrl /sys/fs/resctrl/ >>>>> >>>>> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign >>>>> [abmc] >>>>> legacy_mbm >>>>> >>>>> Linux kernel detected ABMC feature and it is enabled. >>>> >>>> Please note that this adds the "abmc" feature to the resctrl >>>> *filesystem* that supports more architectures than just AMD. Calling the >>>> resctrl filesystem feature "abmc" means that (a) AMD needs to be ok with >>>> other architectures calling their features that are >>>> similar-but-maybe-not-identical-to-AMD-ABMC "abmc", or (b) this needs >>>> a new generic name. >>> >>> It should not a problem if other architecture calling abmc for similar >>> feature. But generic name is always better if there is a suggestion. >> >> "should not a problem" does not instill confidence that AMD is >> actually ok with this. > > The feature "ABMC" has been used in the public document already to refer > this feature. > https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24594.pdf It is clear to me that Assignable Bandwidth Monitoring Counters (ABMC) is the name of the AMD feature. The question is whether users can use the same name to interact with "similar but maybe not identical" features from other architectures, which is what this series enables. > If there comes a conflict then we can change it to amd_abmc. Didn't see > any conflict at this pint. How do you envision this? The resctrl filesystem interface is intended to be architecture neutral so it is not obvious to me how "amd_abmc" is expected to look? Why would it be necessary to have different architecture specific names for a similar feature from different architectures that users interact with in the same way? Sounds to me as though this just needs a new non-AMD marketing name. Reinette
Hi Reinette,
On 5/8/24 15:41, Reinette Chatre wrote:
> Hi Babu,
>
> On 5/8/2024 1:07 PM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 5/7/24 15:26, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 5/6/2024 10:18 AM, Moger, Babu wrote:
>>>> On 5/3/24 18:24, Reinette Chatre wrote:
>>>>> On 3/28/2024 6:06 PM, Babu Moger wrote:
>>>>>
>>>>>> a. Check if ABMC support is available
>>>>>> #mount -t resctrl resctrl /sys/fs/resctrl/
>>>>>>
>>>>>> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>>>>>> [abmc]
>>>>>> legacy_mbm
>>>>>>
>>>>>> Linux kernel detected ABMC feature and it is enabled.
>>>>>
>>>>> Please note that this adds the "abmc" feature to the resctrl
>>>>> *filesystem* that supports more architectures than just AMD. Calling the
>>>>> resctrl filesystem feature "abmc" means that (a) AMD needs to be ok with
>>>>> other architectures calling their features that are
>>>>> similar-but-maybe-not-identical-to-AMD-ABMC "abmc", or (b) this needs
>>>>> a new generic name.
>>>>
>>>> It should not a problem if other architecture calling abmc for similar
>>>> feature. But generic name is always better if there is a suggestion.
>>>
>>> "should not a problem" does not instill confidence that AMD is
>>> actually ok with this.
>>
>> The feature "ABMC" has been used in the public document already to refer
>> this feature.
>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24594.pdf
>
> It is clear to me that Assignable Bandwidth Monitoring Counters (ABMC) is the
> name of the AMD feature. The question is whether users can use the
> same name to interact with "similar but maybe not identical" features from other
> architectures, which is what this series enables.
>
>> If there comes a conflict then we can change it to amd_abmc. Didn't see
>> any conflict at this pint.
>
> How do you envision this? The resctrl filesystem interface is intended to be
> architecture neutral so it is not obvious to me how "amd_abmc" is expected
> to look? Why would it be necessary to have different architecture specific names
> for a similar feature from different architectures that users interact with in
> the same way? Sounds to me as though this just needs a new non-AMD marketing name.
I think I misunderstood it.
It is not a concern to have a same name("abmc") for similar feature across
the architectures.
ABMC is also kind of generic. I am open to other generic suggestions. I
think we should have "assign" and "monitor" words in them.
--
Thanks
Babu Moger
Hi Babu,
On 5/8/2024 4:29 PM, Moger, Babu wrote:
> On 5/8/24 15:41, Reinette Chatre wrote:
>> On 5/8/2024 1:07 PM, Moger, Babu wrote:
>>> On 5/7/24 15:26, Reinette Chatre wrote:
>>>> On 5/6/2024 10:18 AM, Moger, Babu wrote:
>>>>> On 5/3/24 18:24, Reinette Chatre wrote:
>>>>>> On 3/28/2024 6:06 PM, Babu Moger wrote:
>>>>>>
>>>>>>> a. Check if ABMC support is available
>>>>>>> #mount -t resctrl resctrl /sys/fs/resctrl/
>>>>>>>
>>>>>>> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>>>>>>> [abmc]
>>>>>>> legacy_mbm
>>>>>>>
>>>>>>> Linux kernel detected ABMC feature and it is enabled.
>>>>>>
>>>>>> Please note that this adds the "abmc" feature to the resctrl
>>>>>> *filesystem* that supports more architectures than just AMD. Calling the
>>>>>> resctrl filesystem feature "abmc" means that (a) AMD needs to be ok with
>>>>>> other architectures calling their features that are
>>>>>> similar-but-maybe-not-identical-to-AMD-ABMC "abmc", or (b) this needs
>>>>>> a new generic name.
>>>>>
>>>>> It should not a problem if other architecture calling abmc for similar
>>>>> feature. But generic name is always better if there is a suggestion.
>>>>
>>>> "should not a problem" does not instill confidence that AMD is
>>>> actually ok with this.
>>>
>>> The feature "ABMC" has been used in the public document already to refer
>>> this feature.
>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24594.pdf
>>
>> It is clear to me that Assignable Bandwidth Monitoring Counters (ABMC) is the
>> name of the AMD feature. The question is whether users can use the
>> same name to interact with "similar but maybe not identical" features from other
>> architectures, which is what this series enables.
>>
>>> If there comes a conflict then we can change it to amd_abmc. Didn't see
>>> any conflict at this pint.
>>
>> How do you envision this? The resctrl filesystem interface is intended to be
>> architecture neutral so it is not obvious to me how "amd_abmc" is expected
>> to look? Why would it be necessary to have different architecture specific names
>> for a similar feature from different architectures that users interact with in
>> the same way? Sounds to me as though this just needs a new non-AMD marketing name.
>
> I think I misunderstood it.
> It is not a concern to have a same name("abmc") for similar feature across
> the architectures.
Thank you for confirming. This joins BMEC and SMBA in this regard.
> ABMC is also kind of generic. I am open to other generic suggestions. I
> think we should have "assign" and "monitor" words in them.
One thing we can consider is to move to a simple "enable"/"disable"
interface for events. Users do not really need to know that hardware
needs to "assign a counter" to an event for it to measure. Yes,
user space can infer some of this by the number of events that
can be "enabled" at a time, but the concept of "assign a hardware counter"
is abstract and does not directly map to (as I understand) the soft-RMID
approach for other AMD hardware. Peter reminded us a while back [1] that
"assign" has a variety of meanings, even among AMD, so we should aim to
avoid any confusion.
Reinette
[1] https://lore.kernel.org/lkml/CALPaoCjg-W3w8OKLHP_g6Evoo03fbgaOQZrGTLX6vdSLp70=SA@mail.gmail.com/
Hi Reinette,
On 5/9/24 13:07, Reinette Chatre wrote:
> Hi Babu,
>
> On 5/8/2024 4:29 PM, Moger, Babu wrote:
>> On 5/8/24 15:41, Reinette Chatre wrote:
>>> On 5/8/2024 1:07 PM, Moger, Babu wrote:
>>>> On 5/7/24 15:26, Reinette Chatre wrote:
>>>>> On 5/6/2024 10:18 AM, Moger, Babu wrote:
>>>>>> On 5/3/24 18:24, Reinette Chatre wrote:
>>>>>>> On 3/28/2024 6:06 PM, Babu Moger wrote:
>>>>>>>
>>>>>>>> a. Check if ABMC support is available
>>>>>>>> #mount -t resctrl resctrl /sys/fs/resctrl/
>>>>>>>>
>>>>>>>> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>>>>>>>> [abmc]
>>>>>>>> legacy_mbm
>>>>>>>>
>>>>>>>> Linux kernel detected ABMC feature and it is enabled.
>>>>>>>
>>>>>>> Please note that this adds the "abmc" feature to the resctrl
>>>>>>> *filesystem* that supports more architectures than just AMD. Calling the
>>>>>>> resctrl filesystem feature "abmc" means that (a) AMD needs to be ok with
>>>>>>> other architectures calling their features that are
>>>>>>> similar-but-maybe-not-identical-to-AMD-ABMC "abmc", or (b) this needs
>>>>>>> a new generic name.
>>>>>>
>>>>>> It should not a problem if other architecture calling abmc for similar
>>>>>> feature. But generic name is always better if there is a suggestion.
>>>>>
>>>>> "should not a problem" does not instill confidence that AMD is
>>>>> actually ok with this.
>>>>
>>>> The feature "ABMC" has been used in the public document already to refer
>>>> this feature.
>>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24594.pdf
>>>
>>> It is clear to me that Assignable Bandwidth Monitoring Counters (ABMC) is the
>>> name of the AMD feature. The question is whether users can use the
>>> same name to interact with "similar but maybe not identical" features from other
>>> architectures, which is what this series enables.
>>>
>>>> If there comes a conflict then we can change it to amd_abmc. Didn't see
>>>> any conflict at this pint.
>>>
>>> How do you envision this? The resctrl filesystem interface is intended to be
>>> architecture neutral so it is not obvious to me how "amd_abmc" is expected
>>> to look? Why would it be necessary to have different architecture specific names
>>> for a similar feature from different architectures that users interact with in
>>> the same way? Sounds to me as though this just needs a new non-AMD marketing name.
>>
>> I think I misunderstood it.
>> It is not a concern to have a same name("abmc") for similar feature across
>> the architectures.
>
> Thank you for confirming. This joins BMEC and SMBA in this regard.
Yes. Sure.
>
>> ABMC is also kind of generic. I am open to other generic suggestions. I
>> think we should have "assign" and "monitor" words in them.
>
> One thing we can consider is to move to a simple "enable"/"disable"
> interface for events. Users do not really need to know that hardware
> needs to "assign a counter" to an event for it to measure. Yes,
> user space can infer some of this by the number of events that
> can be "enabled" at a time, but the concept of "assign a hardware counter"
> is abstract and does not directly map to (as I understand) the soft-RMID
> approach for other AMD hardware. Peter reminded us a while back [1] that
> "assign" has a variety of meanings, even among AMD, so we should aim to
> avoid any confusion.
Documentation will refer the events as enabled or disabled instead of
assigned or unassigned.
For example:
Event flags can be one of the following:
t MBM total event is enabled
l MBM local event is enabled
tl MBM total and local events are enabled
_ None of the MBM events are enabled
It is fine.
>
> Reinette
>
> [1] https://lore.kernel.org/lkml/CALPaoCjg-W3w8OKLHP_g6Evoo03fbgaOQZrGTLX6vdSLp70=SA@mail.gmail.com/
>
>
--
Thanks
Babu Moger
Hi Babu, On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@amd.com> wrote: > > > This series adds the support for Assignable Bandwidth Monitoring Counters > (ABMC). It is also called QoS RMID Pinning feature > > The feature details are documented in the APM listed below [1]. > [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming > Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth > Monitoring (ABMC). The documentation is available at > Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 > > The patches are based on top of commit > cd80c2c94699913f9334414189487ff3f93cf0b5 (tip/master) > > # Introduction > > AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring > feature only guarantees that RMIDs currently assigned to a processor will > be tracked by hardware. The counters of any other RMIDs which are no longer > being tracked will be reset to zero. The MBM event counters return > "Unavailable" for the RMIDs that are not active. > > Users can create 256 or more monitor groups. But there can be only limited > number of groups that can give guaranteed monitoring numbers. With ever > changing configurations there is no way to definitely know which of these > groups will be active for certain point of time. Users do not have the > option to monitor a group or set of groups for certain period of time > without worrying about RMID being reset in between. > > The ABMC feature provides an option to the user to assign an RMID to the > hardware counter and monitor the bandwidth for a longer duration. > The assigned RMID will be active until the user unassigns it manually. > There is no need to worry about counters being reset during this period. > Additionally, the user can specify a bitmask identifying the specific > bandwidth types from the given source to track with the counter. > > Without ABMC enabled, monitoring will work in current mode without > assignment option. > > # Linux Implementation > > Linux resctrl subsystem provides the interface to count maximum of two > memory bandwidth events per group, from a combination of available total > and local events. Keeping the current interface, users can assign a maximum > of 2 ABMC counters per group. User will also have the option to assign only > one counter to the group. If the system runs out of assignable ABMC > counters, kernel will display an error. Users need to unassign an already > assigned counter to make space for new assignments. > > > # Examples > > a. Check if ABMC support is available > #mount -t resctrl resctrl /sys/fs/resctrl/ > > #cat /sys/fs/resctrl/info/L3_MON/mbm_assign > [abmc] > legacy_mbm > > Linux kernel detected ABMC feature and it is enabled. > > b. Check how many ABMC counters are available. > > #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_cntrs > 32 > > c. Create few resctrl groups. > > # mkdir /sys/fs/resctrl/mon_groups/default_mon1 > # mkdir /sys/fs/resctrl/non_defult_group > # mkdir /sys/fs/resctrl/non_defult_group/mon_groups/non_default_mon1 > > d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control > to list and modify the group's assignment states. > > The list follows the following format: > > * Default CTRL_MON group: > "//<domain_id>=<assignment_flags>" > > * Non-default CTRL_MON group: > "<CTRL_MON group>//<domain_id>=<assignment_flags>" > > * Child MON group of default CTRL_MON group: > "/<MON group>/<domain_id>=<assignment_flags>" > > * Child MON group of non-default CTRL_MON group: > "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>" > > Assignment flags can be one of the following: > > t MBM total event is assigned > l MBM local event is assigned > tl Both total and local MBM events are assigned > _ None of the MBM events are assigned > I was able to successfully build a kernel where this interface is adapted to work with both real ABMC on hardware that supports it and my software workaround for older hardware. My prototype is based on a refactored version of the codebase supporting MPAM, but the capabilities of the MPAM hardware look similar enough to ABMC that I'm not concerned about the feasibility. The FS layer is informed by the arch layer (through rdt_resource fields) how many assignable monitors are available and whether a monitor is assigned to an entire group or a single event in a group. Also, the FS layer can assume that monitors are indexed contiguously, allowing it to host the data structures managing FS-level view of monitor usage. I used the following resctrl_arch-interfaces to propagate assignments to the implementation: void resctrl_arch_assign_monitor(struct rdt_domain *d, u32 mon_id, u32 closid, u32 rmid, int evtid); void resctrl_arch_unassign_monitor(struct rdt_domain *d, u32 mon_id); I chose to allow reassigning an assigned monitor without calling unassign first. This is important when monitors are unassigned and assigned in a single write to mbm_assign_control, as it allows all updates to be performed in a single round of parallel IPIs to the domains. > > g. Users will have the option to go back to legacy_mbm mode if required. > This can be done using the following command. > > # echo "legacy_mbm" > /sys/fs/resctrl/info/L3_MON/mbm_assign > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign > abmc > [legacy_mbm] I chose to make this a mount option to simplify the management of the monitor tracking data structures. They are simply allocated at mount time and deallocated and unmount. I called the option "mon_assign": The mount option parser calls resctrl_arch_mon_assign_enable() to determine whether the implementation supports assignment in some form. If it returns an error, the mount fails. When successful, the assignable monitor count is made non-zero in the appropriate rdt_resource, triggering the behavior change in the FS layer. I'm still not sure if it's a good idea to enable monitor assignment by default. This would be a major disruption in the MBM usage model triggered by moving software between AMD CPU models. I thought the safest option was to disallow creating more monitoring groups than monitors unless the option is selected. Given that nobody else complained about monitoring HW limitations on the mailing list, I assumed few users create enough monitoring groups to be impacted. Thanks! -Peter
Hi Peter, On 5/1/24 12:48, Peter Newman wrote: > Hi Babu, > > On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@amd.com> wrote: >> >> >> This series adds the support for Assignable Bandwidth Monitoring Counters >> (ABMC). It is also called QoS RMID Pinning feature >> >> The feature details are documented in the APM listed below [1]. >> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming >> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth >> Monitoring (ABMC). The documentation is available at >> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 >> >> The patches are based on top of commit >> cd80c2c94699913f9334414189487ff3f93cf0b5 (tip/master) >> >> # Introduction >> >> AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring >> feature only guarantees that RMIDs currently assigned to a processor will >> be tracked by hardware. The counters of any other RMIDs which are no longer >> being tracked will be reset to zero. The MBM event counters return >> "Unavailable" for the RMIDs that are not active. >> >> Users can create 256 or more monitor groups. But there can be only limited >> number of groups that can give guaranteed monitoring numbers. With ever >> changing configurations there is no way to definitely know which of these >> groups will be active for certain point of time. Users do not have the >> option to monitor a group or set of groups for certain period of time >> without worrying about RMID being reset in between. >> >> The ABMC feature provides an option to the user to assign an RMID to the >> hardware counter and monitor the bandwidth for a longer duration. >> The assigned RMID will be active until the user unassigns it manually. >> There is no need to worry about counters being reset during this period. >> Additionally, the user can specify a bitmask identifying the specific >> bandwidth types from the given source to track with the counter. >> >> Without ABMC enabled, monitoring will work in current mode without >> assignment option. >> >> # Linux Implementation >> >> Linux resctrl subsystem provides the interface to count maximum of two >> memory bandwidth events per group, from a combination of available total >> and local events. Keeping the current interface, users can assign a maximum >> of 2 ABMC counters per group. User will also have the option to assign only >> one counter to the group. If the system runs out of assignable ABMC >> counters, kernel will display an error. Users need to unassign an already >> assigned counter to make space for new assignments. >> >> >> # Examples >> >> a. Check if ABMC support is available >> #mount -t resctrl resctrl /sys/fs/resctrl/ >> >> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign >> [abmc] >> legacy_mbm >> >> Linux kernel detected ABMC feature and it is enabled. >> >> b. Check how many ABMC counters are available. >> >> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_cntrs >> 32 >> >> c. Create few resctrl groups. >> >> # mkdir /sys/fs/resctrl/mon_groups/default_mon1 >> # mkdir /sys/fs/resctrl/non_defult_group >> # mkdir /sys/fs/resctrl/non_defult_group/mon_groups/non_default_mon1 >> >> d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> to list and modify the group's assignment states. >> >> The list follows the following format: >> >> * Default CTRL_MON group: >> "//<domain_id>=<assignment_flags>" >> >> * Non-default CTRL_MON group: >> "<CTRL_MON group>//<domain_id>=<assignment_flags>" >> >> * Child MON group of default CTRL_MON group: >> "/<MON group>/<domain_id>=<assignment_flags>" >> >> * Child MON group of non-default CTRL_MON group: >> "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>" >> >> Assignment flags can be one of the following: >> >> t MBM total event is assigned >> l MBM local event is assigned >> tl Both total and local MBM events are assigned >> _ None of the MBM events are assigned >> > > I was able to successfully build a kernel where this interface is > adapted to work with both real ABMC on hardware that supports it and > my software workaround for older hardware. Thanks for trying that out. Good to know. > > My prototype is based on a refactored version of the codebase > supporting MPAM, but the capabilities of the MPAM hardware look > similar enough to ABMC that I'm not concerned about the feasibility. That is good. > > The FS layer is informed by the arch layer (through rdt_resource > fields) how many assignable monitors are available and whether a > monitor is assigned to an entire group or a single event in a group. > Also, the FS layer can assume that monitors are indexed contiguously, > allowing it to host the data structures managing FS-level view of > monitor usage. > > I used the following resctrl_arch-interfaces to propagate assignments > to the implementation: > > void resctrl_arch_assign_monitor(struct rdt_domain *d, u32 mon_id, u32 > closid, u32 rmid, int evtid); Sure. I can add these in next version. Few comments.. AMD does not need closid for assignment. I assume ARM requires closid. What is mon_id here? > void resctrl_arch_unassign_monitor(struct rdt_domain *d, u32 mon_id); We need rmid and evtid for unassign interface here. > > I chose to allow reassigning an assigned monitor without calling > unassign first. This is important when monitors are unassigned and > assigned in a single write to mbm_assign_control, as it allows all > updates to be performed in a single round of parallel IPIs to the > domains. Yes. It is not required to call unassign before assign. Hardware(AMD) supports it. But, we only have 32 counters. We need to know which counter we are going to use for assignment. If all the counters already assigned, then we can't figure out the counter id without calling unassigm first. Using the random counter will overwrite the already assigned counter. > > >> >> g. Users will have the option to go back to legacy_mbm mode if required. >> This can be done using the following command. >> >> # echo "legacy_mbm" > /sys/fs/resctrl/info/L3_MON/mbm_assign >> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign >> abmc >> [legacy_mbm] > > I chose to make this a mount option to simplify the management of the > monitor tracking data structures. They are simply allocated at mount > time and deallocated and unmount. Initially I added it as an mount option. Based on our earlier discussion, we decided to use the assign feature by default if hardware supports it. Users don't have to worry about the details. > > I called the option "mon_assign": The mount option parser calls > resctrl_arch_mon_assign_enable() to determine whether the > implementation supports assignment in some form. If it returns an > error, the mount fails. When successful, the assignable monitor count > is made non-zero in the appropriate rdt_resource, triggering the > behavior change in the FS layer. > > I'm still not sure if it's a good idea to enable monitor assignment by > default. This would be a major disruption in the MBM usage model > triggered by moving software between AMD CPU models. I thought the Why will it be a disruption? Why do you think mount option will solve the problem? As always, there will be option to go back to legacy mode. right? > safest option was to disallow creating more monitoring groups than > monitors unless the option is selected. Given that nobody else Current code allows to create more groups, but it will report "Monitor assignment failed" when it runs out of monitors. > complained about monitoring HW limitations on the mailing list, I > assumed few users create enough monitoring groups to be impacted. > > Thanks! > -Peter -- Thanks Babu Moger
Hi Babu, On Thu, May 2, 2024 at 9:25 AM Moger, Babu <babu.moger@amd.com> wrote: > On 5/1/24 12:48, Peter Newman wrote: > > The FS layer is informed by the arch layer (through rdt_resource > > fields) how many assignable monitors are available and whether a > > monitor is assigned to an entire group or a single event in a group. > > Also, the FS layer can assume that monitors are indexed contiguously, > > allowing it to host the data structures managing FS-level view of > > monitor usage. > > > > I used the following resctrl_arch-interfaces to propagate assignments > > to the implementation: > > > > void resctrl_arch_assign_monitor(struct rdt_domain *d, u32 mon_id, u32 > > closid, u32 rmid, int evtid); > > Sure. I can add these in next version. > > Few comments.. > > AMD does not need closid for assignment. I assume ARM requires closid. Correct, MPAM needs a CLOSID+RMID (PARTID+PMG) to identify a monitoring group. The CLOSID parameter is ignored on x86. > > What is mon_id here? On ABMC, the value is programmed into L3_QOS_ABMC_CFG.CtrID > > > void resctrl_arch_unassign_monitor(struct rdt_domain *d, u32 mon_id); > > We need rmid and evtid for unassign interface here. From my reading of the ABMC specification, it does not look necessary to program BwSrc or BwType when changing L3_QOS_ABMC_CFG.CtrEn to 0 for a particular CtrID. This interface only disables a counter, so it should not need to know about how it was previously used when assign is able to reassign, as assign will always reset the arch_mbm data. I do not see any harm in the arch_mbm data being stale while the counter is unassigned, because the data is not accessed when reading the hardware counter fails. In general, resctrl_arch_rmid_read() cannot return any information if the hardware counter is not readable at the time it is called. > > > > > > I chose to allow reassigning an assigned monitor without calling > > unassign first. This is important when monitors are unassigned and > > assigned in a single write to mbm_assign_control, as it allows all > > updates to be performed in a single round of parallel IPIs to the > > domains. > > Yes. It is not required to call unassign before assign. Hardware(AMD) > supports it. > > But, we only have 32 counters. We need to know which counter we are going > to use for assignment. If all the counters already assigned, then we can't > figure out the counter id without calling unassigm first. Using the random > counter will overwrite the already assigned counter. I made the caller of resctrl_arch_assign_monitor() responsible for selecting which monitor to assign. As long as the user orders the unassign operations before the assign operations in a write to mbm_assign_control, the FS code will be able to find an available monitor ID. > > I chose to make this a mount option to simplify the management of the > > monitor tracking data structures. They are simply allocated at mount > > time and deallocated and unmount. > > Initially I added it as an mount option. > Based on our earlier discussion, we decided to use the assign feature by > default if hardware supports it. Users don't have to worry about the details. > > > > I called the option "mon_assign": The mount option parser calls > > resctrl_arch_mon_assign_enable() to determine whether the > > implementation supports assignment in some form. If it returns an > > error, the mount fails. When successful, the assignable monitor count > > is made non-zero in the appropriate rdt_resource, triggering the > > behavior change in the FS layer. > > > > I'm still not sure if it's a good idea to enable monitor assignment by > > default. This would be a major disruption in the MBM usage model > > triggered by moving software between AMD CPU models. I thought the > > Why will it be a disruption? Why do you think mount option will solve the > problem? As always, there will be option to go back to legacy mode. right? > > > safest option was to disallow creating more monitoring groups than > > monitors unless the option is selected. Given that nobody else > > Current code allows to create more groups, but it will report "Monitor > assignment failed" when it runs out of monitors. Ok that should be fine then. However, I don't think it's necessary to support dynamically changing the usage model of monitoring groups without remounting. I believe it makes it more difficult for the FS code to generically manage monitor assignment. -Peter
Hi Peter, On 5/2/24 12:50, Peter Newman wrote: > Hi Babu, > > On Thu, May 2, 2024 at 9:25 AM Moger, Babu <babu.moger@amd.com> wrote: >> On 5/1/24 12:48, Peter Newman wrote: >>> The FS layer is informed by the arch layer (through rdt_resource >>> fields) how many assignable monitors are available and whether a >>> monitor is assigned to an entire group or a single event in a group. >>> Also, the FS layer can assume that monitors are indexed contiguously, >>> allowing it to host the data structures managing FS-level view of >>> monitor usage. >>> >>> I used the following resctrl_arch-interfaces to propagate assignments >>> to the implementation: >>> >>> void resctrl_arch_assign_monitor(struct rdt_domain *d, u32 mon_id, u32 >>> closid, u32 rmid, int evtid); >> >> Sure. I can add these in next version. >> >> Few comments.. >> >> AMD does not need closid for assignment. I assume ARM requires closid. > > Correct, MPAM needs a CLOSID+RMID (PARTID+PMG) to identify a > monitoring group. The CLOSID parameter is ignored on x86. > >> >> What is mon_id here? > > On ABMC, the value is programmed into L3_QOS_ABMC_CFG.CtrID ok. > > >> >>> void resctrl_arch_unassign_monitor(struct rdt_domain *d, u32 mon_id); >> >> We need rmid and evtid for unassign interface here. > > From my reading of the ABMC specification, it does not look necessary > to program BwSrc or BwType when changing L3_QOS_ABMC_CFG.CtrEn to 0 > for a particular CtrID. This interface only disables a counter, so it > should not need to know about how it was previously used when assign > is able to reassign, as assign will always reset the arch_mbm data. Yes. That is correct. We may not need to set BwSrc or BwType for unassign. But, we need evtid to update the monitor state of the rdtgroup. > > I do not see any harm in the arch_mbm data being stale while the > counter is unassigned, because the data is not accessed when reading > the hardware counter fails. In general, resctrl_arch_rmid_read() > cannot return any information if the hardware counter is not readable > at the time it is called. Ok. Le me check about keeping the stale arch_mbm data after unassign. It may be okay. > >> >> >>> >>> I chose to allow reassigning an assigned monitor without calling >>> unassign first. This is important when monitors are unassigned and >>> assigned in a single write to mbm_assign_control, as it allows all >>> updates to be performed in a single round of parallel IPIs to the >>> domains. >> >> Yes. It is not required to call unassign before assign. Hardware(AMD) >> supports it. >> >> But, we only have 32 counters. We need to know which counter we are going >> to use for assignment. If all the counters already assigned, then we can't >> figure out the counter id without calling unassigm first. Using the random >> counter will overwrite the already assigned counter. > > I made the caller of resctrl_arch_assign_monitor() responsible for > selecting which monitor to assign. As long as the user orders the > unassign operations before the assign operations in a write to > mbm_assign_control, the FS code will be able to find an available > monitor ID. How does assign_resctrl_arch_assign_monitor() selects the monitor id (or counter id) if all of them are assigned already. In this series the monitor ids are allocated using assign_cntrs_alloc. rdtgroup_assign_abmc() calls assign_cntrs_alloc() to get monitor id. It reports error if it cannot get free monitor id. Expectation is the user to unassign an event from another group(or the same group) before calling assign. Are you expecting something else here? > > >>> I chose to make this a mount option to simplify the management of the >>> monitor tracking data structures. They are simply allocated at mount >>> time and deallocated and unmount. >> >> Initially I added it as an mount option. >> Based on our earlier discussion, we decided to use the assign feature by >> default if hardware supports it. Users don't have to worry about the details. >>> >>> I called the option "mon_assign": The mount option parser calls >>> resctrl_arch_mon_assign_enable() to determine whether the >>> implementation supports assignment in some form. If it returns an >>> error, the mount fails. When successful, the assignable monitor count >>> is made non-zero in the appropriate rdt_resource, triggering the >>> behavior change in the FS layer. >>> >>> I'm still not sure if it's a good idea to enable monitor assignment by >>> default. This would be a major disruption in the MBM usage model >>> triggered by moving software between AMD CPU models. I thought the >> >> Why will it be a disruption? Why do you think mount option will solve the >> problem? As always, there will be option to go back to legacy mode. right? >> >>> safest option was to disallow creating more monitoring groups than >>> monitors unless the option is selected. Given that nobody else >> >> Current code allows to create more groups, but it will report "Monitor >> assignment failed" when it runs out of monitors. > > Ok that should be fine then. > > However, I don't think it's necessary to support dynamically changing > the usage model of monitoring groups without remounting. I believe it > makes it more difficult for the FS code to generically manage monitor > assignment. Are you suggesting to enable ABMC by default when available? Then provide the mount option switch back to legacy mode? I am fine with that if we all agree on that. -- Thanks Babu Moger
Hi Peter and Babu, On 5/2/2024 1:14 PM, Moger, Babu wrote: > On 5/2/24 12:50, Peter Newman wrote: >> On Thu, May 2, 2024 at 9:25 AM Moger, Babu <babu.moger@amd.com> wrote: >>> On 5/1/24 12:48, Peter Newman wrote: ... >>>> I chose to make this a mount option to simplify the management of the >>>> monitor tracking data structures. They are simply allocated at mount >>>> time and deallocated and unmount. >>> >>> Initially I added it as an mount option. >>> Based on our earlier discussion, we decided to use the assign feature by >>> default if hardware supports it. Users don't have to worry about the details. >>>> >>>> I called the option "mon_assign": The mount option parser calls >>>> resctrl_arch_mon_assign_enable() to determine whether the >>>> implementation supports assignment in some form. If it returns an >>>> error, the mount fails. When successful, the assignable monitor count >>>> is made non-zero in the appropriate rdt_resource, triggering the >>>> behavior change in the FS layer. >>>> >>>> I'm still not sure if it's a good idea to enable monitor assignment by >>>> default. This would be a major disruption in the MBM usage model >>>> triggered by moving software between AMD CPU models. I thought the >>> >>> Why will it be a disruption? Why do you think mount option will solve the >>> problem? As always, there will be option to go back to legacy mode. right? >>> >>>> safest option was to disallow creating more monitoring groups than >>>> monitors unless the option is selected. Given that nobody else >>> >>> Current code allows to create more groups, but it will report "Monitor >>> assignment failed" when it runs out of monitors. >> >> Ok that should be fine then. >> >> However, I don't think it's necessary to support dynamically changing >> the usage model of monitoring groups without remounting. I believe it >> makes it more difficult for the FS code to generically manage monitor >> assignment. > > Are you suggesting to enable ABMC by default when available? I do think ABMC should be enabled by default when available and it looks to be what this series aims to do [1]. The way I reason about this is that legacy user space gets more reliable monitoring behavior without needing to change behavior. I thought there was discussion about communicating to user space when an attempt is made to read data from an event that does not have a counter assigned. Something like below but I did not notice this in this series. # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes Unassigned > > Then provide the mount option switch back to legacy mode? > I am fine with that if we all agree on that. Why is a mount option needed? I think we should avoid requiring a remount unless required and I would like to understand why it is required here. Peter: could you please elaborate what you mean with it makes it more difficult for the FS code to generically manage monitor assignment? Why would user space be required to recreate all control and monitor groups if wanting to change how memory bandwidth monitoring is done? From this implementation it has been difficult to understand the impact of switching between ABMC and legacy. Reinette [1] https://lore.kernel.org/lkml/e898059f3c182886b1c16353be7db76d9b852b02.1711674410.git.babu.moger@amd.com/
Hi Reinette, On Thu, May 2, 2024 at 4:21 PM Reinette Chatre <reinette.chatre@intel.com> wrote: > > Hi Peter and Babu, > > On 5/2/2024 1:14 PM, Moger, Babu wrote: > > Are you suggesting to enable ABMC by default when available? > > I do think ABMC should be enabled by default when available and it looks > to be what this series aims to do [1]. The way I reason about this is > that legacy user space gets more reliable monitoring behavior without > needing to change behavior. I don't like that for a monitor assignment-aware user, following the creation of new monitoring groups, there will be less monitors available for assignment. If the user wants precise control over where monitors are allocated, they would need to manually unassign the automatically-assigned monitor after creating new groups. It's an annoyance, but I'm not sure if it would break any realistic usage model. Maybe if the monitoring agent operates independently of whoever creates monitoring groups it could result in brief periods where less monitors than expected are available because whoever just created a new monitoring group hasn't given the automatically-assigned monitors back yet. > > I thought there was discussion about communicating to user space > when an attempt is made to read data from an event that does not > have a counter assigned. Something like below but I did not notice this > in this series. > > # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes > Unassigned > > > > > Then provide the mount option switch back to legacy mode? > > I am fine with that if we all agree on that. > > Why is a mount option needed? I think we should avoid requiring a remount > unless required and I would like to understand why it is required here. > > Peter: could you please elaborate what you mean with it makes it more > difficult for the FS code to generically manage monitor assignment? > > Why would user space be required to recreate all control and monitor > groups if wanting to change how memory bandwidth monitoring is done? I was looking at this more from the perspective of whether it's necessary to support the live transition of the groups' configuration back and forth between programming models. I find it very unlikely for the userspace controller software to change its mind about the programming model for monitoring in a running system, so I thought this would be in the same category as choosing at mount time whether or not to use CDP or the MBA software controller. Also, in the software implementation of monitor assignment for older AMD processors, which is based on allocating a subset of RMIDs, I'm concerned that the context switch handler would want to read the monitors associated with the incoming thread's current group to determine whether it should use one of the tracked RMIDs. I believe it would be cleaner if the lifetime of the generic monitor-tracking structures would last until the static branches gating __resctrl_sched_in() could be disabled. > > From this implementation it has been difficult to understand the impact > of switching between ABMC and legacy. I'll see if there's a good way to share my software monitor assignment prototype so it's clearer how the user interface would interact with diverse implementations. Unfortunately, it's difficult to see the required abstraction boundaries without the fs/resctrl refactoring changes[1] applied. It would also require my changes[2] for reading a thread's RMID from the FS structures to prevent monitor assignments from forcing an update of all task_structs in the system. -Peter [1] https://lore.kernel.org/lkml/20240426150537.8094-1-Dave.Martin@arm.com/ [2] https://lore.kernel.org/lkml/20240325172707.73966-1-peternewman@google.com/
Hi Peter, On 5/2/2024 5:57 PM, Peter Newman wrote: > Hi Reinette, > > On Thu, May 2, 2024 at 4:21 PM Reinette Chatre > <reinette.chatre@intel.com> wrote: >> >> Hi Peter and Babu, >> >> On 5/2/2024 1:14 PM, Moger, Babu wrote: >>> Are you suggesting to enable ABMC by default when available? >> >> I do think ABMC should be enabled by default when available and it looks >> to be what this series aims to do [1]. The way I reason about this is >> that legacy user space gets more reliable monitoring behavior without >> needing to change behavior. > > I don't like that for a monitor assignment-aware user, following the > creation of new monitoring groups, there will be less monitors > available for assignment. If the user wants precise control over where > monitors are allocated, they would need to manually unassign the > automatically-assigned monitor after creating new groups. > > It's an annoyance, but I'm not sure if it would break any realistic > usage model. Maybe if the monitoring agent operates independently of > whoever creates monitoring groups it could result in brief periods > where less monitors than expected are available because whoever just > created a new monitoring group hasn't given the automatically-assigned > monitors back yet. > I will respond in other thread. >> >> I thought there was discussion about communicating to user space >> when an attempt is made to read data from an event that does not >> have a counter assigned. Something like below but I did not notice this >> in this series. >> >> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes >> Unassigned >> >>> >>> Then provide the mount option switch back to legacy mode? >>> I am fine with that if we all agree on that. >> >> Why is a mount option needed? I think we should avoid requiring a remount >> unless required and I would like to understand why it is required here. >> >> Peter: could you please elaborate what you mean with it makes it more >> difficult for the FS code to generically manage monitor assignment? >> >> Why would user space be required to recreate all control and monitor >> groups if wanting to change how memory bandwidth monitoring is done? > > I was looking at this more from the perspective of whether it's > necessary to support the live transition of the groups' configuration > back and forth between programming models. I find it very unlikely > for the userspace controller software to change its mind about the > programming model for monitoring in a running system, so I thought > this would be in the same category as choosing at mount time whether > or not to use CDP or the MBA software controller. This seems reasonable to me if only considering ABMC and legacy. When also taking into account soft-RMID it is no longer obvious to me. I do still have an impression that the soft-RMID solution impacts context switch duration so I am considering the scenario where user space may want to use soft-RMID for portions of time to get an idea of workload behavior and then dynamically move to less accurate measurements to not impact the workloads all the time. In this case perhaps more like how user space can dynamically change power saving mode based on requirements of responsiveness etc. > Also, in the software implementation of monitor assignment for older > AMD processors, which is based on allocating a subset of RMIDs, I'm > concerned that the context switch handler would want to read the > monitors associated with the incoming thread's current group to > determine whether it should use one of the tracked RMIDs. I believe it > would be cleaner if the lifetime of the generic monitor-tracking > structures would last until the static branches gating > __resctrl_sched_in() could be disabled. Yes, this falls under the umbrella of needing to understand the impact of switching between mechanisms that is not obvious to me. > >> >> From this implementation it has been difficult to understand the impact >> of switching between ABMC and legacy. > > I'll see if there's a good way to share my software monitor assignment > prototype so it's clearer how the user interface would interact with > diverse implementations. Unfortunately, it's difficult to see the > required abstraction boundaries without the fs/resctrl refactoring > changes[1] applied. It would also require my changes[2] for reading a > thread's RMID from the FS structures to prevent monitor assignments > from forcing an update of all task_structs in the system. > > -Peter > > [1] https://lore.kernel.org/lkml/20240426150537.8094-1-Dave.Martin@arm.com/ > [2] https://lore.kernel.org/lkml/20240325172707.73966-1-peternewman@google.com/
Hi Peter, On 5/2/2024 7:57 PM, Peter Newman wrote: > Hi Reinette, > > On Thu, May 2, 2024 at 4:21 PM Reinette Chatre > <reinette.chatre@intel.com> wrote: >> >> Hi Peter and Babu, >> >> On 5/2/2024 1:14 PM, Moger, Babu wrote: >>> Are you suggesting to enable ABMC by default when available? >> >> I do think ABMC should be enabled by default when available and it looks >> to be what this series aims to do [1]. The way I reason about this is >> that legacy user space gets more reliable monitoring behavior without >> needing to change behavior. > > I don't like that for a monitor assignment-aware user, following the > creation of new monitoring groups, there will be less monitors > available for assignment. If the user wants precise control over where > monitors are allocated, they would need to manually unassign the > automatically-assigned monitor after creating new groups. > > It's an annoyance, but I'm not sure if it would break any realistic > usage model. Maybe if the monitoring agent operates independently of Yes. Its annoyance. But if you think about it, normal users don't create too many groups. They wont have to worry about assign/unassign headache if we enable monitor assignment automatically. Also there is pqos tool which uses this interface. It does not have to know about assign/unassign stuff. > whoever creates monitoring groups it could result in brief periods > where less monitors than expected are available because whoever just > created a new monitoring group hasn't given the automatically-assigned > monitors back yet. > >> >> I thought there was discussion about communicating to user space >> when an attempt is made to read data from an event that does not >> have a counter assigned. Something like below but I did not notice this >> in this series. >> >> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes >> Unassigned >> >>> >>> Then provide the mount option switch back to legacy mode? >>> I am fine with that if we all agree on that. >> >> Why is a mount option needed? I think we should avoid requiring a remount >> unless required and I would like to understand why it is required here. >> >> Peter: could you please elaborate what you mean with it makes it more >> difficult for the FS code to generically manage monitor assignment? >> >> Why would user space be required to recreate all control and monitor >> groups if wanting to change how memory bandwidth monitoring is done? > > I was looking at this more from the perspective of whether it's > necessary to support the live transition of the groups' configuration > back and forth between programming models. I find it very unlikely > for the userspace controller software to change its mind about the > programming model for monitoring in a running system, so I thought > this would be in the same category as choosing at mount time whether > or not to use CDP or the MBA software controller. Good point about the mount option is, we don't create extra files for monitor assignment in /sys/fs/resctrl when we mount with legacy option. > > Also, in the software implementation of monitor assignment for older > AMD processors, which is based on allocating a subset of RMIDs, I'm > concerned that the context switch handler would want to read the > monitors associated with the incoming thread's current group to > determine whether it should use one of the tracked RMIDs. I believe it > would be cleaner if the lifetime of the generic monitor-tracking > structures would last until the static branches gating > __resctrl_sched_in() could be disabled. > >> >> From this implementation it has been difficult to understand the impact >> of switching between ABMC and legacy. > > I'll see if there's a good way to share my software monitor assignment > prototype so it's clearer how the user interface would interact with > diverse implementations. Unfortunately, it's difficult to see the > required abstraction boundaries without the fs/resctrl refactoring > changes[1] applied. It would also require my changes[2] for reading a > thread's RMID from the FS structures to prevent monitor assignments > from forcing an update of all task_structs in the system. > > -Peter > > [1] https://lore.kernel.org/lkml/20240426150537.8094-1-Dave.Martin@arm.com/ > [2] https://lore.kernel.org/lkml/20240325172707.73966-1-peternewman@google.com/ > -- - Babu Moger
Hi Peter/Reinette, On 5/3/2024 3:44 PM, Moger, Babu wrote: > Hi Peter, > > On 5/2/2024 7:57 PM, Peter Newman wrote: >> Hi Reinette, >> >> On Thu, May 2, 2024 at 4:21 PM Reinette Chatre >> <reinette.chatre@intel.com> wrote: >>> >>> Hi Peter and Babu, >>> >>> On 5/2/2024 1:14 PM, Moger, Babu wrote: >>>> Are you suggesting to enable ABMC by default when available? >>> >>> I do think ABMC should be enabled by default when available and it looks >>> to be what this series aims to do [1]. The way I reason about this is >>> that legacy user space gets more reliable monitoring behavior without >>> needing to change behavior. >> >> I don't like that for a monitor assignment-aware user, following the >> creation of new monitoring groups, there will be less monitors >> available for assignment. If the user wants precise control over where >> monitors are allocated, they would need to manually unassign the >> automatically-assigned monitor after creating new groups. >> >> It's an annoyance, but I'm not sure if it would break any realistic >> usage model. Maybe if the monitoring agent operates independently of > > Yes. Its annoyance. > > But if you think about it, normal users don't create too many groups. > They wont have to worry about assign/unassign headache if we enable > monitor assignment automatically. Also there is pqos tool which uses > this interface. It does not have to know about assign/unassign stuff. > > >> whoever creates monitoring groups it could result in brief periods >> where less monitors than expected are available because whoever just >> created a new monitoring group hasn't given the automatically-assigned >> monitors back yet. >> >>> >>> I thought there was discussion about communicating to user space >>> when an attempt is made to read data from an event that does not >>> have a counter assigned. Something like below but I did not notice this >>> in this series. >>> >>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes >>> Unassigned >>> >>>> >>>> Then provide the mount option switch back to legacy mode? >>>> I am fine with that if we all agree on that. >>> >>> Why is a mount option needed? I think we should avoid requiring a >>> remount >>> unless required and I would like to understand why it is required here. >>> >>> Peter: could you please elaborate what you mean with it makes it more >>> difficult for the FS code to generically manage monitor assignment? >>> >>> Why would user space be required to recreate all control and monitor >>> groups if wanting to change how memory bandwidth monitoring is done? >> >> I was looking at this more from the perspective of whether it's >> necessary to support the live transition of the groups' configuration >> back and forth between programming models. I find it very unlikely >> for the userspace controller software to change its mind about the >> programming model for monitoring in a running system, so I thought >> this would be in the same category as choosing at mount time whether >> or not to use CDP or the MBA software controller. > > Good point about the mount option is, we don't create extra files for > monitor assignment in /sys/fs/resctrl when we mount with legacy option. I think we still have not decided about the "mount" option for switching to legacy monitoring. Mount option seems safe at this point. We don't have to deal with extra files in resctrl filesystem with dynamic switching. > >> >> Also, in the software implementation of monitor assignment for older >> AMD processors, which is based on allocating a subset of RMIDs, I'm >> concerned that the context switch handler would want to read the >> monitors associated with the incoming thread's current group to >> determine whether it should use one of the tracked RMIDs. I believe it >> would be cleaner if the lifetime of the generic monitor-tracking >> structures would last until the static branches gating >> __resctrl_sched_in() could be disabled. >> >>> >>> From this implementation it has been difficult to understand the impact >>> of switching between ABMC and legacy. >> >> I'll see if there's a good way to share my software monitor assignment >> prototype so it's clearer how the user interface would interact with >> diverse implementations. Unfortunately, it's difficult to see the >> required abstraction boundaries without the fs/resctrl refactoring >> changes[1] applied. It would also require my changes[2] for reading a >> thread's RMID from the FS structures to prevent monitor assignments >> from forcing an update of all task_structs in the system. >> >> -Peter >> >> [1] >> https://lore.kernel.org/lkml/20240426150537.8094-1-Dave.Martin@arm.com/ >> [2] >> https://lore.kernel.org/lkml/20240325172707.73966-1-peternewman@google.com/ >> >> > -- - Babu Moger
On 5/9/2024 5:57 PM, Moger, Babu wrote: > On 5/3/2024 3:44 PM, Moger, Babu wrote: >> On 5/2/2024 7:57 PM, Peter Newman wrote: >>> On Thu, May 2, 2024 at 4:21 PM Reinette Chatre >>> <reinette.chatre@intel.com> wrote: >>>> On 5/2/2024 1:14 PM, Moger, Babu wrote: >>>>> Are you suggesting to enable ABMC by default when available? >>>> >>>> I do think ABMC should be enabled by default when available and it looks >>>> to be what this series aims to do [1]. The way I reason about this is >>>> that legacy user space gets more reliable monitoring behavior without >>>> needing to change behavior. >>> >>> I don't like that for a monitor assignment-aware user, following the >>> creation of new monitoring groups, there will be less monitors >>> available for assignment. If the user wants precise control over where >>> monitors are allocated, they would need to manually unassign the >>> automatically-assigned monitor after creating new groups. >>> >>> It's an annoyance, but I'm not sure if it would break any realistic >>> usage model. Maybe if the monitoring agent operates independently of >> >> Yes. Its annoyance. >> >> But if you think about it, normal users don't create too many groups. >> They wont have to worry about assign/unassign headache if we enable monitor assignment automatically. Also there is pqos tool which uses this interface. It does not have to know about assign/unassign stuff. >> >> >>> whoever creates monitoring groups it could result in brief periods >>> where less monitors than expected are available because whoever just >>> created a new monitoring group hasn't given the automatically-assigned >>> monitors back yet. >>> >>>> >>>> I thought there was discussion about communicating to user space >>>> when an attempt is made to read data from an event that does not >>>> have a counter assigned. Something like below but I did not notice this >>>> in this series. >>>> >>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes >>>> Unassigned >>>> >>>>> >>>>> Then provide the mount option switch back to legacy mode? >>>>> I am fine with that if we all agree on that. >>>> >>>> Why is a mount option needed? I think we should avoid requiring a remount >>>> unless required and I would like to understand why it is required here. >>>> >>>> Peter: could you please elaborate what you mean with it makes it more >>>> difficult for the FS code to generically manage monitor assignment? >>>> >>>> Why would user space be required to recreate all control and monitor >>>> groups if wanting to change how memory bandwidth monitoring is done? >>> >>> I was looking at this more from the perspective of whether it's >>> necessary to support the live transition of the groups' configuration >>> back and forth between programming models. I find it very unlikely >>> for the userspace controller software to change its mind about the >>> programming model for monitoring in a running system, so I thought >>> this would be in the same category as choosing at mount time whether >>> or not to use CDP or the MBA software controller. >> >> Good point about the mount option is, we don't create extra files for monitor assignment in /sys/fs/resctrl when we mount with legacy option. > > I think we still have not decided about the "mount" option for > switching to legacy monitoring. Mount option seems safe at this > point. I have not heard back after sending [1] so I do still believe that users may want a way to not have soft-RMID running all the time without impacting monitor and control groups. > We don't have to deal with extra files in resctrl filesystem > with dynamic switching. Reinette [1] https://lore.kernel.org/lkml/ea56c630-80f4-4564-beb3-2b61e810a558@intel.com/
Hi Babu, On Fri, May 3, 2024 at 1:44 PM Moger, Babu <bmoger@amd.com> wrote: > > Hi Peter, > > On 5/2/2024 7:57 PM, Peter Newman wrote: > > Hi Reinette, > > > > On Thu, May 2, 2024 at 4:21 PM Reinette Chatre > >> I do think ABMC should be enabled by default when available and it looks > >> to be what this series aims to do [1]. The way I reason about this is > >> that legacy user space gets more reliable monitoring behavior without > >> needing to change behavior. > > > > I don't like that for a monitor assignment-aware user, following the > > creation of new monitoring groups, there will be less monitors > > available for assignment. If the user wants precise control over where > > monitors are allocated, they would need to manually unassign the > > automatically-assigned monitor after creating new groups. > > > > It's an annoyance, but I'm not sure if it would break any realistic > > usage model. Maybe if the monitoring agent operates independently of > > Yes. Its annoyance. > > But if you think about it, normal users don't create too many groups. > They wont have to worry about assign/unassign headache if we enable > monitor assignment automatically. Also there is pqos tool which uses > this interface. It does not have to know about assign/unassign stuff. Thinking about this again, I don't think it's much of a concern because the automatic assignment on mongroup creation behavior can be trivially disabled using a boolean flag. -Peter
Hi Peter, On 5/3/2024 2:00 PM, Peter Newman wrote: > Hi Babu, > > On Fri, May 3, 2024 at 1:44 PM Moger, Babu <bmoger@amd.com> wrote: >> >> Hi Peter, >> >> On 5/2/2024 7:57 PM, Peter Newman wrote: >>> Hi Reinette, >>> >>> On Thu, May 2, 2024 at 4:21 PM Reinette Chatre >>>> I do think ABMC should be enabled by default when available and it looks >>>> to be what this series aims to do [1]. The way I reason about this is >>>> that legacy user space gets more reliable monitoring behavior without >>>> needing to change behavior. >>> >>> I don't like that for a monitor assignment-aware user, following the >>> creation of new monitoring groups, there will be less monitors >>> available for assignment. If the user wants precise control over where >>> monitors are allocated, they would need to manually unassign the >>> automatically-assigned monitor after creating new groups. >>> >>> It's an annoyance, but I'm not sure if it would break any realistic >>> usage model. Maybe if the monitoring agent operates independently of >> >> Yes. Its annoyance. >> >> But if you think about it, normal users don't create too many groups. >> They wont have to worry about assign/unassign headache if we enable >> monitor assignment automatically. Also there is pqos tool which uses >> this interface. It does not have to know about assign/unassign stuff. > > Thinking about this again, I don't think it's much of a concern > because the automatic assignment on mongroup creation behavior can be > trivially disabled using a boolean flag. This could be a config option. Reinette
Hi Reinette, Babu, On Fri, May 3, 2024 at 2:15 PM Reinette Chatre <reinette.chatre@intel.com> wrote: > > Hi Peter, > > On 5/3/2024 2:00 PM, Peter Newman wrote: > > Hi Babu, > > > > On Fri, May 3, 2024 at 1:44 PM Moger, Babu <bmoger@amd.com> wrote: > >> > >> Hi Peter, > >> > >> On 5/2/2024 7:57 PM, Peter Newman wrote: > >>> Hi Reinette, > >>> > >>> On Thu, May 2, 2024 at 4:21 PM Reinette Chatre > >>>> I do think ABMC should be enabled by default when available and it looks > >>>> to be what this series aims to do [1]. The way I reason about this is > >>>> that legacy user space gets more reliable monitoring behavior without > >>>> needing to change behavior. > >>> > >>> I don't like that for a monitor assignment-aware user, following the > >>> creation of new monitoring groups, there will be less monitors > >>> available for assignment. If the user wants precise control over where > >>> monitors are allocated, they would need to manually unassign the > >>> automatically-assigned monitor after creating new groups. > >>> > >>> It's an annoyance, but I'm not sure if it would break any realistic > >>> usage model. Maybe if the monitoring agent operates independently of > >> > >> Yes. Its annoyance. > >> > >> But if you think about it, normal users don't create too many groups. > >> They wont have to worry about assign/unassign headache if we enable > >> monitor assignment automatically. Also there is pqos tool which uses > >> this interface. It does not have to know about assign/unassign stuff. > > > > Thinking about this again, I don't think it's much of a concern > > because the automatic assignment on mongroup creation behavior can be > > trivially disabled using a boolean flag. > > This could be a config option. I'd like to work out the details of this option. info/L3_MON/mbm_assign_on_mkdir? boolean (parsed with kstrtobool()), defaulting to true? -Peter
Hi Peter, On 5/17/24 16:51, Peter Newman wrote: > Hi Reinette, Babu, > > On Fri, May 3, 2024 at 2:15 PM Reinette Chatre > <reinette.chatre@intel.com> wrote: >> >> Hi Peter, >> >> On 5/3/2024 2:00 PM, Peter Newman wrote: >>> Hi Babu, >>> >>> On Fri, May 3, 2024 at 1:44 PM Moger, Babu <bmoger@amd.com> wrote: >>>> >>>> Hi Peter, >>>> >>>> On 5/2/2024 7:57 PM, Peter Newman wrote: >>>>> Hi Reinette, >>>>> >>>>> On Thu, May 2, 2024 at 4:21 PM Reinette Chatre >>>>>> I do think ABMC should be enabled by default when available and it looks >>>>>> to be what this series aims to do [1]. The way I reason about this is >>>>>> that legacy user space gets more reliable monitoring behavior without >>>>>> needing to change behavior. >>>>> >>>>> I don't like that for a monitor assignment-aware user, following the >>>>> creation of new monitoring groups, there will be less monitors >>>>> available for assignment. If the user wants precise control over where >>>>> monitors are allocated, they would need to manually unassign the >>>>> automatically-assigned monitor after creating new groups. >>>>> >>>>> It's an annoyance, but I'm not sure if it would break any realistic >>>>> usage model. Maybe if the monitoring agent operates independently of >>>> >>>> Yes. Its annoyance. >>>> >>>> But if you think about it, normal users don't create too many groups. >>>> They wont have to worry about assign/unassign headache if we enable >>>> monitor assignment automatically. Also there is pqos tool which uses >>>> this interface. It does not have to know about assign/unassign stuff. >>> >>> Thinking about this again, I don't think it's much of a concern >>> because the automatic assignment on mongroup creation behavior can be >>> trivially disabled using a boolean flag. >> >> This could be a config option. > > I'd like to work out the details of this option. > > info/L3_MON/mbm_assign_on_mkdir? > > boolean (parsed with kstrtobool()), defaulting to true? I am thinking is not a big concern. We only have limited (32) counters. Automatic monitor assignment works only for first 16 groups(2 counters for each group). When the counters are exhausted auto assignment does not work. In your case(with more than 16 groups) the auto assignment does not work. I feel having a config option is really not necessary. -- Thanks Babu Moger
Hi Babu, On Mon, May 20, 2024 at 7:25 AM Moger, Babu <babu.moger@amd.com> wrote: > > Hi Peter, > > On 5/17/24 16:51, Peter Newman wrote: > > Hi Reinette, Babu, > > > > On Fri, May 3, 2024 at 2:15 PM Reinette Chatre > > <reinette.chatre@intel.com> wrote: > >> > >> Hi Peter, > >> > >> On 5/3/2024 2:00 PM, Peter Newman wrote: > >>> Hi Babu, > >>> > >>> On Fri, May 3, 2024 at 1:44 PM Moger, Babu <bmoger@amd.com> wrote: > >>>> > >>>> Hi Peter, > >>>> > >>>> On 5/2/2024 7:57 PM, Peter Newman wrote: > >>>>> Hi Reinette, > >>>>> > >>>>> On Thu, May 2, 2024 at 4:21 PM Reinette Chatre > >>>>>> I do think ABMC should be enabled by default when available and it looks > >>>>>> to be what this series aims to do [1]. The way I reason about this is > >>>>>> that legacy user space gets more reliable monitoring behavior without > >>>>>> needing to change behavior. > >>>>> > >>>>> I don't like that for a monitor assignment-aware user, following the > >>>>> creation of new monitoring groups, there will be less monitors > >>>>> available for assignment. If the user wants precise control over where > >>>>> monitors are allocated, they would need to manually unassign the > >>>>> automatically-assigned monitor after creating new groups. > >>>>> > >>>>> It's an annoyance, but I'm not sure if it would break any realistic > >>>>> usage model. Maybe if the monitoring agent operates independently of > >>>> > >>>> Yes. Its annoyance. > >>>> > >>>> But if you think about it, normal users don't create too many groups. > >>>> They wont have to worry about assign/unassign headache if we enable > >>>> monitor assignment automatically. Also there is pqos tool which uses > >>>> this interface. It does not have to know about assign/unassign stuff. > >>> > >>> Thinking about this again, I don't think it's much of a concern > >>> because the automatic assignment on mongroup creation behavior can be > >>> trivially disabled using a boolean flag. > >> > >> This could be a config option. > > > > I'd like to work out the details of this option. > > > > info/L3_MON/mbm_assign_on_mkdir? > > > > boolean (parsed with kstrtobool()), defaulting to true? > > I am thinking is not a big concern. We only have limited (32) counters. > Automatic monitor assignment works only for first 16 groups(2 counters for > each group). When the counters are exhausted auto assignment does not > work. In your case(with more than 16 groups) the auto assignment does not > work. I feel having a config option is really not necessary. I'm not sure I follow the argument you're trying to make because it doesn't sound like an argument against adding a config option. What exactly do you mean by "work" vs "not work"? Also it doesn't address my original concern about needing to manually (and non-atomically) undo the auto assignment in order to account for where the monitors are assigned or ensure that creating a new monitoring group will succeed. -Peter
Hi Peter, On 5/20/24 11:00, Peter Newman wrote: > Hi Babu, > > On Mon, May 20, 2024 at 7:25 AM Moger, Babu <babu.moger@amd.com> wrote: >> >> Hi Peter, >> >> On 5/17/24 16:51, Peter Newman wrote: >>> Hi Reinette, Babu, >>> >>> On Fri, May 3, 2024 at 2:15 PM Reinette Chatre >>> <reinette.chatre@intel.com> wrote: >>>> >>>> Hi Peter, >>>> >>>> On 5/3/2024 2:00 PM, Peter Newman wrote: >>>>> Hi Babu, >>>>> >>>>> On Fri, May 3, 2024 at 1:44 PM Moger, Babu <bmoger@amd.com> wrote: >>>>>> >>>>>> Hi Peter, >>>>>> >>>>>> On 5/2/2024 7:57 PM, Peter Newman wrote: >>>>>>> Hi Reinette, >>>>>>> >>>>>>> On Thu, May 2, 2024 at 4:21 PM Reinette Chatre >>>>>>>> I do think ABMC should be enabled by default when available and it looks >>>>>>>> to be what this series aims to do [1]. The way I reason about this is >>>>>>>> that legacy user space gets more reliable monitoring behavior without >>>>>>>> needing to change behavior. >>>>>>> >>>>>>> I don't like that for a monitor assignment-aware user, following the >>>>>>> creation of new monitoring groups, there will be less monitors >>>>>>> available for assignment. If the user wants precise control over where >>>>>>> monitors are allocated, they would need to manually unassign the >>>>>>> automatically-assigned monitor after creating new groups. >>>>>>> >>>>>>> It's an annoyance, but I'm not sure if it would break any realistic >>>>>>> usage model. Maybe if the monitoring agent operates independently of >>>>>> >>>>>> Yes. Its annoyance. >>>>>> >>>>>> But if you think about it, normal users don't create too many groups. >>>>>> They wont have to worry about assign/unassign headache if we enable >>>>>> monitor assignment automatically. Also there is pqos tool which uses >>>>>> this interface. It does not have to know about assign/unassign stuff. >>>>> >>>>> Thinking about this again, I don't think it's much of a concern >>>>> because the automatic assignment on mongroup creation behavior can be >>>>> trivially disabled using a boolean flag. >>>> >>>> This could be a config option. >>> >>> I'd like to work out the details of this option. >>> >>> info/L3_MON/mbm_assign_on_mkdir? >>> >>> boolean (parsed with kstrtobool()), defaulting to true? >> >> I am thinking is not a big concern. We only have limited (32) counters. >> Automatic monitor assignment works only for first 16 groups(2 counters for >> each group). When the counters are exhausted auto assignment does not >> work. In your case(with more than 16 groups) the auto assignment does not >> work. I feel having a config option is really not necessary. > > I'm not sure I follow the argument you're trying to make because it > doesn't sound like an argument against adding a config option. What > exactly do you mean by "work" vs "not work"? > > Also it doesn't address my original concern about needing to manually > (and non-atomically) undo the auto assignment in order to account for > where the monitors are assigned or ensure that creating a new > monitoring group will succeed. > Sorry for the confusion. Auto monitor assignment works only for small number of groups(15 or less). After that point user can create more groups. But auto assignment will not work because the hw counters are all exhausted. You need to manually unassign a counter from another group and use that counter for new assignment. I assume that you are dealing with more than 16 groups. In that case, you have to manually assign/unassign anyways. Having a config option "info/L3_MON/mbm_assign_on_mkdir" will not be much helpful for you. -- Thanks Babu Moger
Hi Babu, On Thu, Mar 28, 2024 at 08:06:33PM -0500, Babu Moger wrote: > > This series adds the support for Assignable Bandwidth Monitoring Counters > (ABMC). It is also called QoS RMID Pinning feature > > The feature details are documented in the APM listed below [1]. > [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming > Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth > Monitoring (ABMC). The documentation is available at > Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 > > The patches are based on top of commit > cd80c2c94699913f9334414189487ff3f93cf0b5 (tip/master) A few very general comments from me here, since I'm not vary familiar with this topic... > # Introduction > > AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring > feature only guarantees that RMIDs currently assigned to a processor will > be tracked by hardware. The counters of any other RMIDs which are no longer > being tracked will be reset to zero. The MBM event counters return > "Unavailable" for the RMIDs that are not active. > > Users can create 256 or more monitor groups. But there can be only limited > number of groups that can give guaranteed monitoring numbers. With ever > changing configurations there is no way to definitely know which of these > groups will be active for certain point of time. Users do not have the > option to monitor a group or set of groups for certain period of time > without worrying about RMID being reset in between. > > The ABMC feature provides an option to the user to assign an RMID to the > hardware counter and monitor the bandwidth for a longer duration. > The assigned RMID will be active until the user unassigns it manually. > There is no need to worry about counters being reset during this period. > Additionally, the user can specify a bitmask identifying the specific > bandwidth types from the given source to track with the counter. > > Without ABMC enabled, monitoring will work in current mode without > assignment option. > > # Linux Implementation > > Linux resctrl subsystem provides the interface to count maximum of two > memory bandwidth events per group, from a combination of available total > and local events. Keeping the current interface, users can assign a maximum > of 2 ABMC counters per group. User will also have the option to assign only > one counter to the group. If the system runs out of assignable ABMC > counters, kernel will display an error. Users need to unassign an already > assigned counter to make space for new assignments. > > > # Examples > > a. Check if ABMC support is available > #mount -t resctrl resctrl /sys/fs/resctrl/ > > #cat /sys/fs/resctrl/info/L3_MON/mbm_assign > [abmc] > legacy_mbm > > Linux kernel detected ABMC feature and it is enabled. > > b. Check how many ABMC counters are available. > > #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_cntrs > 32 > > c. Create few resctrl groups. > > # mkdir /sys/fs/resctrl/mon_groups/default_mon1 > # mkdir /sys/fs/resctrl/non_defult_group > # mkdir /sys/fs/resctrl/non_defult_group/mon_groups/non_default_mon1 > > d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control > to list and modify the group's assignment states. > > The list follows the following format: This section looks possibly inconsistent with (e.) Is (d.) the userspace read format, with (e.) being the format written by userspace? > * Default CTRL_MON group: > "//<domain_id>=<assignment_flags>" > > * Non-default CTRL_MON group: > "<CTRL_MON group>//<domain_id>=<assignment_flags>" > > * Child MON group of default CTRL_MON group: > "/<MON group>/<domain_id>=<assignment_flags>" > > * Child MON group of non-default CTRL_MON group: > "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>" > > Assignment flags can be one of the following: > > t MBM total event is assigned With my MPAM hat on this looks a bit weird, although I suppose it follows on from the way "mbm_total_bytes" and "mbm_local_bytes" are already exposed in resctrlfs. From an abstract point of view, "total" and "local" are just event selection criteria, additional to those in mbm_cfg_mask. The different way they are treated in the hardware feels like an x86 implementation detail. For MPAM we don't currently distinguish local from non-local traffic, so I guess this just reduces to a simple on-off (i.e., "t" or nothing), which I guess is tolerable. This might want more thought if there is an expectation that more categories will be added here, though (?) > l MBM local event is assigned > tl Both total and local MBM events are assigned > _ None of the MBM events are assigned This use of '_' seems unusual. Can we not just have the empty string for "nothing assigned"? Since every assignment is terminated by ';' or end-of-line, I don't think that there would be any parsing ambiguity (?) > > Examples: > > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > > There are four groups and all the groups have local and total event assigned. > > "//" - This is a default CONTROL MON group > > "non_defult_group//" - This is non default CONTROL MON group > > "/default_mon1/" - This is Child MON group of the defult group > > "non_defult_group/non_default_mon1/" - This is child MON group of the non default group > > =tl means both total and local events are assigned. > > e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control. > > The write format is similar to the above list format with addition of > op-code for the assignment operation. With by resctrl newbie hat on: It feels a bit complex (for the kernel) to have userspace needing to write a script into a magic file that we need to parse, specifying updates to a bunch of controls already visible as objects in resctrlfs in their own right. What's the expected use case here? If userspace really does need to switch lots of events simultaneously then I guess the overhead of enumerating and poking lots of individual files might be unacceptable though, and we would still need some global interfaces for operations such as "unassign everything"... OTOH, the proposed approach is not so different from the way the schemata files already work. > > * Default CTRL_MON group: > "//<domain_id><op-code><assignment_flags>" > > * Non-default CTRL_MON group: > "<CTRL_MON group>//<domain_id><op-code><assignment_flags>" > > * Child MON group of default CTRL_MON group: > "/<MON group>/<domain_id><op-code><assignment_flags>" > > * Child MON group of non-default CTRL_MON group: > "<CTRL_MON group>/<MON group>/<domain_id><op-code><assignment_flags>" > > Op-code can be one of the following: > > = Update the assignment to match the flags > + Assign a new state > - Unassign a new state > _ Unassign all the states If we adopt "empty string" to mean "no events", then <foo>/<bar>/<domain>= would unassign all events, so '_' would not be needed as a separate syntax. [...] Cheers ---Dave
Hi Dave, On Mon, Apr 22, 2024 at 9:33 AM Dave Martin <Dave.Martin@arm.com> wrote: > > Hi Babu, > > On Thu, Mar 28, 2024 at 08:06:33PM -0500, Babu Moger wrote: > > Assignment flags can be one of the following: > > > > t MBM total event is assigned > > With my MPAM hat on this looks a bit weird, although I suppose it > follows on from the way "mbm_total_bytes" and "mbm_local_bytes" are > already exposed in resctrlfs. > > From an abstract point of view, "total" and "local" are just event > selection criteria, additional to those in mbm_cfg_mask. The different > way they are treated in the hardware feels like an x86 implementation > detail. > > For MPAM we don't currently distinguish local from non-local traffic, so > I guess this just reduces to a simple on-off (i.e., "t" or nothing), > which I guess is tolerable. > > This might want more thought if there is an expectation that more > categories will be added here, though (?) There should be a path forward whenever we start supporting user-configured counter classes. I assume the letters a-z will be enough to cover all the counter classes which could be used at once. > > > l MBM local event is assigned > > tl Both total and local MBM events are assigned > > _ None of the MBM events are assigned > > This use of '_' seems unusual. Can we not just have the empty string > for "nothing assigned"? > > Since every assignment is terminated by ';' or end-of-line, I don't > think that there would be any parsing ambiguity (?) > > > > > Examples: > > > > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > > non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > > non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > > //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > > /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > > > > There are four groups and all the groups have local and total event assigned. > > > > "//" - This is a default CONTROL MON group > > > > "non_defult_group//" - This is non default CONTROL MON group > > > > "/default_mon1/" - This is Child MON group of the defult group > > > > "non_defult_group/non_default_mon1/" - This is child MON group of the non default group > > > > =tl means both total and local events are assigned. > > > > e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control. > > > > The write format is similar to the above list format with addition of > > op-code for the assignment operation. > > With by resctrl newbie hat on: > > It feels a bit complex (for the kernel) to have userspace needing to > write a script into a magic file that we need to parse, specifying > updates to a bunch of controls already visible as objects in resctrlfs > in their own right. > > What's the expected use case here? I went over the use case of iterating a small number of monitors over a much larger number of monitoring groups here: https://lore.kernel.org/lkml/CALPaoCi=PCWr6U5zYtFPmyaFHU_iqZtZL-LaHC2mYxbETXk3ig@mail.gmail.com/ > > If userspace really does need to switch lots of events simultaneously > then I guess the overhead of enumerating and poking lots of individual > files might be unacceptable though, and we would still need some global > interfaces for operations such as "unassign everything"... My main goal is for the number of parallel IPI batches to all the domains (or write syscalls) to be O(num_rmids / num_monitors) rather than O(num_rmids * num_monitors) as I need to know how frequently we can afford to sample the current memory bandwidth of the maximum number of monitoring groups supported. Thanks! -Peter
Hi Peter, On Mon, Apr 22, 2024 at 11:23:50AM -0700, Peter Newman wrote: > Hi Dave, > > On Mon, Apr 22, 2024 at 9:33 AM Dave Martin <Dave.Martin@arm.com> wrote: > > > > Hi Babu, > > > > On Thu, Mar 28, 2024 at 08:06:33PM -0500, Babu Moger wrote: > > > Assignment flags can be one of the following: > > > > > > t MBM total event is assigned > > > > With my MPAM hat on this looks a bit weird, although I suppose it > > follows on from the way "mbm_total_bytes" and "mbm_local_bytes" are > > already exposed in resctrlfs. > > > > From an abstract point of view, "total" and "local" are just event > > selection criteria, additional to those in mbm_cfg_mask. The different > > way they are treated in the hardware feels like an x86 implementation > > detail. > > > > For MPAM we don't currently distinguish local from non-local traffic, so > > I guess this just reduces to a simple on-off (i.e., "t" or nothing), > > which I guess is tolerable. > > > > This might want more thought if there is an expectation that more > > categories will be added here, though (?) > > There should be a path forward whenever we start supporting > user-configured counter classes. I assume the letters a-z will be > enough to cover all the counter classes which could be used at once. Ack, though I'd appreciate a response on the point about "_" below in case people missed it. > > > > > > l MBM local event is assigned > > > tl Both total and local MBM events are assigned > > > _ None of the MBM events are assigned > > > > This use of '_' seems unusual. Can we not just have the empty string > > for "nothing assigned"? > > > > Since every assignment is terminated by ';' or end-of-line, I don't > > think that there would be any parsing ambiguity (?) > > > > > > > > Examples: > > > > > > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > > > non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > > > non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > > > //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > > > /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > > > > > > There are four groups and all the groups have local and total event assigned. > > > > > > "//" - This is a default CONTROL MON group > > > > > > "non_defult_group//" - This is non default CONTROL MON group > > > > > > "/default_mon1/" - This is Child MON group of the defult group > > > > > > "non_defult_group/non_default_mon1/" - This is child MON group of the non default group > > > > > > =tl means both total and local events are assigned. > > > > > > e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control. > > > > > > The write format is similar to the above list format with addition of > > > op-code for the assignment operation. > > > > With by resctrl newbie hat on: > > > > It feels a bit complex (for the kernel) to have userspace needing to > > write a script into a magic file that we need to parse, specifying > > updates to a bunch of controls already visible as objects in resctrlfs > > in their own right. > > > > What's the expected use case here? > > I went over the use case of iterating a small number of monitors over > a much larger number of monitoring groups here: > > https://lore.kernel.org/lkml/CALPaoCi=PCWr6U5zYtFPmyaFHU_iqZtZL-LaHC2mYxbETXk3ig@mail.gmail.com/ > > > > > If userspace really does need to switch lots of events simultaneously > > then I guess the overhead of enumerating and poking lots of individual > > files might be unacceptable though, and we would still need some global > > interfaces for operations such as "unassign everything"... > > My main goal is for the number of parallel IPI batches to all the > domains (or write syscalls) to be O(num_rmids / num_monitors) rather > than O(num_rmids * num_monitors) as I need to know how frequently we > can afford to sample the current memory bandwidth of the maximum > number of monitoring groups supported. Fair enough; I wasn't fully aware of the background discussions. Cheers ---Dave
Hi Dave, On 4/23/24 07:38, Dave Martin wrote: > Hi Peter, > > On Mon, Apr 22, 2024 at 11:23:50AM -0700, Peter Newman wrote: >> Hi Dave, >> >> On Mon, Apr 22, 2024 at 9:33 AM Dave Martin <Dave.Martin@arm.com> wrote: >>> >>> Hi Babu, >>> >>> On Thu, Mar 28, 2024 at 08:06:33PM -0500, Babu Moger wrote: >>>> Assignment flags can be one of the following: >>>> >>>> t MBM total event is assigned >>> >>> With my MPAM hat on this looks a bit weird, although I suppose it >>> follows on from the way "mbm_total_bytes" and "mbm_local_bytes" are >>> already exposed in resctrlfs. >>> >>> From an abstract point of view, "total" and "local" are just event >>> selection criteria, additional to those in mbm_cfg_mask. The different >>> way they are treated in the hardware feels like an x86 implementation >>> detail. >>> >>> For MPAM we don't currently distinguish local from non-local traffic, so >>> I guess this just reduces to a simple on-off (i.e., "t" or nothing), >>> which I guess is tolerable. >>> >>> This might want more thought if there is an expectation that more >>> categories will be added here, though (?) >> >> There should be a path forward whenever we start supporting >> user-configured counter classes. I assume the letters a-z will be >> enough to cover all the counter classes which could be used at once. > > Ack, though I'd appreciate a response on the point about "_" below in > case people missed it. It was based on the dynamic debug interface and also Reinette's suggestion as well. https://www.kernel.org/doc/html/v4.10/admin-guide/dynamic-debug-howto.html (Look for "No flags are set"). We tried to use that similar interface. -- Thanks Babu Moger
Hi Babu, On Tue, Apr 23, 2024 at 10:43:25AM -0500, Moger, Babu wrote: > Hi Dave, > > On 4/23/24 07:38, Dave Martin wrote: > > Hi Peter, > > > > On Mon, Apr 22, 2024 at 11:23:50AM -0700, Peter Newman wrote: > >> Hi Dave, > >> > >> On Mon, Apr 22, 2024 at 9:33 AM Dave Martin <Dave.Martin@arm.com> wrote: > >>> > >>> Hi Babu, > >>> > >>> On Thu, Mar 28, 2024 at 08:06:33PM -0500, Babu Moger wrote: > >>>> Assignment flags can be one of the following: > >>>> > >>>> t MBM total event is assigned > >>> > >>> With my MPAM hat on this looks a bit weird, although I suppose it > >>> follows on from the way "mbm_total_bytes" and "mbm_local_bytes" are > >>> already exposed in resctrlfs. > >>> > >>> From an abstract point of view, "total" and "local" are just event > >>> selection criteria, additional to those in mbm_cfg_mask. The different > >>> way they are treated in the hardware feels like an x86 implementation > >>> detail. > >>> > >>> For MPAM we don't currently distinguish local from non-local traffic, so > >>> I guess this just reduces to a simple on-off (i.e., "t" or nothing), > >>> which I guess is tolerable. > >>> > >>> This might want more thought if there is an expectation that more > >>> categories will be added here, though (?) > >> > >> There should be a path forward whenever we start supporting > >> user-configured counter classes. I assume the letters a-z will be > >> enough to cover all the counter classes which could be used at once. > > > > Ack, though I'd appreciate a response on the point about "_" below in > > case people missed it. > > It was based on the dynamic debug interface and also Reinette's suggestion > as well. > https://www.kernel.org/doc/html/v4.10/admin-guide/dynamic-debug-howto.html > (Look for "No flags are set"). > > We tried to use that similar interface. Fair enough; I haven't touched dynamic debug for quite a while and had forgotten about this convention being used there. Apologies for the noise on that! Cheers ---Dave
Hi Babu, On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@amd.com> wrote: > The list follows the following format: > > * Default CTRL_MON group: > "//<domain_id>=<assignment_flags>" > > * Non-default CTRL_MON group: > "<CTRL_MON group>//<domain_id>=<assignment_flags>" > > * Child MON group of default CTRL_MON group: > "/<MON group>/<domain_id>=<assignment_flags>" > > * Child MON group of non-default CTRL_MON group: > "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>" > > Assignment flags can be one of the following: > > t MBM total event is assigned > l MBM local event is assigned > tl Both total and local MBM events are assigned > _ None of the MBM events are assigned > > Examples: > > # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > > There are four groups and all the groups have local and total event assigned. > > "//" - This is a default CONTROL MON group > > "non_defult_group//" - This is non default CONTROL MON group > > "/default_mon1/" - This is Child MON group of the defult group > > "non_defult_group/non_default_mon1/" - This is child MON group of the non default group > > =tl means both total and local events are assigned. I recall there was supposed to be a way to perform the same update on all domains together so that it isn't tedious to not do per-domain customizations. (And also to avoid serializing programming all the domains the same way.) > > .../admin-guide/kernel-parameters.txt | 2 +- > Documentation/arch/x86/resctrl.rst | 144 ++++ > arch/x86/include/asm/cpufeatures.h | 1 + > arch/x86/include/asm/msr-index.h | 2 + > arch/x86/kernel/cpu/cpuid-deps.c | 3 + > arch/x86/kernel/cpu/resctrl/core.c | 25 +- > arch/x86/kernel/cpu/resctrl/internal.h | 56 +- > arch/x86/kernel/cpu/resctrl/monitor.c | 24 +- > arch/x86/kernel/cpu/resctrl/rdtgroup.c | 714 +++++++++++++++++- > arch/x86/kernel/cpu/scattered.c | 1 + > include/linux/resctrl.h | 12 + > 11 files changed, 964 insertions(+), 20 deletions(-) > > -- > 2.34.1 > This should be fine for me to get started with. I'll see if I can work backwards from the patches adding the parsing code to see how I'll work the software implementation in. Thanks! -Peter
Hi Peter, On 4/4/24 14:08, Peter Newman wrote: > Hi Babu, > > On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@amd.com> wrote: >> The list follows the following format: >> >> * Default CTRL_MON group: >> "//<domain_id>=<assignment_flags>" >> >> * Non-default CTRL_MON group: >> "<CTRL_MON group>//<domain_id>=<assignment_flags>" >> >> * Child MON group of default CTRL_MON group: >> "/<MON group>/<domain_id>=<assignment_flags>" >> >> * Child MON group of non-default CTRL_MON group: >> "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>" >> >> Assignment flags can be one of the following: >> >> t MBM total event is assigned >> l MBM local event is assigned >> tl Both total and local MBM events are assigned >> _ None of the MBM events are assigned >> >> Examples: >> >> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control >> non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; >> non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; >> //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; >> /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; >> >> There are four groups and all the groups have local and total event assigned. >> >> "//" - This is a default CONTROL MON group >> >> "non_defult_group//" - This is non default CONTROL MON group >> >> "/default_mon1/" - This is Child MON group of the defult group >> >> "non_defult_group/non_default_mon1/" - This is child MON group of the non default group >> >> =tl means both total and local events are assigned. > > I recall there was supposed to be a way to perform the same update on > all domains together so that it isn't tedious to not do per-domain Yes. Correct. Reinette suggested to have "no domains" means ALL the domains. Example: Initial list: $cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control non_def_ctrl_mon_grep//0=_;1=_;2=_;3=_;4=_;5=_;6=_;7=_; //0=_;1=_;2=_;3=_;4=_;5=_;6=_;7=_; Two groups and no events assigned. To assign total event on all the domains, The command will look like this. $ echo "//=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control Parsing becomes ugly here. I look for domain number after the name. Now I have add some ugly checks there. I also thought about something like this: $ echo "//FFFF=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control FFFF means all the domains. But there could be domain number with FFFF also. So, I dropped the idea. > customizations. (And also to avoid serializing programming all the > domains the same way.) One more thing with respect to domains: This series updates all the domains when assignment is requested. Makes it easy to implement. For example: $ echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control This command will assign total event on all the domains on default group even though user passed only domain 0. I am looking at supporting domain specific assignment right now. If your use case is specific to each domain then I can add that support in in next revision. > > >> >> .../admin-guide/kernel-parameters.txt | 2 +- >> Documentation/arch/x86/resctrl.rst | 144 ++++ >> arch/x86/include/asm/cpufeatures.h | 1 + >> arch/x86/include/asm/msr-index.h | 2 + >> arch/x86/kernel/cpu/cpuid-deps.c | 3 + >> arch/x86/kernel/cpu/resctrl/core.c | 25 +- >> arch/x86/kernel/cpu/resctrl/internal.h | 56 +- >> arch/x86/kernel/cpu/resctrl/monitor.c | 24 +- >> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 714 +++++++++++++++++- >> arch/x86/kernel/cpu/scattered.c | 1 + >> include/linux/resctrl.h | 12 + >> 11 files changed, 964 insertions(+), 20 deletions(-) >> >> -- >> 2.34.1 >> > > This should be fine for me to get started with. I'll see if I can work > backwards from the patches adding the parsing code to see how I'll > work the software implementation in. > > Thanks! > -Peter -- Thanks Babu Moger
Hi Babu, On Thu, Apr 04, 2024 at 03:02:45PM -0500, Moger, Babu wrote: > Hi Peter, > > > On 4/4/24 14:08, Peter Newman wrote: > > Hi Babu, > > > > On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@amd.com> wrote: > >> The list follows the following format: > >> > >> * Default CTRL_MON group: > >> "//<domain_id>=<assignment_flags>" > >> > >> * Non-default CTRL_MON group: > >> "<CTRL_MON group>//<domain_id>=<assignment_flags>" > >> > >> * Child MON group of default CTRL_MON group: > >> "/<MON group>/<domain_id>=<assignment_flags>" > >> > >> * Child MON group of non-default CTRL_MON group: > >> "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>" > >> > >> Assignment flags can be one of the following: > >> > >> t MBM total event is assigned > >> l MBM local event is assigned > >> tl Both total and local MBM events are assigned > >> _ None of the MBM events are assigned > >> > >> Examples: > >> > >> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control > >> non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > >> non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > >> //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > >> /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl; > >> > >> There are four groups and all the groups have local and total event assigned. > >> > >> "//" - This is a default CONTROL MON group > >> > >> "non_defult_group//" - This is non default CONTROL MON group > >> > >> "/default_mon1/" - This is Child MON group of the defult group > >> > >> "non_defult_group/non_default_mon1/" - This is child MON group of the non default group > >> > >> =tl means both total and local events are assigned. > > > > I recall there was supposed to be a way to perform the same update on > > all domains together so that it isn't tedious to not do per-domain > > Yes. Correct. Reinette suggested to have "no domains" means ALL the domains. Would "*" be more intuitive? Whatever is done here to describe the "wildcard node", would it be worth having the node field parse the same way in the "schemata" files? Is there any merit in having range match expressions, e.g. something like 0-3,8-11=foo;4-7,12-*=bar (The latter is obvious feature creep though, so a real use case for this would be needed to justify it. I don't have one right now...) [...] Cheers ---Dave
Hi Dave,
On 4/22/24 11:34, Dave Martin wrote:
> Hi Babu,
>
> On Thu, Apr 04, 2024 at 03:02:45PM -0500, Moger, Babu wrote:
>> Hi Peter,
>>
>>
>> On 4/4/24 14:08, Peter Newman wrote:
>>> Hi Babu,
>>>
>>> On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@amd.com> wrote:
>>>> The list follows the following format:
>>>>
>>>> * Default CTRL_MON group:
>>>> "//<domain_id>=<assignment_flags>"
>>>>
>>>> * Non-default CTRL_MON group:
>>>> "<CTRL_MON group>//<domain_id>=<assignment_flags>"
>>>>
>>>> * Child MON group of default CTRL_MON group:
>>>> "/<MON group>/<domain_id>=<assignment_flags>"
>>>>
>>>> * Child MON group of non-default CTRL_MON group:
>>>> "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
>>>>
>>>> Assignment flags can be one of the following:
>>>>
>>>> t MBM total event is assigned
>>>> l MBM local event is assigned
>>>> tl Both total and local MBM events are assigned
>>>> _ None of the MBM events are assigned
>>>>
>>>> Examples:
>>>>
>>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>> non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>> non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>> //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>> /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>>
>>>> There are four groups and all the groups have local and total event assigned.
>>>>
>>>> "//" - This is a default CONTROL MON group
>>>>
>>>> "non_defult_group//" - This is non default CONTROL MON group
>>>>
>>>> "/default_mon1/" - This is Child MON group of the defult group
>>>>
>>>> "non_defult_group/non_default_mon1/" - This is child MON group of the non default group
>>>>
>>>> =tl means both total and local events are assigned.
>>>
>>> I recall there was supposed to be a way to perform the same update on
>>> all domains together so that it isn't tedious to not do per-domain
>>
>> Yes. Correct. Reinette suggested to have "no domains" means ALL the domains.
>
> Would "*" be more intuitive?
We could. But I don't see the need for wildcard ("*") or ranges and
complexity that comes with that.
Even in schemata processing we don't use the wildcard or ranges and also
there is no mention of that in documentation.
https://www.kernel.org/doc/Documentation/x86/resctrl.rst
Domains(or nodes) are processed one by one. Some examples.
# cat schemata
SMBA:0=2048;1=2048;2=2048;3=2048
MB:0=2048;1=2048;2=2048;3=2048
L3:0=ffff;1=ffff;2=ffff;3=ffff
# echo "SMBA:1=64" > schemata
# cat schemata
SMBA:0=2048;1= 64;2=2048;3=2048
MB:0=2048;1=2048;2=2048;3=2048
L3:0=ffff;1=ffff;2=ffff;3=ffff
>
> Whatever is done here to describe the "wildcard node", would it be worth
> having the node field parse the same way in the "schemata" files?
>
> Is there any merit in having range match expressions, e.g. something like
>
> 0-3,8-11=foo;4-7,12-*=bar
>
> (The latter is obvious feature creep though, so a real use case for this
> would be needed to justify it. I don't have one right now...)
>
> [...]
>
> Cheers
> ---Dave
--
Thanks
Babu Moger
On Mon, Apr 22, 2024 at 03:44:26PM -0500, Moger, Babu wrote:
> Hi Dave,
>
> On 4/22/24 11:34, Dave Martin wrote:
> > Hi Babu,
> >
> > On Thu, Apr 04, 2024 at 03:02:45PM -0500, Moger, Babu wrote:
> >> Hi Peter,
> >>
> >>
> >> On 4/4/24 14:08, Peter Newman wrote:
> >>> Hi Babu,
> >>>
> >>> On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@amd.com> wrote:
> >>>> The list follows the following format:
> >>>>
> >>>> * Default CTRL_MON group:
> >>>> "//<domain_id>=<assignment_flags>"
> >>>>
> >>>> * Non-default CTRL_MON group:
> >>>> "<CTRL_MON group>//<domain_id>=<assignment_flags>"
> >>>>
> >>>> * Child MON group of default CTRL_MON group:
> >>>> "/<MON group>/<domain_id>=<assignment_flags>"
> >>>>
> >>>> * Child MON group of non-default CTRL_MON group:
> >>>> "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
> >>>>
> >>>> Assignment flags can be one of the following:
> >>>>
> >>>> t MBM total event is assigned
> >>>> l MBM local event is assigned
> >>>> tl Both total and local MBM events are assigned
> >>>> _ None of the MBM events are assigned
> >>>>
> >>>> Examples:
> >>>>
> >>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >>>> non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >>>> non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >>>> //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >>>> /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >>>>
> >>>> There are four groups and all the groups have local and total event assigned.
> >>>>
> >>>> "//" - This is a default CONTROL MON group
> >>>>
> >>>> "non_defult_group//" - This is non default CONTROL MON group
> >>>>
> >>>> "/default_mon1/" - This is Child MON group of the defult group
> >>>>
> >>>> "non_defult_group/non_default_mon1/" - This is child MON group of the non default group
> >>>>
> >>>> =tl means both total and local events are assigned.
> >>>
> >>> I recall there was supposed to be a way to perform the same update on
> >>> all domains together so that it isn't tedious to not do per-domain
> >>
> >> Yes. Correct. Reinette suggested to have "no domains" means ALL the domains.
> >
> > Would "*" be more intuitive?
>
> We could. But I don't see the need for wildcard ("*") or ranges and
> complexity that comes with that.
For "*", I mean that this would just stand for "all cpus", not a generic
string match; apologies if I didn't make that clear.
I think that an explicit "*" is still a less surprising way to say
"everything" than "" (which if it means anything at all, usually means
"nothing").
I may have misunderstood the intention here: _if_ the intention is to
provide a way to enable/disable an event in all domains without having
to enumerate them all one by one, then I think "*" is preferable syntax
to "". That was my only real suggestion here.
>
> Even in schemata processing we don't use the wildcard or ranges and also
> there is no mention of that in documentation.
> https://www.kernel.org/doc/Documentation/x86/resctrl.rst
I know, though writing the schemata files can be tedious and annoying,
since their content is often very repetitive, so ...
>
> Domains(or nodes) are processed one by one. Some examples.
>
> # cat schemata
> SMBA:0=2048;1=2048;2=2048;3=2048
> MB:0=2048;1=2048;2=2048;3=2048
> L3:0=ffff;1=ffff;2=ffff;3=ffff
>
> # echo "SMBA:1=64" > schemata
> # cat schemata
> SMBA:0=2048;1= 64;2=2048;3=2048
> MB:0=2048;1=2048;2=2048;3=2048
> L3:0=ffff;1=ffff;2=ffff;3=ffff
... it would be convenient to be able to do something like
# echo "SMBA:*=64" >schemata
# grep SMBA: schemata
SMBA:0= 64;1= 64;2= 64;3= 64
Anyway, this is nothing directly to do with this series; just a
thought.
> > Whatever is done here to describe the "wildcard node", would it be worth
> > having the node field parse the same way in the "schemata" files?
> >
> > Is there any merit in having range match expressions, e.g. something like
> >
> > 0-3,8-11=foo;4-7,12-*=bar
> >
> > (The latter is obvious feature creep though, so a real use case for this
> > would be needed to justify it. I don't have one right now...)
[...]
> Thanks
> Babu Moger
I do agree that unless someone jumps up and down saying this would
help their use case, this is probably a step too far.
Just thinking aloud (and this kind of feature could be added later in a
backwards compatible way if someone really needs it).
Cheers
---Dave
On 4/23/2024 5:37 AM, Dave Martin wrote:
> On Mon, Apr 22, 2024 at 03:44:26PM -0500, Moger, Babu wrote:
>> Hi Dave,
>>
>> On 4/22/24 11:34, Dave Martin wrote:
>>> Hi Babu,
>>>
>>> On Thu, Apr 04, 2024 at 03:02:45PM -0500, Moger, Babu wrote:
>>>> Hi Peter,
>>>>
>>>>
>>>> On 4/4/24 14:08, Peter Newman wrote:
>>>>> Hi Babu,
>>>>>
>>>>> On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@amd.com> wrote:
>>>>>> The list follows the following format:
>>>>>>
>>>>>> * Default CTRL_MON group:
>>>>>> "//<domain_id>=<assignment_flags>"
>>>>>>
>>>>>> * Non-default CTRL_MON group:
>>>>>> "<CTRL_MON group>//<domain_id>=<assignment_flags>"
>>>>>>
>>>>>> * Child MON group of default CTRL_MON group:
>>>>>> "/<MON group>/<domain_id>=<assignment_flags>"
>>>>>>
>>>>>> * Child MON group of non-default CTRL_MON group:
>>>>>> "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
>>>>>>
>>>>>> Assignment flags can be one of the following:
>>>>>>
>>>>>> t MBM total event is assigned
>>>>>> l MBM local event is assigned
>>>>>> tl Both total and local MBM events are assigned
>>>>>> _ None of the MBM events are assigned
>>>>>>
>>>>>> Examples:
>>>>>>
>>>>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>> non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>>>> non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>>>> //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>>>> /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>>>>
>>>>>> There are four groups and all the groups have local and total event assigned.
>>>>>>
>>>>>> "//" - This is a default CONTROL MON group
>>>>>>
>>>>>> "non_defult_group//" - This is non default CONTROL MON group
>>>>>>
>>>>>> "/default_mon1/" - This is Child MON group of the defult group
>>>>>>
>>>>>> "non_defult_group/non_default_mon1/" - This is child MON group of the non default group
>>>>>>
>>>>>> =tl means both total and local events are assigned.
>>>>>
>>>>> I recall there was supposed to be a way to perform the same update on
>>>>> all domains together so that it isn't tedious to not do per-domain
>>>>
>>>> Yes. Correct. Reinette suggested to have "no domains" means ALL the domains.
>>>
>>> Would "*" be more intuitive?
>>
>> We could. But I don't see the need for wildcard ("*") or ranges and
>> complexity that comes with that.
>
> For "*", I mean that this would just stand for "all cpus", not a generic
> string match; apologies if I didn't make that clear.
(reading this by replacing "all cpus" with "all domains")
This sounds reasonable to me. It may indeed make the parsing simpler by
not needing the ugly checks Babu mentioned in [1].
Reinette
[1] https://lore.kernel.org/lkml/7ccd59b8-9fe3-4d1f-82f5-f33d96dbf5ac@amd.com/
On Tue, Apr 23, 2024 at 09:15:07PM -0700, Reinette Chatre wrote:
>
>
> On 4/23/2024 5:37 AM, Dave Martin wrote:
> > On Mon, Apr 22, 2024 at 03:44:26PM -0500, Moger, Babu wrote:
> >> Hi Dave,
> >>
> >> On 4/22/24 11:34, Dave Martin wrote:
> >>> Hi Babu,
> >>>
> >>> On Thu, Apr 04, 2024 at 03:02:45PM -0500, Moger, Babu wrote:
> >>>> Hi Peter,
> >>>>
> >>>>
> >>>> On 4/4/24 14:08, Peter Newman wrote:
> >>>>> Hi Babu,
> >>>>>
> >>>>> On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@amd.com> wrote:
> >>>>>> The list follows the following format:
> >>>>>>
> >>>>>> * Default CTRL_MON group:
> >>>>>> "//<domain_id>=<assignment_flags>"
> >>>>>>
> >>>>>> * Non-default CTRL_MON group:
> >>>>>> "<CTRL_MON group>//<domain_id>=<assignment_flags>"
> >>>>>>
> >>>>>> * Child MON group of default CTRL_MON group:
> >>>>>> "/<MON group>/<domain_id>=<assignment_flags>"
> >>>>>>
> >>>>>> * Child MON group of non-default CTRL_MON group:
> >>>>>> "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
> >>>>>>
> >>>>>> Assignment flags can be one of the following:
> >>>>>>
> >>>>>> t MBM total event is assigned
> >>>>>> l MBM local event is assigned
> >>>>>> tl Both total and local MBM events are assigned
> >>>>>> _ None of the MBM events are assigned
> >>>>>>
> >>>>>> Examples:
> >>>>>>
> >>>>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >>>>>> non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >>>>>> non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >>>>>> //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >>>>>> /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >>>>>>
> >>>>>> There are four groups and all the groups have local and total event assigned.
> >>>>>>
> >>>>>> "//" - This is a default CONTROL MON group
> >>>>>>
> >>>>>> "non_defult_group//" - This is non default CONTROL MON group
> >>>>>>
> >>>>>> "/default_mon1/" - This is Child MON group of the defult group
> >>>>>>
> >>>>>> "non_defult_group/non_default_mon1/" - This is child MON group of the non default group
> >>>>>>
> >>>>>> =tl means both total and local events are assigned.
> >>>>>
> >>>>> I recall there was supposed to be a way to perform the same update on
> >>>>> all domains together so that it isn't tedious to not do per-domain
> >>>>
> >>>> Yes. Correct. Reinette suggested to have "no domains" means ALL the domains.
> >>>
> >>> Would "*" be more intuitive?
> >>
> >> We could. But I don't see the need for wildcard ("*") or ranges and
> >> complexity that comes with that.
> >
> > For "*", I mean that this would just stand for "all cpus", not a generic
> > string match; apologies if I didn't make that clear.
>
> (reading this by replacing "all cpus" with "all domains")
>
> This sounds reasonable to me. It may indeed make the parsing simpler by
> not needing the ugly checks Babu mentioned in [1].
>
> Reinette
>
> [1] https://lore.kernel.org/lkml/7ccd59b8-9fe3-4d1f-82f5-f33d96dbf5ac@amd.com/
Ack, I meant "all domains", sorry!
Note, should we try to detect things like:
<resource>:0=fee;1=fie;*=foe;0=fum
..?
Either we treat conflicting assignments as an error, or we do them all
in the order specified, so that assignments on the right override those
on the left (which is what the schemata parsing in ctrlmondata.c:
parse_line() seems to do today if I understand the code correctly).
In the latter case,
<resource>:*=fee;1=fie
would set all nodes except 1 to "fee", and node 1 to "fie", which might
be useful (or at least, convenient).
If we're worried about that being exposed as ABI and used by userspace,
we might want to disallow it explicitly.
Cheers
---Dave
Hi Dave,
On 4/24/24 09:16, Dave Martin wrote:
> On Tue, Apr 23, 2024 at 09:15:07PM -0700, Reinette Chatre wrote:
>>
>>
>> On 4/23/2024 5:37 AM, Dave Martin wrote:
>>> On Mon, Apr 22, 2024 at 03:44:26PM -0500, Moger, Babu wrote:
>>>> Hi Dave,
>>>>
>>>> On 4/22/24 11:34, Dave Martin wrote:
>>>>> Hi Babu,
>>>>>
>>>>> On Thu, Apr 04, 2024 at 03:02:45PM -0500, Moger, Babu wrote:
>>>>>> Hi Peter,
>>>>>>
>>>>>>
>>>>>> On 4/4/24 14:08, Peter Newman wrote:
>>>>>>> Hi Babu,
>>>>>>>
>>>>>>> On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@amd.com> wrote:
>>>>>>>> The list follows the following format:
>>>>>>>>
>>>>>>>> * Default CTRL_MON group:
>>>>>>>> "//<domain_id>=<assignment_flags>"
>>>>>>>>
>>>>>>>> * Non-default CTRL_MON group:
>>>>>>>> "<CTRL_MON group>//<domain_id>=<assignment_flags>"
>>>>>>>>
>>>>>>>> * Child MON group of default CTRL_MON group:
>>>>>>>> "/<MON group>/<domain_id>=<assignment_flags>"
>>>>>>>>
>>>>>>>> * Child MON group of non-default CTRL_MON group:
>>>>>>>> "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
>>>>>>>>
>>>>>>>> Assignment flags can be one of the following:
>>>>>>>>
>>>>>>>> t MBM total event is assigned
>>>>>>>> l MBM local event is assigned
>>>>>>>> tl Both total and local MBM events are assigned
>>>>>>>> _ None of the MBM events are assigned
>>>>>>>>
>>>>>>>> Examples:
>>>>>>>>
>>>>>>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>> non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>>>>>> non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>>>>>> //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>>>>>> /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
>>>>>>>>
>>>>>>>> There are four groups and all the groups have local and total event assigned.
>>>>>>>>
>>>>>>>> "//" - This is a default CONTROL MON group
>>>>>>>>
>>>>>>>> "non_defult_group//" - This is non default CONTROL MON group
>>>>>>>>
>>>>>>>> "/default_mon1/" - This is Child MON group of the defult group
>>>>>>>>
>>>>>>>> "non_defult_group/non_default_mon1/" - This is child MON group of the non default group
>>>>>>>>
>>>>>>>> =tl means both total and local events are assigned.
>>>>>>>
>>>>>>> I recall there was supposed to be a way to perform the same update on
>>>>>>> all domains together so that it isn't tedious to not do per-domain
>>>>>>
>>>>>> Yes. Correct. Reinette suggested to have "no domains" means ALL the domains.
>>>>>
>>>>> Would "*" be more intuitive?
>>>>
>>>> We could. But I don't see the need for wildcard ("*") or ranges and
>>>> complexity that comes with that.
>>>
>>> For "*", I mean that this would just stand for "all cpus", not a generic
>>> string match; apologies if I didn't make that clear.
>>
>> (reading this by replacing "all cpus" with "all domains")
>>
>> This sounds reasonable to me. It may indeed make the parsing simpler by
>> not needing the ugly checks Babu mentioned in [1].
Sure. Will plan to address "all domains" (*) option in next revision.
>>
>> Reinette
>>
>> [1] https://lore.kernel.org/lkml/7ccd59b8-9fe3-4d1f-82f5-f33d96dbf5ac@amd.com/
>
> Ack, I meant "all domains", sorry!
>
> Note, should we try to detect things like:
>
> <resource>:0=fee;1=fie;*=foe;0=fum
>
> ..?
>
> Either we treat conflicting assignments as an error, or we do them all
> in the order specified, so that assignments on the right override those
> on the left (which is what the schemata parsing in ctrlmondata.c:
> parse_line() seems to do today if I understand the code correctly).
>
> In the latter case,
>
> <resource>:*=fee;1=fie
>
> would set all nodes except 1 to "fee", and node 1 to "fie", which might
> be useful (or at least, convenient).
>
> If we're worried about that being exposed as ABI and used by userspace,
> we might want to disallow it explicitly.
>
Sure. Right now we are not planning to support domain specific
assignments. But, will plan to keep options open for future support.
--
Thanks
Babu Moger
© 2016 - 2026 Red Hat, Inc.