Users can create as many monitor groups as RMIDs supported by the hardware.
However, bandwidth monitoring feature on AMD system only guarantees that
RMIDs currently assigned to a processor will be tracked by hardware. The
counters of any other RMIDs which are no longer being tracked will be reset
to zero. The MBM event counters return "Unavailable" for the RMIDs that are
not tracked by hardware. So, there can be only limited number of groups
that can give guaranteed monitoring numbers. With ever changing
configurations there is no way to definitely know which of these groups are
being tracked for certain point of time. Users do not have the option to
monitor a group or set of groups for certain period of time without
worrying about RMID being reset in between.
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID, event pair and monitor the bandwidth as long as it is
assigned. The assigned RMID will be tracked by the hardware until the user
unassigns it manually. There is no need to worry about counters being reset
during this period. Additionally, the user can specify a bitmask
identifying the specific bandwidth types from the given source to track
with the counter.
Without ABMC enabled, monitoring will work in current mode without
assignment option.
The Linux resctrl subsystem provides an interface that allows monitoring of
up to two memory bandwidth events per group, selected from a combination of
available total and local events. When ABMC is enabled, two events will be
assigned to each group by default, in line with the current interface
design. Users will also have the option to configure which types of memory
transactions are counted by these events.
Due to the limited number of available counters (32), users may quickly
exhaust the available counters. If the system runs out of assignable ABMC
counters, the kernel will report an error. In such cases, users will nee
dto unassign one or more active counters to free up countes for new
assignments. The interface will provide options to assign or unassign
events through the group-specific interface file.
The feature can be detected via CPUID_Fn80000020_EBX_x00 bit 5.
Bits Description
5 ABMC (Assignable Bandwidth Monitoring Counters)
The feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
Note: Checkpatch checks/warnings are ignored to maintain coding style.
v13: Updated the commit log with Linux interface details.
v12: Removed the dependancy on X86_FEATURE_BMEC.
Removed the Reviewed-by tag as patch has changed.
v11: No changes.
v10: No changes.
v9: Took care of couple of minor merge conflicts. No other changes.
v8: No changes.
v7: Removed "" from feature flags. Not required anymore.
https://lore.kernel.org/lkml/20240817145058.GCZsC40neU4wkPXeVR@fat_crate.local/
v6: Added Reinette's Reviewed-by. Moved the Checkpatch note below ---.
v5: Minor rebase change and subject line update.
v4: Changes because of rebase. Feature word 21 has few more additions now.
Changed the text to "tracked by hardware" instead of active.
v3: Change because of rebase. Actual patch did not change.
v2: Added dependency on X86_FEATURE_BMEC.
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 2 ++
arch/x86/kernel/cpu/scattered.c | 1 +
3 files changed, 4 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 6c2c152d8a67..d5c14dc678df 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -481,6 +481,7 @@
#define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */
#define X86_FEATURE_AMD_WORKLOAD_CLASS (21*32 + 7) /* Workload Classification */
#define X86_FEATURE_PREFER_YMM (21*32 + 8) /* Avoid ZMM registers due to downclocking */
+#define X86_FEATURE_ABMC (21*32 + 9) /* Assignable Bandwidth Monitoring Counters */
/*
* BUG word(s)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index a2fbea0be535..2f54831e04e5 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -71,6 +71,8 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_CQM_MBM_LOCAL, X86_FEATURE_CQM_LLC },
{ X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_TOTAL },
{ X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_LOCAL },
+ { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_TOTAL },
+ { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_LOCAL },
{ X86_FEATURE_AVX512_BF16, X86_FEATURE_AVX512VL },
{ X86_FEATURE_AVX512_FP16, X86_FEATURE_AVX512BW },
{ X86_FEATURE_ENQCMD, X86_FEATURE_XSAVES },
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 16f3ca30626a..3b72b72270f1 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -49,6 +49,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_MBA, CPUID_EBX, 6, 0x80000008, 0 },
{ X86_FEATURE_SMBA, CPUID_EBX, 2, 0x80000020, 0 },
{ X86_FEATURE_BMEC, CPUID_EBX, 3, 0x80000020, 0 },
+ { X86_FEATURE_ABMC, CPUID_EBX, 5, 0x80000020, 0 },
{ X86_FEATURE_AMD_WORKLOAD_CLASS, CPUID_EAX, 22, 0x80000021, 0 },
{ X86_FEATURE_PERFMON_V2, CPUID_EAX, 0, 0x80000022, 0 },
{ X86_FEATURE_AMD_LBR_V2, CPUID_EAX, 1, 0x80000022, 0 },
--
2.34.1
Hi Babu,
On 5/15/25 3:51 PM, Babu Moger wrote:
> Users can create as many monitor groups as RMIDs supported by the hardware.
> However, bandwidth monitoring feature on AMD system only guarantees that
> RMIDs currently assigned to a processor will be tracked by hardware. The
> counters of any other RMIDs which are no longer being tracked will be reset
> to zero. The MBM event counters return "Unavailable" for the RMIDs that are
> not tracked by hardware. So, there can be only limited number of groups
> that can give guaranteed monitoring numbers. With ever changing
> configurations there is no way to definitely know which of these groups are
> being tracked for certain point of time. Users do not have the option to
> monitor a group or set of groups for certain period of time without
> worrying about RMID being reset in between.
>
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID, event pair and monitor the bandwidth as long as it is
> assigned. The assigned RMID will be tracked by the hardware until the user
> unassigns it manually. There is no need to worry about counters being reset
> during this period. Additionally, the user can specify a bitmask
> identifying the specific bandwidth types from the given source to track
> with the counter.
>
> Without ABMC enabled, monitoring will work in current mode without
> assignment option.
>
> The Linux resctrl subsystem provides an interface that allows monitoring of
> up to two memory bandwidth events per group, selected from a combination of
> available total and local events. When ABMC is enabled, two events will be
> assigned to each group by default, in line with the current interface
> design. Users will also have the option to configure which types of memory
> transactions are counted by these events.
>
> Due to the limited number of available counters (32), users may quickly
> exhaust the available counters. If the system runs out of assignable ABMC
> counters, the kernel will report an error. In such cases, users will nee
> dto unassign one or more active counters to free up countes for new
"nee dto" -> "need to"
"countes" -> "counters"
> assignments. The interface will provide options to assign or unassign
"The interface will" -> "resctrl will"?
> events through the group-specific interface file.
>
> The feature can be detected via CPUID_Fn80000020_EBX_x00 bit 5.
"The feature can be detected" -> "The feature is detected"
> Bits Description
> 5 ABMC (Assignable Bandwidth Monitoring Counters)
>
> The feature details are documented in APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
...
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/kernel/cpu/cpuid-deps.c | 2 ++
> arch/x86/kernel/cpu/scattered.c | 1 +
> 3 files changed, 4 insertions(+)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 6c2c152d8a67..d5c14dc678df 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -481,6 +481,7 @@
> #define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */
> #define X86_FEATURE_AMD_WORKLOAD_CLASS (21*32 + 7) /* Workload Classification */
> #define X86_FEATURE_PREFER_YMM (21*32 + 8) /* Avoid ZMM registers due to downclocking */
> +#define X86_FEATURE_ABMC (21*32 + 9) /* Assignable Bandwidth Monitoring Counters */
>
> /*
> * BUG word(s)
> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
> index a2fbea0be535..2f54831e04e5 100644
> --- a/arch/x86/kernel/cpu/cpuid-deps.c
> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> @@ -71,6 +71,8 @@ static const struct cpuid_dep cpuid_deps[] = {
> { X86_FEATURE_CQM_MBM_LOCAL, X86_FEATURE_CQM_LLC },
> { X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_TOTAL },
> { X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_LOCAL },
> + { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_TOTAL },
> + { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_LOCAL },
Is this dependency still accurate now that the implementation switched to the
"extended event ID" variant of ABMC that no longer uses the event IDs associated
with X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL?
> { X86_FEATURE_AVX512_BF16, X86_FEATURE_AVX512VL },
> { X86_FEATURE_AVX512_FP16, X86_FEATURE_AVX512BW },
> { X86_FEATURE_ENQCMD, X86_FEATURE_XSAVES },
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index 16f3ca30626a..3b72b72270f1 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -49,6 +49,7 @@ static const struct cpuid_bit cpuid_bits[] = {
> { X86_FEATURE_MBA, CPUID_EBX, 6, 0x80000008, 0 },
> { X86_FEATURE_SMBA, CPUID_EBX, 2, 0x80000020, 0 },
> { X86_FEATURE_BMEC, CPUID_EBX, 3, 0x80000020, 0 },
> + { X86_FEATURE_ABMC, CPUID_EBX, 5, 0x80000020, 0 },
> { X86_FEATURE_AMD_WORKLOAD_CLASS, CPUID_EAX, 22, 0x80000021, 0 },
> { X86_FEATURE_PERFMON_V2, CPUID_EAX, 0, 0x80000022, 0 },
> { X86_FEATURE_AMD_LBR_V2, CPUID_EAX, 1, 0x80000022, 0 },
Reinette
Hi Reinette,
On 5/22/25 15:51, Reinette Chatre wrote:
> Hi Babu,
>
> On 5/15/25 3:51 PM, Babu Moger wrote:
>> Users can create as many monitor groups as RMIDs supported by the hardware.
>> However, bandwidth monitoring feature on AMD system only guarantees that
>> RMIDs currently assigned to a processor will be tracked by hardware. The
>> counters of any other RMIDs which are no longer being tracked will be reset
>> to zero. The MBM event counters return "Unavailable" for the RMIDs that are
>> not tracked by hardware. So, there can be only limited number of groups
>> that can give guaranteed monitoring numbers. With ever changing
>> configurations there is no way to definitely know which of these groups are
>> being tracked for certain point of time. Users do not have the option to
>> monitor a group or set of groups for certain period of time without
>> worrying about RMID being reset in between.
>>
>> The ABMC feature provides an option to the user to assign a hardware
>> counter to an RMID, event pair and monitor the bandwidth as long as it is
>> assigned. The assigned RMID will be tracked by the hardware until the user
>> unassigns it manually. There is no need to worry about counters being reset
>> during this period. Additionally, the user can specify a bitmask
>> identifying the specific bandwidth types from the given source to track
>> with the counter.
>>
>> Without ABMC enabled, monitoring will work in current mode without
>> assignment option.
>>
>> The Linux resctrl subsystem provides an interface that allows monitoring of
>> up to two memory bandwidth events per group, selected from a combination of
>> available total and local events. When ABMC is enabled, two events will be
>> assigned to each group by default, in line with the current interface
>> design. Users will also have the option to configure which types of memory
>> transactions are counted by these events.
>>
>> Due to the limited number of available counters (32), users may quickly
>> exhaust the available counters. If the system runs out of assignable ABMC
>> counters, the kernel will report an error. In such cases, users will nee
>> dto unassign one or more active counters to free up countes for new
>
> "nee dto" -> "need to"
> "countes" -> "counters"
Sure.
>
>> assignments. The interface will provide options to assign or unassign
>
> "The interface will" -> "resctrl will"?
>
Sure.
>> events through the group-specific interface file.
>>
>> The feature can be detected via CPUID_Fn80000020_EBX_x00 bit 5.
>
> "The feature can be detected" -> "The feature is detected"
>
Sure.
>> Bits Description
>> 5 ABMC (Assignable Bandwidth Monitoring Counters)
>>
>> The feature details are documented in APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>
> ...
>> arch/x86/include/asm/cpufeatures.h | 1 +
>> arch/x86/kernel/cpu/cpuid-deps.c | 2 ++
>> arch/x86/kernel/cpu/scattered.c | 1 +
>> 3 files changed, 4 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
>> index 6c2c152d8a67..d5c14dc678df 100644
>> --- a/arch/x86/include/asm/cpufeatures.h
>> +++ b/arch/x86/include/asm/cpufeatures.h
>> @@ -481,6 +481,7 @@
>> #define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */
>> #define X86_FEATURE_AMD_WORKLOAD_CLASS (21*32 + 7) /* Workload Classification */
>> #define X86_FEATURE_PREFER_YMM (21*32 + 8) /* Avoid ZMM registers due to downclocking */
>> +#define X86_FEATURE_ABMC (21*32 + 9) /* Assignable Bandwidth Monitoring Counters */
>>
>> /*
>> * BUG word(s)
>> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
>> index a2fbea0be535..2f54831e04e5 100644
>> --- a/arch/x86/kernel/cpu/cpuid-deps.c
>> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
>> @@ -71,6 +71,8 @@ static const struct cpuid_dep cpuid_deps[] = {
>> { X86_FEATURE_CQM_MBM_LOCAL, X86_FEATURE_CQM_LLC },
>> { X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_TOTAL },
>> { X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_LOCAL },
>> + { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_TOTAL },
>> + { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_LOCAL },
>
> Is this dependency still accurate now that the implementation switched to the
> "extended event ID" variant of ABMC that no longer uses the event IDs associated
> with X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL?
That's a good question. Unfortunately, we may need to retain this
dependency for now, as a significant portion of the code relies on
functions like resctrl_is_mbm_event(), resctrl_is_mbm_enabled(),
resctrl_arch_is_mbm_total_enabled(), and others.
>
>> { X86_FEATURE_AVX512_BF16, X86_FEATURE_AVX512VL },
>> { X86_FEATURE_AVX512_FP16, X86_FEATURE_AVX512BW },
>> { X86_FEATURE_ENQCMD, X86_FEATURE_XSAVES },
>> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
>> index 16f3ca30626a..3b72b72270f1 100644
>> --- a/arch/x86/kernel/cpu/scattered.c
>> +++ b/arch/x86/kernel/cpu/scattered.c
>> @@ -49,6 +49,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>> { X86_FEATURE_MBA, CPUID_EBX, 6, 0x80000008, 0 },
>> { X86_FEATURE_SMBA, CPUID_EBX, 2, 0x80000020, 0 },
>> { X86_FEATURE_BMEC, CPUID_EBX, 3, 0x80000020, 0 },
>> + { X86_FEATURE_ABMC, CPUID_EBX, 5, 0x80000020, 0 },
>> { X86_FEATURE_AMD_WORKLOAD_CLASS, CPUID_EAX, 22, 0x80000021, 0 },
>> { X86_FEATURE_PERFMON_V2, CPUID_EAX, 0, 0x80000022, 0 },
>> { X86_FEATURE_AMD_LBR_V2, CPUID_EAX, 1, 0x80000022, 0 },
>
> Reinette
>
--
Thanks
Babu Moger
Hi Babu,
On 5/27/25 10:23 AM, Moger, Babu wrote:
> On 5/22/25 15:51, Reinette Chatre wrote:
>> On 5/15/25 3:51 PM, Babu Moger wrote:
>>> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
>>> index a2fbea0be535..2f54831e04e5 100644
>>> --- a/arch/x86/kernel/cpu/cpuid-deps.c
>>> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
>>> @@ -71,6 +71,8 @@ static const struct cpuid_dep cpuid_deps[] = {
>>> { X86_FEATURE_CQM_MBM_LOCAL, X86_FEATURE_CQM_LLC },
>>> { X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_TOTAL },
>>> { X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_LOCAL },
>>> + { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_TOTAL },
>>> + { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_LOCAL },
>>
>> Is this dependency still accurate now that the implementation switched to the
>> "extended event ID" variant of ABMC that no longer uses the event IDs associated
>> with X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL?
>
> That's a good question. Unfortunately, we may need to retain this
> dependency for now, as a significant portion of the code relies on
> functions like resctrl_is_mbm_event(), resctrl_is_mbm_enabled(),
> resctrl_arch_is_mbm_total_enabled(), and others.
>
Avoiding needing to change code is not a valid reason.
I think that without this dependency the code will
still rely on "functions like resctrl_is_mbm_event(),
resctrl_is_mbm_enabled(), resctrl_arch_is_mbm_total_enabled(),
and others." though.
The core shift is to stop thinking about QOS_L3_MBM_TOTAL_EVENT_ID
to mean the same as X86_FEATURE_CQM_MBM_TOTAL, similarly to stop
thinking about QOS_L3_MBM_LOCAL_EVENT_ID to mean the same as
X86_FEATURE_CQM_MBM_LOCAL.
I expected that for backwards compatibility ABMC will start by
enabling QOS_L3_MBM_TOTAL_EVENT_ID and QOS_L3_MBM_LOCAL_EVENT_ID
as part of its initialization, configuring them with the current
defaults for which memory transactions are expected to be monitored
by each. With these events enabled the existing flows using, for
example, resctrl_is_mbm_event(), will continue to work as expected, no?
This would require more familiarity with L3 monitoring enumeration
on AMD since it will still be required to determine the number of
RMIDs etc. but if ABMC does not actually depend on these CQM features
then the current enumeration would need to be re-worked anyway.
Reinette
Hi Reinette,
On 5/27/25 12:54, Reinette Chatre wrote:
> Hi Babu,
>
> On 5/27/25 10:23 AM, Moger, Babu wrote:
>> On 5/22/25 15:51, Reinette Chatre wrote:
>>> On 5/15/25 3:51 PM, Babu Moger wrote:
>
>>>> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
>>>> index a2fbea0be535..2f54831e04e5 100644
>>>> --- a/arch/x86/kernel/cpu/cpuid-deps.c
>>>> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
>>>> @@ -71,6 +71,8 @@ static const struct cpuid_dep cpuid_deps[] = {
>>>> { X86_FEATURE_CQM_MBM_LOCAL, X86_FEATURE_CQM_LLC },
>>>> { X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_TOTAL },
>>>> { X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_LOCAL },
>>>> + { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_TOTAL },
>>>> + { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_LOCAL },
>>>
>>> Is this dependency still accurate now that the implementation switched to the
>>> "extended event ID" variant of ABMC that no longer uses the event IDs associated
>>> with X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL?
>>
>> That's a good question. Unfortunately, we may need to retain this
>> dependency for now, as a significant portion of the code relies on
>> functions like resctrl_is_mbm_event(), resctrl_is_mbm_enabled(),
>> resctrl_arch_is_mbm_total_enabled(), and others.
>>
>
> Avoiding needing to change code is not a valid reason.
>
> I think that without this dependency the code will
> still rely on "functions like resctrl_is_mbm_event(),
> resctrl_is_mbm_enabled(), resctrl_arch_is_mbm_total_enabled(),
> and others." though.
>
> The core shift is to stop thinking about QOS_L3_MBM_TOTAL_EVENT_ID
> to mean the same as X86_FEATURE_CQM_MBM_TOTAL, similarly to stop
> thinking about QOS_L3_MBM_LOCAL_EVENT_ID to mean the same as
> X86_FEATURE_CQM_MBM_LOCAL.
oh. ok.
>
> I expected that for backwards compatibility ABMC will start by
> enabling QOS_L3_MBM_TOTAL_EVENT_ID and QOS_L3_MBM_LOCAL_EVENT_ID
> as part of its initialization, configuring them with the current
> defaults for which memory transactions are expected to be monitored
> by each. With these events enabled the existing flows using, for
> example, resctrl_is_mbm_event(), will continue to work as expected, no?
Yes. It will work as it uses event id.
>
> This would require more familiarity with L3 monitoring enumeration
> on AMD since it will still be required to determine the number of
> RMIDs etc. but if ABMC does not actually depend on these CQM features
> then the current enumeration would need to be re-worked anyway.
Are you suggesting to remove the dependency and rework ABMC enumeration in
get_rdt_mon_resources()?
--
Thanks
Babu Moger
Hi Babu,
On 5/27/25 11:40 AM, Moger, Babu wrote:
> On 5/27/25 12:54, Reinette Chatre wrote:
>> On 5/27/25 10:23 AM, Moger, Babu wrote:
>>> On 5/22/25 15:51, Reinette Chatre wrote:
>>>> On 5/15/25 3:51 PM, Babu Moger wrote:
>>
>>>>> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
>>>>> index a2fbea0be535..2f54831e04e5 100644
>>>>> --- a/arch/x86/kernel/cpu/cpuid-deps.c
>>>>> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
>>>>> @@ -71,6 +71,8 @@ static const struct cpuid_dep cpuid_deps[] = {
>>>>> { X86_FEATURE_CQM_MBM_LOCAL, X86_FEATURE_CQM_LLC },
>>>>> { X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_TOTAL },
>>>>> { X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_LOCAL },
>>>>> + { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_TOTAL },
>>>>> + { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_LOCAL },
>>>>
>>>> Is this dependency still accurate now that the implementation switched to the
>>>> "extended event ID" variant of ABMC that no longer uses the event IDs associated
>>>> with X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL?
>>>
>>> That's a good question. Unfortunately, we may need to retain this
>>> dependency for now, as a significant portion of the code relies on
>>> functions like resctrl_is_mbm_event(), resctrl_is_mbm_enabled(),
>>> resctrl_arch_is_mbm_total_enabled(), and others.
>>>
>>
>> Avoiding needing to change code is not a valid reason.
>>
>> I think that without this dependency the code will
>> still rely on "functions like resctrl_is_mbm_event(),
>> resctrl_is_mbm_enabled(), resctrl_arch_is_mbm_total_enabled(),
>> and others." though.
>>
>> The core shift is to stop thinking about QOS_L3_MBM_TOTAL_EVENT_ID
>> to mean the same as X86_FEATURE_CQM_MBM_TOTAL, similarly to stop
>> thinking about QOS_L3_MBM_LOCAL_EVENT_ID to mean the same as
>> X86_FEATURE_CQM_MBM_LOCAL.
>
> oh. ok.
>
>>
>> I expected that for backwards compatibility ABMC will start by
>> enabling QOS_L3_MBM_TOTAL_EVENT_ID and QOS_L3_MBM_LOCAL_EVENT_ID
>> as part of its initialization, configuring them with the current
>> defaults for which memory transactions are expected to be monitored
>> by each. With these events enabled the existing flows using, for
>> example, resctrl_is_mbm_event(), will continue to work as expected, no?
>
> Yes. It will work as it uses event id.
>>
>> This would require more familiarity with L3 monitoring enumeration
>> on AMD since it will still be required to determine the number of
>> RMIDs etc. but if ABMC does not actually depend on these CQM features
>> then the current enumeration would need to be re-worked anyway.
>
> Are you suggesting to remove the dependency and rework ABMC enumeration in
> get_rdt_mon_resources()?
>
If you have an alternative proposal that would accurately reflect the ABMC
and existing L3 MON features then we can surely consider it.
Reinette
Hi Reinette,
On 5/27/2025 6:42 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 5/27/25 11:40 AM, Moger, Babu wrote:
>> On 5/27/25 12:54, Reinette Chatre wrote:
>>> On 5/27/25 10:23 AM, Moger, Babu wrote:
>>>> On 5/22/25 15:51, Reinette Chatre wrote:
>>>>> On 5/15/25 3:51 PM, Babu Moger wrote:
>>>
>>>>>> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
>>>>>> index a2fbea0be535..2f54831e04e5 100644
>>>>>> --- a/arch/x86/kernel/cpu/cpuid-deps.c
>>>>>> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
>>>>>> @@ -71,6 +71,8 @@ static const struct cpuid_dep cpuid_deps[] = {
>>>>>> { X86_FEATURE_CQM_MBM_LOCAL, X86_FEATURE_CQM_LLC },
>>>>>> { X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_TOTAL },
>>>>>> { X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_LOCAL },
>>>>>> + { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_TOTAL },
>>>>>> + { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_LOCAL },
>>>>>
>>>>> Is this dependency still accurate now that the implementation switched to the
>>>>> "extended event ID" variant of ABMC that no longer uses the event IDs associated
>>>>> with X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL?
>>>>
>>>> That's a good question. Unfortunately, we may need to retain this
>>>> dependency for now, as a significant portion of the code relies on
>>>> functions like resctrl_is_mbm_event(), resctrl_is_mbm_enabled(),
>>>> resctrl_arch_is_mbm_total_enabled(), and others.
>>>>
>>>
>>> Avoiding needing to change code is not a valid reason.
>>>
>>> I think that without this dependency the code will
>>> still rely on "functions like resctrl_is_mbm_event(),
>>> resctrl_is_mbm_enabled(), resctrl_arch_is_mbm_total_enabled(),
>>> and others." though.
>>>
>>> The core shift is to stop thinking about QOS_L3_MBM_TOTAL_EVENT_ID
>>> to mean the same as X86_FEATURE_CQM_MBM_TOTAL, similarly to stop
>>> thinking about QOS_L3_MBM_LOCAL_EVENT_ID to mean the same as
>>> X86_FEATURE_CQM_MBM_LOCAL.
>>
>> oh. ok.
>>
>>>
>>> I expected that for backwards compatibility ABMC will start by
>>> enabling QOS_L3_MBM_TOTAL_EVENT_ID and QOS_L3_MBM_LOCAL_EVENT_ID
>>> as part of its initialization, configuring them with the current
>>> defaults for which memory transactions are expected to be monitored
>>> by each. With these events enabled the existing flows using, for
>>> example, resctrl_is_mbm_event(), will continue to work as expected, no?
>>
>> Yes. It will work as it uses event id.
>>>
>>> This would require more familiarity with L3 monitoring enumeration
>>> on AMD since it will still be required to determine the number of
>>> RMIDs etc. but if ABMC does not actually depend on these CQM features
>>> then the current enumeration would need to be re-worked anyway.
>>
>> Are you suggesting to remove the dependency and rework ABMC enumeration in
>> get_rdt_mon_resources()?
>>
>
> If you have an alternative proposal that would accurately reflect the ABMC
> and existing L3 MON features then we can surely consider it.
I don't see any other option at this point. Will change it next revision.
Thanks
Babu
© 2016 - 2025 Red Hat, Inc.