[PATCH 2/2] x86/resctrl: Fix memory bandwidth counter width for Hygon

Xiaochen Shen posted 2 patches 2 weeks, 1 day ago
There is a newer version of this series
[PATCH 2/2] x86/resctrl: Fix memory bandwidth counter width for Hygon
Posted by Xiaochen Shen 2 weeks, 1 day ago
The memory bandwidth calculation relies on reading the hardware counter
and measuring the delta between samples. To ensure accurate measurement,
the software reads the counter frequently enough to prevent it from
rolling over twice between reads.

The default Memory Bandwidth Monitoring (MBM) counter width is 24 bits.
Hygon CPUs provide a 32-bit width counter, but they do not support the
MBM capability CPUID leaf (0xF.[ECX=1]:EAX) to report the width offset
(from 24 bits).

Consequently, the kernel falls back to the 24-bit default counter width,
which causes incorrect overflow handling on Hygon CPUs.

Fix this by explicitly setting the counter width offset to 8 bits
(resulting in a 32-bit total counter width) for Hygon CPUs.

Fixes: d8df126349da ("x86/cpu/hygon: Add missing resctrl_cpu_detect() in bsp_init helper")
Fixes: 923f3a2b48bd ("x86/resctrl: Query LLC monitoring properties once during boot")
Cc: stable@vger.kernel.org
Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net>
---
 arch/x86/kernel/cpu/resctrl/core.c     | 15 +++++++++++++--
 arch/x86/kernel/cpu/resctrl/internal.h |  3 +++
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 10de1594d328..6ebff44a3f75 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -1021,8 +1021,19 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
 		c->x86_cache_occ_scale = ebx;
 		c->x86_cache_mbm_width_offset = eax & 0xff;
 
-		if (c->x86_vendor == X86_VENDOR_AMD && !c->x86_cache_mbm_width_offset)
-			c->x86_cache_mbm_width_offset = MBM_CNTR_WIDTH_OFFSET_AMD;
+		if (!c->x86_cache_mbm_width_offset) {
+			switch (c->x86_vendor) {
+			case X86_VENDOR_AMD:
+				c->x86_cache_mbm_width_offset = MBM_CNTR_WIDTH_OFFSET_AMD;
+				break;
+			case X86_VENDOR_HYGON:
+				c->x86_cache_mbm_width_offset = MBM_CNTR_WIDTH_OFFSET_HYGON;
+				break;
+			default:
+				/* Leave c->x86_cache_mbm_width_offset as 0 */
+				break;
+			}
+		}
 	}
 }
 
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 4a916c84a322..79c18657ede0 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -14,6 +14,9 @@
 
 #define MBM_CNTR_WIDTH_OFFSET_AMD	20
 
+/* Hygon MBM counter width as an offset from MBM_CNTR_WIDTH_BASE */
+#define MBM_CNTR_WIDTH_OFFSET_HYGON	8
+
 #define RMID_VAL_ERROR			BIT_ULL(63)
 
 #define RMID_VAL_UNAVAIL		BIT_ULL(62)
-- 
2.47.3
RE: [PATCH 2/2] x86/resctrl: Fix memory bandwidth counter width for Hygon
Posted by Luck, Tony 2 weeks, 1 day ago
> The default Memory Bandwidth Monitoring (MBM) counter width is 24 bits.
> Hygon CPUs provide a 32-bit width counter, but they do not support the
> MBM capability CPUID leaf (0xF.[ECX=1]:EAX) to report the width offset
> (from 24 bits).
>
> Consequently, the kernel falls back to the 24-bit default counter width,
> which causes incorrect overflow handling on Hygon CPUs.

I *think* you'd get the right results if the h/w counter is wider
than s/w expects. You'd just need to keep polling fast enough
(and we never adjusted the MBM polling rate from the original
1 HZ.)

> Fix this by explicitly setting the counter width offset to 8 bits
> (resulting in a 32-bit total counter width) for Hygon CPUs.

But the patch looks good.

Reviewed-by: Tony Luck <tony.luck@intel.com>
Re: [PATCH 2/2] x86/resctrl: Fix memory bandwidth counter width for Hygon
Posted by Xiaochen Shen 2 weeks ago
Hi Tony,

On 12/5/2025 1:11 AM, Luck, Tony wrote:
> I *think* you'd get the right results if the h/w counter is wider
> than s/w expects. You'd just need to keep polling fast enough
> (and we never adjusted the MBM polling rate from the original
> 1 HZ.)

Thank you very much for code review!

We have observed a test case where an incorrect counter width leads to random unexpected memory bandwidth readings:
https://github.com/shenxiaochen/my_documents/blob/main/memory_bandwidth_counter_width_and_overflow_issue_steps_to_reproduce.txt

The issue was resolved by applying this patch.


Best regards,
Xiaochen Shen
Re: [PATCH 2/2] x86/resctrl: Fix memory bandwidth counter width for Hygon
Posted by Reinette Chatre 1 week, 6 days ago
Hi Xiaochen,

On 12/4/25 6:38 PM, Xiaochen Shen wrote:
> Hi Tony,
> 
> On 12/5/2025 1:11 AM, Luck, Tony wrote:
>> I *think* you'd get the right results if the h/w counter is wider
>> than s/w expects. You'd just need to keep polling fast enough
>> (and we never adjusted the MBM polling rate from the original
>> 1 HZ.)
> 
> Thank you very much for code review!
> 
> We have observed a test case where an incorrect counter width leads to random unexpected memory bandwidth readings:
> https://github.com/shenxiaochen/my_documents/blob/main/memory_bandwidth_counter_width_and_overflow_issue_steps_to_reproduce.txt

Could this perhaps be related to issue fixed by:
15292f1b4c55 ("x86/resctrl: Fix miscount of bandwidth event when reactivating previously unavailable RMID")?

Reinette
Re: [PATCH 2/2] x86/resctrl: Fix memory bandwidth counter width for Hygon
Posted by Xiaochen Shen 1 week, 6 days ago
Hi Reinette,

On 12/6/2025 5:57 AM, Reinette Chatre wrote:
>> We have observed a test case where an incorrect counter width leads to random unexpected memory bandwidth readings:
>> https://github.com/shenxiaochen/my_documents/blob/main/memory_bandwidth_counter_width_and_overflow_issue_steps_to_reproduce.txt
> Could this perhaps be related to issue fixed by:
> 15292f1b4c55 ("x86/resctrl: Fix miscount of bandwidth event when reactivating previously unavailable RMID")?

Thank you for the information.
But I don't think this issue is related to commit 15292f1b4c55, which was already part of the code base when I discovered the problem.


Best regards,
Xiaochen Shen
RE: [PATCH 2/2] x86/resctrl: Fix memory bandwidth counter width for Hygon
Posted by Luck, Tony 1 week, 6 days ago
> > I *think* you'd get the right results if the h/w counter is wider
> > than s/w expects. You'd just need to keep polling fast enough
> > (and we never adjusted the MBM polling rate from the original
> > 1 HZ.)
>
> Thank you very much for code review!
>
> We have observed a test case where an incorrect counter width leads to random unexpected memory bandwidth readings:
> https://github.com/shenxiaochen/my_documents/blob/main/memory_bandwidth_counter_width_and_overflow_issue_steps_to_reproduce.txt
>
> The issue was resolved by applying this patch.

Clearly something is going wrong, and you sometime see enormous values for
memory bandwidth. But I'm still puzzled about what is going wrong.

I pasted the resctrl wraparound function into a small user-mode test, and it
seems to be able to ignore bits above the width used.

The test below prints "2" when told the width is either 24 or 32.

Your patch to use width = 32 is good, but if the problem isn't in the mbm_overflow_count()
function, then you might just have made the problem 256X harder to hit.

Question: What is the value of "hw_res->mon_scale" on a Hygon system?

-Tony


#include <stdio.h>

typedef unsigned long long u64;

static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
{
        u64 shift = 64 - width, chunks;

        chunks = (cur_msr << shift) - (prev_msr << shift);
        return chunks >> shift;
}

int main(void)
{
        u64 prev, cur;

        prev = 0xffffff;
        cur = 0x1000001;

        printf("width = 24 %lld\n", mbm_overflow_count(prev, cur, 24));

        printf("width = 32 %lld\n", mbm_overflow_count(prev, cur, 32));

        return 0;
}
Re: [PATCH 2/2] x86/resctrl: Fix memory bandwidth counter width for Hygon
Posted by Xiaochen Shen 1 week, 5 days ago
Hi Tony,

(Sorry to resent. The previous thread sent to some recipients was bounced by Recipient System)


On 12/6/2025 4:33 AM, Luck, Tony wrote:
> Clearly something is going wrong, and you sometime see enormous values for
> memory bandwidth. But I'm still puzzled about what is going wrong.
> 
> I pasted the resctrl wraparound function into a small user-mode test, and it
> seems to be able to ignore bits above the width used.
> 
> The test below prints "2" when told the width is either 24 or 32.
> 
> Your patch to use width = 32 is good, but if the problem isn't in the mbm_overflow_count()
> function, then you might just have made the problem 256X harder to hit.#include <stdio.h>
> 
> typedef unsigned long long u64;
> 
> static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
> {
>         u64 shift = 64 - width, chunks;
> 
>         chunks = (cur_msr << shift) - (prev_msr << shift);
>         return chunks >> shift;
> }
> 
> int main(void)
> {
>         u64 prev, cur;
> 
>         prev = 0xffffff;
>         cur = 0x1000001;
> 
>         printf("width = 24 %lld\n", mbm_overflow_count(prev, cur, 24));
> 
>         printf("width = 32 %lld\n", mbm_overflow_count(prev, cur, 32));
> 
>         return 0;
> }

I think, this issue could be reproduced by your test program with a minor change (e.g., cur.bits[31:24] > 1):
- cur = 0x1000001;
+ cur = 0xf000001;

// The calculated value 2 is incorrrect with passing 24 bits counter width:
# ./counter_width
width = 24 2
width = 32 234881026

Thank you!


Best regards,
Xiaochen Shen
Re: [PATCH 2/2] x86/resctrl: Fix memory bandwidth counter width for Hygon
Posted by Xiaochen Shen 1 week, 6 days ago
Hi Tony,

On 12/6/2025 4:33 AM, Luck, Tony wrote:
>> We have observed a test case where an incorrect counter width leads to random unexpected memory bandwidth readings:
>> https://github.com/shenxiaochen/my_documents/blob/main/memory_bandwidth_counter_width_and_overflow_issue_steps_to_reproduce.txt
>>
>> The issue was resolved by applying this patch.
> Clearly something is going wrong, and you sometime see enormous values for
> memory bandwidth. But I'm still puzzled about what is going wrong.
> 
> I pasted the resctrl wraparound function into a small user-mode test, and it
> seems to be able to ignore bits above the width used.
> 
> The test below prints "2" when told the width is either 24 or 32.
> 

--The test output:----
# ./counter_width
width = 24 2
width = 32 2
----------------------


> Your patch to use width = 32 is good, but if the problem isn't in the mbm_overflow_count()
> function, then you might just have made the problem 256X harder to hit.
> 

From my understanding, mbm_overflow_count() works as expected if a correct counter width is passed as a parameter.

In my opinion, the root cause is that:
The incorrect counter width (24-bits) is passed to mbm_overflow_count(), which is much smaller than the hardware counter width (32-bits).
As a result, the data between bit 24 and bit 32 of the counter is *discarded* unexpectedly in bandwidth delta calculation in this scenario:
(1) Kernel firstly reads the hardware counter.
(2) Kernel secondly reads the hardware counter with 1 second interval.
Between (1) and (2), if 32-bits hardware counter really hits overflow, mbm_overflow_count() still handles with 24-bits counter width as a parameter:

static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
{
        u64 shift = 64 - width, chunks;

        chunks = (cur_msr << shift) - (prev_msr << shift);
        return chunks >> shift;
}

The calculated bandwidth delta is incorrect at this time, because the data between bit 24 and bit 32 of the counter is *discarded* unexpectedly.

See more debugging information as below.


> Question: What is the value of "hw_res->mon_scale" on a Hygon system?

I have double confirmed with Hygon hardware architect, in the testing Hygon system:
hw_res->mon_scale: 64
hw_res->mbm_width: 32

Here is a rough calculation for the theoretical max bandwidth by the hardware counter:
(1) 32-bits width and mon_scale is 64:
2 ^32 * 64 = 2 ^38 = 256 (B/s)

(2) 24-bits width and mon_scale is 64 (this is highly likely to cause overflow):
2 ^24 * 64 = 2 ^30 = 1G (B/s)


FYI - debugging for the overflow issue:
--------------------------------------------
I run the test case in [1] again, try to capture more useful information:
[1]https://github.com/shenxiaochen/my_documents/blob/main/memory_bandwidth_counter_width_and_overflow_issue_steps_to_reproduce.txt

# ./mbm_total_verbose.sh
...
// Normal data
total b/w (bytes/s): 31192093760 (77687336665701824 - 77687305473608064)
total b/w (bytes/s): 31205362048 (77687367871063872 - 77687336665701824)
77687336665701824 / 64 = 1213864635401591 (0x45000E2660577)
77687367871063872 / 64 = 1213865122985373 (0x45000FF75F59D)

// Unexpected calculated bandwidth value, hardware should hit overflow:
total b/w (bytes/s): 1125656235801792 (78813024106865664 - 77687367871063872)  

// The data analysis:
// Before overflow, low 32-bits of the counter is close to overflow (0xFF75F59D). Will hit hardware overflow immediately!
Read 1: 77687367871063872 / 64 = 1213865122985373 (0x45000FF75F59D)

// 1 second later, after overflow, the low 32-bits of the counter is (0x1C864190).
Read 2: 78813024106865664 / 64 = 1231453501669776 (0x460001C864190)

// The calculated delta is incorrect (the data between bit 24 and bit 32 of the counter is discarded unexpectedly). We see the unexpected bandwidth value:
total b/w (bytes/s): 1125656235801792 (78813024106865664 - 77687367871063872)


Best regards,
Xiaochen Shen
RE: [PATCH 2/2] x86/resctrl: Fix memory bandwidth counter width for Hygon
Posted by Luck, Tony 1 week, 4 days ago
>> Question: What is the value of "hw_res->mon_scale" on a Hygon system?
>
> I have double confirmed with Hygon hardware architect, in the testing Hygon system:
> hw_res->mon_scale: 64
> hw_res->mbm_width: 32
>
> Here is a rough calculation for the theoretical max bandwidth by the hardware counter:
> (1) 32-bits width and mon_scale is 64:
> 2 ^32 * 64 = 2 ^38 = 256 (B/s)
> 
> (2) 24-bits width and mon_scale is 64 (this is highly likely to cause overflow):
> 2 ^24 * 64 = 2 ^30 = 1G (B/s)

I see. Hygon has a much finer grained mon_scale than Intel (e.g. Icelake sets
mon_scale to 72KB). So a 24-bit counter will wrap easily for Hygon, but not for
Intel. The large values you saw must be from where the latest value read from the
MSR is just below the previous value. So resctrl reports chunks ~= 2^24.

Perhaps many of the reported values are wrong because the counter might wrap
24-bits multiple times in 1 second.

The fix to use the full 32-bits is correct and is resolving the problem you saw.

-Tony
Re: [PATCH 2/2] x86/resctrl: Fix memory bandwidth counter width for Hygon
Posted by Xiaochen Shen 1 week, 5 days ago
Hi Tony,

On 12/7/2025 12:14 AM, Xiaochen Shen wrote:
>> Question: What is the value of "hw_res->mon_scale" on a Hygon system?
> I have double confirmed with Hygon hardware architect, in the testing Hygon system:
> hw_res->mon_scale: 64
> hw_res->mbm_width: 32
> 
> Here is a rough calculation for the theoretical max bandwidth by the hardware counter:
> (1) 32-bits width and mon_scale is 64:
> 2 ^32 * 64 = 2 ^38 = 256 (B/s)
> 

Sorry. Here is a typo:
- 2 ^32 * 64 = 2 ^38 = 256 (B/s)
+ 2 ^32 * 64 = 2 ^38 = 256G (B/s) 


> (2) 24-bits width and mon_scale is 64 (this is highly likely to cause overflow):
> 2 ^24 * 64 = 2 ^30 = 1G (B/s)


Best regards,
Xiaochen Shen