[PATCH v2] x86/intel: ensure Global Performance Counter Control is setup correctly

Roger Pau Monne posted 1 patch 3 months, 3 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20240111090821.67594-1-roger.pau@citrix.com
There is a newer version of this series
xen/arch/x86/cpu/intel.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
[PATCH v2] x86/intel: ensure Global Performance Counter Control is setup correctly
Posted by Roger Pau Monne 3 months, 3 weeks ago
When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
MSR contains per-counter enable bits that is ANDed with the enable bit in the
counter EVNTSEL MSR in order for a PMC counter to be enabled.

So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
bits being set by default, but at least on some Intel Sapphire and Emerald
Rapids this is no longer the case, and Xen reports:

Testing NMI watchdog on all CPUs: 0 40 stuck

The first CPU on each package is started with PERF_GLOBAL_CTRL zeroed, so PMC0
doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
relevant enable bit in PERF_GLOBAL_CTRL not being set.

Check and adjust PERF_GLOBAL_CTRL during CPU initialization so that all the
general-purpose PMCs are enabled.  Doing so brings the state of the package-BSP
PERF_GLOBAL_CTRL in line with the rest of the CPUs on the system.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
 - Do the adjustment of PERF_GLOBAL_CTRL even if the watchdog is not used, and
   enable all counters.
---
Unsure whether printing a warning if the value of PERF_GLOBAL_CTRL is not
correct is of any value, hence I haven't added it.
---
 xen/arch/x86/cpu/intel.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index dfee64689ffe..40d3eb0e18a7 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -533,9 +533,25 @@ static void cf_check init_intel(struct cpuinfo_x86 *c)
 	init_intel_cacheinfo(c);
 	if (c->cpuid_level > 9) {
 		unsigned eax = cpuid_eax(10);
+		unsigned int cnt = (uint8_t)(eax >> 8);
+
 		/* Check for version and the number of counters */
-		if ((eax & 0xff) && (((eax>>8) & 0xff) > 1))
+		if ((eax & 0xff) && (cnt > 1) && (cnt <= 32)) {
+			uint64_t global_ctrl;
+			unsigned int cnt_mask = (1UL << cnt) - 1;
+
+			/*
+			 * On (some?) Sapphire/Emerald Rapids platforms each
+			 * package-BSP starts with all the enable bits for the
+			 * general-purpose PMCs cleared.  Adjust so counters
+			 * can be enabled from EVNTSEL.
+			 */
+			rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
+			if ((global_ctrl & cnt_mask) != cnt_mask)
+				wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL,
+				       global_ctrl | cnt_mask);
 			__set_bit(X86_FEATURE_ARCH_PERFMON, c->x86_capability);
+		}
 	}
 
 	if ( !cpu_has(c, X86_FEATURE_XTOPOLOGY) )
-- 
2.43.0


Re: [PATCH v2] x86/intel: ensure Global Performance Counter Control is setup correctly
Posted by Jan Beulich 3 months, 3 weeks ago
On 11.01.2024 10:08, Roger Pau Monne wrote:
> When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
> MSR contains per-counter enable bits that is ANDed with the enable bit in the
> counter EVNTSEL MSR in order for a PMC counter to be enabled.
> 
> So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
> bits being set by default, but at least on some Intel Sapphire and Emerald
> Rapids this is no longer the case, and Xen reports:
> 
> Testing NMI watchdog on all CPUs: 0 40 stuck
> 
> The first CPU on each package is started with PERF_GLOBAL_CTRL zeroed, so PMC0
> doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
> relevant enable bit in PERF_GLOBAL_CTRL not being set.
> 
> Check and adjust PERF_GLOBAL_CTRL during CPU initialization so that all the
> general-purpose PMCs are enabled.  Doing so brings the state of the package-BSP
> PERF_GLOBAL_CTRL in line with the rest of the CPUs on the system.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Changes since v1:
>  - Do the adjustment of PERF_GLOBAL_CTRL even if the watchdog is not used, and
>    enable all counters.
> ---
> Unsure whether printing a warning if the value of PERF_GLOBAL_CTRL is not
> correct is of any value, hence I haven't added it.
> ---
>  xen/arch/x86/cpu/intel.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
> index dfee64689ffe..40d3eb0e18a7 100644
> --- a/xen/arch/x86/cpu/intel.c
> +++ b/xen/arch/x86/cpu/intel.c
> @@ -533,9 +533,25 @@ static void cf_check init_intel(struct cpuinfo_x86 *c)
>  	init_intel_cacheinfo(c);
>  	if (c->cpuid_level > 9) {
>  		unsigned eax = cpuid_eax(10);
> +		unsigned int cnt = (uint8_t)(eax >> 8);
> +
>  		/* Check for version and the number of counters */
> -		if ((eax & 0xff) && (((eax>>8) & 0xff) > 1))
> +		if ((eax & 0xff) && (cnt > 1) && (cnt <= 32)) {

I may not have looked closely enough, but I didn't find the limit of
32 being stated anywhere.

> +			uint64_t global_ctrl;
> +			unsigned int cnt_mask = (1UL << cnt) - 1;

Bits 2 + 4 * n have an additional qualification as per SDM vol 4.

Jan

> +			/*
> +			 * On (some?) Sapphire/Emerald Rapids platforms each
> +			 * package-BSP starts with all the enable bits for the
> +			 * general-purpose PMCs cleared.  Adjust so counters
> +			 * can be enabled from EVNTSEL.
> +			 */
> +			rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
> +			if ((global_ctrl & cnt_mask) != cnt_mask)
> +				wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL,
> +				       global_ctrl | cnt_mask);
>  			__set_bit(X86_FEATURE_ARCH_PERFMON, c->x86_capability);
> +		}
>  	}
>  
>  	if ( !cpu_has(c, X86_FEATURE_XTOPOLOGY) )


Re: [PATCH v2] x86/intel: ensure Global Performance Counter Control is setup correctly
Posted by Roger Pau Monné 3 months, 3 weeks ago
On Thu, Jan 11, 2024 at 11:11:07AM +0100, Jan Beulich wrote:
> On 11.01.2024 10:08, Roger Pau Monne wrote:
> > When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
> > MSR contains per-counter enable bits that is ANDed with the enable bit in the
> > counter EVNTSEL MSR in order for a PMC counter to be enabled.
> > 
> > So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
> > bits being set by default, but at least on some Intel Sapphire and Emerald
> > Rapids this is no longer the case, and Xen reports:
> > 
> > Testing NMI watchdog on all CPUs: 0 40 stuck
> > 
> > The first CPU on each package is started with PERF_GLOBAL_CTRL zeroed, so PMC0
> > doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
> > relevant enable bit in PERF_GLOBAL_CTRL not being set.
> > 
> > Check and adjust PERF_GLOBAL_CTRL during CPU initialization so that all the
> > general-purpose PMCs are enabled.  Doing so brings the state of the package-BSP
> > PERF_GLOBAL_CTRL in line with the rest of the CPUs on the system.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Changes since v1:
> >  - Do the adjustment of PERF_GLOBAL_CTRL even if the watchdog is not used, and
> >    enable all counters.
> > ---
> > Unsure whether printing a warning if the value of PERF_GLOBAL_CTRL is not
> > correct is of any value, hence I haven't added it.
> > ---
> >  xen/arch/x86/cpu/intel.c | 18 +++++++++++++++++-
> >  1 file changed, 17 insertions(+), 1 deletion(-)
> > 
> > diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
> > index dfee64689ffe..40d3eb0e18a7 100644
> > --- a/xen/arch/x86/cpu/intel.c
> > +++ b/xen/arch/x86/cpu/intel.c
> > @@ -533,9 +533,25 @@ static void cf_check init_intel(struct cpuinfo_x86 *c)
> >  	init_intel_cacheinfo(c);
> >  	if (c->cpuid_level > 9) {
> >  		unsigned eax = cpuid_eax(10);
> > +		unsigned int cnt = (uint8_t)(eax >> 8);
> > +
> >  		/* Check for version and the number of counters */
> > -		if ((eax & 0xff) && (((eax>>8) & 0xff) > 1))
> > +		if ((eax & 0xff) && (cnt > 1) && (cnt <= 32)) {
> 
> I may not have looked closely enough, but I didn't find the limit of
> 32 being stated anywhere.

Hm, my copy of the SDM vol4 states that bits 31:n are the enable bits,
where n is CPUID.0AH: EAX[15:8].  Bits 32, 33 and 34 control the Fixed
PMCs.

> > +			uint64_t global_ctrl;
> > +			unsigned int cnt_mask = (1UL << cnt) - 1;
> 
> Bits 2 + 4 * n have an additional qualification as per SDM vol 4.

Let me update my copy...

Looking at the version from December 2023, I see:

0 EN_PMC0 If CPUID.0AH: EAX[15:8] > 0
1 EN_PMC1 If CPUID.0AH: EAX[15:8] > 1
2 EN_PMC2 If CPUID.0AH: EAX[15:8] > 2
n EN_PMCn If CPUID.0AH: EAX[15:8] > n
31:n+1 Reserved.

And I'm afraid I'm not able to infer this additional qualification of
bits 2 + 4 * n.

Thanks, Roger.

Re: [PATCH v2] x86/intel: ensure Global Performance Counter Control is setup correctly
Posted by Jan Beulich 3 months, 3 weeks ago
On 11.01.2024 11:40, Roger Pau Monné wrote:
> On Thu, Jan 11, 2024 at 11:11:07AM +0100, Jan Beulich wrote:
>> On 11.01.2024 10:08, Roger Pau Monne wrote:
>>> When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
>>> MSR contains per-counter enable bits that is ANDed with the enable bit in the
>>> counter EVNTSEL MSR in order for a PMC counter to be enabled.
>>>
>>> So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
>>> bits being set by default, but at least on some Intel Sapphire and Emerald
>>> Rapids this is no longer the case, and Xen reports:
>>>
>>> Testing NMI watchdog on all CPUs: 0 40 stuck
>>>
>>> The first CPU on each package is started with PERF_GLOBAL_CTRL zeroed, so PMC0
>>> doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
>>> relevant enable bit in PERF_GLOBAL_CTRL not being set.
>>>
>>> Check and adjust PERF_GLOBAL_CTRL during CPU initialization so that all the
>>> general-purpose PMCs are enabled.  Doing so brings the state of the package-BSP
>>> PERF_GLOBAL_CTRL in line with the rest of the CPUs on the system.
>>>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> ---
>>> Changes since v1:
>>>  - Do the adjustment of PERF_GLOBAL_CTRL even if the watchdog is not used, and
>>>    enable all counters.
>>> ---
>>> Unsure whether printing a warning if the value of PERF_GLOBAL_CTRL is not
>>> correct is of any value, hence I haven't added it.
>>> ---
>>>  xen/arch/x86/cpu/intel.c | 18 +++++++++++++++++-
>>>  1 file changed, 17 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
>>> index dfee64689ffe..40d3eb0e18a7 100644
>>> --- a/xen/arch/x86/cpu/intel.c
>>> +++ b/xen/arch/x86/cpu/intel.c
>>> @@ -533,9 +533,25 @@ static void cf_check init_intel(struct cpuinfo_x86 *c)
>>>  	init_intel_cacheinfo(c);
>>>  	if (c->cpuid_level > 9) {
>>>  		unsigned eax = cpuid_eax(10);
>>> +		unsigned int cnt = (uint8_t)(eax >> 8);
>>> +
>>>  		/* Check for version and the number of counters */
>>> -		if ((eax & 0xff) && (((eax>>8) & 0xff) > 1))
>>> +		if ((eax & 0xff) && (cnt > 1) && (cnt <= 32)) {
>>
>> I may not have looked closely enough, but I didn't find the limit of
>> 32 being stated anywhere.
> 
> Hm, my copy of the SDM vol4 states that bits 31:n are the enable bits,
> where n is CPUID.0AH: EAX[15:8].  Bits 32, 33 and 34 control the Fixed
> PMCs.
> 
>>> +			uint64_t global_ctrl;
>>> +			unsigned int cnt_mask = (1UL << cnt) - 1;
>>
>> Bits 2 + 4 * n have an additional qualification as per SDM vol 4.
> 
> Let me update my copy...
> 
> Looking at the version from December 2023, I see:
> 
> 0 EN_PMC0 If CPUID.0AH: EAX[15:8] > 0
> 1 EN_PMC1 If CPUID.0AH: EAX[15:8] > 1
> 2 EN_PMC2 If CPUID.0AH: EAX[15:8] > 2
> n EN_PMCn If CPUID.0AH: EAX[15:8] > n
> 31:n+1 Reserved.
> 
> And I'm afraid I'm not able to infer this additional qualification of
> bits 2 + 4 * n.

I'm sorry, both earlier questions were my fault, due to looking at the
table entry for the first match of PERF_GLOBAL_CTRL (i.e.
IA32_FIXED_CTR_CTRL). Still need to get used to the new table layout,
it seems.

Looking at the correct entry raises a question on cnt > 1 though, as
EN_PMC0 is defined for cnt > 0.

Jan

Re: [PATCH v2] x86/intel: ensure Global Performance Counter Control is setup correctly
Posted by Roger Pau Monné 3 months, 3 weeks ago
On Thu, Jan 11, 2024 at 12:34:45PM +0100, Jan Beulich wrote:
> On 11.01.2024 11:40, Roger Pau Monné wrote:
> > On Thu, Jan 11, 2024 at 11:11:07AM +0100, Jan Beulich wrote:
> >> On 11.01.2024 10:08, Roger Pau Monne wrote:
> >>> When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
> >>> MSR contains per-counter enable bits that is ANDed with the enable bit in the
> >>> counter EVNTSEL MSR in order for a PMC counter to be enabled.
> >>>
> >>> So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
> >>> bits being set by default, but at least on some Intel Sapphire and Emerald
> >>> Rapids this is no longer the case, and Xen reports:
> >>>
> >>> Testing NMI watchdog on all CPUs: 0 40 stuck
> >>>
> >>> The first CPU on each package is started with PERF_GLOBAL_CTRL zeroed, so PMC0
> >>> doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
> >>> relevant enable bit in PERF_GLOBAL_CTRL not being set.
> >>>
> >>> Check and adjust PERF_GLOBAL_CTRL during CPU initialization so that all the
> >>> general-purpose PMCs are enabled.  Doing so brings the state of the package-BSP
> >>> PERF_GLOBAL_CTRL in line with the rest of the CPUs on the system.
> >>>
> >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >>> ---
> >>> Changes since v1:
> >>>  - Do the adjustment of PERF_GLOBAL_CTRL even if the watchdog is not used, and
> >>>    enable all counters.
> >>> ---
> >>> Unsure whether printing a warning if the value of PERF_GLOBAL_CTRL is not
> >>> correct is of any value, hence I haven't added it.
> >>> ---
> >>>  xen/arch/x86/cpu/intel.c | 18 +++++++++++++++++-
> >>>  1 file changed, 17 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
> >>> index dfee64689ffe..40d3eb0e18a7 100644
> >>> --- a/xen/arch/x86/cpu/intel.c
> >>> +++ b/xen/arch/x86/cpu/intel.c
> >>> @@ -533,9 +533,25 @@ static void cf_check init_intel(struct cpuinfo_x86 *c)
> >>>  	init_intel_cacheinfo(c);
> >>>  	if (c->cpuid_level > 9) {
> >>>  		unsigned eax = cpuid_eax(10);
> >>> +		unsigned int cnt = (uint8_t)(eax >> 8);
> >>> +
> >>>  		/* Check for version and the number of counters */
> >>> -		if ((eax & 0xff) && (((eax>>8) & 0xff) > 1))
> >>> +		if ((eax & 0xff) && (cnt > 1) && (cnt <= 32)) {
> >>
> >> I may not have looked closely enough, but I didn't find the limit of
> >> 32 being stated anywhere.
> > 
> > Hm, my copy of the SDM vol4 states that bits 31:n are the enable bits,
> > where n is CPUID.0AH: EAX[15:8].  Bits 32, 33 and 34 control the Fixed
> > PMCs.
> > 
> >>> +			uint64_t global_ctrl;
> >>> +			unsigned int cnt_mask = (1UL << cnt) - 1;
> >>
> >> Bits 2 + 4 * n have an additional qualification as per SDM vol 4.
> > 
> > Let me update my copy...
> > 
> > Looking at the version from December 2023, I see:
> > 
> > 0 EN_PMC0 If CPUID.0AH: EAX[15:8] > 0
> > 1 EN_PMC1 If CPUID.0AH: EAX[15:8] > 1
> > 2 EN_PMC2 If CPUID.0AH: EAX[15:8] > 2
> > n EN_PMCn If CPUID.0AH: EAX[15:8] > n
> > 31:n+1 Reserved.
> > 
> > And I'm afraid I'm not able to infer this additional qualification of
> > bits 2 + 4 * n.
> 
> I'm sorry, both earlier questions were my fault, due to looking at the
> table entry for the first match of PERF_GLOBAL_CTRL (i.e.
> IA32_FIXED_CTR_CTRL). Still need to get used to the new table layout,
> it seems.
> 
> Looking at the correct entry raises a question on cnt > 1 though, as
> EN_PMC0 is defined for cnt > 0.

Oh, indeed, can adjust on this same patch if that's OK (seeing as the
issue was already there previous to my change).

Thanks, Roger.

Re: [PATCH v2] x86/intel: ensure Global Performance Counter Control is setup correctly
Posted by Jan Beulich 3 months, 3 weeks ago
On 11.01.2024 13:22, Roger Pau Monné wrote:
> Oh, indeed, can adjust on this same patch if that's OK (seeing as the
> issue was already there previous to my change).

Well, I'm getting the impression that it was deliberate there, i.e. set
setting of the feature flag may want to remain thus constrained.

Jan

Re: [PATCH v2] x86/intel: ensure Global Performance Counter Control is setup correctly
Posted by Roger Pau Monné 3 months, 3 weeks ago
On Thu, Jan 11, 2024 at 03:01:01PM +0100, Jan Beulich wrote:
> On 11.01.2024 13:22, Roger Pau Monné wrote:
> > Oh, indeed, can adjust on this same patch if that's OK (seeing as the
> > issue was already there previous to my change).
> 
> Well, I'm getting the impression that it was deliberate there, i.e. set
> setting of the feature flag may want to remain thus constrained.

Hm, I find it weird, but the original commit message doesn't help at
all.  Xen itself only uses PMC0, and I don't find any other
justification in the current code to require at least 2 counters in
order to expose arch performance monitoring to be present.

Looking at the SDM vol3, the figures there only contain PMC0 and PMC1,
so someone only reading that manual might assume there must always be
2 global PMCs?

(vol4 clarifies the that the number of global PMCs is variable).

Thanks, Roger.

Re: [PATCH v2] x86/intel: ensure Global Performance Counter Control is setup correctly
Posted by Jan Beulich 3 months, 3 weeks ago
On 11.01.2024 15:15, Roger Pau Monné wrote:
> On Thu, Jan 11, 2024 at 03:01:01PM +0100, Jan Beulich wrote:
>> On 11.01.2024 13:22, Roger Pau Monné wrote:
>>> Oh, indeed, can adjust on this same patch if that's OK (seeing as the
>>> issue was already there previous to my change).
>>
>> Well, I'm getting the impression that it was deliberate there, i.e. set
>> setting of the feature flag may want to remain thus constrained.
> 
> Hm, I find it weird, but the original commit message doesn't help at
> all.  Xen itself only uses PMC0, and I don't find any other
> justification in the current code to require at least 2 counters in
> order to expose arch performance monitoring to be present.
> 
> Looking at the SDM vol3, the figures there only contain PMC0 and PMC1,
> so someone only reading that manual might assume there must always be
> 2 global PMCs?

That may have been the impression at the time. It may have been wrong
already back then, or ...

> (vol4 clarifies the that the number of global PMCs is variable).

... it may have been clarified in the SDM later on. My vague guess is
that the > 1 check was to skip what may have been "obviously buggy"
back at the time.

Jan

Re: [PATCH v2] x86/intel: ensure Global Performance Counter Control is setup correctly
Posted by Roger Pau Monné 3 months, 3 weeks ago
On Thu, Jan 11, 2024 at 04:52:04PM +0100, Jan Beulich wrote:
> On 11.01.2024 15:15, Roger Pau Monné wrote:
> > On Thu, Jan 11, 2024 at 03:01:01PM +0100, Jan Beulich wrote:
> >> On 11.01.2024 13:22, Roger Pau Monné wrote:
> >>> Oh, indeed, can adjust on this same patch if that's OK (seeing as the
> >>> issue was already there previous to my change).
> >>
> >> Well, I'm getting the impression that it was deliberate there, i.e. set
> >> setting of the feature flag may want to remain thus constrained.
> > 
> > Hm, I find it weird, but the original commit message doesn't help at
> > all.  Xen itself only uses PMC0, and I don't find any other
> > justification in the current code to require at least 2 counters in
> > order to expose arch performance monitoring to be present.
> > 
> > Looking at the SDM vol3, the figures there only contain PMC0 and PMC1,
> > so someone only reading that manual might assume there must always be
> > 2 global PMCs?
> 
> That may have been the impression at the time. It may have been wrong
> already back then, or ...
> 
> > (vol4 clarifies the that the number of global PMCs is variable).
> 
> ... it may have been clarified in the SDM later on. My vague guess is
> that the > 1 check was to skip what may have been "obviously buggy"
> back at the time.

Let me know if you are OK with the adjustment in v3, or whether you
would rather leave the > 1 check as-is (or maybe adjust in a different
patch).

Thanks, Roger.

Re: [PATCH v2] x86/intel: ensure Global Performance Counter Control is setup correctly
Posted by Jan Beulich 3 months, 3 weeks ago
On 11.01.2024 17:53, Roger Pau Monné wrote:
> On Thu, Jan 11, 2024 at 04:52:04PM +0100, Jan Beulich wrote:
>> On 11.01.2024 15:15, Roger Pau Monné wrote:
>>> On Thu, Jan 11, 2024 at 03:01:01PM +0100, Jan Beulich wrote:
>>>> On 11.01.2024 13:22, Roger Pau Monné wrote:
>>>>> Oh, indeed, can adjust on this same patch if that's OK (seeing as the
>>>>> issue was already there previous to my change).
>>>>
>>>> Well, I'm getting the impression that it was deliberate there, i.e. set
>>>> setting of the feature flag may want to remain thus constrained.
>>>
>>> Hm, I find it weird, but the original commit message doesn't help at
>>> all.  Xen itself only uses PMC0, and I don't find any other
>>> justification in the current code to require at least 2 counters in
>>> order to expose arch performance monitoring to be present.
>>>
>>> Looking at the SDM vol3, the figures there only contain PMC0 and PMC1,
>>> so someone only reading that manual might assume there must always be
>>> 2 global PMCs?
>>
>> That may have been the impression at the time. It may have been wrong
>> already back then, or ...
>>
>>> (vol4 clarifies the that the number of global PMCs is variable).
>>
>> ... it may have been clarified in the SDM later on. My vague guess is
>> that the > 1 check was to skip what may have been "obviously buggy"
>> back at the time.
> 
> Let me know if you are OK with the adjustment in v3, or whether you
> would rather leave the > 1 check as-is (or maybe adjust in a different
> patch).

Well, I haven't been able to make up my mind as to whether the original
check was wrong. Without clear indication, I think we should retain the
original behavior by having the __set_bit() gated by an additional if().
Then, since the line needs touching anyway, a further question would be
whether to properly switch to setup_force_cpu_cap() at the same time.

Jan

Re: [PATCH v2] x86/intel: ensure Global Performance Counter Control is setup correctly
Posted by Roger Pau Monné 3 months, 3 weeks ago
On Fri, Jan 12, 2024 at 08:42:27AM +0100, Jan Beulich wrote:
> On 11.01.2024 17:53, Roger Pau Monné wrote:
> > On Thu, Jan 11, 2024 at 04:52:04PM +0100, Jan Beulich wrote:
> >> On 11.01.2024 15:15, Roger Pau Monné wrote:
> >>> On Thu, Jan 11, 2024 at 03:01:01PM +0100, Jan Beulich wrote:
> >>>> On 11.01.2024 13:22, Roger Pau Monné wrote:
> >>>>> Oh, indeed, can adjust on this same patch if that's OK (seeing as the
> >>>>> issue was already there previous to my change).
> >>>>
> >>>> Well, I'm getting the impression that it was deliberate there, i.e. set
> >>>> setting of the feature flag may want to remain thus constrained.
> >>>
> >>> Hm, I find it weird, but the original commit message doesn't help at
> >>> all.  Xen itself only uses PMC0, and I don't find any other
> >>> justification in the current code to require at least 2 counters in
> >>> order to expose arch performance monitoring to be present.
> >>>
> >>> Looking at the SDM vol3, the figures there only contain PMC0 and PMC1,
> >>> so someone only reading that manual might assume there must always be
> >>> 2 global PMCs?
> >>
> >> That may have been the impression at the time. It may have been wrong
> >> already back then, or ...
> >>
> >>> (vol4 clarifies the that the number of global PMCs is variable).
> >>
> >> ... it may have been clarified in the SDM later on. My vague guess is
> >> that the > 1 check was to skip what may have been "obviously buggy"
> >> back at the time.
> > 
> > Let me know if you are OK with the adjustment in v3, or whether you
> > would rather leave the > 1 check as-is (or maybe adjust in a different
> > patch).
> 
> Well, I haven't been able to make up my mind as to whether the original
> check was wrong. Without clear indication, I think we should retain the
> original behavior by having the __set_bit() gated by an additional if().
> Then, since the line needs touching anyway, a further question would be
> whether to properly switch to setup_force_cpu_cap() at the same time.

Having looked at Linux, it has exactly the same check for > 1, which I
guess is to be expected since the code in Xen is quite likely adapted
from the code in Linux.

Overall, it might be best to leave the check as > 1.  It's possible (as
I think you also mention in a previous email) that there's simply no
hardware with 1 counter.  This might no longer be true when
virtualized, but given the current checks in both Xen and Linux any
virtualization environment that attempts to expose arch perf support
would need to expose at least 2 PMCs.

My suggestion is to leave the cnt > 1 check as it is in v2.

I can send a v4 with that check fixed if there's nothing else in v3
that needs fixing.

IMO doing the adjustment to PERF_GLOBAL_CTRL without setting
ARCH_PERFMON would be contradictory.  Either we set ARCH_PERFMON
support and consequently adjust PERF_GLOBAL_CTRL, or we don't.

Thanks, Roger.

Re: [PATCH v2] x86/intel: ensure Global Performance Counter Control is setup correctly
Posted by Jan Beulich 3 months, 3 weeks ago
On 12.01.2024 11:08, Roger Pau Monné wrote:
> On Fri, Jan 12, 2024 at 08:42:27AM +0100, Jan Beulich wrote:
>> On 11.01.2024 17:53, Roger Pau Monné wrote:
>>> On Thu, Jan 11, 2024 at 04:52:04PM +0100, Jan Beulich wrote:
>>>> On 11.01.2024 15:15, Roger Pau Monné wrote:
>>>>> On Thu, Jan 11, 2024 at 03:01:01PM +0100, Jan Beulich wrote:
>>>>>> On 11.01.2024 13:22, Roger Pau Monné wrote:
>>>>>>> Oh, indeed, can adjust on this same patch if that's OK (seeing as the
>>>>>>> issue was already there previous to my change).
>>>>>>
>>>>>> Well, I'm getting the impression that it was deliberate there, i.e. set
>>>>>> setting of the feature flag may want to remain thus constrained.
>>>>>
>>>>> Hm, I find it weird, but the original commit message doesn't help at
>>>>> all.  Xen itself only uses PMC0, and I don't find any other
>>>>> justification in the current code to require at least 2 counters in
>>>>> order to expose arch performance monitoring to be present.
>>>>>
>>>>> Looking at the SDM vol3, the figures there only contain PMC0 and PMC1,
>>>>> so someone only reading that manual might assume there must always be
>>>>> 2 global PMCs?
>>>>
>>>> That may have been the impression at the time. It may have been wrong
>>>> already back then, or ...
>>>>
>>>>> (vol4 clarifies the that the number of global PMCs is variable).
>>>>
>>>> ... it may have been clarified in the SDM later on. My vague guess is
>>>> that the > 1 check was to skip what may have been "obviously buggy"
>>>> back at the time.
>>>
>>> Let me know if you are OK with the adjustment in v3, or whether you
>>> would rather leave the > 1 check as-is (or maybe adjust in a different
>>> patch).
>>
>> Well, I haven't been able to make up my mind as to whether the original
>> check was wrong. Without clear indication, I think we should retain the
>> original behavior by having the __set_bit() gated by an additional if().
>> Then, since the line needs touching anyway, a further question would be
>> whether to properly switch to setup_force_cpu_cap() at the same time.
> 
> Having looked at Linux, it has exactly the same check for > 1, which I
> guess is to be expected since the code in Xen is quite likely adapted
> from the code in Linux.
> 
> Overall, it might be best to leave the check as > 1.  It's possible (as
> I think you also mention in a previous email) that there's simply no
> hardware with 1 counter.  This might no longer be true when
> virtualized, but given the current checks in both Xen and Linux any
> virtualization environment that attempts to expose arch perf support
> would need to expose at least 2 PMCs.
> 
> My suggestion is to leave the cnt > 1 check as it is in v2.
> 
> I can send a v4 with that check fixed if there's nothing else in v3
> that needs fixing.
> 
> IMO doing the adjustment to PERF_GLOBAL_CTRL without setting
> ARCH_PERFMON would be contradictory.  Either we set ARCH_PERFMON
> support and consequently adjust PERF_GLOBAL_CTRL, or we don't.

Probably fair enough.

Jan

Re: [PATCH v2] x86/intel: ensure Global Performance Counter Control is setup correctly
Posted by Jan Beulich 3 months, 3 weeks ago
On 11.01.2024 10:08, Roger Pau Monne wrote:
> When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
> MSR contains per-counter enable bits that is ANDed with the enable bit in the
> counter EVNTSEL MSR in order for a PMC counter to be enabled.
> 
> So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
> bits being set by default, but at least on some Intel Sapphire and Emerald
> Rapids this is no longer the case, and Xen reports:
> 
> Testing NMI watchdog on all CPUs: 0 40 stuck
> 
> The first CPU on each package is started with PERF_GLOBAL_CTRL zeroed, so PMC0
> doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
> relevant enable bit in PERF_GLOBAL_CTRL not being set.
> 
> Check and adjust PERF_GLOBAL_CTRL during CPU initialization so that all the
> general-purpose PMCs are enabled.  Doing so brings the state of the package-BSP
> PERF_GLOBAL_CTRL in line with the rest of the CPUs on the system.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Changes since v1:
>  - Do the adjustment of PERF_GLOBAL_CTRL even if the watchdog is not used, and
>    enable all counters.
> ---
> Unsure whether printing a warning if the value of PERF_GLOBAL_CTRL is not
> correct is of any value, hence I haven't added it.
> ---
>  xen/arch/x86/cpu/intel.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
> index dfee64689ffe..40d3eb0e18a7 100644
> --- a/xen/arch/x86/cpu/intel.c
> +++ b/xen/arch/x86/cpu/intel.c
> @@ -533,9 +533,25 @@ static void cf_check init_intel(struct cpuinfo_x86 *c)
>  	init_intel_cacheinfo(c);
>  	if (c->cpuid_level > 9) {
>  		unsigned eax = cpuid_eax(10);
> +		unsigned int cnt = (uint8_t)(eax >> 8);

(Not just) since ./CODING_STYLE wants us to avoid fixed width types where
possible, personally I'd prefer "& 0xff" here.

>  		/* Check for version and the number of counters */
> -		if ((eax & 0xff) && (((eax>>8) & 0xff) > 1))
> +		if ((eax & 0xff) && (cnt > 1) && (cnt <= 32)) {
> +			uint64_t global_ctrl;
> +			unsigned int cnt_mask = (1UL << cnt) - 1;
> +
> +			/*
> +			 * On (some?) Sapphire/Emerald Rapids platforms each
> +			 * package-BSP starts with all the enable bits for the
> +			 * general-purpose PMCs cleared.  Adjust so counters
> +			 * can be enabled from EVNTSEL.
> +			 */
> +			rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
> +			if ((global_ctrl & cnt_mask) != cnt_mask)
> +				wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL,
> +				       global_ctrl | cnt_mask);

Should there perhaps be a log message?

Jan

>  			__set_bit(X86_FEATURE_ARCH_PERFMON, c->x86_capability);
> +		}
>  	}
>  
>  	if ( !cpu_has(c, X86_FEATURE_XTOPOLOGY) )


Re: [PATCH v2] x86/intel: ensure Global Performance Counter Control is setup correctly
Posted by Roger Pau Monné 3 months, 3 weeks ago
On Thu, Jan 11, 2024 at 11:00:28AM +0100, Jan Beulich wrote:
> On 11.01.2024 10:08, Roger Pau Monne wrote:
> > When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
> > MSR contains per-counter enable bits that is ANDed with the enable bit in the
> > counter EVNTSEL MSR in order for a PMC counter to be enabled.
> > 
> > So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
> > bits being set by default, but at least on some Intel Sapphire and Emerald
> > Rapids this is no longer the case, and Xen reports:
> > 
> > Testing NMI watchdog on all CPUs: 0 40 stuck
> > 
> > The first CPU on each package is started with PERF_GLOBAL_CTRL zeroed, so PMC0
> > doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
> > relevant enable bit in PERF_GLOBAL_CTRL not being set.
> > 
> > Check and adjust PERF_GLOBAL_CTRL during CPU initialization so that all the
> > general-purpose PMCs are enabled.  Doing so brings the state of the package-BSP
> > PERF_GLOBAL_CTRL in line with the rest of the CPUs on the system.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Changes since v1:
> >  - Do the adjustment of PERF_GLOBAL_CTRL even if the watchdog is not used, and
> >    enable all counters.
> > ---
> > Unsure whether printing a warning if the value of PERF_GLOBAL_CTRL is not
> > correct is of any value, hence I haven't added it.
> > ---
> >  xen/arch/x86/cpu/intel.c | 18 +++++++++++++++++-
> >  1 file changed, 17 insertions(+), 1 deletion(-)
> > 
> > diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
> > index dfee64689ffe..40d3eb0e18a7 100644
> > --- a/xen/arch/x86/cpu/intel.c
> > +++ b/xen/arch/x86/cpu/intel.c
> > @@ -533,9 +533,25 @@ static void cf_check init_intel(struct cpuinfo_x86 *c)
> >  	init_intel_cacheinfo(c);
> >  	if (c->cpuid_level > 9) {
> >  		unsigned eax = cpuid_eax(10);
> > +		unsigned int cnt = (uint8_t)(eax >> 8);
> 
> (Not just) since ./CODING_STYLE wants us to avoid fixed width types where
> possible, personally I'd prefer "& 0xff" here.

Hm, OK.  I got confused and somehow was under the impression we prefer
to truncate using fixed types rather than masks.

> >  		/* Check for version and the number of counters */
> > -		if ((eax & 0xff) && (((eax>>8) & 0xff) > 1))
> > +		if ((eax & 0xff) && (cnt > 1) && (cnt <= 32)) {
> > +			uint64_t global_ctrl;
> > +			unsigned int cnt_mask = (1UL << cnt) - 1;
> > +
> > +			/*
> > +			 * On (some?) Sapphire/Emerald Rapids platforms each
> > +			 * package-BSP starts with all the enable bits for the
> > +			 * general-purpose PMCs cleared.  Adjust so counters
> > +			 * can be enabled from EVNTSEL.
> > +			 */
> > +			rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
> > +			if ((global_ctrl & cnt_mask) != cnt_mask)
> > +				wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL,
> > +				       global_ctrl | cnt_mask);
> 
> Should there perhaps be a log message?

I had a post-commit remark about that exactly.  I can add one.

Thanks, Roger.

Re: [PATCH v2] x86/intel: ensure Global Performance Counter Control is setup correctly
Posted by Jan Beulich 3 months, 3 weeks ago
On 11.01.2024 11:32, Roger Pau Monné wrote:
> On Thu, Jan 11, 2024 at 11:00:28AM +0100, Jan Beulich wrote:
>> On 11.01.2024 10:08, Roger Pau Monne wrote:
>>> When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
>>> MSR contains per-counter enable bits that is ANDed with the enable bit in the
>>> counter EVNTSEL MSR in order for a PMC counter to be enabled.
>>>
>>> So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
>>> bits being set by default, but at least on some Intel Sapphire and Emerald
>>> Rapids this is no longer the case, and Xen reports:
>>>
>>> Testing NMI watchdog on all CPUs: 0 40 stuck
>>>
>>> The first CPU on each package is started with PERF_GLOBAL_CTRL zeroed, so PMC0
>>> doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
>>> relevant enable bit in PERF_GLOBAL_CTRL not being set.
>>>
>>> Check and adjust PERF_GLOBAL_CTRL during CPU initialization so that all the
>>> general-purpose PMCs are enabled.  Doing so brings the state of the package-BSP
>>> PERF_GLOBAL_CTRL in line with the rest of the CPUs on the system.
>>>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> ---
>>> Changes since v1:
>>>  - Do the adjustment of PERF_GLOBAL_CTRL even if the watchdog is not used, and
>>>    enable all counters.
>>> ---
>>> Unsure whether printing a warning if the value of PERF_GLOBAL_CTRL is not
>>> correct is of any value, hence I haven't added it.
>>> ---
>>>  xen/arch/x86/cpu/intel.c | 18 +++++++++++++++++-
>>>  1 file changed, 17 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
>>> index dfee64689ffe..40d3eb0e18a7 100644
>>> --- a/xen/arch/x86/cpu/intel.c
>>> +++ b/xen/arch/x86/cpu/intel.c
>>> @@ -533,9 +533,25 @@ static void cf_check init_intel(struct cpuinfo_x86 *c)
>>>  	init_intel_cacheinfo(c);
>>>  	if (c->cpuid_level > 9) {
>>>  		unsigned eax = cpuid_eax(10);
>>> +		unsigned int cnt = (uint8_t)(eax >> 8);
>>
>> (Not just) since ./CODING_STYLE wants us to avoid fixed width types where
>> possible, personally I'd prefer "& 0xff" here.
> 
> Hm, OK.  I got confused and somehow was under the impression we prefer
> to truncate using fixed types rather than masks.

It's on the edge I admit, and iirc e.g. Andrew would rather see
./CODING_STYLE be relaxed some again.

>>>  		/* Check for version and the number of counters */
>>> -		if ((eax & 0xff) && (((eax>>8) & 0xff) > 1))
>>> +		if ((eax & 0xff) && (cnt > 1) && (cnt <= 32)) {
>>> +			uint64_t global_ctrl;
>>> +			unsigned int cnt_mask = (1UL << cnt) - 1;
>>> +
>>> +			/*
>>> +			 * On (some?) Sapphire/Emerald Rapids platforms each
>>> +			 * package-BSP starts with all the enable bits for the
>>> +			 * general-purpose PMCs cleared.  Adjust so counters
>>> +			 * can be enabled from EVNTSEL.
>>> +			 */
>>> +			rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
>>> +			if ((global_ctrl & cnt_mask) != cnt_mask)
>>> +				wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL,
>>> +				       global_ctrl | cnt_mask);
>>
>> Should there perhaps be a log message?
> 
> I had a post-commit remark about that exactly.  I can add one.

Oh, sorry, missed that. Imo for firmware bugs it's useful to record.
Further, if such a messages is emitted first, in the (hopefully)
unlikely event of the WRMSR causing an issue, there'll be an immediate
reference of what's going on.

Jan