[XEN PATCH] x86/msr: Allow hardware domain to read package C-state residency counters

Simon Gaiser posted 1 patch 9 months, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/946e5494801866c93332cc5d9ec0fa03a4df00d7.1689686046.git.simon@invisiblethingslab.com
xen/arch/x86/include/asm/msr-index.h |  9 +++++++++
xen/arch/x86/pv/emul-priv-op.c       | 14 ++++++++++++++
2 files changed, 23 insertions(+)
[XEN PATCH] x86/msr: Allow hardware domain to read package C-state residency counters
Posted by Simon Gaiser 9 months, 2 weeks ago
Since it's limited to the hardware domain it should be safe and it's
very useful to have access to this directly in dom0 when debugging power
related things for example S0ix.
---
 xen/arch/x86/include/asm/msr-index.h |  9 +++++++++
 xen/arch/x86/pv/emul-priv-op.c       | 14 ++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/xen/arch/x86/include/asm/msr-index.h b/xen/arch/x86/include/asm/msr-index.h
index 4f861c0bb4..7e7255383d 100644
--- a/xen/arch/x86/include/asm/msr-index.h
+++ b/xen/arch/x86/include/asm/msr-index.h
@@ -704,4 +704,13 @@
 #define MSR_PKGC9_IRTL			0x00000634
 #define MSR_PKGC10_IRTL			0x00000635
 
+/* Package C-state residency counters */
+#define MSR_PKG_C2_RESIDENCY            0x0000060d
+#define MSR_PKG_C3_RESIDENCY            0x000003f8
+#define MSR_PKG_C6_RESIDENCY            0x000003f9
+#define MSR_PKG_C7_RESIDENCY            0x000003fa
+#define MSR_PKG_C8_RESIDENCY            0x00000630
+#define MSR_PKG_C9_RESIDENCY            0x00000631
+#define MSR_PKG_C10_RESIDENCY           0x00000632
+
 #endif /* __ASM_MSR_INDEX_H */
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 142bc4818c..4593295ee2 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -965,6 +965,20 @@ static int cf_check read_msr(
         *val = 0;
         return X86EMUL_OKAY;
 
+    case MSR_PKG_C2_RESIDENCY:
+    case MSR_PKG_C3_RESIDENCY:
+    case MSR_PKG_C6_RESIDENCY:
+    case MSR_PKG_C7_RESIDENCY:
+    case MSR_PKG_C8_RESIDENCY:
+    case MSR_PKG_C9_RESIDENCY:
+    case MSR_PKG_C10_RESIDENCY:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
+            break;
+        if ( is_hardware_domain(currd) )
+            goto normal;
+        *val = 0;
+        return X86EMUL_OKAY;
+
     case MSR_P6_PERFCTR(0) ... MSR_P6_PERFCTR(7):
     case MSR_P6_EVNTSEL(0) ... MSR_P6_EVNTSEL(3):
     case MSR_CORE_PERF_FIXED_CTR0 ... MSR_CORE_PERF_FIXED_CTR2:
-- 
2.40.1
Re: [XEN PATCH] x86/msr: Allow hardware domain to read package C-state residency counters
Posted by Jan Beulich 9 months, 2 weeks ago
On 18.07.2023 15:17, Simon Gaiser wrote:
> --- a/xen/arch/x86/pv/emul-priv-op.c
> +++ b/xen/arch/x86/pv/emul-priv-op.c
> @@ -965,6 +965,20 @@ static int cf_check read_msr(
>          *val = 0;
>          return X86EMUL_OKAY;
>  
> +    case MSR_PKG_C2_RESIDENCY:
> +    case MSR_PKG_C3_RESIDENCY:
> +    case MSR_PKG_C6_RESIDENCY:
> +    case MSR_PKG_C7_RESIDENCY:
> +    case MSR_PKG_C8_RESIDENCY:
> +    case MSR_PKG_C9_RESIDENCY:
> +    case MSR_PKG_C10_RESIDENCY:
> +        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
> +            break;
> +        if ( is_hardware_domain(currd) )
> +            goto normal;
> +        *val = 0;
> +        return X86EMUL_OKAY;

In addition to what Andrew said: Why would we suddenly allow these
reads to succeed for DomU-s?

Jan
Re: [XEN PATCH] x86/msr: Allow hardware domain to read package C-state residency counters
Posted by Simon Gaiser 9 months, 1 week ago
Jan Beulich:
> On 18.07.2023 15:17, Simon Gaiser wrote:
>> --- a/xen/arch/x86/pv/emul-priv-op.c
>> +++ b/xen/arch/x86/pv/emul-priv-op.c
>> @@ -965,6 +965,20 @@ static int cf_check read_msr(
>>          *val = 0;
>>          return X86EMUL_OKAY;
>>  
>> +    case MSR_PKG_C2_RESIDENCY:
>> +    case MSR_PKG_C3_RESIDENCY:
>> +    case MSR_PKG_C6_RESIDENCY:
>> +    case MSR_PKG_C7_RESIDENCY:
>> +    case MSR_PKG_C8_RESIDENCY:
>> +    case MSR_PKG_C9_RESIDENCY:
>> +    case MSR_PKG_C10_RESIDENCY:
>> +        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
>> +            break;
>> +        if ( is_hardware_domain(currd) )
>> +            goto normal;
>> +        *val = 0;
>> +        return X86EMUL_OKAY;
> 
> In addition to what Andrew said: Why would we suddenly allow these
> reads to succeed for DomU-s?

That patch wouldn't actually allow those reads, but fake a 0 response,
or do I miss something. If you mean that behavior: I just mirrored what
is done there in some of the other cases. If you prefer something else,
for example treating it as unimplemented, I can change that.

Simon
Re: [XEN PATCH] x86/msr: Allow hardware domain to read package C-state residency counters
Posted by Marek Marczykowski-Górecki 9 months, 1 week ago
On Wed, Jul 19, 2023 at 12:38:39AM +0200, Simon Gaiser wrote:
> Jan Beulich:
> > On 18.07.2023 15:17, Simon Gaiser wrote:
> >> --- a/xen/arch/x86/pv/emul-priv-op.c
> >> +++ b/xen/arch/x86/pv/emul-priv-op.c
> >> @@ -965,6 +965,20 @@ static int cf_check read_msr(
> >>          *val = 0;
> >>          return X86EMUL_OKAY;
> >>  
> >> +    case MSR_PKG_C2_RESIDENCY:
> >> +    case MSR_PKG_C3_RESIDENCY:
> >> +    case MSR_PKG_C6_RESIDENCY:
> >> +    case MSR_PKG_C7_RESIDENCY:
> >> +    case MSR_PKG_C8_RESIDENCY:
> >> +    case MSR_PKG_C9_RESIDENCY:
> >> +    case MSR_PKG_C10_RESIDENCY:
> >> +        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
> >> +            break;
> >> +        if ( is_hardware_domain(currd) )
> >> +            goto normal;
> >> +        *val = 0;
> >> +        return X86EMUL_OKAY;
> > 
> > In addition to what Andrew said: Why would we suddenly allow these
> > reads to succeed for DomU-s?
> 
> That patch wouldn't actually allow those reads, but fake a 0 response,
> or do I miss something. If you mean that behavior: I just mirrored what
> is done there in some of the other cases. If you prefer something else,
> for example treating it as unimplemented, I can change that.

I think Jan meant exactly this difference - faking 0, instead of
failing the read.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Re: [XEN PATCH] x86/msr: Allow hardware domain to read package C-state residency counters
Posted by Jan Beulich 9 months, 1 week ago
On 19.07.2023 03:17, Marek Marczykowski-Górecki wrote:
> On Wed, Jul 19, 2023 at 12:38:39AM +0200, Simon Gaiser wrote:
>> Jan Beulich:
>>> On 18.07.2023 15:17, Simon Gaiser wrote:
>>>> --- a/xen/arch/x86/pv/emul-priv-op.c
>>>> +++ b/xen/arch/x86/pv/emul-priv-op.c
>>>> @@ -965,6 +965,20 @@ static int cf_check read_msr(
>>>>          *val = 0;
>>>>          return X86EMUL_OKAY;
>>>>  
>>>> +    case MSR_PKG_C2_RESIDENCY:
>>>> +    case MSR_PKG_C3_RESIDENCY:
>>>> +    case MSR_PKG_C6_RESIDENCY:
>>>> +    case MSR_PKG_C7_RESIDENCY:
>>>> +    case MSR_PKG_C8_RESIDENCY:
>>>> +    case MSR_PKG_C9_RESIDENCY:
>>>> +    case MSR_PKG_C10_RESIDENCY:
>>>> +        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
>>>> +            break;
>>>> +        if ( is_hardware_domain(currd) )
>>>> +            goto normal;
>>>> +        *val = 0;
>>>> +        return X86EMUL_OKAY;
>>>
>>> In addition to what Andrew said: Why would we suddenly allow these
>>> reads to succeed for DomU-s?
>>
>> That patch wouldn't actually allow those reads, but fake a 0 response,
>> or do I miss something. If you mean that behavior: I just mirrored what
>> is done there in some of the other cases. If you prefer something else,
>> for example treating it as unimplemented, I can change that.
> 
> I think Jan meant exactly this difference - faking 0, instead of
> failing the read.

Indeed.

Jan


Re: [XEN PATCH] x86/msr: Allow hardware domain to read package C-state residency counters
Posted by Andrew Cooper 9 months, 2 weeks ago
On 18/07/2023 2:17 pm, Simon Gaiser wrote:
> Since it's limited to the hardware domain it should be safe and it's
> very useful to have access to this directly in dom0 when debugging power
> related things for example S0ix.

You need a SoB.

But, this is an area there things are subtly broken.  For package-scope
MSRs on single socket systems (which does include client systems), then
this happens to function.

It does not function for core-scoped MSRs, or at all in a multi-socket
system.  In such scenarios, dom0 can be rescheduled to a CPU in a
different scope while it thinks it is sampling a single scope.

This is one of the areas where dom0 and Xen end up fighting over the system.

I agree that we want some way for dom0 to get this information, but I'm
afraid it's not as simple as just permitting access to the MSRs like this.

~Andrew

Re: [XEN PATCH] x86/msr: Allow hardware domain to read package C-state residency counters
Posted by Simon Gaiser 9 months, 1 week ago
Andrew Cooper:
> On 18/07/2023 2:17 pm, Simon Gaiser wrote:
>> Since it's limited to the hardware domain it should be safe and it's
>> very useful to have access to this directly in dom0 when debugging power
>> related things for example S0ix.
> 
> You need a SoB.

Yeah, sorry.

> But, this is an area there things are subtly broken.  For package-scope
> MSRs on single socket systems (which does include client systems), then
> this happens to function.
> 
> It does not function for core-scoped MSRs, or at all in a multi-socket
> system.  In such scenarios, dom0 can be rescheduled to a CPU in a
> different scope while it thinks it is sampling a single scope.
> 
> This is one of the areas where dom0 and Xen end up fighting over the system.
> 
> I agree that we want some way for dom0 to get this information, but I'm
> afraid it's not as simple as just permitting access to the MSRs like this.

I see. So a generic solution is not so easy. Also even if there would be
an interface for dom0, my main motivation was to be able to just use
existing code like /sys/kernel/debug/pmc_core/package_cstate_show and
turbostat. You can already read those PC-states via Xen's debug
interface, but that's less convenient.

For those package-scoped MSRs, how about limiting them to single-socket
systems?

Simon
Re: [XEN PATCH] x86/msr: Allow hardware domain to read package C-state residency counters
Posted by Jan Beulich 9 months, 1 week ago
On 19.07.2023 01:27, Simon Gaiser wrote:
> Andrew Cooper:
>> On 18/07/2023 2:17 pm, Simon Gaiser wrote:
>>> Since it's limited to the hardware domain it should be safe and it's
>>> very useful to have access to this directly in dom0 when debugging power
>>> related things for example S0ix.
>>
>> You need a SoB.
> 
> Yeah, sorry.
> 
>> But, this is an area there things are subtly broken.  For package-scope
>> MSRs on single socket systems (which does include client systems), then
>> this happens to function.
>>
>> It does not function for core-scoped MSRs, or at all in a multi-socket
>> system.  In such scenarios, dom0 can be rescheduled to a CPU in a
>> different scope while it thinks it is sampling a single scope.
>>
>> This is one of the areas where dom0 and Xen end up fighting over the system.
>>
>> I agree that we want some way for dom0 to get this information, but I'm
>> afraid it's not as simple as just permitting access to the MSRs like this.
> 
> I see. So a generic solution is not so easy. Also even if there would be
> an interface for dom0, my main motivation was to be able to just use
> existing code like /sys/kernel/debug/pmc_core/package_cstate_show and
> turbostat. You can already read those PC-states via Xen's debug
> interface, but that's less convenient.
> 
> For those package-scoped MSRs, how about limiting them to single-socket
> systems?

No, that's all hackery that in the extreme case may be okay to have in
debug builds, but never in release ones. Even having such in debug
builds is problematic, because then we won't routinely test code paths
as used by release builds. When virtualized and trying to access real
MSRs (rather than properly virtualized ones), you need some sort of PV
solution. If you want stuff like that to show up in "standard
interfaces", you'll have to teach the code providing those to use
whatever the (perhaps to be added) PV interface is.

Jan