[PATCH] x86/msr: fix handling of MSR_IA32_PERF_{STATUS/CTL}

Roger Pau Monne posted 1 patch 3 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/xen tags/patchew/20201006162327.93055-1-roger.pau@citrix.com
xen/arch/x86/msr.c             | 20 ++++++++++++++++++++
xen/arch/x86/pv/emul-priv-op.c | 14 --------------
xen/include/xen/sched.h        |  6 ++++++
3 files changed, 26 insertions(+), 14 deletions(-)

[PATCH] x86/msr: fix handling of MSR_IA32_PERF_{STATUS/CTL}

Posted by Roger Pau Monne 3 weeks ago
Currently a PV hardware domain can also be given control over the CPU
frequency, and such guest is allowed to write to MSR_IA32_PERF_CTL.
However since commit 322ec7c89f6 the default behavior has been changed
to reject accesses to not explicitly handled MSRs, preventing PV
guests that manage CPU frequency from reading
MSR_IA32_PERF_{STATUS/CTL}.

Additionally some HVM guests (Windows at least) will attempt to read
MSR_IA32_PERF_CTL and will panic if given back a #GP fault:

vmx.c:3035:d8v0 RDMSR 0x00000199 unimplemented
d8v0 VIRIDIAN CRASH: 3b c0000096 fffff806871c1651 ffffda0253683720 0

Move the handling of MSR_IA32_PERF_{STATUS/CTL} to the common MSR
handling shared between HVM and PV guests, and add an explicit case
for reads to MSR_IA32_PERF_{STATUS/CTL}.

Restore previous behavior and allow PV guests with the required
permissions to read the contents of the mentioned MSRs. Non privileged
guests will get 0 when trying to read those registers, as writes to
MSR_IA32_PERF_CTL by such guest will already be silently dropped.

Fixes: 322ec7c89f6 ('x86/pv: disallow access to unknown MSRs')
Fixes: 84e848fd7a1 ('x86/hvm: disallow access to unknown MSRs')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/msr.c             | 20 ++++++++++++++++++++
 xen/arch/x86/pv/emul-priv-op.c | 14 --------------
 xen/include/xen/sched.h        |  6 ++++++
 3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 81b34fb212..e4c4fa6127 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -242,6 +242,17 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val)
             goto gp_fault;
         break;
 
+    case MSR_IA32_PERF_STATUS:
+    case MSR_IA32_PERF_CTL:
+        if ( cp->x86_vendor != X86_VENDOR_INTEL )
+            goto gp_fault;
+        *val = 0;
+        if ( likely(!is_cpufreq_controller(d)) ||
+             boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+             rdmsr_safe(msr, *val) == 0 )
+            break;
+        goto gp_fault;
+
     case MSR_X2APIC_FIRST ... MSR_X2APIC_LAST:
         if ( !is_hvm_domain(d) || v != curr )
             goto gp_fault;
@@ -442,6 +453,15 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
             goto gp_fault;
         break;
 
+    case MSR_IA32_PERF_CTL:
+        if ( cp->x86_vendor != X86_VENDOR_INTEL )
+            goto gp_fault;
+        if ( likely(!is_cpufreq_controller(d)) ||
+             boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+             wrmsr_safe(msr, val) == 0 )
+            break;
+        goto gp_fault;
+
     case MSR_X2APIC_FIRST ... MSR_X2APIC_LAST:
         if ( !is_hvm_domain(d) || v != curr )
             goto gp_fault;
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 7cc16d6eda..dbceed8a05 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -849,12 +849,6 @@ static inline uint64_t guest_misc_enable(uint64_t val)
     return val;
 }
 
-static inline bool is_cpufreq_controller(const struct domain *d)
-{
-    return ((cpufreq_controller == FREQCTL_dom0_kernel) &&
-            is_hardware_domain(d));
-}
-
 static uint64_t guest_efer(const struct domain *d)
 {
     uint64_t val;
@@ -1121,14 +1115,6 @@ static int write_msr(unsigned int reg, uint64_t val,
             return X86EMUL_OKAY;
         break;
 
-    case MSR_IA32_PERF_CTL:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
-            break;
-        if ( likely(!is_cpufreq_controller(currd)) ||
-             wrmsr_safe(reg, val) == 0 )
-            return X86EMUL_OKAY;
-        break;
-
     case MSR_IA32_THERM_CONTROL:
     case MSR_IA32_ENERGY_PERF_BIAS:
         if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index d8ed83f869..41baa3b7a1 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -1069,6 +1069,12 @@ extern enum cpufreq_controller {
     FREQCTL_none, FREQCTL_dom0_kernel, FREQCTL_xen
 } cpufreq_controller;
 
+static inline bool is_cpufreq_controller(const struct domain *d)
+{
+    return ((cpufreq_controller == FREQCTL_dom0_kernel) &&
+            is_hardware_domain(d));
+}
+
 int cpupool_move_domain(struct domain *d, struct cpupool *c);
 int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op);
 int cpupool_get_id(const struct domain *d);
-- 
2.28.0


Re: [PATCH] x86/msr: fix handling of MSR_IA32_PERF_{STATUS/CTL}

Posted by Jan Beulich 2 weeks ago
On 06.10.2020 18:23, Roger Pau Monne wrote:
> Currently a PV hardware domain can also be given control over the CPU
> frequency, and such guest is allowed to write to MSR_IA32_PERF_CTL.
> However since commit 322ec7c89f6 the default behavior has been changed
> to reject accesses to not explicitly handled MSRs, preventing PV
> guests that manage CPU frequency from reading
> MSR_IA32_PERF_{STATUS/CTL}.
> 
> Additionally some HVM guests (Windows at least) will attempt to read
> MSR_IA32_PERF_CTL and will panic if given back a #GP fault:
> 
> vmx.c:3035:d8v0 RDMSR 0x00000199 unimplemented
> d8v0 VIRIDIAN CRASH: 3b c0000096 fffff806871c1651 ffffda0253683720 0
> 
> Move the handling of MSR_IA32_PERF_{STATUS/CTL} to the common MSR
> handling shared between HVM and PV guests, and add an explicit case
> for reads to MSR_IA32_PERF_{STATUS/CTL}.
> 
> Restore previous behavior and allow PV guests with the required
> permissions to read the contents of the mentioned MSRs. Non privileged
> guests will get 0 when trying to read those registers, as writes to
> MSR_IA32_PERF_CTL by such guest will already be silently dropped.
> 
> Fixes: 322ec7c89f6 ('x86/pv: disallow access to unknown MSRs')
> Fixes: 84e848fd7a1 ('x86/hvm: disallow access to unknown MSRs')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

I would have given this my R-b, but Andrew's "straight up broken"
comment needs resolving first, one way or another.

Jan

Re: [PATCH] x86/msr: fix handling of MSR_IA32_PERF_{STATUS/CTL}

Posted by Andrew Cooper 3 weeks ago
On 06/10/2020 17:23, Roger Pau Monne wrote:
> Currently a PV hardware domain can also be given control over the CPU
> frequency, and such guest is allowed to write to MSR_IA32_PERF_CTL.

This might be how the current logic "works", but its straight up broken.

PERF_CTL is thread scope, so unless dom0 is identity pinned and has one
vcpu for every pcpu, it cannot use the interface correctly.

> However since commit 322ec7c89f6 the default behavior has been changed
> to reject accesses to not explicitly handled MSRs, preventing PV
> guests that manage CPU frequency from reading
> MSR_IA32_PERF_{STATUS/CTL}.
>
> Additionally some HVM guests (Windows at least) will attempt to read
> MSR_IA32_PERF_CTL and will panic if given back a #GP fault:
>
> vmx.c:3035:d8v0 RDMSR 0x00000199 unimplemented
> d8v0 VIRIDIAN CRASH: 3b c0000096 fffff806871c1651 ffffda0253683720 0
>
> Move the handling of MSR_IA32_PERF_{STATUS/CTL} to the common MSR
> handling shared between HVM and PV guests, and add an explicit case
> for reads to MSR_IA32_PERF_{STATUS/CTL}.

OTOH, PERF_CTL does have a seemingly architectural "please disable turbo
for me" bit, which is supposed to be for calibration loops.  I wonder if
anyone uses this, and whether we ought to honour it (probably not).

> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index d8ed83f869..41baa3b7a1 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -1069,6 +1069,12 @@ extern enum cpufreq_controller {
>      FREQCTL_none, FREQCTL_dom0_kernel, FREQCTL_xen
>  } cpufreq_controller;
>  
> +static inline bool is_cpufreq_controller(const struct domain *d)
> +{
> +    return ((cpufreq_controller == FREQCTL_dom0_kernel) &&
> +            is_hardware_domain(d));

This won't compile on !CONFIG_X86, due to CONFIG_HAS_CPUFREQ

Honestly - I don't see any point to this code.  Its opt-in via the
command line only, and doesn't provide adequate checks for enablement. 
(It's not as if we're lacking complexity or moving parts when it comes
to power/frequency management).

~Andrew

> +}
> +
>  int cpupool_move_domain(struct domain *d, struct cpupool *c);
>  int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op);
>  int cpupool_get_id(const struct domain *d);


Re: [PATCH] x86/msr: fix handling of MSR_IA32_PERF_{STATUS/CTL}

Posted by Roger Pau Monné 3 weeks ago
On Wed, Oct 07, 2020 at 01:06:08PM +0100, Andrew Cooper wrote:
> On 06/10/2020 17:23, Roger Pau Monne wrote:
> > Currently a PV hardware domain can also be given control over the CPU
> > frequency, and such guest is allowed to write to MSR_IA32_PERF_CTL.
> 
> This might be how the current logic "works", but its straight up broken.
> 
> PERF_CTL is thread scope, so unless dom0 is identity pinned and has one
> vcpu for every pcpu, it cannot use the interface correctly.

Selecting cpufreq=dom0-kernel will force vCPU pinning. I'm not able
however to see anywhere that would force dom0 vCPUs == pCPUs.

> > However since commit 322ec7c89f6 the default behavior has been changed
> > to reject accesses to not explicitly handled MSRs, preventing PV
> > guests that manage CPU frequency from reading
> > MSR_IA32_PERF_{STATUS/CTL}.
> >
> > Additionally some HVM guests (Windows at least) will attempt to read
> > MSR_IA32_PERF_CTL and will panic if given back a #GP fault:
> >
> > vmx.c:3035:d8v0 RDMSR 0x00000199 unimplemented
> > d8v0 VIRIDIAN CRASH: 3b c0000096 fffff806871c1651 ffffda0253683720 0
> >
> > Move the handling of MSR_IA32_PERF_{STATUS/CTL} to the common MSR
> > handling shared between HVM and PV guests, and add an explicit case
> > for reads to MSR_IA32_PERF_{STATUS/CTL}.
> 
> OTOH, PERF_CTL does have a seemingly architectural "please disable turbo
> for me" bit, which is supposed to be for calibration loops.  I wonder if
> anyone uses this, and whether we ought to honour it (probably not).

If we let guests play with this we would have to save/restore the
guest value on context switch. Unless there's a strong case for this,
I would say no.

> > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> > index d8ed83f869..41baa3b7a1 100644
> > --- a/xen/include/xen/sched.h
> > +++ b/xen/include/xen/sched.h
> > @@ -1069,6 +1069,12 @@ extern enum cpufreq_controller {
> >      FREQCTL_none, FREQCTL_dom0_kernel, FREQCTL_xen
> >  } cpufreq_controller;
> >  
> > +static inline bool is_cpufreq_controller(const struct domain *d)
> > +{
> > +    return ((cpufreq_controller == FREQCTL_dom0_kernel) &&
> > +            is_hardware_domain(d));
> 
> This won't compile on !CONFIG_X86, due to CONFIG_HAS_CPUFREQ

It does seem to build on Arm, because this is only used in x86 code:

https://gitlab.com/xen-project/people/royger/xen/-/jobs/778207412

The extern declaration of cpufreq_controller is just above, so if you
tried to use is_cpufreq_controller on Arm you would get a link time
error, otherwise it builds fine. The compiler removes the function on
Arm as it has the inline attribute and it's not used.

Alternatively I could look into moving cpufreq_controller (and
is_cpufreq_controller) out of sched.h into somewhere else, I haven't
looked at why it needs to live there.

> Honestly - I don't see any point to this code.  Its opt-in via the
> command line only, and doesn't provide adequate checks for enablement. 
> (It's not as if we're lacking complexity or moving parts when it comes
> to power/frequency management).

Right, I could do a pre-patch to remove this, but I also don't think
we should block this fix on removing FREQCTL_dom0_kernel, so I would
rather fix the regression and then remove the feature if we agree it
can be removed.

Thanks, Roger.

Re: [PATCH] x86/msr: fix handling of MSR_IA32_PERF_{STATUS/CTL}

Posted by Roger Pau Monné 2 weeks ago
On Wed, Oct 07, 2020 at 06:41:17PM +0200, Roger Pau Monné wrote:
> On Wed, Oct 07, 2020 at 01:06:08PM +0100, Andrew Cooper wrote:
> > On 06/10/2020 17:23, Roger Pau Monne wrote:
> > > Currently a PV hardware domain can also be given control over the CPU
> > > frequency, and such guest is allowed to write to MSR_IA32_PERF_CTL.
> > 
> > This might be how the current logic "works", but its straight up broken.
> > 
> > PERF_CTL is thread scope, so unless dom0 is identity pinned and has one
> > vcpu for every pcpu, it cannot use the interface correctly.
> 
> Selecting cpufreq=dom0-kernel will force vCPU pinning. I'm not able
> however to see anywhere that would force dom0 vCPUs == pCPUs.
> 
> > > However since commit 322ec7c89f6 the default behavior has been changed
> > > to reject accesses to not explicitly handled MSRs, preventing PV
> > > guests that manage CPU frequency from reading
> > > MSR_IA32_PERF_{STATUS/CTL}.
> > >
> > > Additionally some HVM guests (Windows at least) will attempt to read
> > > MSR_IA32_PERF_CTL and will panic if given back a #GP fault:
> > >
> > > vmx.c:3035:d8v0 RDMSR 0x00000199 unimplemented
> > > d8v0 VIRIDIAN CRASH: 3b c0000096 fffff806871c1651 ffffda0253683720 0
> > >
> > > Move the handling of MSR_IA32_PERF_{STATUS/CTL} to the common MSR
> > > handling shared between HVM and PV guests, and add an explicit case
> > > for reads to MSR_IA32_PERF_{STATUS/CTL}.
> > 
> > OTOH, PERF_CTL does have a seemingly architectural "please disable turbo
> > for me" bit, which is supposed to be for calibration loops.  I wonder if
> > anyone uses this, and whether we ought to honour it (probably not).
> 
> If we let guests play with this we would have to save/restore the
> guest value on context switch. Unless there's a strong case for this,
> I would say no.
> 
> > > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> > > index d8ed83f869..41baa3b7a1 100644
> > > --- a/xen/include/xen/sched.h
> > > +++ b/xen/include/xen/sched.h
> > > @@ -1069,6 +1069,12 @@ extern enum cpufreq_controller {
> > >      FREQCTL_none, FREQCTL_dom0_kernel, FREQCTL_xen
> > >  } cpufreq_controller;
> > >  
> > > +static inline bool is_cpufreq_controller(const struct domain *d)
> > > +{
> > > +    return ((cpufreq_controller == FREQCTL_dom0_kernel) &&
> > > +            is_hardware_domain(d));
> > 
> > This won't compile on !CONFIG_X86, due to CONFIG_HAS_CPUFREQ
> 
> It does seem to build on Arm, because this is only used in x86 code:
> 
> https://gitlab.com/xen-project/people/royger/xen/-/jobs/778207412
> 
> The extern declaration of cpufreq_controller is just above, so if you
> tried to use is_cpufreq_controller on Arm you would get a link time
> error, otherwise it builds fine. The compiler removes the function on
> Arm as it has the inline attribute and it's not used.
> 
> Alternatively I could look into moving cpufreq_controller (and
> is_cpufreq_controller) out of sched.h into somewhere else, I haven't
> looked at why it needs to live there.
> 
> > Honestly - I don't see any point to this code.  Its opt-in via the
> > command line only, and doesn't provide adequate checks for enablement. 
> > (It's not as if we're lacking complexity or moving parts when it comes
> > to power/frequency management).
> 
> Right, I could do a pre-patch to remove this, but I also don't think
> we should block this fix on removing FREQCTL_dom0_kernel, so I would
> rather fix the regression and then remove the feature if we agree it
> can be removed.

Can we get some consensus on what to do next?

I think I've provided replies to all the points above, and I'm not sure
what do to next in order to proceed with this patch.

Thanks, Roger.

Re: [PATCH] x86/msr: fix handling of MSR_IA32_PERF_{STATUS/CTL}

Posted by Jan Beulich 2 weeks ago
On 07.10.2020 18:41, Roger Pau Monné wrote:
> On Wed, Oct 07, 2020 at 01:06:08PM +0100, Andrew Cooper wrote:
>> On 06/10/2020 17:23, Roger Pau Monne wrote:
>>> Currently a PV hardware domain can also be given control over the CPU
>>> frequency, and such guest is allowed to write to MSR_IA32_PERF_CTL.
>>
>> This might be how the current logic "works", but its straight up broken.
>>
>> PERF_CTL is thread scope, so unless dom0 is identity pinned and has one
>> vcpu for every pcpu, it cannot use the interface correctly.
> 
> Selecting cpufreq=dom0-kernel will force vCPU pinning. I'm not able
> however to see anywhere that would force dom0 vCPUs == pCPUs.

Unless there are other overriding command line options, doesn't the
way sched_select_initial_cpu() works guarantee this?

>>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>>> index d8ed83f869..41baa3b7a1 100644
>>> --- a/xen/include/xen/sched.h
>>> +++ b/xen/include/xen/sched.h
>>> @@ -1069,6 +1069,12 @@ extern enum cpufreq_controller {
>>>      FREQCTL_none, FREQCTL_dom0_kernel, FREQCTL_xen
>>>  } cpufreq_controller;
>>>  
>>> +static inline bool is_cpufreq_controller(const struct domain *d)
>>> +{
>>> +    return ((cpufreq_controller == FREQCTL_dom0_kernel) &&
>>> +            is_hardware_domain(d));
>>
>> This won't compile on !CONFIG_X86, due to CONFIG_HAS_CPUFREQ
> 
> It does seem to build on Arm, because this is only used in x86 code:
> 
> https://gitlab.com/xen-project/people/royger/xen/-/jobs/778207412
> 
> The extern declaration of cpufreq_controller is just above, so if you
> tried to use is_cpufreq_controller on Arm you would get a link time
> error, otherwise it builds fine. The compiler removes the function on
> Arm as it has the inline attribute and it's not used.
> 
> Alternatively I could look into moving cpufreq_controller (and
> is_cpufreq_controller) out of sched.h into somewhere else, I haven't
> looked at why it needs to live there.
> 
>> Honestly - I don't see any point to this code.  Its opt-in via the
>> command line only, and doesn't provide adequate checks for enablement. 
>> (It's not as if we're lacking complexity or moving parts when it comes
>> to power/frequency management).
> 
> Right, I could do a pre-patch to remove this, but I also don't think
> we should block this fix on removing FREQCTL_dom0_kernel, so I would
> rather fix the regression and then remove the feature if we agree it
> can be removed.

I agree.

Jan