[PATCH] x86/amd: do not expose HWCR.TscFreqSel to guests

Roger Pau Monne posted 1 patch 7 months, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20230912162305.34339-1-roger.pau@citrix.com
There is a newer version of this series
xen/arch/x86/msr.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
[PATCH] x86/amd: do not expose HWCR.TscFreqSel to guests
Posted by Roger Pau Monne 7 months, 2 weeks ago
OpenBSD will attempt to unconditionally access PSTATE0 if HWCR.TscFreqSel is
set, and will also attempt to unconditionally access HWCR if the TSC is
reported as Invariant.

The reasoning for exposing HWCR.TscFreqSel was to avoid Linux from printing a
(bogus) warning message, but doing so at the cost of OpenBSD not booting is not
a suitable solution.

In order to fix expose an empty HWCR.

Fixes: 14b95b3b8546 ('x86/AMD: expose HWCR.TscFreqSel to guests')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Not sure whether we want to expose something when is_cpufreq_controller() is
true, seeing as there's a special wrmsr handler for the same MSR in that case.
Likely should be done for PV only, but also likely quite bogus.

Missing reported by as the issue came from the QubesOS tracker.
---
 xen/arch/x86/msr.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 3f0450259cdf..964d500ff8a1 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -240,8 +240,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val)
     case MSR_K8_HWCR:
         if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
             goto gp_fault;
-        *val = get_cpu_family(cp->basic.raw_fms, NULL, NULL) >= 0x10
-               ? K8_HWCR_TSC_FREQ_SEL : 0;
+        /*
+         * OpenBSD 7.3 accesses HWCR unconditionally if the TSC is reported as
+         * Invariant.  Do not set TSC_FREQ_SEL as that would trigger OpenBSD to
+         * also poke at PSTATE0.
+         */
+        *val = 0;
         break;
 
     case MSR_VIRT_SPEC_CTRL:
-- 
2.42.0


Re: [PATCH] x86/amd: do not expose HWCR.TscFreqSel to guests
Posted by Andrew Cooper 7 months, 2 weeks ago
On 12/09/2023 5:23 pm, Roger Pau Monne wrote:
> OpenBSD will attempt to unconditionally access PSTATE0 if HWCR.TscFreqSel is
> set, and will also attempt to unconditionally access HWCR if the TSC is
> reported as Invariant.
>
> The reasoning for exposing HWCR.TscFreqSel was to avoid Linux from printing a
> (bogus) warning message, but doing so at the cost of OpenBSD not booting is not
> a suitable solution.
>
> In order to fix expose an empty HWCR.

At first I was thinking a straight up revert, but AMD's CPUID Faulting
is an architectural bit in here so it's worth keeping the register around.

>
> Fixes: 14b95b3b8546 ('x86/AMD: expose HWCR.TscFreqSel to guests')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Not sure whether we want to expose something when is_cpufreq_controller() is
> true, seeing as there's a special wrmsr handler for the same MSR in that case.
> Likely should be done for PV only, but also likely quite bogus.
>
> Missing reported by as the issue came from the QubesOS tracker.

Well - we can at least have a:

Link: https://github.com/QubesOS/qubes-issues/issues/8502

in the commit message, and it's probably worth asking Solène / Marek
(both CC'd) if they want a Reported-by tag.

> ---
>  xen/arch/x86/msr.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
> index 3f0450259cdf..964d500ff8a1 100644
> --- a/xen/arch/x86/msr.c
> +++ b/xen/arch/x86/msr.c
> @@ -240,8 +240,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val)
>      case MSR_K8_HWCR:
>          if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
>              goto gp_fault;
> -        *val = get_cpu_family(cp->basic.raw_fms, NULL, NULL) >= 0x10
> -               ? K8_HWCR_TSC_FREQ_SEL : 0;
> +        /*
> +         * OpenBSD 7.3 accesses HWCR unconditionally if the TSC is reported as
> +         * Invariant.  Do not set TSC_FREQ_SEL as that would trigger OpenBSD to
> +         * also poke at PSTATE0.
> +         */

While this is true, the justification for removing this is because
TSC_FREQ_SEL is a model specific bit, not an architectural bit in HWCR.

Also because it's addition without writing into the migration stream was
bogus irrespective of the specifics of the bit.

I'm still of the opinion that it's buggy for OpenBSD to be looking at
model specific bits when virtualised, but given my latest reading of the
AMD manuals, I think OpenBSD *is* well behaved looking at PSTATE0 if it
can see TSC_FREQ_SEL.

In some theoretical future where the toolstack better understands MSRs
and (non)migratable VMs (which is the QubesOS usecase), then it would in
principle be fine to construct a VM which can see the host TSC_FREQ_SEL
and PSTATE* values.

Preferably with an adjusted comment, Reviewed-by: Andrew Cooper
<andrew.cooper3@citrix.com>

~Andrew

Re: [PATCH] x86/amd: do not expose HWCR.TscFreqSel to guests
Posted by Andrew Cooper 7 months, 2 weeks ago
On 12/09/2023 5:35 pm, Andrew Cooper wrote:
> On 12/09/2023 5:23 pm, Roger Pau Monne wrote:
>> OpenBSD will attempt to unconditionally access PSTATE0 if HWCR.TscFreqSel is
>> set, and will also attempt to unconditionally access HWCR if the TSC is
>> reported as Invariant.
>>
>> The reasoning for exposing HWCR.TscFreqSel was to avoid Linux from printing a
>> (bogus) warning message, but doing so at the cost of OpenBSD not booting is not
>> a suitable solution.
>>
>> In order to fix expose an empty HWCR.
> At first I was thinking a straight up revert, but AMD's CPUID Faulting
> is an architectural bit in here so it's worth keeping the register around.
>
>> Fixes: 14b95b3b8546 ('x86/AMD: expose HWCR.TscFreqSel to guests')
>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>> ---
>> Not sure whether we want to expose something when is_cpufreq_controller() is
>> true, seeing as there's a special wrmsr handler for the same MSR in that case.
>> Likely should be done for PV only, but also likely quite bogus.
>>
>> Missing reported by as the issue came from the QubesOS tracker.
> Well - we can at least have a:
>
> Link: https://github.com/QubesOS/qubes-issues/issues/8502
>
> in the commit message, and it's probably worth asking Solène / Marek
> (both CC'd) if they want a Reported-by tag.
>
>> ---
>>  xen/arch/x86/msr.c | 8 ++++++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
>> index 3f0450259cdf..964d500ff8a1 100644
>> --- a/xen/arch/x86/msr.c
>> +++ b/xen/arch/x86/msr.c
>> @@ -240,8 +240,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val)
>>      case MSR_K8_HWCR:
>>          if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
>>              goto gp_fault;
>> -        *val = get_cpu_family(cp->basic.raw_fms, NULL, NULL) >= 0x10
>> -               ? K8_HWCR_TSC_FREQ_SEL : 0;
>> +        /*
>> +         * OpenBSD 7.3 accesses HWCR unconditionally if the TSC is reported as
>> +         * Invariant.  Do not set TSC_FREQ_SEL as that would trigger OpenBSD to
>> +         * also poke at PSTATE0.
>> +         */
> While this is true, the justification for removing this is because
> TSC_FREQ_SEL is a model specific bit, not an architectural bit in HWCR.
>
> Also because it's addition without writing into the migration stream was
> bogus irrespective of the specifics of the bit.
>
> I'm still of the opinion that it's buggy for OpenBSD to be looking at
> model specific bits when virtualised, but given my latest reading of the
> AMD manuals, I think OpenBSD *is* well behaved looking at PSTATE0 if it
> can see TSC_FREQ_SEL.
>
> In some theoretical future where the toolstack better understands MSRs
> and (non)migratable VMs (which is the QubesOS usecase), then it would in
> principle be fine to construct a VM which can see the host TSC_FREQ_SEL
> and PSTATE* values.
>
> Preferably with an adjusted comment, Reviewed-by: Andrew Cooper
> <andrew.cooper3@citrix.com>

Sorry - I meant to be clearer here.  I'd suggest just deleting the
comment and leaving an unconditional return of 0 (which will become
conditional when we wire up CPUID Faulting).

MSR_HWCR *is* an architectural MSR on any 64bit AMD system, so shouldn't
fault.

~Andrew

Re: [PATCH] x86/amd: do not expose HWCR.TscFreqSel to guests
Posted by Roger Pau Monné 7 months, 2 weeks ago
On Tue, Sep 12, 2023 at 05:36:53PM +0100, Andrew Cooper wrote:
> On 12/09/2023 5:35 pm, Andrew Cooper wrote:
> > On 12/09/2023 5:23 pm, Roger Pau Monne wrote:
> >> OpenBSD will attempt to unconditionally access PSTATE0 if HWCR.TscFreqSel is
> >> set, and will also attempt to unconditionally access HWCR if the TSC is
> >> reported as Invariant.
> >>
> >> The reasoning for exposing HWCR.TscFreqSel was to avoid Linux from printing a
> >> (bogus) warning message, but doing so at the cost of OpenBSD not booting is not
> >> a suitable solution.
> >>
> >> In order to fix expose an empty HWCR.
> > At first I was thinking a straight up revert, but AMD's CPUID Faulting
> > is an architectural bit in here so it's worth keeping the register around.
> >
> >> Fixes: 14b95b3b8546 ('x86/AMD: expose HWCR.TscFreqSel to guests')
> >> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >> ---
> >> Not sure whether we want to expose something when is_cpufreq_controller() is
> >> true, seeing as there's a special wrmsr handler for the same MSR in that case.
> >> Likely should be done for PV only, but also likely quite bogus.
> >>
> >> Missing reported by as the issue came from the QubesOS tracker.
> > Well - we can at least have a:
> >
> > Link: https://github.com/QubesOS/qubes-issues/issues/8502
> >
> > in the commit message, and it's probably worth asking Solène / Marek
> > (both CC'd) if they want a Reported-by tag.
> >
> >> ---
> >>  xen/arch/x86/msr.c | 8 ++++++--
> >>  1 file changed, 6 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
> >> index 3f0450259cdf..964d500ff8a1 100644
> >> --- a/xen/arch/x86/msr.c
> >> +++ b/xen/arch/x86/msr.c
> >> @@ -240,8 +240,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val)
> >>      case MSR_K8_HWCR:
> >>          if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
> >>              goto gp_fault;
> >> -        *val = get_cpu_family(cp->basic.raw_fms, NULL, NULL) >= 0x10
> >> -               ? K8_HWCR_TSC_FREQ_SEL : 0;
> >> +        /*
> >> +         * OpenBSD 7.3 accesses HWCR unconditionally if the TSC is reported as
> >> +         * Invariant.  Do not set TSC_FREQ_SEL as that would trigger OpenBSD to
> >> +         * also poke at PSTATE0.
> >> +         */
> > While this is true, the justification for removing this is because
> > TSC_FREQ_SEL is a model specific bit, not an architectural bit in HWCR.
> >
> > Also because it's addition without writing into the migration stream was
> > bogus irrespective of the specifics of the bit.
> >
> > I'm still of the opinion that it's buggy for OpenBSD to be looking at
> > model specific bits when virtualised, but given my latest reading of the
> > AMD manuals, I think OpenBSD *is* well behaved looking at PSTATE0 if it
> > can see TSC_FREQ_SEL.
> >
> > In some theoretical future where the toolstack better understands MSRs
> > and (non)migratable VMs (which is the QubesOS usecase), then it would in
> > principle be fine to construct a VM which can see the host TSC_FREQ_SEL
> > and PSTATE* values.
> >
> > Preferably with an adjusted comment, Reviewed-by: Andrew Cooper
> > <andrew.cooper3@citrix.com>
> 
> Sorry - I meant to be clearer here.  I'd suggest just deleting the
> comment and leaving an unconditional return of 0 (which will become
> conditional when we wire up CPUID Faulting).
> 
> MSR_HWCR *is* an architectural MSR on any 64bit AMD system, so shouldn't
> fault.

Hm, I think it's worth to at least keep a note that if TSC_FREQ_SEL is
exposed PSTATE0 must also be exposed to prevent OpenBSD 7.3 from
panicking.

Thanks, Roger.

Re: [PATCH] x86/amd: do not expose HWCR.TscFreqSel to guests
Posted by Andrew Cooper 7 months, 2 weeks ago
On 13/09/2023 9:08 am, Roger Pau Monné wrote:
> On Tue, Sep 12, 2023 at 05:36:53PM +0100, Andrew Cooper wrote:
>> On 12/09/2023 5:35 pm, Andrew Cooper wrote:
>>> On 12/09/2023 5:23 pm, Roger Pau Monne wrote:
>>>> OpenBSD will attempt to unconditionally access PSTATE0 if HWCR.TscFreqSel is
>>>> set, and will also attempt to unconditionally access HWCR if the TSC is
>>>> reported as Invariant.
>>>>
>>>> The reasoning for exposing HWCR.TscFreqSel was to avoid Linux from printing a
>>>> (bogus) warning message, but doing so at the cost of OpenBSD not booting is not
>>>> a suitable solution.
>>>>
>>>> In order to fix expose an empty HWCR.
>>> At first I was thinking a straight up revert, but AMD's CPUID Faulting
>>> is an architectural bit in here so it's worth keeping the register around.
>>>
>>>> Fixes: 14b95b3b8546 ('x86/AMD: expose HWCR.TscFreqSel to guests')
>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>>> ---
>>>> Not sure whether we want to expose something when is_cpufreq_controller() is
>>>> true, seeing as there's a special wrmsr handler for the same MSR in that case.
>>>> Likely should be done for PV only, but also likely quite bogus.
>>>>
>>>> Missing reported by as the issue came from the QubesOS tracker.
>>> Well - we can at least have a:
>>>
>>> Link: https://github.com/QubesOS/qubes-issues/issues/8502
>>>
>>> in the commit message, and it's probably worth asking Solène / Marek
>>> (both CC'd) if they want a Reported-by tag.
>>>
>>>> ---
>>>>  xen/arch/x86/msr.c | 8 ++++++--
>>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
>>>> index 3f0450259cdf..964d500ff8a1 100644
>>>> --- a/xen/arch/x86/msr.c
>>>> +++ b/xen/arch/x86/msr.c
>>>> @@ -240,8 +240,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val)
>>>>      case MSR_K8_HWCR:
>>>>          if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
>>>>              goto gp_fault;
>>>> -        *val = get_cpu_family(cp->basic.raw_fms, NULL, NULL) >= 0x10
>>>> -               ? K8_HWCR_TSC_FREQ_SEL : 0;
>>>> +        /*
>>>> +         * OpenBSD 7.3 accesses HWCR unconditionally if the TSC is reported as
>>>> +         * Invariant.  Do not set TSC_FREQ_SEL as that would trigger OpenBSD to
>>>> +         * also poke at PSTATE0.
>>>> +         */
>>> While this is true, the justification for removing this is because
>>> TSC_FREQ_SEL is a model specific bit, not an architectural bit in HWCR.
>>>
>>> Also because it's addition without writing into the migration stream was
>>> bogus irrespective of the specifics of the bit.
>>>
>>> I'm still of the opinion that it's buggy for OpenBSD to be looking at
>>> model specific bits when virtualised, but given my latest reading of the
>>> AMD manuals, I think OpenBSD *is* well behaved looking at PSTATE0 if it
>>> can see TSC_FREQ_SEL.
>>>
>>> In some theoretical future where the toolstack better understands MSRs
>>> and (non)migratable VMs (which is the QubesOS usecase), then it would in
>>> principle be fine to construct a VM which can see the host TSC_FREQ_SEL
>>> and PSTATE* values.
>>>
>>> Preferably with an adjusted comment, Reviewed-by: Andrew Cooper
>>> <andrew.cooper3@citrix.com>
>> Sorry - I meant to be clearer here.  I'd suggest just deleting the
>> comment and leaving an unconditional return of 0 (which will become
>> conditional when we wire up CPUID Faulting).
>>
>> MSR_HWCR *is* an architectural MSR on any 64bit AMD system, so shouldn't
>> fault.
> Hm, I think it's worth to at least keep a note that if TSC_FREQ_SEL is
> exposed PSTATE0 must also be exposed to prevent OpenBSD 7.3 from
> panicking.

But there's nothing OpenBSD 7.3 specific about it.

Any software which sees this bit is permitted (expected even) to edit
the pstate registers.


You know why it's called frequency select?  Because firmware is supposed
to program the systemwide frequency/voltage to the user's request for
whether the system wants to run cooler, or be overclocked.


Putting OpenBSD 7.3 in the commit message is absolutely relevant to why
we're making this change now, but it's not relevant to why we have
concluded that TSC_FREQ_SEL is bogus to expose to guests.

~Andrew