[PATCH] x86/hvm: Disallow CR0.PG 1->0 transitions when CS.L=1

Andrew Cooper posted 1 patch 1 year ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20230412183519.2996902-1-andrew.cooper3@citrix.com
There is a newer version of this series
xen/arch/x86/hvm/hvm.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
[PATCH] x86/hvm: Disallow CR0.PG 1->0 transitions when CS.L=1
Posted by Andrew Cooper 1 year ago
The Long Mode consistency checks exist to "ensure that the processor does not
enter an undefined mode or state that results in unpredictable behavior".  APM
Vol2 Table 14-5 "Long-Mode Consistency Checks" lists them, but there is no row
preventing the OS from trying to exit Long mode while in 64bit mode.  This
could leave the CPU in Protected Mode with an %rip above the 4G boundary.

Experimentally, AMD CPUs really do permit this state transition.  An OS which
tries it hits an instant SHUTDOWN, even in cases where the truncation I expect
to be going on behind the scenes ought to result in sane continued execution.

Furthermore, right from the very outset, the APM Vol2 14.7 "Leaving Long Mode"
section instructs peoples to switch to a compatibility mode segment first
before clearing CR0.PG, which does clear out the upper bits in %rip.  This is
further backed up by Vol2 Figure 1-6 "Operating Modes of the AMD64
Architecture".

Either way, this appears to have been a genuine oversight in the AMD64 spec.

Intel, on the other hand, rejects this state transition with #GP.

Between revision 71 (Nov 2019) and 72 (May 2020) of SDM Vol3, a footnote to
4.1.2 "Paging-Mode Enable" was altered from:

  If CR4.PCIDE= 1, an attempt to clear CR0.PG causes a general-protection
  exception (#GP); software should clear CR4.PCIDE before attempting to
  disable paging.

to

  If the logical processor is in 64-bit mode or if CR4.PCIDE= 1, an attempt to
  clear CR0.PG causes a general-protection exception (#GP). Software should
  transition to compatibility mode and clear CR4.PCIDE before attempting to
  disable paging.

which acknowledges this corner case, but there doesn't appear to be any other
discussion even in the relevant Long Mode sections.

So it appears that Intel spotted and addressed the corner case in IA-32e mode,
but where 15 years late to document it.

Xen was written to the AMD spec, and misses the check.  Follow the Intel
behaviour, because it is more sensible and avoids hitting a VMEntry failure.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
---
 xen/arch/x86/hvm/hvm.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 1454c1732d95..3c8d28a2d8be 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2340,6 +2340,21 @@ int hvm_set_cr0(unsigned long value, bool may_defer)
     }
     else if ( !(value & X86_CR0_PG) && (old_value & X86_CR0_PG) )
     {
+        struct segment_register cs;
+
+        hvm_get_segment_register(v, x86_seg_cs, &cs);
+
+        /*
+         * Intel documents a #GP fault in this case, and VMEntry checks reject
+         * it as a valid state.  AMD permits the state transition, and hits
+         * SHUTDOWN immediately thereafter.  Follow the Intel behaviour.
+         */
+        if ( cs.l )
+        {
+            HVM_DBG_LOG(DBG_LEVEL_1, "Guest attempts to clear CR0.PG while CS.L=1");
+            return X86EMUL_EXCEPTION;
+        }
+
         if ( hvm_pcid_enabled(v) )
         {
             HVM_DBG_LOG(DBG_LEVEL_1, "Guest attempts to clear CR0.PG "
-- 
2.30.2


Re: [PATCH] x86/hvm: Disallow CR0.PG 1->0 transitions when CS.L=1
Posted by Andrew Cooper 1 year ago
On 12/04/2023 7:35 pm, Andrew Cooper wrote:
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 1454c1732d95..3c8d28a2d8be 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2340,6 +2340,21 @@ int hvm_set_cr0(unsigned long value, bool may_defer)
>      }
>      else if ( !(value & X86_CR0_PG) && (old_value & X86_CR0_PG) )
>      {
> +        struct segment_register cs;
> +
> +        hvm_get_segment_register(v, x86_seg_cs, &cs);
> +
> +        /*
> +         * Intel documents a #GP fault in this case, and VMEntry checks reject
> +         * it as a valid state.  AMD permits the state transition, and hits
> +         * SHUTDOWN immediately thereafter.  Follow the Intel behaviour.
> +         */
> +        if ( cs.l )

It occurs to me that this needs to be qualified with LME first, because
cs.l is software-available outside of long mode.

~Andrew

> +        {
> +            HVM_DBG_LOG(DBG_LEVEL_1, "Guest attempts to clear CR0.PG while CS.L=1");
> +            return X86EMUL_EXCEPTION;
> +        }
> +
>          if ( hvm_pcid_enabled(v) )
>          {
>              HVM_DBG_LOG(DBG_LEVEL_1, "Guest attempts to clear CR0.PG "
[PATCH v2] x86/hvm: Disallow CR0.PG 1->0 transitions when CS.L=1
Posted by Andrew Cooper 1 year ago
The Long Mode consistency checks exist to "ensure that the processor does not
enter an undefined mode or state that results in unpredictable behavior".  APM
Vol2 Table 14-5 "Long-Mode Consistency Checks" lists them, but there is no row
preventing the OS from trying to exit Long mode while in 64bit mode.  This
could leave the CPU in Protected Mode with an %rip above the 4G boundary.

Experimentally, AMD CPUs really do permit this state transition.  An OS which
tries it hits an instant SHUTDOWN, even in cases where the truncation I expect
to be going on behind the scenes ought to result in sane continued execution.

Furthermore, right from the very outset, the APM Vol2 14.7 "Leaving Long Mode"
section instructs peoples to switch to a compatibility mode segment first
before clearing CR0.PG, which does clear out the upper bits in %rip.  This is
further backed up by Vol2 Figure 1-6 "Operating Modes of the AMD64
Architecture".

Either way, this appears to have been a genuine oversight in the AMD64 spec.

Intel, on the other hand, rejects this state transition with #GP.

Between revision 71 (Nov 2019) and 72 (May 2020) of SDM Vol3, a footnote to
4.1.2 "Paging-Mode Enable" was altered from:

  If CR4.PCIDE= 1, an attempt to clear CR0.PG causes a general-protection
  exception (#GP); software should clear CR4.PCIDE before attempting to
  disable paging.

to

  If the logical processor is in 64-bit mode or if CR4.PCIDE= 1, an attempt to
  clear CR0.PG causes a general-protection exception (#GP). Software should
  transition to compatibility mode and clear CR4.PCIDE before attempting to
  disable paging.

which acknowledges this corner case, but there doesn't appear to be any other
discussion even in the relevant Long Mode sections.

So it appears that Intel spotted and addressed the corner case in IA-32e mode,
but were 15 years late to document it.

Xen was written to the AMD spec, and misses the check.  Follow the Intel
behaviour, because it is more sensible and avoids hitting a VMEntry failure.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>

v2:
 * Restrict to when Long Mode is enabled.
---
 xen/arch/x86/hvm/hvm.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 1454c1732d95..c63c7d4d6bcf 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2340,6 +2340,21 @@ int hvm_set_cr0(unsigned long value, bool may_defer)
     }
     else if ( !(value & X86_CR0_PG) && (old_value & X86_CR0_PG) )
     {
+        struct segment_register cs;
+
+        hvm_get_segment_register(v, x86_seg_cs, &cs);
+
+        /*
+         * Intel documents a #GP fault in this case, and VMEntry checks reject
+         * it as a valid state.  AMD permits the state transition, and hits
+         * SHUTDOWN immediately thereafter.  Follow the Intel behaviour.
+         */
+        if ( (v->arch.hvm.guest_efer & EFER_LME) && cs.l )
+        {
+            HVM_DBG_LOG(DBG_LEVEL_1, "Guest attempts to clear CR0.PG while CS.L=1");
+            return X86EMUL_EXCEPTION;
+        }
+
         if ( hvm_pcid_enabled(v) )
         {
             HVM_DBG_LOG(DBG_LEVEL_1, "Guest attempts to clear CR0.PG "
-- 
2.30.2


Re: [PATCH v2] x86/hvm: Disallow CR0.PG 1->0 transitions when CS.L=1
Posted by Jan Beulich 1 year ago
On 13.04.2023 17:00, Andrew Cooper wrote:
> The Long Mode consistency checks exist to "ensure that the processor does not
> enter an undefined mode or state that results in unpredictable behavior".  APM
> Vol2 Table 14-5 "Long-Mode Consistency Checks" lists them, but there is no row
> preventing the OS from trying to exit Long mode while in 64bit mode.  This
> could leave the CPU in Protected Mode with an %rip above the 4G boundary.
> 
> Experimentally, AMD CPUs really do permit this state transition.  An OS which
> tries it hits an instant SHUTDOWN, even in cases where the truncation I expect
> to be going on behind the scenes ought to result in sane continued execution.

For my own understanding, which truncation are you referring to here?
As you're in 1:1 mapped code, %rip can't really be meant. Clearly IDT
and GDT would need to be (re)loaded to point to 32-bit-style tables, so
the only thing left would seem to be %rsp. It's not clear to me whether
after such an illegal mode switch its upper bits would be cleared or
ignored ...

Jan
Re: [PATCH v2] x86/hvm: Disallow CR0.PG 1->0 transitions when CS.L=1
Posted by Andrew Cooper 1 year ago
On 17/04/2023 9:05 am, Jan Beulich wrote:
> On 13.04.2023 17:00, Andrew Cooper wrote:
>> The Long Mode consistency checks exist to "ensure that the processor does not
>> enter an undefined mode or state that results in unpredictable behavior".  APM
>> Vol2 Table 14-5 "Long-Mode Consistency Checks" lists them, but there is no row
>> preventing the OS from trying to exit Long mode while in 64bit mode.  This
>> could leave the CPU in Protected Mode with an %rip above the 4G boundary.
>>
>> Experimentally, AMD CPUs really do permit this state transition.  An OS which
>> tries it hits an instant SHUTDOWN, even in cases where the truncation I expect
>> to be going on behind the scenes ought to result in sane continued execution.
> For my own understanding, which truncation are you referring to here?
> As you're in 1:1 mapped code, %rip can't really be meant. Clearly IDT
> and GDT would need to be (re)loaded to point to 32-bit-style tables, so
> the only thing left would seem to be %rsp. It's not clear to me whether
> after such an illegal mode switch its upper bits would be cleared or
> ignored ...

Outside of 64bit mode, all address generation is truncated to 32 bits.

So when %rip happens to be above 2^32, the fetch of the next instruction
ought to be from a truncated %eip, but my attempts to set up such an
experiment still crashed.

I didn't spend too long investigating.  I've got too many other things
to do.

~Andrew

Re: [PATCH v2] x86/hvm: Disallow CR0.PG 1->0 transitions when CS.L=1
Posted by Roger Pau Monné 1 year ago
On Thu, Apr 13, 2023 at 04:00:09PM +0100, Andrew Cooper wrote:
> The Long Mode consistency checks exist to "ensure that the processor does not
> enter an undefined mode or state that results in unpredictable behavior".  APM
> Vol2 Table 14-5 "Long-Mode Consistency Checks" lists them, but there is no row
> preventing the OS from trying to exit Long mode while in 64bit mode.  This
> could leave the CPU in Protected Mode with an %rip above the 4G boundary.
> 
> Experimentally, AMD CPUs really do permit this state transition.  An OS which
> tries it hits an instant SHUTDOWN, even in cases where the truncation I expect
> to be going on behind the scenes ought to result in sane continued execution.
> 
> Furthermore, right from the very outset, the APM Vol2 14.7 "Leaving Long Mode"
> section instructs peoples to switch to a compatibility mode segment first
> before clearing CR0.PG, which does clear out the upper bits in %rip.  This is
> further backed up by Vol2 Figure 1-6 "Operating Modes of the AMD64
> Architecture".
> 
> Either way, this appears to have been a genuine oversight in the AMD64 spec.
> 
> Intel, on the other hand, rejects this state transition with #GP.
> 
> Between revision 71 (Nov 2019) and 72 (May 2020) of SDM Vol3, a footnote to
> 4.1.2 "Paging-Mode Enable" was altered from:
> 
>   If CR4.PCIDE= 1, an attempt to clear CR0.PG causes a general-protection
>   exception (#GP); software should clear CR4.PCIDE before attempting to
>   disable paging.
> 
> to
> 
>   If the logical processor is in 64-bit mode or if CR4.PCIDE= 1, an attempt to
>   clear CR0.PG causes a general-protection exception (#GP). Software should
>   transition to compatibility mode and clear CR4.PCIDE before attempting to
>   disable paging.
> 
> which acknowledges this corner case, but there doesn't appear to be any other
> discussion even in the relevant Long Mode sections.
> 
> So it appears that Intel spotted and addressed the corner case in IA-32e mode,
> but were 15 years late to document it.
> 
> Xen was written to the AMD spec, and misses the check.  Follow the Intel
> behaviour, because it is more sensible and avoids hitting a VMEntry failure.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wl@xen.org>
> 
> v2:
>  * Restrict to when Long Mode is enabled.

Maybe the subject also needs to be slightly edited to mention CS.L=1
and Long Mode enabled?  Or just mention long mode instead of the code
segment status?

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

Re: [PATCH v2] x86/hvm: Disallow CR0.PG 1->0 transitions when CS.L=1
Posted by Andrew Cooper 1 year ago
On 14/04/2023 11:31 am, Roger Pau Monné wrote:
> On Thu, Apr 13, 2023 at 04:00:09PM +0100, Andrew Cooper wrote:
>> The Long Mode consistency checks exist to "ensure that the processor does not
>> enter an undefined mode or state that results in unpredictable behavior".  APM
>> Vol2 Table 14-5 "Long-Mode Consistency Checks" lists them, but there is no row
>> preventing the OS from trying to exit Long mode while in 64bit mode.  This
>> could leave the CPU in Protected Mode with an %rip above the 4G boundary.
>>
>> Experimentally, AMD CPUs really do permit this state transition.  An OS which
>> tries it hits an instant SHUTDOWN, even in cases where the truncation I expect
>> to be going on behind the scenes ought to result in sane continued execution.
>>
>> Furthermore, right from the very outset, the APM Vol2 14.7 "Leaving Long Mode"
>> section instructs peoples to switch to a compatibility mode segment first
>> before clearing CR0.PG, which does clear out the upper bits in %rip.  This is
>> further backed up by Vol2 Figure 1-6 "Operating Modes of the AMD64
>> Architecture".
>>
>> Either way, this appears to have been a genuine oversight in the AMD64 spec.
>>
>> Intel, on the other hand, rejects this state transition with #GP.
>>
>> Between revision 71 (Nov 2019) and 72 (May 2020) of SDM Vol3, a footnote to
>> 4.1.2 "Paging-Mode Enable" was altered from:
>>
>>   If CR4.PCIDE= 1, an attempt to clear CR0.PG causes a general-protection
>>   exception (#GP); software should clear CR4.PCIDE before attempting to
>>   disable paging.
>>
>> to
>>
>>   If the logical processor is in 64-bit mode or if CR4.PCIDE= 1, an attempt to
>>   clear CR0.PG causes a general-protection exception (#GP). Software should
>>   transition to compatibility mode and clear CR4.PCIDE before attempting to
>>   disable paging.
>>
>> which acknowledges this corner case, but there doesn't appear to be any other
>> discussion even in the relevant Long Mode sections.
>>
>> So it appears that Intel spotted and addressed the corner case in IA-32e mode,
>> but were 15 years late to document it.
>>
>> Xen was written to the AMD spec, and misses the check.  Follow the Intel
>> behaviour, because it is more sensible and avoids hitting a VMEntry failure.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> CC: Wei Liu <wl@xen.org>
>>
>> v2:
>>  * Restrict to when Long Mode is enabled.
> Maybe the subject also needs to be slightly edited to mention CS.L=1
> and Long Mode enabled?  Or just mention long mode instead of the code
> segment status?
>
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.  I think "Disallow CR0.PG 1->0 transitions in 64bit mode" is the
most concise way of tweaking the subject.

~Andrew