x86/hvm: Rationalise CS limit handling in arch_set_info_hvm_guest()

[PATCH] x86/hvm: Rationalise CS limit handling in arch_set_info_hvm_guest()

Posted by Andrew Cooper 5 months, 1 week ago

Ever since it's introduction in commit 192df6f9122d ("x86: allow HVM guests to
use hypercalls to bring up vCPUs"), %cs.g/limit has been handled differently
to all other segments.

The hypercall takes full 32bit, and hvm_set_segment_register() fixes up all
segments .g to match the limit being 2^20 or more.  Therefore, treating %cs
only as having architectural (20-bit) limit field is weird and unexpected.

Remove the custom handling for %cs.  This is a guest ABI change, but all
callers are expected to be setting up flat segmentation, at which point limit
will be ~0U and there will be no change in practice whether .g is set or not.

Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

Slightly RFC as this is an ABI change, but I don't anticipate any breakge from
this change.
---
 xen/arch/x86/hvm/domain.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/hvm/domain.c b/xen/arch/x86/hvm/domain.c
index 048f29ae4911..4e9aaca39fe6 100644
--- a/xen/arch/x86/hvm/domain.c
+++ b/xen/arch/x86/hvm/domain.c
@@ -120,7 +120,6 @@ int arch_set_info_hvm_guest(struct vcpu *v, const struct vcpu_hvm_context *ctx)
     case VCPU_HVM_MODE_32B:
     {
         const struct vcpu_hvm_x86_32 *regs = &ctx->cpu_regs.x86_32;
-        uint32_t limit;
 
         if ( ctx->cpu_regs.x86_32.pad1 != 0 ||
              ctx->cpu_regs.x86_32.pad2[0] != 0 ||
@@ -147,13 +146,10 @@ int arch_set_info_hvm_guest(struct vcpu *v, const struct vcpu_hvm_context *ctx)
             return rc;
 
         /* Basic sanity checks. */
-        limit = cs.limit;
-        if ( cs.g )
-            limit = (limit << 12) | 0xfff;
-        if ( regs->eip > limit )
+        if ( regs->eip > cs.limit )
         {
             gprintk(XENLOG_ERR, "EIP (%#08x) outside CS limit (%#08x)\n",
-                    regs->eip, limit);
+                    regs->eip, cs.limit);
             return -EINVAL;
         }
 

base-commit: 3999ff0d307a9a901ad1b5ad56e0dde657fec558
-- 
2.39.5

Re: [PATCH] x86/hvm: Rationalise CS limit handling in arch_set_info_hvm_guest()

Posted by Jan Beulich 5 months, 1 week ago

On 01.09.2025 20:08, Andrew Cooper wrote:
> Ever since it's introduction in commit 192df6f9122d ("x86: allow HVM guests to
> use hypercalls to bring up vCPUs"), %cs.g/limit has been handled differently
> to all other segments.
> 
> The hypercall takes full 32bit,

This is an implication from the implementation, but it's not said anywhere in
the public header. Without it saying that .g is ignored when .p is set (due
to ...

> and hvm_set_segment_register() fixes up all
> segments .g to match the limit being 2^20 or more.

... this internal behavior), I'd imply the opposite from what the public
header has right now. IOW I think the public header also needs touching in
the course of consolidating the code.

>  Therefore, treating %cs
> only as having architectural (20-bit) limit field is weird and unexpected.
> 
> Remove the custom handling for %cs.  This is a guest ABI change, but all
> callers are expected to be setting up flat segmentation, at which point limit
> will be ~0U and there will be no change in practice whether .g is set or not.

A legitimate input to achieve flat segmentation would be to pass in a CS
limit of 0xfffff and .g set. Just that people trying to do so would have
learned that this doesn't do what's intended. (This is just to further
emphasize that the public header commentary needs adjusting.)

That said, what hvm_set_segment_register() does isn't quite right for the
purpose here: If we assume the limit to be the full 32-bit value, then
when any of the upper 12 bits is set (meaning we would set .g) really
we'd need to reject values with the lower 12 bits not all ones. (That
could be a separate change to check_segment(), though.)

I'm further puzzled by comments in the header talking of starting a vCPU
in compatibility mode. How would that work sensibly? The guest can't
really set up any of the long mode control structures, unless confining
all of them into the low 4Gb (virtual and, for CR3, physical).

The TR setup for VCPU_HVM_MODE_64B also looks suspicious: What use is it
to set up a TSS at linear address 0, with a limit of 0x67? Wouldn't we
better make the limit as small as possible (0?), such that any implicit
access to it would fault? (I don't recall whether both SVM and VT-x
would permit an entirely invalid TR.) Furthermore to enter a guest in
compat mode, CR0.PG and CR4.PAE would also need to be set, whereas CS.L
would need to be clear.

Finally, why do we check both EFER.LME and EFER.LMA in the 32-bit case,
but only EFER.LME in the 64-bit one?

Jan