[RFC PATCH] x86/xen: Fix PVH dom0 xen_hypercall detection

Jason Andryuk posted 1 patch 6 months, 3 weeks ago
Failed in applying to current master (apply log)
arch/x86/xen/xen-head.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[RFC PATCH] x86/xen: Fix PVH dom0 xen_hypercall detection
Posted by Jason Andryuk 6 months, 3 weeks ago
A Xen PVH dom0 on an AMD processor triple faults early in boot on
6.6.86.  CPU detection appears to fail, as the faulting instruction is
vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd().

Detection fails because __xen_hypercall_setfunc() returns the full
kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() -
e.g. 0xffffffff815b93f0.  But this is compared against the rip-relative
xen_hypercall_amd(%rip), which when running from identity mapping, is
only 0x015b93f0.

Replace the rip-relative address with just loading the actual address to
restore the proper comparision.

This only seems to affect PVH dom0 boot.  This is probably because the
XENMEM_memory_map hypercall is issued early on from the identity
mappings.  With a domU, the memory map is provided via hvm_start_info
and the hypercall is skipped.  The domU is probably running from the
kernel high mapping when it issues hypercalls.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
I think this sort of address mismatch would be addresed by
e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping")

That could be backported instead, but it depends on a fair number of
patches.

Not sure on how getting a patch just into 6.6 would work.  This patch
could go into upstream Linux though it's not strictly necessary when the
rip-relative address is a high address.
---
 arch/x86/xen/xen-head.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 059f343da76d..71a0eda2da60 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -117,7 +117,7 @@ SYM_FUNC_START(xen_hypercall_hvm)
 	pop %ebx
 	pop %eax
 #else
-	lea xen_hypercall_amd(%rip), %rcx
+	mov $xen_hypercall_amd, %rcx
 	cmp %rax, %rcx
 #ifdef CONFIG_FRAME_POINTER
 	pop %rax	/* Dummy pop. */
-- 
2.49.0
Re: [RFC PATCH] x86/xen: Fix PVH dom0 xen_hypercall detection
Posted by Alejandro Vallejo 6 months, 3 weeks ago
On Thu Apr 10, 2025 at 8:50 PM BST, Jason Andryuk wrote:
> A Xen PVH dom0 on an AMD processor triple faults early in boot on
> 6.6.86.  CPU detection appears to fail, as the faulting instruction is
> vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd().
>
> Detection fails because __xen_hypercall_setfunc() returns the full
> kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() -
> e.g. 0xffffffff815b93f0.  But this is compared against the rip-relative
> xen_hypercall_amd(%rip), which when running from identity mapping, is
> only 0x015b93f0.
>
> Replace the rip-relative address with just loading the actual address to
> restore the proper comparision.
>
> This only seems to affect PVH dom0 boot.  This is probably because the
> XENMEM_memory_map hypercall is issued early on from the identity
> mappings.  With a domU, the memory map is provided via hvm_start_info
> and the hypercall is skipped.  The domU is probably running from the
> kernel high mapping when it issues hypercalls.
>
> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> ---
> I think this sort of address mismatch would be addresed by
> e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping")
>
> That could be backported instead, but it depends on a fair number of
> patches.
>
> Not sure on how getting a patch just into 6.6 would work.  This patch
> could go into upstream Linux though it's not strictly necessary when the
> rip-relative address is a high address.
> ---
>  arch/x86/xen/xen-head.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
> index 059f343da76d..71a0eda2da60 100644
> --- a/arch/x86/xen/xen-head.S
> +++ b/arch/x86/xen/xen-head.S
> @@ -117,7 +117,7 @@ SYM_FUNC_START(xen_hypercall_hvm)
>  	pop %ebx
>  	pop %eax
>  #else
> -	lea xen_hypercall_amd(%rip), %rcx
> +	mov $xen_hypercall_amd, %rcx

(Now that this is known to be the fix upstream) This probably wants to
be plain lea without RIP-relative addressing, like the x86_32 branch
above?

>  	cmp %rax, %rcx
>  #ifdef CONFIG_FRAME_POINTER
>  	pop %rax	/* Dummy pop. */
Re: [RFC PATCH] x86/xen: Fix PVH dom0 xen_hypercall detection
Posted by Jan Beulich 6 months, 3 weeks ago
On 11.04.2025 14:46, Alejandro Vallejo wrote:
> On Thu Apr 10, 2025 at 8:50 PM BST, Jason Andryuk wrote:
>> A Xen PVH dom0 on an AMD processor triple faults early in boot on
>> 6.6.86.  CPU detection appears to fail, as the faulting instruction is
>> vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd().
>>
>> Detection fails because __xen_hypercall_setfunc() returns the full
>> kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() -
>> e.g. 0xffffffff815b93f0.  But this is compared against the rip-relative
>> xen_hypercall_amd(%rip), which when running from identity mapping, is
>> only 0x015b93f0.
>>
>> Replace the rip-relative address with just loading the actual address to
>> restore the proper comparision.
>>
>> This only seems to affect PVH dom0 boot.  This is probably because the
>> XENMEM_memory_map hypercall is issued early on from the identity
>> mappings.  With a domU, the memory map is provided via hvm_start_info
>> and the hypercall is skipped.  The domU is probably running from the
>> kernel high mapping when it issues hypercalls.
>>
>> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
>> ---
>> I think this sort of address mismatch would be addresed by
>> e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping")
>>
>> That could be backported instead, but it depends on a fair number of
>> patches.
>>
>> Not sure on how getting a patch just into 6.6 would work.  This patch
>> could go into upstream Linux though it's not strictly necessary when the
>> rip-relative address is a high address.
>> ---
>>  arch/x86/xen/xen-head.S | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
>> index 059f343da76d..71a0eda2da60 100644
>> --- a/arch/x86/xen/xen-head.S
>> +++ b/arch/x86/xen/xen-head.S
>> @@ -117,7 +117,7 @@ SYM_FUNC_START(xen_hypercall_hvm)
>>  	pop %ebx
>>  	pop %eax
>>  #else
>> -	lea xen_hypercall_amd(%rip), %rcx
>> +	mov $xen_hypercall_amd, %rcx
> 
> (Now that this is known to be the fix upstream) This probably wants to
> be plain lea without RIP-relative addressing, like the x86_32 branch
> above?

Why would you want to use LEA there? It's functionally identical, but the
MOV can be encoded without ModR/M byte.

Jan
Re: [RFC PATCH] x86/xen: Fix PVH dom0 xen_hypercall detection
Posted by Alejandro Vallejo 6 months, 3 weeks ago
On Fri Apr 11, 2025 at 2:08 PM BST, Jan Beulich wrote:
> On 11.04.2025 14:46, Alejandro Vallejo wrote:
>> On Thu Apr 10, 2025 at 8:50 PM BST, Jason Andryuk wrote:
>>> A Xen PVH dom0 on an AMD processor triple faults early in boot on
>>> 6.6.86.  CPU detection appears to fail, as the faulting instruction is
>>> vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd().
>>>
>>> Detection fails because __xen_hypercall_setfunc() returns the full
>>> kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() -
>>> e.g. 0xffffffff815b93f0.  But this is compared against the rip-relative
>>> xen_hypercall_amd(%rip), which when running from identity mapping, is
>>> only 0x015b93f0.
>>>
>>> Replace the rip-relative address with just loading the actual address to
>>> restore the proper comparision.
>>>
>>> This only seems to affect PVH dom0 boot.  This is probably because the
>>> XENMEM_memory_map hypercall is issued early on from the identity
>>> mappings.  With a domU, the memory map is provided via hvm_start_info
>>> and the hypercall is skipped.  The domU is probably running from the
>>> kernel high mapping when it issues hypercalls.
>>>
>>> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
>>> ---
>>> I think this sort of address mismatch would be addresed by
>>> e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping")
>>>
>>> That could be backported instead, but it depends on a fair number of
>>> patches.
>>>
>>> Not sure on how getting a patch just into 6.6 would work.  This patch
>>> could go into upstream Linux though it's not strictly necessary when the
>>> rip-relative address is a high address.
>>> ---
>>>  arch/x86/xen/xen-head.S | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
>>> index 059f343da76d..71a0eda2da60 100644
>>> --- a/arch/x86/xen/xen-head.S
>>> +++ b/arch/x86/xen/xen-head.S
>>> @@ -117,7 +117,7 @@ SYM_FUNC_START(xen_hypercall_hvm)
>>>  	pop %ebx
>>>  	pop %eax
>>>  #else
>>> -	lea xen_hypercall_amd(%rip), %rcx
>>> +	mov $xen_hypercall_amd, %rcx
>> 
>> (Now that this is known to be the fix upstream) This probably wants to
>> be plain lea without RIP-relative addressing, like the x86_32 branch
>> above?
>
> Why would you want to use LEA there? It's functionally identical, but the
> MOV can be encoded without ModR/M byte.
>
> Jan

It's not the using of a particular encoding that I meant, but not using
the same on both 32 and 64 bit paths. Surely whatever argument in favour
of either would hold for both 32 and 64 bits.

Cheers,
Alejandro
Re: [RFC PATCH] x86/xen: Fix PVH dom0 xen_hypercall detection
Posted by Andrew Cooper 6 months, 3 weeks ago
On 10/04/2025 8:50 pm, Jason Andryuk wrote:
> A Xen PVH dom0 on an AMD processor triple faults early in boot on
> 6.6.86.  CPU detection appears to fail, as the faulting instruction is
> vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd().
>
> Detection fails because __xen_hypercall_setfunc() returns the full
> kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() -
> e.g. 0xffffffff815b93f0.  But this is compared against the rip-relative
> xen_hypercall_amd(%rip), which when running from identity mapping, is
> only 0x015b93f0.
>
> Replace the rip-relative address with just loading the actual address to
> restore the proper comparision.
>
> This only seems to affect PVH dom0 boot.  This is probably because the
> XENMEM_memory_map hypercall is issued early on from the identity
> mappings.  With a domU, the memory map is provided via hvm_start_info
> and the hypercall is skipped.  The domU is probably running from the
> kernel high mapping when it issues hypercalls.
>
> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> ---
> I think this sort of address mismatch would be addresed by
> e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping")
>
> That could be backported instead, but it depends on a fair number of
> patches.

I've just spoken to Ard, and he thinks that it's standalone.  Should be
ok to backport as a fix.

> Not sure on how getting a patch just into 6.6 would work.  This patch
> could go into upstream Linux though it's not strictly necessary when the
> rip-relative address is a high address.

Do we know which other trees are broken?  I only found 6.6 because I was
messing around with other bits of CI that happen to use 6.6.

~Andrew

Re: [RFC PATCH] x86/xen: Fix PVH dom0 xen_hypercall detection
Posted by Ard Biesheuvel 6 months, 3 weeks ago
On Thu, 10 Apr 2025 at 23:49, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 10/04/2025 8:50 pm, Jason Andryuk wrote:
> > A Xen PVH dom0 on an AMD processor triple faults early in boot on
> > 6.6.86.  CPU detection appears to fail, as the faulting instruction is
> > vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd().
> >
> > Detection fails because __xen_hypercall_setfunc() returns the full
> > kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() -
> > e.g. 0xffffffff815b93f0.  But this is compared against the rip-relative
> > xen_hypercall_amd(%rip), which when running from identity mapping, is
> > only 0x015b93f0.
> >
> > Replace the rip-relative address with just loading the actual address to
> > restore the proper comparision.
> >
> > This only seems to affect PVH dom0 boot.  This is probably because the
> > XENMEM_memory_map hypercall is issued early on from the identity
> > mappings.  With a domU, the memory map is provided via hvm_start_info
> > and the hypercall is skipped.  The domU is probably running from the
> > kernel high mapping when it issues hypercalls.
> >
> > Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> > ---
> > I think this sort of address mismatch would be addresed by
> > e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping")
> >
> > That could be backported instead, but it depends on a fair number of
> > patches.
>
> I've just spoken to Ard, and he thinks that it's standalone.  Should be
> ok to backport as a fix.
>

I've tried building and booting 6.6.y with the patch applied - GS will
still be set to the 1:1 mapped address but that shouldn't matter,
given that it is only used for the stack canary, and we don't do
address comparisons on that afaik.

> > Not sure on how getting a patch just into 6.6 would work.  This patch
> > could go into upstream Linux though it's not strictly necessary when the
> > rip-relative address is a high address.
>
> Do we know which other trees are broken?  I only found 6.6 because I was
> messing around with other bits of CI that happen to use 6.6.
>

I'd assume all trees that had the hypercall page removal patch
backported to them will be broken in the same way.
Re: [RFC PATCH] x86/xen: Fix PVH dom0 xen_hypercall detection
Posted by Jason Andryuk 6 months, 3 weeks ago

On 2025-04-11 07:35, Ard Biesheuvel wrote:
> On Thu, 10 Apr 2025 at 23:49, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>
>> On 10/04/2025 8:50 pm, Jason Andryuk wrote:
>>> A Xen PVH dom0 on an AMD processor triple faults early in boot on
>>> 6.6.86.  CPU detection appears to fail, as the faulting instruction is
>>> vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd().
>>>
>>> Detection fails because __xen_hypercall_setfunc() returns the full
>>> kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() -
>>> e.g. 0xffffffff815b93f0.  But this is compared against the rip-relative
>>> xen_hypercall_amd(%rip), which when running from identity mapping, is
>>> only 0x015b93f0.
>>>
>>> Replace the rip-relative address with just loading the actual address to
>>> restore the proper comparision.
>>>
>>> This only seems to affect PVH dom0 boot.  This is probably because the
>>> XENMEM_memory_map hypercall is issued early on from the identity
>>> mappings.  With a domU, the memory map is provided via hvm_start_info
>>> and the hypercall is skipped.  The domU is probably running from the
>>> kernel high mapping when it issues hypercalls.
>>>
>>> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
>>> ---
>>> I think this sort of address mismatch would be addresed by
>>> e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping")
>>>
>>> That could be backported instead, but it depends on a fair number of
>>> patches.
>>
>> I've just spoken to Ard, and he thinks that it's standalone.  Should be
>> ok to backport as a fix.
>>
> 
> I've tried building and booting 6.6.y with the patch applied - GS will
> still be set to the 1:1 mapped address but that shouldn't matter,
> given that it is only used for the stack canary, and we don't do
> address comparisons on that afaik.

Yes, it seems to work - I tested with dom0 and it booted.  I removed the 
use of phys_base - the diff is included below.  Does that match what you 
did?

>>> Not sure on how getting a patch just into 6.6 would work.  This patch
>>> could go into upstream Linux though it's not strictly necessary when the
>>> rip-relative address is a high address.
>>
>> Do we know which other trees are broken?  I only found 6.6 because I was
>> messing around with other bits of CI that happen to use 6.6.
>>
> 
> I'd assume all trees that had the hypercall page removal patch
> backported to them will be broken in the same way.

Yes, I think so.  Looks like it went back to 5.10 but not to 5.4.

Ard, I can submit the stable request unless you want to.

Regards,
Jason

diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index c4365a05ab83..9bf4cc04f079 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -100,7 +100,11 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
         xor %edx, %edx
         wrmsr

-       call xen_prepare_pvh
+       /* Call xen_prepare_pvh() via the kernel virtual mapping */
+       leaq xen_prepare_pvh(%rip), %rax
+       addq $__START_KERNEL_map, %rax
+       ANNOTATE_RETPOLINE_SAFE
+       call *%rax

         /* startup_64 expects boot_params in %rsi. */
         mov $_pa(pvh_bootparams), %rsi
Re: [RFC PATCH] x86/xen: Fix PVH dom0 xen_hypercall detection
Posted by Ard Biesheuvel 6 months, 3 weeks ago
On Fri, 11 Apr 2025 at 16:28, Jason Andryuk <jason.andryuk@amd.com> wrote:
>
>
>
> On 2025-04-11 07:35, Ard Biesheuvel wrote:
> > On Thu, 10 Apr 2025 at 23:49, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >>
> >> On 10/04/2025 8:50 pm, Jason Andryuk wrote:
> >>> A Xen PVH dom0 on an AMD processor triple faults early in boot on
> >>> 6.6.86.  CPU detection appears to fail, as the faulting instruction is
> >>> vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd().
> >>>
> >>> Detection fails because __xen_hypercall_setfunc() returns the full
> >>> kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() -
> >>> e.g. 0xffffffff815b93f0.  But this is compared against the rip-relative
> >>> xen_hypercall_amd(%rip), which when running from identity mapping, is
> >>> only 0x015b93f0.
> >>>
> >>> Replace the rip-relative address with just loading the actual address to
> >>> restore the proper comparision.
> >>>
> >>> This only seems to affect PVH dom0 boot.  This is probably because the
> >>> XENMEM_memory_map hypercall is issued early on from the identity
> >>> mappings.  With a domU, the memory map is provided via hvm_start_info
> >>> and the hypercall is skipped.  The domU is probably running from the
> >>> kernel high mapping when it issues hypercalls.
> >>>
> >>> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> >>> ---
> >>> I think this sort of address mismatch would be addresed by
> >>> e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping")
> >>>
> >>> That could be backported instead, but it depends on a fair number of
> >>> patches.
> >>
> >> I've just spoken to Ard, and he thinks that it's standalone.  Should be
> >> ok to backport as a fix.
> >>
> >
> > I've tried building and booting 6.6.y with the patch applied - GS will
> > still be set to the 1:1 mapped address but that shouldn't matter,
> > given that it is only used for the stack canary, and we don't do
> > address comparisons on that afaik.
>
> Yes, it seems to work - I tested with dom0 and it booted.  I removed the
> use of phys_base - the diff is included below.  Does that match what you
> did?
>

The stable tree maintainers generally prefer the backports to be as
close to the originals as possible, and given that phys_base is
guaranteed to be 0x0, you might as well keep the subtraction.

> >>> Not sure on how getting a patch just into 6.6 would work.  This patch
> >>> could go into upstream Linux though it's not strictly necessary when the
> >>> rip-relative address is a high address.
> >>
> >> Do we know which other trees are broken?  I only found 6.6 because I was
> >> messing around with other bits of CI that happen to use 6.6.
> >>
> >
> > I'd assume all trees that had the hypercall page removal patch
> > backported to them will be broken in the same way.
>
> Yes, I think so.  Looks like it went back to 5.10 but not to 5.4.
>
> Ard, I can submit the stable request unless you want to.
>

Please go ahead.