.gitlab-ci.yml | 43 ++++++---
containerize | 95 +++++++++++++++++++
images/Makefile | 5 +-
...dockerfile => 3.18-arm64-build.dockerfile} | 22 ++---
images/alpine/x86_64-build.dockerfile | 7 +-
scripts/build-linux.sh | 54 +++++++++++
scripts/x86_64-kernel-linux.sh | 31 ------
7 files changed, 197 insertions(+), 60 deletions(-)
create mode 100755 containerize
copy images/alpine/{x86_64-build.dockerfile => 3.18-arm64-build.dockerfile} (55%)
create mode 100755 scripts/build-linux.sh
delete mode 100755 scripts/x86_64-kernel-linux.sh
Various bits of cleanup, and support for arm64 Linux builds.
Run using the new Linux 6.6.86 on (most) x86, and ARM64:
https://gitlab.com/xen-project/hardware/xen-staging/-/pipelines/1760667411
Still to go:
- Merge argo into the linux build (as it builds a module), strip devel artefacts
- Rootfs generation, both x86 and ARM64
The argo kernel module and userspace should be a CPIO fragment too, as it's
embedded into both dom0 and domU in the relevant test.
Switching from tar to cpio can happen when the artefact name changes; which
fixes the backwards comptibility concerns. In hindsight, domU shouldn't be
automatically embedded in dom0, as several tests further customise it; the
test job can adjust, then wrap the whole lot in a CPIO and append it to
dom0's.
Xen's main build jobs should either build with --prefix=/usr, or the common
rootfs wants to set up /usr/local/, because right now it's done by all jobs.
Andrew Cooper (7):
Port containerise
Fix container user setup
Clean up Gitlab yaml
Adjust Linux build script to work with other major versions
Factor our x86-isms in the linux build script
Infrastructure for arm64 linux builds
Linux 6.6.86 for x86 and arm64
Marek Marczykowski-Górecki (1):
Consistently use DOCKER_CMD in makefiles
.gitlab-ci.yml | 43 ++++++---
containerize | 95 +++++++++++++++++++
images/Makefile | 5 +-
...dockerfile => 3.18-arm64-build.dockerfile} | 22 ++---
images/alpine/x86_64-build.dockerfile | 7 +-
scripts/build-linux.sh | 54 +++++++++++
scripts/x86_64-kernel-linux.sh | 31 ------
7 files changed, 197 insertions(+), 60 deletions(-)
create mode 100755 containerize
copy images/alpine/{x86_64-build.dockerfile => 3.18-arm64-build.dockerfile} (55%)
create mode 100755 scripts/build-linux.sh
delete mode 100755 scripts/x86_64-kernel-linux.sh
--
2.39.5
On 09/04/2025 5:36 pm, Andrew Cooper wrote: > Various bits of cleanup, and support for arm64 Linux builds. > > Run using the new Linux 6.6.86 on (most) x86, and ARM64: > https://gitlab.com/xen-project/hardware/xen-staging/-/pipelines/1760667411 Lovely, Linux 6.6.86 is broken for x86 PVH. It triple faults very early on. Sample log: https://gitlab.com/xen-project/hardware/xen-staging/-/jobs/9673797450 I guess we'll have to stay on 6.6.56 for now. (Only affects the final patch.) ~Andrew
On 2025-04-09 13:01, Andrew Cooper wrote:
> On 09/04/2025 5:36 pm, Andrew Cooper wrote:
>> Various bits of cleanup, and support for arm64 Linux builds.
>>
>> Run using the new Linux 6.6.86 on (most) x86, and ARM64:
>> https://gitlab.com/xen-project/hardware/xen-staging/-/pipelines/1760667411
>
> Lovely, Linux 6.6.86 is broken for x86 PVH. It triple faults very early on.
>
> Sample log:
> https://gitlab.com/xen-project/hardware/xen-staging/-/jobs/9673797450
>
> I guess we'll have to stay on 6.6.56 for now. (Only affects the final
> patch.)
This is an AMD system:
(XEN) [ 2.577549] d0v0 Triple fault - invoking HVM shutdown action 1
(XEN) [ 2.577557] RIP: 0008:[<0000000001f851d4>]
The instruction:
ffffffff81f851d4: 0f 01 c1 vmcall
vmcall is the Intel instruction, and vmmcall is the AMD one, so CPU
detection is malfunctioning.
(Early PVH is running identity mapped, so it's offset from ffffffff80000000)
There are no debug symbols in the vmlinux I extracted from the bzImage
from gitlab, but I can repro locally with on 6.6.86. It's unclear to me
why it's failing.
Trying:
diff --git i/arch/x86/xen/enlighten.c w/arch/x86/xen/enlighten.c
index 0219f1c90202..fb4ad7fe3e34 100644
--- i/arch/x86/xen/enlighten.c
+++ w/arch/x86/xen/enlighten.c
@@ -123,11 +123,10 @@ noinstr void *__xen_hypercall_setfunc(void)
if (!boot_cpu_has(X86_FEATURE_CPUID))
xen_get_vendor();
- if ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
- boot_cpu_data.x86_vendor == X86_VENDOR_HYGON))
- func = xen_hypercall_amd;
- else
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
func = xen_hypercall_intel;
+ else
+ func = xen_hypercall_amd;
static_call_update_early(xen_hypercall, func);
But it still calls xen_hypercall_intel(). So maybe x86_vendor isn't
getting set and ends up as 0 (X86_VENDOR_INTEL)?
That's as far as I got here.
Different but related, on mainline master, I also get a fail in vmcall.
There, I see in the disassembly that __xen_hypercall_setfunc()'s calls
to xen_get_vendor() is gone. xen_get_vendor() seems to have been
DCE-ed. There is some new code that hardcodes features -
"x86/cpufeatures: Add {REQUIRED,DISABLED} feature configs" - which may
be responsible.
Regards,
Jason
On 10.04.25 02:09, Jason Andryuk wrote:
> On 2025-04-09 13:01, Andrew Cooper wrote:
>> On 09/04/2025 5:36 pm, Andrew Cooper wrote:
>>> Various bits of cleanup, and support for arm64 Linux builds.
>>>
>>> Run using the new Linux 6.6.86 on (most) x86, and ARM64:
>>> https://gitlab.com/xen-project/hardware/xen-staging/-/pipelines/1760667411
>>
>> Lovely, Linux 6.6.86 is broken for x86 PVH. It triple faults very early on.
>>
>> Sample log:
>> https://gitlab.com/xen-project/hardware/xen-staging/-/jobs/9673797450
>>
>> I guess we'll have to stay on 6.6.56 for now. (Only affects the final
>> patch.)
>
> This is an AMD system:
>
> (XEN) [ 2.577549] d0v0 Triple fault - invoking HVM shutdown action 1
> (XEN) [ 2.577557] RIP: 0008:[<0000000001f851d4>]
>
> The instruction:
> ffffffff81f851d4: 0f 01 c1 vmcall
>
> vmcall is the Intel instruction, and vmmcall is the AMD one, so CPU detection is
> malfunctioning.
>
> (Early PVH is running identity mapped, so it's offset from ffffffff80000000)
>
> There are no debug symbols in the vmlinux I extracted from the bzImage from
> gitlab, but I can repro locally with on 6.6.86. It's unclear to me why it's
> failing.
>
> Trying:
> diff --git i/arch/x86/xen/enlighten.c w/arch/x86/xen/enlighten.c
> index 0219f1c90202..fb4ad7fe3e34 100644
> --- i/arch/x86/xen/enlighten.c
> +++ w/arch/x86/xen/enlighten.c
> @@ -123,11 +123,10 @@ noinstr void *__xen_hypercall_setfunc(void)
> if (!boot_cpu_has(X86_FEATURE_CPUID))
> xen_get_vendor();
>
> - if ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
> - boot_cpu_data.x86_vendor == X86_VENDOR_HYGON))
> - func = xen_hypercall_amd;
> - else
> + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
> func = xen_hypercall_intel;
> + else
> + func = xen_hypercall_amd;
>
> static_call_update_early(xen_hypercall, func);
>
> But it still calls xen_hypercall_intel(). So maybe x86_vendor isn't getting set
> and ends up as 0 (X86_VENDOR_INTEL)?
>
> That's as far as I got here.
>
> Different but related, on mainline master, I also get a fail in vmcall. There, I
> see in the disassembly that __xen_hypercall_setfunc()'s calls to
> xen_get_vendor() is gone. xen_get_vendor() seems to have been DCE-ed. There is
> some new code that hardcodes features - "x86/cpufeatures: Add
> {REQUIRED,DISABLED} feature configs" - which may be responsible.
The test for !X86_FEATURE_CPUID will probably never be true now.
I guess the most simple fix will be to just call xen_get_vendor()
unconditionally.
Juergen
On 10/04/2025 1:09 am, Jason Andryuk wrote:
> On 2025-04-09 13:01, Andrew Cooper wrote:
>> On 09/04/2025 5:36 pm, Andrew Cooper wrote:
>>> Various bits of cleanup, and support for arm64 Linux builds.
>>>
>>> Run using the new Linux 6.6.86 on (most) x86, and ARM64:
>>>
>>> https://gitlab.com/xen-project/hardware/xen-staging/-/pipelines/1760667411
>>
>> Lovely, Linux 6.6.86 is broken for x86 PVH. It triple faults very
>> early on.
>>
>> Sample log:
>> https://gitlab.com/xen-project/hardware/xen-staging/-/jobs/9673797450
>>
>> I guess we'll have to stay on 6.6.56 for now. (Only affects the final
>> patch.)
>
> This is an AMD system:
>
> (XEN) [ 2.577549] d0v0 Triple fault - invoking HVM shutdown action 1
> (XEN) [ 2.577557] RIP: 0008:[<0000000001f851d4>]
>
> The instruction:
> ffffffff81f851d4: 0f 01 c1 vmcall
>
> vmcall is the Intel instruction, and vmmcall is the AMD one, so CPU
> detection is malfunctioning.
>
> (Early PVH is running identity mapped, so it's offset from
> ffffffff80000000)
>
> There are no debug symbols in the vmlinux I extracted from the bzImage
> from gitlab, but I can repro locally with on 6.6.86. It's unclear to
> me why it's failing.
>
> Trying:
> diff --git i/arch/x86/xen/enlighten.c w/arch/x86/xen/enlighten.c
> index 0219f1c90202..fb4ad7fe3e34 100644
> --- i/arch/x86/xen/enlighten.c
> +++ w/arch/x86/xen/enlighten.c
> @@ -123,11 +123,10 @@ noinstr void *__xen_hypercall_setfunc(void)
> if (!boot_cpu_has(X86_FEATURE_CPUID))
> xen_get_vendor();
>
> - if ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
> - boot_cpu_data.x86_vendor == X86_VENDOR_HYGON))
> - func = xen_hypercall_amd;
> - else
> + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
> func = xen_hypercall_intel;
> + else
> + func = xen_hypercall_amd;
>
> static_call_update_early(xen_hypercall, func);
>
> But it still calls xen_hypercall_intel(). So maybe x86_vendor isn't
> getting set and ends up as 0 (X86_VENDOR_INTEL)?
>
> That's as far as I got here.
>
> Different but related, on mainline master, I also get a fail in
> vmcall. There, I see in the disassembly that
> __xen_hypercall_setfunc()'s calls to xen_get_vendor() is gone.
> xen_get_vendor() seems to have been DCE-ed. There is some new code
> that hardcodes features - "x86/cpufeatures: Add {REQUIRED,DISABLED}
> feature configs" - which may be responsible.
6.6.74 is broken too. (That's the revision that the ARM tests want).
So it broke somewhere between .56 and .74 which narrows the bisect a little.
https://gitlab.com/xen-project/hardware/xen-staging/-/pipelines/1761323774
In Gitlab, both AMD and Intel are failing in roughly the same way.
~Andrew
On 2025-04-10 05:17, Andrew Cooper wrote: > On 10/04/2025 1:09 am, Jason Andryuk wrote: >> On 2025-04-09 13:01, Andrew Cooper wrote: >>> On 09/04/2025 5:36 pm, Andrew Cooper wrote: > > 6.6.74 is broken too. (That's the revision that the ARM tests want). > So it broke somewhere between .56 and .74 which narrows the bisect a little. > > https://gitlab.com/xen-project/hardware/xen-staging/-/pipelines/1761323774 > > In Gitlab, both AMD and Intel are failing in roughly the same way. Something else goes wrong in QEMU even with my patch for the hypercall, and Linux eventually crashes. Lots of unhandled memory read/write in 0x1bfffe000 - 0x1bfffeff8, which is marked unusable for dom0. I trimmed lots of the consecutive "unhandled memory" from the attached log (313KB->22KB) Regards, Jason
On 2025-04-10 17:16, Jason Andryuk wrote: > On 2025-04-10 05:17, Andrew Cooper wrote: >> On 10/04/2025 1:09 am, Jason Andryuk wrote: >>> On 2025-04-09 13:01, Andrew Cooper wrote: >>>> On 09/04/2025 5:36 pm, Andrew Cooper wrote: >> >> 6.6.74 is broken too. (That's the revision that the ARM tests want). >> So it broke somewhere between .56 and .74 which narrows the bisect a >> little. >> >> https://gitlab.com/xen-project/hardware/xen-staging/-/ >> pipelines/1761323774 >> >> In Gitlab, both AMD and Intel are failing in roughly the same way. > > Something else goes wrong in QEMU even with my patch for the hypercall, > and Linux eventually crashes. Lots of unhandled memory read/write in > 0x1bfffe000 - 0x1bfffeff8, which is marked unusable for dom0. I trimmed > lots of the consecutive "unhandled memory" from the attached log (313KB- > >22KB) Seems like Rogers patches need backporting too: x86/xen: fix memblock_reserve() usage on PVH x86/xen: move xen_reserve_extra_memory() Regards, Jason
On Thu Apr 10, 2025 at 10:50 PM BST, Jason Andryuk wrote: > On 2025-04-10 17:16, Jason Andryuk wrote: >> On 2025-04-10 05:17, Andrew Cooper wrote: >>> On 10/04/2025 1:09 am, Jason Andryuk wrote: >>>> On 2025-04-09 13:01, Andrew Cooper wrote: >>>>> On 09/04/2025 5:36 pm, Andrew Cooper wrote: >>> >>> 6.6.74 is broken too. (That's the revision that the ARM tests want). >>> So it broke somewhere between .56 and .74 which narrows the bisect a >>> little. >>> >>> https://gitlab.com/xen-project/hardware/xen-staging/-/ >>> pipelines/1761323774 >>> >>> In Gitlab, both AMD and Intel are failing in roughly the same way. >> >> Something else goes wrong in QEMU even with my patch for the hypercall, >> and Linux eventually crashes. Lots of unhandled memory read/write in >> 0x1bfffe000 - 0x1bfffeff8, which is marked unusable for dom0. I trimmed >> lots of the consecutive "unhandled memory" from the attached log (313KB- >> >22KB) > > Seems like Rogers patches need backporting too: > > x86/xen: fix memblock_reserve() usage on PVH > x86/xen: move xen_reserve_extra_memory() > > Regards, > Jason I just tested this with your RFC change + those 2 backports on top of stable/v6.6.y and Linux does boot afterwards. Well found. Cheers, Alejandro
On Thu Apr 10, 2025 at 10:17 AM BST, Andrew Cooper wrote:
> On 10/04/2025 1:09 am, Jason Andryuk wrote:
>> On 2025-04-09 13:01, Andrew Cooper wrote:
>>> On 09/04/2025 5:36 pm, Andrew Cooper wrote:
>>>> Various bits of cleanup, and support for arm64 Linux builds.
>>>>
>>>> Run using the new Linux 6.6.86 on (most) x86, and ARM64:
>>>>
>>>> https://gitlab.com/xen-project/hardware/xen-staging/-/pipelines/1760667411
>>>
>>> Lovely, Linux 6.6.86 is broken for x86 PVH. It triple faults very
>>> early on.
>>>
>>> Sample log:
>>> https://gitlab.com/xen-project/hardware/xen-staging/-/jobs/9673797450
>>>
>>> I guess we'll have to stay on 6.6.56 for now. (Only affects the final
>>> patch.)
>>
>> This is an AMD system:
>>
>> (XEN) [ 2.577549] d0v0 Triple fault - invoking HVM shutdown action 1
>> (XEN) [ 2.577557] RIP: 0008:[<0000000001f851d4>]
>>
>> The instruction:
>> ffffffff81f851d4: 0f 01 c1 vmcall
>>
>> vmcall is the Intel instruction, and vmmcall is the AMD one, so CPU
>> detection is malfunctioning.
>>
>> (Early PVH is running identity mapped, so it's offset from
>> ffffffff80000000)
>>
>> There are no debug symbols in the vmlinux I extracted from the bzImage
>> from gitlab, but I can repro locally with on 6.6.86. It's unclear to
>> me why it's failing.
>>
>> Trying:
>> diff --git i/arch/x86/xen/enlighten.c w/arch/x86/xen/enlighten.c
>> index 0219f1c90202..fb4ad7fe3e34 100644
>> --- i/arch/x86/xen/enlighten.c
>> +++ w/arch/x86/xen/enlighten.c
>> @@ -123,11 +123,10 @@ noinstr void *__xen_hypercall_setfunc(void)
>> if (!boot_cpu_has(X86_FEATURE_CPUID))
>> xen_get_vendor();
>>
>> - if ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
>> - boot_cpu_data.x86_vendor == X86_VENDOR_HYGON))
>> - func = xen_hypercall_amd;
>> - else
>> + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
>> func = xen_hypercall_intel;
>> + else
>> + func = xen_hypercall_amd;
>>
>> static_call_update_early(xen_hypercall, func);
>>
>> But it still calls xen_hypercall_intel(). So maybe x86_vendor isn't
>> getting set and ends up as 0 (X86_VENDOR_INTEL)?
>>
>> That's as far as I got here.
>>
>> Different but related, on mainline master, I also get a fail in
>> vmcall. There, I see in the disassembly that
>> __xen_hypercall_setfunc()'s calls to xen_get_vendor() is gone.
>> xen_get_vendor() seems to have been DCE-ed. There is some new code
>> that hardcodes features - "x86/cpufeatures: Add {REQUIRED,DISABLED}
>> feature configs" - which may be responsible.
>
> 6.6.74 is broken too. (That's the revision that the ARM tests want).
> So it broke somewhere between .56 and .74 which narrows the bisect a little.
>
> https://gitlab.com/xen-project/hardware/xen-staging/-/pipelines/1761323774
>
> In Gitlab, both AMD and Intel are failing in roughly the same way.
>
> ~Andrew
I've bisected the tags and it was was introduced somewhere between the
v6.6.66 and the v6.6.67 tags.
The hypercall page was removed very shortly before v6.6.67 was tagged,
so I have a nagging suspicion...
Cheers,
Alejandro
On Thu Apr 10, 2025 at 7:20 PM BST, Alejandro Vallejo wrote:
> On Thu Apr 10, 2025 at 10:17 AM BST, Andrew Cooper wrote:
>> On 10/04/2025 1:09 am, Jason Andryuk wrote:
>>> On 2025-04-09 13:01, Andrew Cooper wrote:
>>>> On 09/04/2025 5:36 pm, Andrew Cooper wrote:
>>>>> Various bits of cleanup, and support for arm64 Linux builds.
>>>>>
>>>>> Run using the new Linux 6.6.86 on (most) x86, and ARM64:
>>>>>
>>>>> https://gitlab.com/xen-project/hardware/xen-staging/-/pipelines/1760667411
>>>>
>>>> Lovely, Linux 6.6.86 is broken for x86 PVH. It triple faults very
>>>> early on.
>>>>
>>>> Sample log:
>>>> https://gitlab.com/xen-project/hardware/xen-staging/-/jobs/9673797450
>>>>
>>>> I guess we'll have to stay on 6.6.56 for now. (Only affects the final
>>>> patch.)
>>>
>>> This is an AMD system:
>>>
>>> (XEN) [ 2.577549] d0v0 Triple fault - invoking HVM shutdown action 1
>>> (XEN) [ 2.577557] RIP: 0008:[<0000000001f851d4>]
>>>
>>> The instruction:
>>> ffffffff81f851d4: 0f 01 c1 vmcall
>>>
>>> vmcall is the Intel instruction, and vmmcall is the AMD one, so CPU
>>> detection is malfunctioning.
>>>
>>> (Early PVH is running identity mapped, so it's offset from
>>> ffffffff80000000)
>>>
>>> There are no debug symbols in the vmlinux I extracted from the bzImage
>>> from gitlab, but I can repro locally with on 6.6.86. It's unclear to
>>> me why it's failing.
>>>
>>> Trying:
>>> diff --git i/arch/x86/xen/enlighten.c w/arch/x86/xen/enlighten.c
>>> index 0219f1c90202..fb4ad7fe3e34 100644
>>> --- i/arch/x86/xen/enlighten.c
>>> +++ w/arch/x86/xen/enlighten.c
>>> @@ -123,11 +123,10 @@ noinstr void *__xen_hypercall_setfunc(void)
>>> if (!boot_cpu_has(X86_FEATURE_CPUID))
>>> xen_get_vendor();
>>>
>>> - if ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
>>> - boot_cpu_data.x86_vendor == X86_VENDOR_HYGON))
>>> - func = xen_hypercall_amd;
>>> - else
>>> + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
>>> func = xen_hypercall_intel;
>>> + else
>>> + func = xen_hypercall_amd;
>>>
>>> static_call_update_early(xen_hypercall, func);
>>>
>>> But it still calls xen_hypercall_intel(). So maybe x86_vendor isn't
>>> getting set and ends up as 0 (X86_VENDOR_INTEL)?
>>>
>>> That's as far as I got here.
>>>
>>> Different but related, on mainline master, I also get a fail in
>>> vmcall. There, I see in the disassembly that
>>> __xen_hypercall_setfunc()'s calls to xen_get_vendor() is gone.
>>> xen_get_vendor() seems to have been DCE-ed. There is some new code
>>> that hardcodes features - "x86/cpufeatures: Add {REQUIRED,DISABLED}
>>> feature configs" - which may be responsible.
>>
>> 6.6.74 is broken too. (That's the revision that the ARM tests want).
>> So it broke somewhere between .56 and .74 which narrows the bisect a little.
>>
>> https://gitlab.com/xen-project/hardware/xen-staging/-/pipelines/1761323774
>>
>> In Gitlab, both AMD and Intel are failing in roughly the same way.
>>
>> ~Andrew
>
> I've bisected the tags and it was was introduced somewhere between the
> v6.6.66 and the v6.6.67 tags.
>
> The hypercall page was removed very shortly before v6.6.67 was tagged,
> so I have a nagging suspicion...
>
> Cheers,
> Alejandro
The cutoff point is bcf0e2fda80c6("x86/xen: remove hypercall page").
Together with Jason's observation it would seem that Linux doesn't guess
the correct instruction (or not early enough) when running as PVH dom0.
On PV it's just "syscall", but on PVH it's a tad more complicated.
Cheers,
Alejandro
On Thu, 2025-04-10 at 19:28 +0100, Alejandro Vallejo wrote:
> The cutoff point is bcf0e2fda80c6("x86/xen: remove hypercall page").
>
> Together with Jason's observation it would seem that Linux doesn't guess
> the correct instruction (or not early enough) when running as PVH dom0.
> On PV it's just "syscall", but on PVH it's a tad more complicated.
>
I never understood why we did it this way anyway.
All this bogus complexity to do early detection of AMD vs. Intel and
use the right trap instruction, when we could have just continued to
use the proper Xen hypercall page at early boot.
After all, if you don't know what kind of CPU you're on, you *also*
haven't enabled CET or any other fancy return-tracking stuff yet. Just
fill in your own hypercall page *then*, instead. And then you can free
the original one.
Much simpler, much less fragile, and less prone to other potential
breakage from 64-bit latching side-effect that we forgot Xen does when
the guest sets the hypercall page.
A Xen PVH dom0 on an AMD processor triple faults early in boot on
6.6.86. CPU detection appears to fail, as the faulting instruction is
vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd().
Detection fails because __xen_hypercall_setfunc() returns the full
kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() -
e.g. 0xffffffff815b93f0. But this is compared against the rip-relative
xen_hypercall_amd(%rip), which when running from identity mapping, is
only 0x015b93f0.
Replace the rip-relative address with just loading the actual address to
restore the proper comparision.
This only seems to affect PVH dom0 boot. This is probably because the
XENMEM_memory_map hypercall is issued early on from the identity
mappings. With a domU, the memory map is provided via hvm_start_info
and the hypercall is skipped. The domU is probably running from the
kernel high mapping when it issues hypercalls.
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
I think this sort of address mismatch would be addresed by
e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping")
That could be backported instead, but it depends on a fair number of
patches.
Not sure on how getting a patch just into 6.6 would work. This patch
could go into upstream Linux though it's not strictly necessary when the
rip-relative address is a high address.
---
arch/x86/xen/xen-head.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 059f343da76d..71a0eda2da60 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -117,7 +117,7 @@ SYM_FUNC_START(xen_hypercall_hvm)
pop %ebx
pop %eax
#else
- lea xen_hypercall_amd(%rip), %rcx
+ mov $xen_hypercall_amd, %rcx
cmp %rax, %rcx
#ifdef CONFIG_FRAME_POINTER
pop %rax /* Dummy pop. */
--
2.49.0
On Thu Apr 10, 2025 at 8:50 PM BST, Jason Andryuk wrote:
> A Xen PVH dom0 on an AMD processor triple faults early in boot on
> 6.6.86. CPU detection appears to fail, as the faulting instruction is
> vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd().
>
> Detection fails because __xen_hypercall_setfunc() returns the full
> kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() -
> e.g. 0xffffffff815b93f0. But this is compared against the rip-relative
> xen_hypercall_amd(%rip), which when running from identity mapping, is
> only 0x015b93f0.
>
> Replace the rip-relative address with just loading the actual address to
> restore the proper comparision.
>
> This only seems to affect PVH dom0 boot. This is probably because the
> XENMEM_memory_map hypercall is issued early on from the identity
> mappings. With a domU, the memory map is provided via hvm_start_info
> and the hypercall is skipped. The domU is probably running from the
> kernel high mapping when it issues hypercalls.
>
> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> ---
> I think this sort of address mismatch would be addresed by
> e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping")
>
> That could be backported instead, but it depends on a fair number of
> patches.
>
> Not sure on how getting a patch just into 6.6 would work. This patch
> could go into upstream Linux though it's not strictly necessary when the
> rip-relative address is a high address.
> ---
> arch/x86/xen/xen-head.S | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
> index 059f343da76d..71a0eda2da60 100644
> --- a/arch/x86/xen/xen-head.S
> +++ b/arch/x86/xen/xen-head.S
> @@ -117,7 +117,7 @@ SYM_FUNC_START(xen_hypercall_hvm)
> pop %ebx
> pop %eax
> #else
> - lea xen_hypercall_amd(%rip), %rcx
> + mov $xen_hypercall_amd, %rcx
(Now that this is known to be the fix upstream) This probably wants to
be plain lea without RIP-relative addressing, like the x86_32 branch
above?
> cmp %rax, %rcx
> #ifdef CONFIG_FRAME_POINTER
> pop %rax /* Dummy pop. */
On 11.04.2025 14:46, Alejandro Vallejo wrote:
> On Thu Apr 10, 2025 at 8:50 PM BST, Jason Andryuk wrote:
>> A Xen PVH dom0 on an AMD processor triple faults early in boot on
>> 6.6.86. CPU detection appears to fail, as the faulting instruction is
>> vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd().
>>
>> Detection fails because __xen_hypercall_setfunc() returns the full
>> kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() -
>> e.g. 0xffffffff815b93f0. But this is compared against the rip-relative
>> xen_hypercall_amd(%rip), which when running from identity mapping, is
>> only 0x015b93f0.
>>
>> Replace the rip-relative address with just loading the actual address to
>> restore the proper comparision.
>>
>> This only seems to affect PVH dom0 boot. This is probably because the
>> XENMEM_memory_map hypercall is issued early on from the identity
>> mappings. With a domU, the memory map is provided via hvm_start_info
>> and the hypercall is skipped. The domU is probably running from the
>> kernel high mapping when it issues hypercalls.
>>
>> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
>> ---
>> I think this sort of address mismatch would be addresed by
>> e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping")
>>
>> That could be backported instead, but it depends on a fair number of
>> patches.
>>
>> Not sure on how getting a patch just into 6.6 would work. This patch
>> could go into upstream Linux though it's not strictly necessary when the
>> rip-relative address is a high address.
>> ---
>> arch/x86/xen/xen-head.S | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
>> index 059f343da76d..71a0eda2da60 100644
>> --- a/arch/x86/xen/xen-head.S
>> +++ b/arch/x86/xen/xen-head.S
>> @@ -117,7 +117,7 @@ SYM_FUNC_START(xen_hypercall_hvm)
>> pop %ebx
>> pop %eax
>> #else
>> - lea xen_hypercall_amd(%rip), %rcx
>> + mov $xen_hypercall_amd, %rcx
>
> (Now that this is known to be the fix upstream) This probably wants to
> be plain lea without RIP-relative addressing, like the x86_32 branch
> above?
Why would you want to use LEA there? It's functionally identical, but the
MOV can be encoded without ModR/M byte.
Jan
On Fri Apr 11, 2025 at 2:08 PM BST, Jan Beulich wrote:
> On 11.04.2025 14:46, Alejandro Vallejo wrote:
>> On Thu Apr 10, 2025 at 8:50 PM BST, Jason Andryuk wrote:
>>> A Xen PVH dom0 on an AMD processor triple faults early in boot on
>>> 6.6.86. CPU detection appears to fail, as the faulting instruction is
>>> vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd().
>>>
>>> Detection fails because __xen_hypercall_setfunc() returns the full
>>> kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() -
>>> e.g. 0xffffffff815b93f0. But this is compared against the rip-relative
>>> xen_hypercall_amd(%rip), which when running from identity mapping, is
>>> only 0x015b93f0.
>>>
>>> Replace the rip-relative address with just loading the actual address to
>>> restore the proper comparision.
>>>
>>> This only seems to affect PVH dom0 boot. This is probably because the
>>> XENMEM_memory_map hypercall is issued early on from the identity
>>> mappings. With a domU, the memory map is provided via hvm_start_info
>>> and the hypercall is skipped. The domU is probably running from the
>>> kernel high mapping when it issues hypercalls.
>>>
>>> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
>>> ---
>>> I think this sort of address mismatch would be addresed by
>>> e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping")
>>>
>>> That could be backported instead, but it depends on a fair number of
>>> patches.
>>>
>>> Not sure on how getting a patch just into 6.6 would work. This patch
>>> could go into upstream Linux though it's not strictly necessary when the
>>> rip-relative address is a high address.
>>> ---
>>> arch/x86/xen/xen-head.S | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
>>> index 059f343da76d..71a0eda2da60 100644
>>> --- a/arch/x86/xen/xen-head.S
>>> +++ b/arch/x86/xen/xen-head.S
>>> @@ -117,7 +117,7 @@ SYM_FUNC_START(xen_hypercall_hvm)
>>> pop %ebx
>>> pop %eax
>>> #else
>>> - lea xen_hypercall_amd(%rip), %rcx
>>> + mov $xen_hypercall_amd, %rcx
>>
>> (Now that this is known to be the fix upstream) This probably wants to
>> be plain lea without RIP-relative addressing, like the x86_32 branch
>> above?
>
> Why would you want to use LEA there? It's functionally identical, but the
> MOV can be encoded without ModR/M byte.
>
> Jan
It's not the using of a particular encoding that I meant, but not using
the same on both 32 and 64 bit paths. Surely whatever argument in favour
of either would hold for both 32 and 64 bits.
Cheers,
Alejandro
On 10/04/2025 8:50 pm, Jason Andryuk wrote:
> A Xen PVH dom0 on an AMD processor triple faults early in boot on
> 6.6.86. CPU detection appears to fail, as the faulting instruction is
> vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd().
>
> Detection fails because __xen_hypercall_setfunc() returns the full
> kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() -
> e.g. 0xffffffff815b93f0. But this is compared against the rip-relative
> xen_hypercall_amd(%rip), which when running from identity mapping, is
> only 0x015b93f0.
>
> Replace the rip-relative address with just loading the actual address to
> restore the proper comparision.
>
> This only seems to affect PVH dom0 boot. This is probably because the
> XENMEM_memory_map hypercall is issued early on from the identity
> mappings. With a domU, the memory map is provided via hvm_start_info
> and the hypercall is skipped. The domU is probably running from the
> kernel high mapping when it issues hypercalls.
>
> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> ---
> I think this sort of address mismatch would be addresed by
> e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping")
>
> That could be backported instead, but it depends on a fair number of
> patches.
I've just spoken to Ard, and he thinks that it's standalone. Should be
ok to backport as a fix.
> Not sure on how getting a patch just into 6.6 would work. This patch
> could go into upstream Linux though it's not strictly necessary when the
> rip-relative address is a high address.
Do we know which other trees are broken? I only found 6.6 because I was
messing around with other bits of CI that happen to use 6.6.
~Andrew
On Thu, 10 Apr 2025 at 23:49, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 10/04/2025 8:50 pm, Jason Andryuk wrote:
> > A Xen PVH dom0 on an AMD processor triple faults early in boot on
> > 6.6.86. CPU detection appears to fail, as the faulting instruction is
> > vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd().
> >
> > Detection fails because __xen_hypercall_setfunc() returns the full
> > kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() -
> > e.g. 0xffffffff815b93f0. But this is compared against the rip-relative
> > xen_hypercall_amd(%rip), which when running from identity mapping, is
> > only 0x015b93f0.
> >
> > Replace the rip-relative address with just loading the actual address to
> > restore the proper comparision.
> >
> > This only seems to affect PVH dom0 boot. This is probably because the
> > XENMEM_memory_map hypercall is issued early on from the identity
> > mappings. With a domU, the memory map is provided via hvm_start_info
> > and the hypercall is skipped. The domU is probably running from the
> > kernel high mapping when it issues hypercalls.
> >
> > Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> > ---
> > I think this sort of address mismatch would be addresed by
> > e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping")
> >
> > That could be backported instead, but it depends on a fair number of
> > patches.
>
> I've just spoken to Ard, and he thinks that it's standalone. Should be
> ok to backport as a fix.
>
I've tried building and booting 6.6.y with the patch applied - GS will
still be set to the 1:1 mapped address but that shouldn't matter,
given that it is only used for the stack canary, and we don't do
address comparisons on that afaik.
> > Not sure on how getting a patch just into 6.6 would work. This patch
> > could go into upstream Linux though it's not strictly necessary when the
> > rip-relative address is a high address.
>
> Do we know which other trees are broken? I only found 6.6 because I was
> messing around with other bits of CI that happen to use 6.6.
>
I'd assume all trees that had the hypercall page removal patch
backported to them will be broken in the same way.
On 2025-04-11 07:35, Ard Biesheuvel wrote:
> On Thu, 10 Apr 2025 at 23:49, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>
>> On 10/04/2025 8:50 pm, Jason Andryuk wrote:
>>> A Xen PVH dom0 on an AMD processor triple faults early in boot on
>>> 6.6.86. CPU detection appears to fail, as the faulting instruction is
>>> vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd().
>>>
>>> Detection fails because __xen_hypercall_setfunc() returns the full
>>> kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() -
>>> e.g. 0xffffffff815b93f0. But this is compared against the rip-relative
>>> xen_hypercall_amd(%rip), which when running from identity mapping, is
>>> only 0x015b93f0.
>>>
>>> Replace the rip-relative address with just loading the actual address to
>>> restore the proper comparision.
>>>
>>> This only seems to affect PVH dom0 boot. This is probably because the
>>> XENMEM_memory_map hypercall is issued early on from the identity
>>> mappings. With a domU, the memory map is provided via hvm_start_info
>>> and the hypercall is skipped. The domU is probably running from the
>>> kernel high mapping when it issues hypercalls.
>>>
>>> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
>>> ---
>>> I think this sort of address mismatch would be addresed by
>>> e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping")
>>>
>>> That could be backported instead, but it depends on a fair number of
>>> patches.
>>
>> I've just spoken to Ard, and he thinks that it's standalone. Should be
>> ok to backport as a fix.
>>
>
> I've tried building and booting 6.6.y with the patch applied - GS will
> still be set to the 1:1 mapped address but that shouldn't matter,
> given that it is only used for the stack canary, and we don't do
> address comparisons on that afaik.
Yes, it seems to work - I tested with dom0 and it booted. I removed the
use of phys_base - the diff is included below. Does that match what you
did?
>>> Not sure on how getting a patch just into 6.6 would work. This patch
>>> could go into upstream Linux though it's not strictly necessary when the
>>> rip-relative address is a high address.
>>
>> Do we know which other trees are broken? I only found 6.6 because I was
>> messing around with other bits of CI that happen to use 6.6.
>>
>
> I'd assume all trees that had the hypercall page removal patch
> backported to them will be broken in the same way.
Yes, I think so. Looks like it went back to 5.10 but not to 5.4.
Ard, I can submit the stable request unless you want to.
Regards,
Jason
diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index c4365a05ab83..9bf4cc04f079 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -100,7 +100,11 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
xor %edx, %edx
wrmsr
- call xen_prepare_pvh
+ /* Call xen_prepare_pvh() via the kernel virtual mapping */
+ leaq xen_prepare_pvh(%rip), %rax
+ addq $__START_KERNEL_map, %rax
+ ANNOTATE_RETPOLINE_SAFE
+ call *%rax
/* startup_64 expects boot_params in %rsi. */
mov $_pa(pvh_bootparams), %rsi
On Fri, 11 Apr 2025 at 16:28, Jason Andryuk <jason.andryuk@amd.com> wrote:
>
>
>
> On 2025-04-11 07:35, Ard Biesheuvel wrote:
> > On Thu, 10 Apr 2025 at 23:49, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >>
> >> On 10/04/2025 8:50 pm, Jason Andryuk wrote:
> >>> A Xen PVH dom0 on an AMD processor triple faults early in boot on
> >>> 6.6.86. CPU detection appears to fail, as the faulting instruction is
> >>> vmcall in xen_hypercall_intel() and not vmmcall in xen_hypercall_amd().
> >>>
> >>> Detection fails because __xen_hypercall_setfunc() returns the full
> >>> kernel mapped address of xen_hypercall_amd() or xen_hypercall_intel() -
> >>> e.g. 0xffffffff815b93f0. But this is compared against the rip-relative
> >>> xen_hypercall_amd(%rip), which when running from identity mapping, is
> >>> only 0x015b93f0.
> >>>
> >>> Replace the rip-relative address with just loading the actual address to
> >>> restore the proper comparision.
> >>>
> >>> This only seems to affect PVH dom0 boot. This is probably because the
> >>> XENMEM_memory_map hypercall is issued early on from the identity
> >>> mappings. With a domU, the memory map is provided via hvm_start_info
> >>> and the hypercall is skipped. The domU is probably running from the
> >>> kernel high mapping when it issues hypercalls.
> >>>
> >>> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> >>> ---
> >>> I think this sort of address mismatch would be addresed by
> >>> e8fbc0d9cab6 ("x86/pvh: Call C code via the kernel virtual mapping")
> >>>
> >>> That could be backported instead, but it depends on a fair number of
> >>> patches.
> >>
> >> I've just spoken to Ard, and he thinks that it's standalone. Should be
> >> ok to backport as a fix.
> >>
> >
> > I've tried building and booting 6.6.y with the patch applied - GS will
> > still be set to the 1:1 mapped address but that shouldn't matter,
> > given that it is only used for the stack canary, and we don't do
> > address comparisons on that afaik.
>
> Yes, it seems to work - I tested with dom0 and it booted. I removed the
> use of phys_base - the diff is included below. Does that match what you
> did?
>
The stable tree maintainers generally prefer the backports to be as
close to the originals as possible, and given that phys_base is
guaranteed to be 0x0, you might as well keep the subtraction.
> >>> Not sure on how getting a patch just into 6.6 would work. This patch
> >>> could go into upstream Linux though it's not strictly necessary when the
> >>> rip-relative address is a high address.
> >>
> >> Do we know which other trees are broken? I only found 6.6 because I was
> >> messing around with other bits of CI that happen to use 6.6.
> >>
> >
> > I'd assume all trees that had the hypercall page removal patch
> > backported to them will be broken in the same way.
>
> Yes, I think so. Looks like it went back to 5.10 but not to 5.4.
>
> Ard, I can submit the stable request unless you want to.
>
Please go ahead.
© 2016 - 2025 Red Hat, Inc.