[PATCH] x86/boot: Fix PVH boot during boot_info transition period

Andrew Cooper posted 1 patch 1 month ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20241022124114.84498-1-andrew.cooper3@citrix.com
xen/arch/x86/setup.c | 25 ++++++++++++++++++++-----
1 file changed, 20 insertions(+), 5 deletions(-)
[PATCH] x86/boot: Fix PVH boot during boot_info transition period
Posted by Andrew Cooper 1 month ago
multiboot_fill_boot_info() taking the physical address of the multiboot_info
structure leads to a subtle use-after-free on the PVH path, with rather less
subtle fallout.

The pointers used by __start_xen(), mbi and mod, are either:

 - MB:  Directmap pointers into the trampoline, or
 - PVH: Xen pointers into .initdata, or
 - EFI: Directmap pointers into Xen.

Critically, these either remain valid across move_xen() (MB, PVH), or rely on
move_xen() being inhibited (EFI).

The conversion to multiboot_fill_boot_info(), taking only mbi_p, makes the PVH
path use directmap pointers into Xen, as well as move_xen() which invalidates
said pointers.

Switch multiboot_fill_boot_info() to consume the same virtual addresses that
__start_xen() currently uses.  This keeps all the pointers valid for the
duration of __start_xen(), for all boot protocols.

It can be safely untangled once multiboot_fill_boot_info() takes a full copy
the multiboot info data, and __start_xen() has been moved over to using the
new boot_info consistently.

Right now, bi->{loader,cmdline,mods} are problematic.  Nothing uses
bi->mods[], and nothing uses bi->cmdline after move_xen().

bi->loader is used after move_xen(), although only for cmdline_cook() of
dom0's cmdline, where it happens to be benign because PVH boot skips the
inspection of the bootloader name.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Daniel P. Smith <dpsmith@apertussolutions.com>

This is more proof that Xen only boots by accident.  It certainly isn't by any
kind of design.

https://gitlab.com/xen-project/people/andyhhp/xen/-/pipelines/1506947472
including the pending work to add a PVShim test

This is the least-invasive fix given the rest of the Hyperlaunch series.

A different option would to introduce EFI and PVH forms of
multiboot_fill_boot_info(), but that would involve juggling even more moving
parts during the transition period.
---
 xen/arch/x86/setup.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index db670258d650..e43b56d4e80f 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -283,11 +283,10 @@ struct boot_info __initdata xen_boot_info = {
     .cmdline = "",
 };
 
-static struct boot_info *__init multiboot_fill_boot_info(unsigned long mbi_p)
+static struct boot_info *__init multiboot_fill_boot_info(
+    multiboot_info_t *mbi, module_t *mods)
 {
     struct boot_info *bi = &xen_boot_info;
-    const multiboot_info_t *mbi = __va(mbi_p);
-    module_t *mods = __va(mbi->mods_addr);
     unsigned int i;
 
     if ( mbi->flags & MBI_MODULES )
@@ -1065,15 +1064,31 @@ void asmlinkage __init noreturn __start_xen(unsigned long mbi_p)
     {
         ASSERT(mbi_p == 0);
         pvh_init(&mbi, &mod);
-        mbi_p = __pa(mbi);
+        /*
+         * mbi and mod are regular pointers to .initdata.  These remain valid
+         * across move_xen().
+         */
     }
     else
     {
         mbi = __va(mbi_p);
         mod = __va(mbi->mods_addr);
+
+        /*
+         * For MB1/2, mbi and mod are directmap pointers into the trampoline.
+         * These remain valid across move_xen().
+         *
+         * For EFI, these are directmap pointers into the Xen image.  They do
+         * not remain valid across move_xen().  EFI boot only functions
+         * because a non-zero xen_phys_start inhibits move_xen().
+         *
+         * Don't be fooled by efi_arch_post_exit_boot() passing "D" (&mbi).
+         * This is a EFI physical-mode (i.e. identity map) pointer.
+         */
+        ASSERT(mbi_p < MB(1) || xen_phys_start);
     }
 
-    bi = multiboot_fill_boot_info(mbi_p);
+    bi = multiboot_fill_boot_info(mbi, mod);
 
     /* Parse the command-line options. */
     if ( (kextra = strstr(bi->cmdline, " -- ")) != NULL )

base-commit: 49a068471d77820af5dac5ad062cde7129e3faae
prerequisite-patch-id: 4ada23fb7ca505d30d9c41e23583d5db5ed95bec
prerequisite-patch-id: 2427c5681fce868938a85f8d70de7adb31f731a7
prerequisite-patch-id: 99b7107cd0d8a7675ebd30cf788e550fda4e9cfb
prerequisite-patch-id: 795f6e9425cc6a953166b530ae68df466a7a3c2b
-- 
2.39.5


Re: [PATCH] x86/boot: Fix PVH boot during boot_info transition period
Posted by Roger Pau Monné 1 month ago
On Tue, Oct 22, 2024 at 01:41:14PM +0100, Andrew Cooper wrote:
> multiboot_fill_boot_info() taking the physical address of the multiboot_info
> structure leads to a subtle use-after-free on the PVH path, with rather less
> subtle fallout.
> 
> The pointers used by __start_xen(), mbi and mod, are either:
> 
>  - MB:  Directmap pointers into the trampoline, or
>  - PVH: Xen pointers into .initdata, or
>  - EFI: Directmap pointers into Xen.
> 
> Critically, these either remain valid across move_xen() (MB, PVH), or rely on
> move_xen() being inhibited (EFI).
> 
> The conversion to multiboot_fill_boot_info(), taking only mbi_p, makes the PVH
> path use directmap pointers into Xen, as well as move_xen() which invalidates
> said pointers.
> 
> Switch multiboot_fill_boot_info() to consume the same virtual addresses that
> __start_xen() currently uses.  This keeps all the pointers valid for the
> duration of __start_xen(), for all boot protocols.
> 
> It can be safely untangled once multiboot_fill_boot_info() takes a full copy
> the multiboot info data, and __start_xen() has been moved over to using the
> new boot_info consistently.
> 
> Right now, bi->{loader,cmdline,mods} are problematic.  Nothing uses
> bi->mods[], and nothing uses bi->cmdline after move_xen().
> 
> bi->loader is used after move_xen(), although only for cmdline_cook() of
> dom0's cmdline, where it happens to be benign because PVH boot skips the
> inspection of the bootloader name.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

One nit below about dropping a const keyword.

> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Daniel P. Smith <dpsmith@apertussolutions.com>
> 
> This is more proof that Xen only boots by accident.  It certainly isn't by any
> kind of design.
> 
> https://gitlab.com/xen-project/people/andyhhp/xen/-/pipelines/1506947472
> including the pending work to add a PVShim test
> 
> This is the least-invasive fix given the rest of the Hyperlaunch series.
> 
> A different option would to introduce EFI and PVH forms of
> multiboot_fill_boot_info(), but that would involve juggling even more moving
> parts during the transition period.
> ---
>  xen/arch/x86/setup.c | 25 ++++++++++++++++++++-----
>  1 file changed, 20 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index db670258d650..e43b56d4e80f 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -283,11 +283,10 @@ struct boot_info __initdata xen_boot_info = {
>      .cmdline = "",
>  };
>  
> -static struct boot_info *__init multiboot_fill_boot_info(unsigned long mbi_p)
> +static struct boot_info *__init multiboot_fill_boot_info(
> +    multiboot_info_t *mbi, module_t *mods)

Shouldn't mbi keep the const keyword?  Given there are no changes on
how it's used in multiboot_fill_boot_info().

Thanks, Roger.

Re: [PATCH] x86/boot: Fix PVH boot during boot_info transition period
Posted by Andrew Cooper 1 month ago
On 22/10/2024 3:12 pm, Roger Pau Monné wrote:
> On Tue, Oct 22, 2024 at 01:41:14PM +0100, Andrew Cooper wrote:
>> multiboot_fill_boot_info() taking the physical address of the multiboot_info
>> structure leads to a subtle use-after-free on the PVH path, with rather less
>> subtle fallout.
>>
>> The pointers used by __start_xen(), mbi and mod, are either:
>>
>>  - MB:  Directmap pointers into the trampoline, or
>>  - PVH: Xen pointers into .initdata, or
>>  - EFI: Directmap pointers into Xen.
>>
>> Critically, these either remain valid across move_xen() (MB, PVH), or rely on
>> move_xen() being inhibited (EFI).
>>
>> The conversion to multiboot_fill_boot_info(), taking only mbi_p, makes the PVH
>> path use directmap pointers into Xen, as well as move_xen() which invalidates
>> said pointers.
>>
>> Switch multiboot_fill_boot_info() to consume the same virtual addresses that
>> __start_xen() currently uses.  This keeps all the pointers valid for the
>> duration of __start_xen(), for all boot protocols.
>>
>> It can be safely untangled once multiboot_fill_boot_info() takes a full copy
>> the multiboot info data, and __start_xen() has been moved over to using the
>> new boot_info consistently.
>>
>> Right now, bi->{loader,cmdline,mods} are problematic.  Nothing uses
>> bi->mods[], and nothing uses bi->cmdline after move_xen().
>>
>> bi->loader is used after move_xen(), although only for cmdline_cook() of
>> dom0's cmdline, where it happens to be benign because PVH boot skips the
>> inspection of the bootloader name.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

>
> One nit below about dropping a const keyword.
>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> CC: Daniel P. Smith <dpsmith@apertussolutions.com>
>>
>> This is more proof that Xen only boots by accident.  It certainly isn't by any
>> kind of design.
>>
>> https://gitlab.com/xen-project/people/andyhhp/xen/-/pipelines/1506947472
>> including the pending work to add a PVShim test
>>
>> This is the least-invasive fix given the rest of the Hyperlaunch series.
>>
>> A different option would to introduce EFI and PVH forms of
>> multiboot_fill_boot_info(), but that would involve juggling even more moving
>> parts during the transition period.
>> ---
>>  xen/arch/x86/setup.c | 25 ++++++++++++++++++++-----
>>  1 file changed, 20 insertions(+), 5 deletions(-)
>>
>> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
>> index db670258d650..e43b56d4e80f 100644
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -283,11 +283,10 @@ struct boot_info __initdata xen_boot_info = {
>>      .cmdline = "",
>>  };
>>  
>> -static struct boot_info *__init multiboot_fill_boot_info(unsigned long mbi_p)
>> +static struct boot_info *__init multiboot_fill_boot_info(
>> +    multiboot_info_t *mbi, module_t *mods)
> Shouldn't mbi keep the const keyword?  Given there are no changes on
> how it's used in multiboot_fill_boot_info().

Fine.

FWIW,
https://gitlab.com/xen-project/people/andyhhp/xen/-/pipelines/1507153249
is previously-crashing consider_modules() change, with this fix in
place, shown now to be functioning.

~Andrew