The format of the buildid is a property of the binary, not a property of how
it was loaded.  This fixes buildid recognition when starting xen.efi from it's
MB2 entrypoint.
Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
CC: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
CC: Daniel P. Smith <dpsmith@apertussolutions.com>
I don't like this patch and tried hard to do it in a better way, but the EFI
aspects of the build system are too intractable.
While on x86 I can in principle pull the same common-stubs.o trick, split on
XEN_BUILD_PE rather than XEN_BUILD_EFI, that doesn't work on ARM which
hand-codes it's PE-ness.  Also, it's really not EFI related, other than as a
consequence of that being the only reason we use PE32+ binaries.
Binutils 2.25 is now the minimum, and the makefiles can be cleaned up
somewhat, but I need to backport this patch, internally at least.
---
 xen/common/version.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/xen/common/version.c b/xen/common/version.c
index 5474b8e385be..56b51c81d2fc 100644
--- a/xen/common/version.c
+++ b/xen/common/version.c
@@ -203,8 +203,11 @@ void __init xen_build_init(void)
     rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len);
 
 #ifdef CONFIG_X86
-    /* Alternatively we may have a CodeView record from an EFI build. */
-    if ( rc && efi_enabled(EFI_LOADER) )
+    /*
+     * xen.efi built with a new enough toolchain will have a CodeView record,
+     * not an ELF note.
+     */
+    if ( rc )
     {
         const struct pe_external_debug_directory *dir = (const void *)n;
 
-- 
2.39.5
---- On Thu, 05 Jun 2025 07:16:36 -0400 Andrew Cooper <andrew.cooper3@citrix.com> wrote ---
 > The format of the buildid is a property of the binary, not a property of how 
 > it was loaded.  This fixes buildid recognition when starting xen.efi from it's 
 > MB2 entrypoint. 
 >  
 > Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com> 
 > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> 
 > --- 
 > CC: Jan Beulich <JBeulich@suse.com> 
 > CC: Roger Pau Monné <roger.pau@citrix.com> 
 > CC: Ross Lagerwall <ross.lagerwall@citrix.com> 
 > CC: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> 
 > CC: Daniel P. Smith <dpsmith@apertussolutions.com> 
 >  
 > I don't like this patch and tried hard to do it in a better way, but the EFI 
 > aspects of the build system are too intractable. 
 >  
 > While on x86 I can in principle pull the same common-stubs.o trick, split on 
 > XEN_BUILD_PE rather than XEN_BUILD_EFI, that doesn't work on ARM which 
 > hand-codes it's PE-ness.  Also, it's really not EFI related, other than as a 
 > consequence of that being the only reason we use PE32+ binaries. 
 >  
 > Binutils 2.25 is now the minimum, and the makefiles can be cleaned up 
 > somewhat, but I need to backport this patch, internally at least. 
 > --- 
 >  xen/common/version.c | 7 +++++-- 
 >  1 file changed, 5 insertions(+), 2 deletions(-) 
 >  
 > diff --git a/xen/common/version.c b/xen/common/version.c 
 > index 5474b8e385be..56b51c81d2fc 100644 
 > --- a/xen/common/version.c 
 > +++ b/xen/common/version.c 
 > @@ -203,8 +203,11 @@ void __init xen_build_init(void) 
 >  rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len); 
 >  
 >  #ifdef CONFIG_X86 
 > -    /* Alternatively we may have a CodeView record from an EFI build. */ 
 > -    if ( rc && efi_enabled(EFI_LOADER) ) 
 > +    /* 
 > +     * xen.efi built with a new enough toolchain will have a CodeView record, 
 > +     * not an ELF note. 
 > +     */ 
 > +    if ( rc ) 
 >  { 
 >  const struct pe_external_debug_directory *dir = (const void *)n; 
 >  
 > -- 
 > 2.39.5 
 >  
 > 
From what I can see, thre are no longer objections and two Rb's ackowledging this is the best path forward. I do not see an issue with it myself.
Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com>
                
            On Thu, Jun 05, 2025 at 12:16:36PM +0100, Andrew Cooper wrote: > The format of the buildid is a property of the binary, not a property of how > it was loaded. This fixes buildid recognition when starting xen.efi from it's > MB2 entrypoint. > > Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com> > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Some possibly useless rants below. > --- > CC: Jan Beulich <JBeulich@suse.com> > CC: Roger Pau Monné <roger.pau@citrix.com> > CC: Ross Lagerwall <ross.lagerwall@citrix.com> > CC: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > CC: Daniel P. Smith <dpsmith@apertussolutions.com> > > I don't like this patch and tried hard to do it in a better way, but the EFI > aspects of the build system are too intractable. > > While on x86 I can in principle pull the same common-stubs.o trick, split on > XEN_BUILD_PE rather than XEN_BUILD_EFI, that doesn't work on ARM which > hand-codes it's PE-ness. Also, it's really not EFI related, other than as a > consequence of that being the only reason we use PE32+ binaries. Since this is already gated on CONFIG_X86 you could pass XEN_BUILD_PE as a define in CFLAGS, and use it together with the CONFIG_X86 check? That however would still put the code in the .gz binary, as the object file would be the same for both builds. Otherwise we would need to compile this twice and use the different object files for the PE vs ELF images, which seems extremely cumbersome for a little benefit, as this is init-only code that would be gone once booted. Thanks, Roger.
On Thu, Jun 5, 2025 at 12:17 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> The format of the buildid is a property of the binary, not a property of how
> it was loaded.  This fixes buildid recognition when starting xen.efi from it's
> MB2 entrypoint.
>
> Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Ross Lagerwall <ross.lagerwall@citrix.com>
> CC: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> CC: Daniel P. Smith <dpsmith@apertussolutions.com>
>
> I don't like this patch and tried hard to do it in a better way, but the EFI
> aspects of the build system are too intractable.
>
> While on x86 I can in principle pull the same common-stubs.o trick, split on
> XEN_BUILD_PE rather than XEN_BUILD_EFI, that doesn't work on ARM which
> hand-codes it's PE-ness.  Also, it's really not EFI related, other than as a
> consequence of that being the only reason we use PE32+ binaries.
>
> Binutils 2.25 is now the minimum, and the makefiles can be cleaned up
> somewhat, but I need to backport this patch, internally at least.
> ---
>  xen/common/version.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/xen/common/version.c b/xen/common/version.c
> index 5474b8e385be..56b51c81d2fc 100644
> --- a/xen/common/version.c
> +++ b/xen/common/version.c
> @@ -203,8 +203,11 @@ void __init xen_build_init(void)
>      rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len);
>
>  #ifdef CONFIG_X86
> -    /* Alternatively we may have a CodeView record from an EFI build. */
> -    if ( rc && efi_enabled(EFI_LOADER) )
> +    /*
> +     * xen.efi built with a new enough toolchain will have a CodeView record,
> +     * not an ELF note.
> +     */
> +    if ( rc )
>      {
>          const struct pe_external_debug_directory *dir = (const void *)n;
>
Reviewed-by: Frediano Ziglio <frediano.ziglio@cloud.com>
I cannot see a better proposal in the discussion, but I prefer this
fixed than keep it broken.
Frediano
                
            On Fri, Jun 27, 2025 at 02:08:00PM +0100, Frediano Ziglio wrote:
> On Thu, Jun 5, 2025 at 12:17 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >
> > The format of the buildid is a property of the binary, not a property of how
> > it was loaded.  This fixes buildid recognition when starting xen.efi from it's
> > MB2 entrypoint.
> >
> > Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > ---
> > CC: Jan Beulich <JBeulich@suse.com>
> > CC: Roger Pau Monné <roger.pau@citrix.com>
> > CC: Ross Lagerwall <ross.lagerwall@citrix.com>
> > CC: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > CC: Daniel P. Smith <dpsmith@apertussolutions.com>
> >
> > I don't like this patch and tried hard to do it in a better way, but the EFI
> > aspects of the build system are too intractable.
> >
> > While on x86 I can in principle pull the same common-stubs.o trick, split on
> > XEN_BUILD_PE rather than XEN_BUILD_EFI, that doesn't work on ARM which
> > hand-codes it's PE-ness.  Also, it's really not EFI related, other than as a
> > consequence of that being the only reason we use PE32+ binaries.
Besides the ARM issue, it would also make it harder to follow different
boot paths.
> >
> > Binutils 2.25 is now the minimum, and the makefiles can be cleaned up
> > somewhat, but I need to backport this patch, internally at least.
> > ---
> >  xen/common/version.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/xen/common/version.c b/xen/common/version.c
> > index 5474b8e385be..56b51c81d2fc 100644
> > --- a/xen/common/version.c
> > +++ b/xen/common/version.c
> > @@ -203,8 +203,11 @@ void __init xen_build_init(void)
> >      rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len);
> >
> >  #ifdef CONFIG_X86
> > -    /* Alternatively we may have a CodeView record from an EFI build. */
> > -    if ( rc && efi_enabled(EFI_LOADER) )
> > +    /*
> > +     * xen.efi built with a new enough toolchain will have a CodeView record,
> > +     * not an ELF note.
> > +     */
> > +    if ( rc )
> >      {
> >          const struct pe_external_debug_directory *dir = (const void *)n;
> >
> 
> Reviewed-by: Frediano Ziglio <frediano.ziglio@cloud.com>
> 
> I cannot see a better proposal in the discussion, but I prefer this
> fixed than keep it broken.
Yeah, I agree.
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
                
            On 05.06.2025 13:16, Andrew Cooper wrote: > The format of the buildid is a property of the binary, not a property of how > it was loaded. This fixes buildid recognition when starting xen.efi from it's > MB2 entrypoint. > > Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com> > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> I'll pick this up without a Fixes: tag, but I think it ought to have one. (I didn't check whether MB2 or build-id support came later, hence introducing the issue.) > --- a/xen/common/version.c > +++ b/xen/common/version.c > @@ -203,8 +203,11 @@ void __init xen_build_init(void) > rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len); > > #ifdef CONFIG_X86 > - /* Alternatively we may have a CodeView record from an EFI build. */ > - if ( rc && efi_enabled(EFI_LOADER) ) > + /* > + * xen.efi built with a new enough toolchain will have a CodeView record, > + * not an ELF note. > + */ > + if ( rc ) Instead of dropping the efi_enabled(), I think you want to replace EFI_LOADER by EFI_BOOT. Jan
On Thu, Jun 05, 2025 at 02:02:21PM +0200, Jan Beulich wrote: > On 05.06.2025 13:16, Andrew Cooper wrote: > > The format of the buildid is a property of the binary, not a property of how > > it was loaded. This fixes buildid recognition when starting xen.efi from it's > > MB2 entrypoint. > > > > Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com> > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > > I'll pick this up without a Fixes: tag, but I think it ought to have one. (I > didn't check whether MB2 or build-id support came later, hence introducing the > issue.) > > > --- a/xen/common/version.c > > +++ b/xen/common/version.c > > @@ -203,8 +203,11 @@ void __init xen_build_init(void) > > rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len); > > > > #ifdef CONFIG_X86 > > - /* Alternatively we may have a CodeView record from an EFI build. */ > > - if ( rc && efi_enabled(EFI_LOADER) ) > > + /* > > + * xen.efi built with a new enough toolchain will have a CodeView record, > > + * not an ELF note. > > + */ > > + if ( rc ) > > Instead of dropping the efi_enabled(), I think you want to replace EFI_LOADER > by EFI_BOOT. Part of the motivation for MB2 entry point in xen.efi is using the same binary in all boot modes, including legacy boot - in which case none of EFI_* checks would be appropriate here. The grub series adds support for loading PE binaries, and IIUC it isn't tied to booting via EFI. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab
On 05.06.2025 14:20, Marek Marczykowski-Górecki wrote: > On Thu, Jun 05, 2025 at 02:02:21PM +0200, Jan Beulich wrote: >> On 05.06.2025 13:16, Andrew Cooper wrote: >>> The format of the buildid is a property of the binary, not a property of how >>> it was loaded. This fixes buildid recognition when starting xen.efi from it's >>> MB2 entrypoint. >>> >>> Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com> >>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> >> I'll pick this up without a Fixes: tag, but I think it ought to have one. (I >> didn't check whether MB2 or build-id support came later, hence introducing the >> issue.) >> >>> --- a/xen/common/version.c >>> +++ b/xen/common/version.c >>> @@ -203,8 +203,11 @@ void __init xen_build_init(void) >>> rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len); >>> >>> #ifdef CONFIG_X86 >>> - /* Alternatively we may have a CodeView record from an EFI build. */ >>> - if ( rc && efi_enabled(EFI_LOADER) ) >>> + /* >>> + * xen.efi built with a new enough toolchain will have a CodeView record, >>> + * not an ELF note. >>> + */ >>> + if ( rc ) >> >> Instead of dropping the efi_enabled(), I think you want to replace EFI_LOADER >> by EFI_BOOT. > > Part of the motivation for MB2 entry point in xen.efi is using the > same binary in all boot modes, including legacy boot - in which case > none of EFI_* checks would be appropriate here. The grub series adds > support for loading PE binaries, and IIUC it isn't tied to booting via > EFI. "The grub series" being which one? I didn't know xen.efi's non-EFI MB2 entry point could be used; instead I was under the impression that someone was having (half?) a patch eliminating the MB data from xen.efi, as being dead code. Jan
On Thu, Jun 05, 2025 at 03:08:07PM +0200, Jan Beulich wrote: > On 05.06.2025 14:20, Marek Marczykowski-Górecki wrote: > > On Thu, Jun 05, 2025 at 02:02:21PM +0200, Jan Beulich wrote: > >> On 05.06.2025 13:16, Andrew Cooper wrote: > >>> The format of the buildid is a property of the binary, not a property of how > >>> it was loaded. This fixes buildid recognition when starting xen.efi from it's > >>> MB2 entrypoint. > >>> > >>> Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com> > >>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > >> > >> I'll pick this up without a Fixes: tag, but I think it ought to have one. (I > >> didn't check whether MB2 or build-id support came later, hence introducing the > >> issue.) > >> > >>> --- a/xen/common/version.c > >>> +++ b/xen/common/version.c > >>> @@ -203,8 +203,11 @@ void __init xen_build_init(void) > >>> rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len); > >>> > >>> #ifdef CONFIG_X86 > >>> - /* Alternatively we may have a CodeView record from an EFI build. */ > >>> - if ( rc && efi_enabled(EFI_LOADER) ) > >>> + /* > >>> + * xen.efi built with a new enough toolchain will have a CodeView record, > >>> + * not an ELF note. > >>> + */ > >>> + if ( rc ) > >> > >> Instead of dropping the efi_enabled(), I think you want to replace EFI_LOADER > >> by EFI_BOOT. > > > > Part of the motivation for MB2 entry point in xen.efi is using the > > same binary in all boot modes, including legacy boot - in which case > > none of EFI_* checks would be appropriate here. The grub series adds > > support for loading PE binaries, and IIUC it isn't tied to booting via > > EFI. > > "The grub series" being which one? https://lists.gnu.org/archive/html/grub-devel/2024-03/msg00080.html | This patch series implements support for loading and verifying a signed | xen binary. This would allow the same xen binary to be used for BIOS | boot, UEFI boot, and UEFI boot with Secure Boot verification. | There is an accompanying Xen patch series. > I didn't know xen.efi's non-EFI MB2 entry > point could be used; instead I was under the impression that someone was > having (half?) a patch eliminating the MB data from xen.efi, as being dead > code. See also https://lore.kernel.org/xen-devel/CAG7k0EroeA=cRRDWnJqzH8esoaSmtg8-xjTwc-01og5R9JwPzg@mail.gmail.com/ for some extra context. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab
On 05/06/2025 1:02 pm, Jan Beulich wrote: > On 05.06.2025 13:16, Andrew Cooper wrote: >> The format of the buildid is a property of the binary, not a property of how >> it was loaded. This fixes buildid recognition when starting xen.efi from it's >> MB2 entrypoint. >> >> Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > I'll pick this up without a Fixes: tag, but I think it ought to have one. (I > didn't check whether MB2 or build-id support came later, hence introducing the > issue.) MB2+EFI came long before any buildid support. If you want a fixes tag, it's eee5909e9d1 > >> --- a/xen/common/version.c >> +++ b/xen/common/version.c >> @@ -203,8 +203,11 @@ void __init xen_build_init(void) >> rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len); >> >> #ifdef CONFIG_X86 >> - /* Alternatively we may have a CodeView record from an EFI build. */ >> - if ( rc && efi_enabled(EFI_LOADER) ) >> + /* >> + * xen.efi built with a new enough toolchain will have a CodeView record, >> + * not an ELF note. >> + */ >> + if ( rc ) > Instead of dropping the efi_enabled(), I think you want to replace EFI_LOADER > by EFI_BOOT. No - that's differently buggy. I suppose the commit message wasn't clear enough? We'd still have a CodeView record if we booted xen.efi using it's MB2 entrypoint without the EFI extensions. This really is a property of being a PE32+ binary, and nothing to do with EFI. ~Andrew
On 05.06.2025 14:14, Andrew Cooper wrote: > On 05/06/2025 1:02 pm, Jan Beulich wrote: >> On 05.06.2025 13:16, Andrew Cooper wrote: >>> The format of the buildid is a property of the binary, not a property of how >>> it was loaded. This fixes buildid recognition when starting xen.efi from it's >>> MB2 entrypoint. >>> >>> Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com> >>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> I'll pick this up without a Fixes: tag, but I think it ought to have one. (I >> didn't check whether MB2 or build-id support came later, hence introducing the >> issue.) > > MB2+EFI came long before any buildid support. If you want a fixes tag, > it's eee5909e9d1 That commit talks of an earlier hack, though. And no, MB2 work came later, albeit still in the 4.9 dev window (commit 9180f53655245). >>> --- a/xen/common/version.c >>> +++ b/xen/common/version.c >>> @@ -203,8 +203,11 @@ void __init xen_build_init(void) >>> rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len); >>> >>> #ifdef CONFIG_X86 >>> - /* Alternatively we may have a CodeView record from an EFI build. */ >>> - if ( rc && efi_enabled(EFI_LOADER) ) >>> + /* >>> + * xen.efi built with a new enough toolchain will have a CodeView record, >>> + * not an ELF note. >>> + */ >>> + if ( rc ) >> Instead of dropping the efi_enabled(), I think you want to replace EFI_LOADER >> by EFI_BOOT. > > No - that's differently buggy. I suppose the commit message wasn't > clear enough? > > We'd still have a CodeView record if we booted xen.efi using it's MB2 > entrypoint without the EFI extensions. Hmm, if that's a possible mode of operation (as said in reply to Marek, I wasn't aware of that) - yes. > This really is a property of being a PE32+ binary, and nothing to do > with EFI. Which still can be checked for without having this code path being taken for xen.gz, too: You could e.g. check for &efi > &_end. That's firmly an image property (yet I expect you're going to sigh about yet another hack). Jan
On 05/06/2025 2:24 pm, Jan Beulich wrote: > On 05.06.2025 14:14, Andrew Cooper wrote: >> On 05/06/2025 1:02 pm, Jan Beulich wrote: >>> On 05.06.2025 13:16, Andrew Cooper wrote: >>>> The format of the buildid is a property of the binary, not a property of how >>>> it was loaded. This fixes buildid recognition when starting xen.efi from it's >>>> MB2 entrypoint. >>>> >>>> Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com> >>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >>> I'll pick this up without a Fixes: tag, but I think it ought to have one. (I >>> didn't check whether MB2 or build-id support came later, hence introducing the >>> issue.) >> MB2+EFI came long before any buildid support. If you want a fixes tag, >> it's eee5909e9d1 > That commit talks of an earlier hack, though. And no, MB2 work came later, > albeit still in the 4.9 dev window (commit 9180f53655245). The "previous hack" was embedding note.o (from the livepatch test infrastructure) back in the main xen binary. That's still present. For xen.gz you get an elf note. For xen.efi, it may be an elf note or a CodeView region, depending on the toolchain. > >>>> --- a/xen/common/version.c >>>> +++ b/xen/common/version.c >>>> @@ -203,8 +203,11 @@ void __init xen_build_init(void) >>>> rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len); >>>> >>>> #ifdef CONFIG_X86 >>>> - /* Alternatively we may have a CodeView record from an EFI build. */ >>>> - if ( rc && efi_enabled(EFI_LOADER) ) >>>> + /* >>>> + * xen.efi built with a new enough toolchain will have a CodeView record, >>>> + * not an ELF note. >>>> + */ >>>> + if ( rc ) >>> Instead of dropping the efi_enabled(), I think you want to replace EFI_LOADER >>> by EFI_BOOT. >> No - that's differently buggy. I suppose the commit message wasn't >> clear enough? >> >> We'd still have a CodeView record if we booted xen.efi using it's MB2 >> entrypoint without the EFI extensions. > Hmm, if that's a possible mode of operation (as said in reply to Marek, I > wasn't aware of that) - yes. It's how I found the bug. There's an awful lot of hackery in the patch-queue holding it together, but in this case it's not really about MB2 or anything else; it's xen_build_init() being incorrect in how it determines whether there's a CodeView region or not. > >> This really is a property of being a PE32+ binary, and nothing to do >> with EFI. > Which still can be checked for without having this code path being taken > for xen.gz, too: You could e.g. check for &efi > &_end. That's firmly an > image property (yet I expect you're going to sigh about yet another hack). It's all hacks, but no. I'm amazed MISRA hasn't spotted that we've got a global `struct efi efi;` and a label named efi, creating an alias for the object with it out of bounds in the compiled image. But even then, it's based on XEN_BUILD_EFI not XEN_BUILD_PE and does not distinguish the property that matters. But the argument I'm going to make this this: Why do you want a check, even if you can find a correct one (and as said before, I cannot)? This function is run exactly once. We've excluded "nothing given by the toolchain", and excluded "what the toolchain gave us was not the expected ELF note". The only thing left (modulo toolchain bugs) is the CodeView region, and if it's not a valid CodeView region then we've wasted a handful of cycles. ~Andrew
On 05.06.2025 19:01, Andrew Cooper wrote: > On 05/06/2025 2:24 pm, Jan Beulich wrote: >> On 05.06.2025 14:14, Andrew Cooper wrote: >>> On 05/06/2025 1:02 pm, Jan Beulich wrote: >>>> On 05.06.2025 13:16, Andrew Cooper wrote: >>> This really is a property of being a PE32+ binary, and nothing to do >>> with EFI. >> Which still can be checked for without having this code path being taken >> for xen.gz, too: You could e.g. check for &efi > &_end. That's firmly an >> image property (yet I expect you're going to sigh about yet another hack). > > It's all hacks, but no. > > I'm amazed MISRA hasn't spotted that we've got a global `struct efi > efi;` and a label named efi, creating an alias for the object with it > out of bounds in the compiled image. But even then, it's based on > XEN_BUILD_EFI not XEN_BUILD_PE and does not distinguish the property > that matters. The use of XEN_BUILD_EFI in the linker script should have been switched to XEN_BUILD_PE when the split was introduced. > But the argument I'm going to make this this: Why do you want a check, > even if you can find a correct one (and as said before, I cannot)? > > This function is run exactly once. We've excluded "nothing given by the > toolchain", and excluded "what the toolchain gave us was not the > expected ELF note". The only thing left (modulo toolchain bugs) is the > CodeView region, and if it's not a valid CodeView region then we've > wasted a handful of cycles. Two reasons: Having code which cannot possibly do anything useful isn't good. Misra calls the latest the body of the inner if() "unreachable code" and objects to the presence of such in a build. (I'm pretty sure Eclair wouldn't spot it, but that doesn't eliminate this being a violation of the respective rule.) And then, based on your reasoning above, why don't you also drop the #ifdef CONFIG_X86? Jan
On Fri, 6 Jun 2025, Jan Beulich wrote: > > But the argument I'm going to make this this: Why do you want a check, > > even if you can find a correct one (and as said before, I cannot)? > > > > This function is run exactly once. We've excluded "nothing given by the > > toolchain", and excluded "what the toolchain gave us was not the > > expected ELF note". The only thing left (modulo toolchain bugs) is the > > CodeView region, and if it's not a valid CodeView region then we've > > wasted a handful of cycles. > > Two reasons: Having code which cannot possibly do anything useful isn't > good. Misra calls the latest the body of the inner if() "unreachable code" > and objects to the presence of such in a build. (I'm pretty sure Eclair > wouldn't spot it, but that doesn't eliminate this being a violation of > the respective rule.) As far as I can tell this code can be reachable simply by having an appropriate test with Xen booted via a non-EFI method and a CodeView record being present. No code changes to Xen needed. > And then, based on your reasoning above, why don't you also drop the > #ifdef CONFIG_X86? Because while theoretically possible, today there is not a single test that would allow us to trigger this code on ARM. We would need to change Xen to do it first.
On 06/06/2025 8:22 am, Jan Beulich wrote: > On 05.06.2025 19:01, Andrew Cooper wrote: >> On 05/06/2025 2:24 pm, Jan Beulich wrote: >>> On 05.06.2025 14:14, Andrew Cooper wrote: >>>> On 05/06/2025 1:02 pm, Jan Beulich wrote: >>>>> On 05.06.2025 13:16, Andrew Cooper wrote: >>>> This really is a property of being a PE32+ binary, and nothing to do >>>> with EFI. >>> Which still can be checked for without having this code path being taken >>> for xen.gz, too: You could e.g. check for &efi > &_end. That's firmly an >>> image property (yet I expect you're going to sigh about yet another hack). >> It's all hacks, but no. >> >> I'm amazed MISRA hasn't spotted that we've got a global `struct efi >> efi;` and a label named efi, creating an alias for the object with it >> out of bounds in the compiled image. But even then, it's based on >> XEN_BUILD_EFI not XEN_BUILD_PE and does not distinguish the property >> that matters. > The use of XEN_BUILD_EFI in the linker script should have been switched > to XEN_BUILD_PE when the split was introduced. That doesn't build. As I already explained, the stubs aren't split in a way that allows that. >> But the argument I'm going to make this this: Why do you want a check, >> even if you can find a correct one (and as said before, I cannot)? >> >> This function is run exactly once. We've excluded "nothing given by the >> toolchain", and excluded "what the toolchain gave us was not the >> expected ELF note". The only thing left (modulo toolchain bugs) is the >> CodeView region, and if it's not a valid CodeView region then we've >> wasted a handful of cycles. > Two reasons: Having code which cannot possibly do anything useful isn't > good. Misra calls the latest the body of the inner if() "unreachable code" > and objects to the presence of such in a build. It's not unreachable code, not even theoretically. *If* there was a suitable check, I'd be using it, but everything you've proposed has been buggy or doesn't even compile. > And then, based on your reasoning above, why don't you also drop the > #ifdef CONFIG_X86? Because that's the one non-buggy way of excluding an impossible case. x86 is the only architecture possibly linking with pep emulation, and therefore the only architecture to possibly have a CodeView record. ~Andrew
On 06.06.2025 17:01, Andrew Cooper wrote: > On 06/06/2025 8:22 am, Jan Beulich wrote: >> On 05.06.2025 19:01, Andrew Cooper wrote: >>> On 05/06/2025 2:24 pm, Jan Beulich wrote: >>>> On 05.06.2025 14:14, Andrew Cooper wrote: >>>>> On 05/06/2025 1:02 pm, Jan Beulich wrote: >>>>>> On 05.06.2025 13:16, Andrew Cooper wrote: >>>>> This really is a property of being a PE32+ binary, and nothing to do >>>>> with EFI. >>>> Which still can be checked for without having this code path being taken >>>> for xen.gz, too: You could e.g. check for &efi > &_end. That's firmly an >>>> image property (yet I expect you're going to sigh about yet another hack). >>> It's all hacks, but no. >>> >>> I'm amazed MISRA hasn't spotted that we've got a global `struct efi >>> efi;` and a label named efi, creating an alias for the object with it >>> out of bounds in the compiled image. But even then, it's based on >>> XEN_BUILD_EFI not XEN_BUILD_PE and does not distinguish the property >>> that matters. >> The use of XEN_BUILD_EFI in the linker script should have been switched >> to XEN_BUILD_PE when the split was introduced. > > That doesn't build. As I already explained, the stubs aren't split in a > way that allows that. Which then is a pretty clear indication that the split was wrong to do in the first place, don't you agree? >>> But the argument I'm going to make this this: Why do you want a check, >>> even if you can find a correct one (and as said before, I cannot)? >>> >>> This function is run exactly once. We've excluded "nothing given by the >>> toolchain", and excluded "what the toolchain gave us was not the >>> expected ELF note". The only thing left (modulo toolchain bugs) is the >>> CodeView region, and if it's not a valid CodeView region then we've >>> wasted a handful of cycles. >> Two reasons: Having code which cannot possibly do anything useful isn't >> good. Misra calls the latest the body of the inner if() "unreachable code" >> and objects to the presence of such in a build. > > It's not unreachable code, not even theoretically. How is it not? If we build without this CodeView record, it very much is unreachable. > *If* there was a suitable check, I'd be using it, but everything you've > proposed has been buggy or doesn't even compile. Okay, but we draw different conclusions: You want to do it in a way that, as per above, imo introduces unreachable code. Whereas I keep wanting to find a suitable check (or if necessary introduce whatever is needed to have one). >> And then, based on your reasoning above, why don't you also drop the >> #ifdef CONFIG_X86? > > Because that's the one non-buggy way of excluding an impossible case. > > x86 is the only architecture possibly linking with pep emulation, and > therefore the only architecture to possibly have a CodeView record. And how's the, say, Arm case different from the x86 case with no such record built in? Either it's unreachable code in both cases, or it's not. Jan
On 10/06/2025 9:01 am, Jan Beulich wrote: > On 06.06.2025 17:01, Andrew Cooper wrote: >> On 06/06/2025 8:22 am, Jan Beulich wrote: >>> On 05.06.2025 19:01, Andrew Cooper wrote: >>>> On 05/06/2025 2:24 pm, Jan Beulich wrote: >>>>> On 05.06.2025 14:14, Andrew Cooper wrote: >>>>>> On 05/06/2025 1:02 pm, Jan Beulich wrote: >>>>>>> On 05.06.2025 13:16, Andrew Cooper wrote: >>>>>> This really is a property of being a PE32+ binary, and nothing to do >>>>>> with EFI. >>>>> Which still can be checked for without having this code path being taken >>>>> for xen.gz, too: You could e.g. check for &efi > &_end. That's firmly an >>>>> image property (yet I expect you're going to sigh about yet another hack). >>>> It's all hacks, but no. >>>> >>>> I'm amazed MISRA hasn't spotted that we've got a global `struct efi >>>> efi;` and a label named efi, creating an alias for the object with it >>>> out of bounds in the compiled image. But even then, it's based on >>>> XEN_BUILD_EFI not XEN_BUILD_PE and does not distinguish the property >>>> that matters. >>> The use of XEN_BUILD_EFI in the linker script should have been switched >>> to XEN_BUILD_PE when the split was introduced. >> That doesn't build. As I already explained, the stubs aren't split in a >> way that allows that. > Which then is a pretty clear indication that the split was wrong to do in > the first place, don't you agree? I think my feelings on how xen.efi was done are quite well known, but so what? I've spent longer than I can afford trying to untangle this, and its an impenetrable mess. >>>> But the argument I'm going to make this this: Why do you want a check, >>>> even if you can find a correct one (and as said before, I cannot)? >>>> >>>> This function is run exactly once. We've excluded "nothing given by the >>>> toolchain", and excluded "what the toolchain gave us was not the >>>> expected ELF note". The only thing left (modulo toolchain bugs) is the >>>> CodeView region, and if it's not a valid CodeView region then we've >>>> wasted a handful of cycles. >>> Two reasons: Having code which cannot possibly do anything useful isn't >>> good. Misra calls the latest the body of the inner if() "unreachable code" >>> and objects to the presence of such in a build. >> It's not unreachable code, not even theoretically. > How is it not? If we build without this CodeView record, it very much is > unreachable. Compiling without a CodeView record doesn't magically cause the prior logic to guarantee to succeed. The compiler is forced to emit real conditional logic, and there's almost 2^96 bit-pattens the toolchain could put into memory which will very literally reach the CodeView check. "The toolchain shouldn't cause this path to be executed" is not the same as genuinely unreachable. What safety certification is liable to complain about is the inability to construct a test that demonstrates coverage, but I'm not changing that property with this patch. >>> And then, based on your reasoning above, why don't you also drop the >>> #ifdef CONFIG_X86? >> Because that's the one non-buggy way of excluding an impossible case. >> >> x86 is the only architecture possibly linking with pep emulation, and >> therefore the only architecture to possibly have a CodeView record. > And how's the, say, Arm case different from the x86 case with no such > record built in? Because its currently impossible for ARM to have a codeview record. Remember that ARM writes a MZ/PE header by hand in a flat binary. It does not use a PEP linker. ~Andrew
On 13.06.2025 23:28, Andrew Cooper wrote:
> On 10/06/2025 9:01 am, Jan Beulich wrote:
>> On 06.06.2025 17:01, Andrew Cooper wrote:
>>> On 06/06/2025 8:22 am, Jan Beulich wrote:
>>>> And then, based on your reasoning above, why don't you also drop the
>>>> #ifdef CONFIG_X86?
>>> Because that's the one non-buggy way of excluding an impossible case.
>>>
>>> x86 is the only architecture possibly linking with pep emulation, and
>>> therefore the only architecture to possibly have a CodeView record.
>> And how's the, say, Arm case different from the x86 case with no such
>> record built in? 
> 
> Because its currently impossible for ARM to have a codeview record.
There must be some pretty fundamental misunderstanding then: I can only
ask again - how's this different from xen.gz? There's not going to be
CodeView record there, yet still you enable the logic there. Hence the
body of the conditional is unreachable in that case.
To expand on my earlier suggestion (ab)using the "efi" global: With
the linker script having this
#ifdef EFI
  .reloc ALIGN(4) : {
    __base_relocs_start = .;
    *(.reloc)
    __base_relocs_end = .;
  }
#elif defined(XEN_BUILD_EFI)
  /*
   * Due to the way EFI support is currently implemented, these two symbols
   * need to be defined.  Their precise values shouldn't matter (the consuming
   * function doesn't get called), but to be on the safe side both values would
   * better match.  Of course the need to be reachable by the relocations
   * referencing them.
   */
  PROVIDE(__base_relocs_start = .);
  PROVIDE(__base_relocs_end = .);
#else
  efi = .;
#endif
where only the #if applies to xen.efi, can't we (ab)use the combination of the
other two symbols here to decide between xen.efi vs xen.gz?
__base_relocs_{start,efi} won't possibly be equal for xen.efi, except in an
extremely theoretical situation (and we could cover for that case by an ASSERT
in the linker script). Pseudo code:
#ifdef XEN_BUILD_EFI
    if ( __base_relocs_start != __base_relocs_end )
    {
        ...
    }
#endif
IOW that #if could simply replace the CONFIG_X86 one that's there right now.
> Remember that ARM writes a MZ/PE header by hand in a flat binary.  It
> does not use a PEP linker.
Of course. Much like for xen.gz there's no PEP linking involved.
Jan
                
            On 16/06/2025 7:27 am, Jan Beulich wrote:
> To expand on my earlier suggestion (ab)using the "efi" global: With
> the linker script having this
>
> #ifdef EFI
>   .reloc ALIGN(4) : {
>     __base_relocs_start = .;
>     *(.reloc)
>     __base_relocs_end = .;
>   }
> #elif defined(XEN_BUILD_EFI)
>   /*
>    * Due to the way EFI support is currently implemented, these two symbols
>    * need to be defined.  Their precise values shouldn't matter (the consuming
>    * function doesn't get called), but to be on the safe side both values would
>    * better match.  Of course the need to be reachable by the relocations
>    * referencing them.
>    */
>   PROVIDE(__base_relocs_start = .);
>   PROVIDE(__base_relocs_end = .);
> #else
>   efi = .;
> #endif
>
> where only the #if applies to xen.efi, can't we (ab)use the combination of the
> other two symbols here to decide between xen.efi vs xen.gz?
> __base_relocs_{start,efi} won't possibly be equal for xen.efi, except in an
> extremely theoretical situation (and we could cover for that case by an ASSERT
> in the linker script). Pseudo code:
>
> #ifdef XEN_BUILD_EFI
>     if ( __base_relocs_start != __base_relocs_end )
>     {
>         ...
>     }
> #endif
>
> IOW that #if could simply replace the CONFIG_X86 one that's there right now.
That's horrifying.  Also you can't include efi-boot.h to get the
declarations.
But given that you are adamant that the (...) in there containing a
CodeView check is unacceptable to have, why does wrapping it in yet
another conditional make it ok?
~Andrew
                
            On 23.06.2025 22:18, Andrew Cooper wrote:
> On 16/06/2025 7:27 am, Jan Beulich wrote:
>> To expand on my earlier suggestion (ab)using the "efi" global: With
>> the linker script having this
>>
>> #ifdef EFI
>>   .reloc ALIGN(4) : {
>>     __base_relocs_start = .;
>>     *(.reloc)
>>     __base_relocs_end = .;
>>   }
>> #elif defined(XEN_BUILD_EFI)
>>   /*
>>    * Due to the way EFI support is currently implemented, these two symbols
>>    * need to be defined.  Their precise values shouldn't matter (the consuming
>>    * function doesn't get called), but to be on the safe side both values would
>>    * better match.  Of course the need to be reachable by the relocations
>>    * referencing them.
>>    */
>>   PROVIDE(__base_relocs_start = .);
>>   PROVIDE(__base_relocs_end = .);
>> #else
>>   efi = .;
>> #endif
>>
>> where only the #if applies to xen.efi, can't we (ab)use the combination of the
>> other two symbols here to decide between xen.efi vs xen.gz?
>> __base_relocs_{start,efi} won't possibly be equal for xen.efi, except in an
>> extremely theoretical situation (and we could cover for that case by an ASSERT
>> in the linker script). Pseudo code:
>>
>> #ifdef XEN_BUILD_EFI
>>     if ( __base_relocs_start != __base_relocs_end )
>>     {
>>         ...
>>     }
>> #endif
>>
>> IOW that #if could simply replace the CONFIG_X86 one that's there right now.
> 
> That's horrifying.  Also you can't include efi-boot.h to get the
> declarations.
The declarations could be moved, but ...
> But given that you are adamant that the (...) in there containing a
> CodeView check is unacceptable to have, why does wrapping it in yet
> another conditional make it ok?
... hmm, yes, there'll still be unreachable code there. I guess I'll shut
up here and leave this to the EFI maintainers. Just as long as there's not
going to be any Eclair / Misra fallout.
Jan
                
            On 06.06.2025 09:22, Jan Beulich wrote: > On 05.06.2025 19:01, Andrew Cooper wrote: >> On 05/06/2025 2:24 pm, Jan Beulich wrote: >>> On 05.06.2025 14:14, Andrew Cooper wrote: >>>> On 05/06/2025 1:02 pm, Jan Beulich wrote: >>>>> On 05.06.2025 13:16, Andrew Cooper wrote: >>>> This really is a property of being a PE32+ binary, and nothing to do >>>> with EFI. >>> Which still can be checked for without having this code path being taken >>> for xen.gz, too: You could e.g. check for &efi > &_end. That's firmly an >>> image property (yet I expect you're going to sigh about yet another hack). >> >> It's all hacks, but no. >> >> I'm amazed MISRA hasn't spotted that we've got a global `struct efi >> efi;` and a label named efi, creating an alias for the object with it >> out of bounds in the compiled image. But even then, it's based on >> XEN_BUILD_EFI not XEN_BUILD_PE and does not distinguish the property >> that matters. > > The use of XEN_BUILD_EFI in the linker script should have been switched > to XEN_BUILD_PE when the split was introduced. > >> But the argument I'm going to make this this: Why do you want a check, >> even if you can find a correct one (and as said before, I cannot)? >> >> This function is run exactly once. We've excluded "nothing given by the >> toolchain", and excluded "what the toolchain gave us was not the >> expected ELF note". The only thing left (modulo toolchain bugs) is the >> CodeView region, and if it's not a valid CodeView region then we've >> wasted a handful of cycles. > > Two reasons: Having code which cannot possibly do anything useful isn't > good. Misra calls the latest the body of the inner if() "unreachable code" > and objects to the presence of such in a build. (I'm pretty sure Eclair > wouldn't spot it, but that doesn't eliminate this being a violation of > the respective rule.) > > And then, based on your reasoning above, why don't you also drop the > #ifdef CONFIG_X86? ..., saying in the description "we can as well check for this uniformly" Jan
© 2016 - 2025 Red Hat, Inc.