x86/EFI: Fix detection of buildid

[PATCH 1/3] x86/EFI: Fix detection of buildid

Posted by Andrew Cooper 4 months, 3 weeks ago

The format of the buildid is a property of the binary, not a property of how
it was loaded.  This fixes buildid recognition when starting xen.efi from it's
MB2 entrypoint.

Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
CC: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
CC: Daniel P. Smith <dpsmith@apertussolutions.com>

I don't like this patch and tried hard to do it in a better way, but the EFI
aspects of the build system are too intractable.

While on x86 I can in principle pull the same common-stubs.o trick, split on
XEN_BUILD_PE rather than XEN_BUILD_EFI, that doesn't work on ARM which
hand-codes it's PE-ness.  Also, it's really not EFI related, other than as a
consequence of that being the only reason we use PE32+ binaries.

Binutils 2.25 is now the minimum, and the makefiles can be cleaned up
somewhat, but I need to backport this patch, internally at least.
---
 xen/common/version.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/common/version.c b/xen/common/version.c
index 5474b8e385be..56b51c81d2fc 100644
--- a/xen/common/version.c
+++ b/xen/common/version.c
@@ -203,8 +203,11 @@ void __init xen_build_init(void)
     rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len);
 
 #ifdef CONFIG_X86
-    /* Alternatively we may have a CodeView record from an EFI build. */
-    if ( rc && efi_enabled(EFI_LOADER) )
+    /*
+     * xen.efi built with a new enough toolchain will have a CodeView record,
+     * not an ELF note.
+     */
+    if ( rc )
     {
         const struct pe_external_debug_directory *dir = (const void *)n;
 
-- 
2.39.5

Re: [PATCH 1/3] x86/EFI: Fix detection of buildid

Posted by Daniel Smith 3 months ago

---- On Thu, 05 Jun 2025 07:16:36 -0400 Andrew Cooper <andrew.cooper3@citrix.com> wrote ---

 > The format of the buildid is a property of the binary, not a property of how 
 > it was loaded.  This fixes buildid recognition when starting xen.efi from it's 
 > MB2 entrypoint. 
 >  
 > Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com> 
 > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> 
 > --- 
 > CC: Jan Beulich <JBeulich@suse.com> 
 > CC: Roger Pau Monné <roger.pau@citrix.com> 
 > CC: Ross Lagerwall <ross.lagerwall@citrix.com> 
 > CC: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> 
 > CC: Daniel P. Smith <dpsmith@apertussolutions.com> 
 >  
 > I don't like this patch and tried hard to do it in a better way, but the EFI 
 > aspects of the build system are too intractable. 
 >  
 > While on x86 I can in principle pull the same common-stubs.o trick, split on 
 > XEN_BUILD_PE rather than XEN_BUILD_EFI, that doesn't work on ARM which 
 > hand-codes it's PE-ness.  Also, it's really not EFI related, other than as a 
 > consequence of that being the only reason we use PE32+ binaries. 
 >  
 > Binutils 2.25 is now the minimum, and the makefiles can be cleaned up 
 > somewhat, but I need to backport this patch, internally at least. 
 > --- 
 >  xen/common/version.c | 7 +++++-- 
 >  1 file changed, 5 insertions(+), 2 deletions(-) 
 >  
 > diff --git a/xen/common/version.c b/xen/common/version.c 
 > index 5474b8e385be..56b51c81d2fc 100644 
 > --- a/xen/common/version.c 
 > +++ b/xen/common/version.c 
 > @@ -203,8 +203,11 @@ void __init xen_build_init(void) 
 >  rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len); 
 >  
 >  #ifdef CONFIG_X86 
 > -    /* Alternatively we may have a CodeView record from an EFI build. */ 
 > -    if ( rc && efi_enabled(EFI_LOADER) ) 
 > +    /* 
 > +     * xen.efi built with a new enough toolchain will have a CodeView record, 
 > +     * not an ELF note. 
 > +     */ 
 > +    if ( rc ) 
 >  { 
 >  const struct pe_external_debug_directory *dir = (const void *)n; 
 >  
 > -- 
 > 2.39.5 
 >  
 > 

From what I can see, thre are no longer objections and two Rb's ackowledging this is the best path forward. I do not see an issue with it myself.

Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com>

Re: [PATCH 1/3] x86/EFI: Fix detection of buildid

Posted by Roger Pau Monné 4 months ago

On Thu, Jun 05, 2025 at 12:16:36PM +0100, Andrew Cooper wrote:
> The format of the buildid is a property of the binary, not a property of how
> it was loaded.  This fixes buildid recognition when starting xen.efi from it's
> MB2 entrypoint.
> 
> Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Some possibly useless rants below.

> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Ross Lagerwall <ross.lagerwall@citrix.com>
> CC: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> CC: Daniel P. Smith <dpsmith@apertussolutions.com>
> 
> I don't like this patch and tried hard to do it in a better way, but the EFI
> aspects of the build system are too intractable.
> 
> While on x86 I can in principle pull the same common-stubs.o trick, split on
> XEN_BUILD_PE rather than XEN_BUILD_EFI, that doesn't work on ARM which
> hand-codes it's PE-ness.  Also, it's really not EFI related, other than as a
> consequence of that being the only reason we use PE32+ binaries.

Since this is already gated on CONFIG_X86 you could pass XEN_BUILD_PE
as a define in CFLAGS, and use it together with the CONFIG_X86 check?

That however would still put the code in the .gz binary, as the object
file would be the same for both builds.  Otherwise we would need to
compile this twice and use the different object files for the PE vs
ELF images, which seems extremely cumbersome for a little benefit, as
this is init-only code that would be gone once booted.

Thanks, Roger.

Re: [PATCH 1/3] x86/EFI: Fix detection of buildid

Posted by Frediano Ziglio 4 months ago

On Thu, Jun 5, 2025 at 12:17 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> The format of the buildid is a property of the binary, not a property of how
> it was loaded.  This fixes buildid recognition when starting xen.efi from it's
> MB2 entrypoint.
>
> Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Ross Lagerwall <ross.lagerwall@citrix.com>
> CC: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> CC: Daniel P. Smith <dpsmith@apertussolutions.com>
>
> I don't like this patch and tried hard to do it in a better way, but the EFI
> aspects of the build system are too intractable.
>
> While on x86 I can in principle pull the same common-stubs.o trick, split on
> XEN_BUILD_PE rather than XEN_BUILD_EFI, that doesn't work on ARM which
> hand-codes it's PE-ness.  Also, it's really not EFI related, other than as a
> consequence of that being the only reason we use PE32+ binaries.
>
> Binutils 2.25 is now the minimum, and the makefiles can be cleaned up
> somewhat, but I need to backport this patch, internally at least.
> ---
>  xen/common/version.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/xen/common/version.c b/xen/common/version.c
> index 5474b8e385be..56b51c81d2fc 100644
> --- a/xen/common/version.c
> +++ b/xen/common/version.c
> @@ -203,8 +203,11 @@ void __init xen_build_init(void)
>      rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len);
>
>  #ifdef CONFIG_X86
> -    /* Alternatively we may have a CodeView record from an EFI build. */
> -    if ( rc && efi_enabled(EFI_LOADER) )
> +    /*
> +     * xen.efi built with a new enough toolchain will have a CodeView record,
> +     * not an ELF note.
> +     */
> +    if ( rc )
>      {
>          const struct pe_external_debug_directory *dir = (const void *)n;
>

Reviewed-by: Frediano Ziglio <frediano.ziglio@cloud.com>

I cannot see a better proposal in the discussion, but I prefer this
fixed than keep it broken.

Frediano

Re: [PATCH 1/3] x86/EFI: Fix detection of buildid

Posted by Marek Marczykowski-Górecki 3 months ago

On Fri, Jun 27, 2025 at 02:08:00PM +0100, Frediano Ziglio wrote:
> On Thu, Jun 5, 2025 at 12:17 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >
> > The format of the buildid is a property of the binary, not a property of how
> > it was loaded.  This fixes buildid recognition when starting xen.efi from it's
> > MB2 entrypoint.
> >
> > Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > ---
> > CC: Jan Beulich <JBeulich@suse.com>
> > CC: Roger Pau Monné <roger.pau@citrix.com>
> > CC: Ross Lagerwall <ross.lagerwall@citrix.com>
> > CC: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > CC: Daniel P. Smith <dpsmith@apertussolutions.com>
> >
> > I don't like this patch and tried hard to do it in a better way, but the EFI
> > aspects of the build system are too intractable.
> >
> > While on x86 I can in principle pull the same common-stubs.o trick, split on
> > XEN_BUILD_PE rather than XEN_BUILD_EFI, that doesn't work on ARM which
> > hand-codes it's PE-ness.  Also, it's really not EFI related, other than as a
> > consequence of that being the only reason we use PE32+ binaries.

Besides the ARM issue, it would also make it harder to follow different
boot paths.

> >
> > Binutils 2.25 is now the minimum, and the makefiles can be cleaned up
> > somewhat, but I need to backport this patch, internally at least.
> > ---
> >  xen/common/version.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/xen/common/version.c b/xen/common/version.c
> > index 5474b8e385be..56b51c81d2fc 100644
> > --- a/xen/common/version.c
> > +++ b/xen/common/version.c
> > @@ -203,8 +203,11 @@ void __init xen_build_init(void)
> >      rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len);
> >
> >  #ifdef CONFIG_X86
> > -    /* Alternatively we may have a CodeView record from an EFI build. */
> > -    if ( rc && efi_enabled(EFI_LOADER) )
> > +    /*
> > +     * xen.efi built with a new enough toolchain will have a CodeView record,
> > +     * not an ELF note.
> > +     */
> > +    if ( rc )
> >      {
> >          const struct pe_external_debug_directory *dir = (const void *)n;
> >
> 
> Reviewed-by: Frediano Ziglio <frediano.ziglio@cloud.com>
> 
> I cannot see a better proposal in the discussion, but I prefer this
> fixed than keep it broken.

Yeah, I agree.

Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Re: [PATCH 1/3] x86/EFI: Fix detection of buildid

Posted by Jan Beulich 4 months, 3 weeks ago

On 05.06.2025 13:16, Andrew Cooper wrote:
> The format of the buildid is a property of the binary, not a property of how
> it was loaded.  This fixes buildid recognition when starting xen.efi from it's
> MB2 entrypoint.
> 
> Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

I'll pick this up without a Fixes: tag, but I think it ought to have one. (I
didn't check whether MB2 or build-id support came later, hence introducing the
issue.)

> --- a/xen/common/version.c
> +++ b/xen/common/version.c
> @@ -203,8 +203,11 @@ void __init xen_build_init(void)
>      rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len);
>  
>  #ifdef CONFIG_X86
> -    /* Alternatively we may have a CodeView record from an EFI build. */
> -    if ( rc && efi_enabled(EFI_LOADER) )
> +    /*
> +     * xen.efi built with a new enough toolchain will have a CodeView record,
> +     * not an ELF note.
> +     */
> +    if ( rc )

Instead of dropping the efi_enabled(), I think you want to replace EFI_LOADER
by EFI_BOOT.

Jan

Re: [PATCH 1/3] x86/EFI: Fix detection of buildid

Posted by Marek Marczykowski-Górecki 4 months, 3 weeks ago

On Thu, Jun 05, 2025 at 02:02:21PM +0200, Jan Beulich wrote:
> On 05.06.2025 13:16, Andrew Cooper wrote:
> > The format of the buildid is a property of the binary, not a property of how
> > it was loaded.  This fixes buildid recognition when starting xen.efi from it's
> > MB2 entrypoint.
> > 
> > Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> I'll pick this up without a Fixes: tag, but I think it ought to have one. (I
> didn't check whether MB2 or build-id support came later, hence introducing the
> issue.)
> 
> > --- a/xen/common/version.c
> > +++ b/xen/common/version.c
> > @@ -203,8 +203,11 @@ void __init xen_build_init(void)
> >      rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len);
> >  
> >  #ifdef CONFIG_X86
> > -    /* Alternatively we may have a CodeView record from an EFI build. */
> > -    if ( rc && efi_enabled(EFI_LOADER) )
> > +    /*
> > +     * xen.efi built with a new enough toolchain will have a CodeView record,
> > +     * not an ELF note.
> > +     */
> > +    if ( rc )
> 
> Instead of dropping the efi_enabled(), I think you want to replace EFI_LOADER
> by EFI_BOOT.

Part of the motivation for MB2 entry point in xen.efi is using the
same binary in all boot modes, including legacy boot - in which case
none of EFI_* checks would be appropriate here. The grub series adds
support for loading PE binaries, and IIUC it isn't tied to booting via
EFI.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Re: [PATCH 1/3] x86/EFI: Fix detection of buildid

Posted by Jan Beulich 4 months, 3 weeks ago

On 05.06.2025 14:20, Marek Marczykowski-Górecki wrote:
> On Thu, Jun 05, 2025 at 02:02:21PM +0200, Jan Beulich wrote:
>> On 05.06.2025 13:16, Andrew Cooper wrote:
>>> The format of the buildid is a property of the binary, not a property of how
>>> it was loaded.  This fixes buildid recognition when starting xen.efi from it's
>>> MB2 entrypoint.
>>>
>>> Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>
>> I'll pick this up without a Fixes: tag, but I think it ought to have one. (I
>> didn't check whether MB2 or build-id support came later, hence introducing the
>> issue.)
>>
>>> --- a/xen/common/version.c
>>> +++ b/xen/common/version.c
>>> @@ -203,8 +203,11 @@ void __init xen_build_init(void)
>>>      rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len);
>>>  
>>>  #ifdef CONFIG_X86
>>> -    /* Alternatively we may have a CodeView record from an EFI build. */
>>> -    if ( rc && efi_enabled(EFI_LOADER) )
>>> +    /*
>>> +     * xen.efi built with a new enough toolchain will have a CodeView record,
>>> +     * not an ELF note.
>>> +     */
>>> +    if ( rc )
>>
>> Instead of dropping the efi_enabled(), I think you want to replace EFI_LOADER
>> by EFI_BOOT.
> 
> Part of the motivation for MB2 entry point in xen.efi is using the
> same binary in all boot modes, including legacy boot - in which case
> none of EFI_* checks would be appropriate here. The grub series adds
> support for loading PE binaries, and IIUC it isn't tied to booting via
> EFI.

"The grub series" being which one? I didn't know xen.efi's non-EFI MB2 entry
point could be used; instead I was under the impression that someone was
having (half?) a patch eliminating the MB data from xen.efi, as being dead
code.

Jan

Re: [PATCH 1/3] x86/EFI: Fix detection of buildid

Posted by Marek Marczykowski-Górecki 4 months, 3 weeks ago

On Thu, Jun 05, 2025 at 03:08:07PM +0200, Jan Beulich wrote:
> On 05.06.2025 14:20, Marek Marczykowski-Górecki wrote:
> > On Thu, Jun 05, 2025 at 02:02:21PM +0200, Jan Beulich wrote:
> >> On 05.06.2025 13:16, Andrew Cooper wrote:
> >>> The format of the buildid is a property of the binary, not a property of how
> >>> it was loaded.  This fixes buildid recognition when starting xen.efi from it's
> >>> MB2 entrypoint.
> >>>
> >>> Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> >>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >>
> >> I'll pick this up without a Fixes: tag, but I think it ought to have one. (I
> >> didn't check whether MB2 or build-id support came later, hence introducing the
> >> issue.)
> >>
> >>> --- a/xen/common/version.c
> >>> +++ b/xen/common/version.c
> >>> @@ -203,8 +203,11 @@ void __init xen_build_init(void)
> >>>      rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len);
> >>>  
> >>>  #ifdef CONFIG_X86
> >>> -    /* Alternatively we may have a CodeView record from an EFI build. */
> >>> -    if ( rc && efi_enabled(EFI_LOADER) )
> >>> +    /*
> >>> +     * xen.efi built with a new enough toolchain will have a CodeView record,
> >>> +     * not an ELF note.
> >>> +     */
> >>> +    if ( rc )
> >>
> >> Instead of dropping the efi_enabled(), I think you want to replace EFI_LOADER
> >> by EFI_BOOT.
> > 
> > Part of the motivation for MB2 entry point in xen.efi is using the
> > same binary in all boot modes, including legacy boot - in which case
> > none of EFI_* checks would be appropriate here. The grub series adds
> > support for loading PE binaries, and IIUC it isn't tied to booting via
> > EFI.
> 
> "The grub series" being which one? 

https://lists.gnu.org/archive/html/grub-devel/2024-03/msg00080.html
| This patch series implements support for loading and verifying a signed
| xen binary. This would allow the same xen binary to be used for BIOS
| boot, UEFI boot, and UEFI boot with Secure Boot verification.
| There is an accompanying Xen patch series.

> I didn't know xen.efi's non-EFI MB2 entry
> point could be used; instead I was under the impression that someone was
> having (half?) a patch eliminating the MB data from xen.efi, as being dead
> code.

See also
https://lore.kernel.org/xen-devel/CAG7k0EroeA=cRRDWnJqzH8esoaSmtg8-xjTwc-01og5R9JwPzg@mail.gmail.com/
for some extra context.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Re: [PATCH 1/3] x86/EFI: Fix detection of buildid

Posted by Andrew Cooper 4 months, 3 weeks ago

On 05/06/2025 1:02 pm, Jan Beulich wrote:
> On 05.06.2025 13:16, Andrew Cooper wrote:
>> The format of the buildid is a property of the binary, not a property of how
>> it was loaded.  This fixes buildid recognition when starting xen.efi from it's
>> MB2 entrypoint.
>>
>> Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> I'll pick this up without a Fixes: tag, but I think it ought to have one. (I
> didn't check whether MB2 or build-id support came later, hence introducing the
> issue.)

MB2+EFI came long before any buildid support.  If you want a fixes tag,
it's eee5909e9d1

>
>> --- a/xen/common/version.c
>> +++ b/xen/common/version.c
>> @@ -203,8 +203,11 @@ void __init xen_build_init(void)
>>      rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len);
>>  
>>  #ifdef CONFIG_X86
>> -    /* Alternatively we may have a CodeView record from an EFI build. */
>> -    if ( rc && efi_enabled(EFI_LOADER) )
>> +    /*
>> +     * xen.efi built with a new enough toolchain will have a CodeView record,
>> +     * not an ELF note.
>> +     */
>> +    if ( rc )
> Instead of dropping the efi_enabled(), I think you want to replace EFI_LOADER
> by EFI_BOOT.

No - that's differently buggy.  I suppose the commit message wasn't
clear enough?

We'd still have a CodeView record if we booted xen.efi using it's MB2
entrypoint without the EFI extensions.

This really is a property of being a PE32+ binary, and nothing to do
with EFI.

~Andrew

Re: [PATCH 1/3] x86/EFI: Fix detection of buildid

Posted by Jan Beulich 4 months, 3 weeks ago

On 05.06.2025 14:14, Andrew Cooper wrote:
> On 05/06/2025 1:02 pm, Jan Beulich wrote:
>> On 05.06.2025 13:16, Andrew Cooper wrote:
>>> The format of the buildid is a property of the binary, not a property of how
>>> it was loaded.  This fixes buildid recognition when starting xen.efi from it's
>>> MB2 entrypoint.
>>>
>>> Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> I'll pick this up without a Fixes: tag, but I think it ought to have one. (I
>> didn't check whether MB2 or build-id support came later, hence introducing the
>> issue.)
> 
> MB2+EFI came long before any buildid support.  If you want a fixes tag,
> it's eee5909e9d1

That commit talks of an earlier hack, though. And no, MB2 work came later,
albeit still in the 4.9 dev window (commit 9180f53655245).

>>> --- a/xen/common/version.c
>>> +++ b/xen/common/version.c
>>> @@ -203,8 +203,11 @@ void __init xen_build_init(void)
>>>      rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len);
>>>  
>>>  #ifdef CONFIG_X86
>>> -    /* Alternatively we may have a CodeView record from an EFI build. */
>>> -    if ( rc && efi_enabled(EFI_LOADER) )
>>> +    /*
>>> +     * xen.efi built with a new enough toolchain will have a CodeView record,
>>> +     * not an ELF note.
>>> +     */
>>> +    if ( rc )
>> Instead of dropping the efi_enabled(), I think you want to replace EFI_LOADER
>> by EFI_BOOT.
> 
> No - that's differently buggy.  I suppose the commit message wasn't
> clear enough?
> 
> We'd still have a CodeView record if we booted xen.efi using it's MB2
> entrypoint without the EFI extensions.

Hmm, if that's a possible mode of operation (as said in reply to Marek, I
wasn't aware of that) - yes.

> This really is a property of being a PE32+ binary, and nothing to do
> with EFI.

Which still can be checked for without having this code path being taken
for xen.gz, too: You could e.g. check for &efi > &_end. That's firmly an
image property (yet I expect you're going to sigh about yet another hack).

Jan

Re: [PATCH 1/3] x86/EFI: Fix detection of buildid

Posted by Andrew Cooper 4 months, 3 weeks ago

On 05/06/2025 2:24 pm, Jan Beulich wrote:
> On 05.06.2025 14:14, Andrew Cooper wrote:
>> On 05/06/2025 1:02 pm, Jan Beulich wrote:
>>> On 05.06.2025 13:16, Andrew Cooper wrote:
>>>> The format of the buildid is a property of the binary, not a property of how
>>>> it was loaded.  This fixes buildid recognition when starting xen.efi from it's
>>>> MB2 entrypoint.
>>>>
>>>> Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> I'll pick this up without a Fixes: tag, but I think it ought to have one. (I
>>> didn't check whether MB2 or build-id support came later, hence introducing the
>>> issue.)
>> MB2+EFI came long before any buildid support.  If you want a fixes tag,
>> it's eee5909e9d1
> That commit talks of an earlier hack, though. And no, MB2 work came later,
> albeit still in the 4.9 dev window (commit 9180f53655245).

The "previous hack" was embedding note.o (from the livepatch test
infrastructure) back in the main xen binary.

That's still present.  For xen.gz you get an elf note.  For xen.efi, it
may be an elf note or a CodeView region, depending on the toolchain.

>
>>>> --- a/xen/common/version.c
>>>> +++ b/xen/common/version.c
>>>> @@ -203,8 +203,11 @@ void __init xen_build_init(void)
>>>>      rc = xen_build_id_check(n, sz, &build_id_p, &build_id_len);
>>>>  
>>>>  #ifdef CONFIG_X86
>>>> -    /* Alternatively we may have a CodeView record from an EFI build. */
>>>> -    if ( rc && efi_enabled(EFI_LOADER) )
>>>> +    /*
>>>> +     * xen.efi built with a new enough toolchain will have a CodeView record,
>>>> +     * not an ELF note.
>>>> +     */
>>>> +    if ( rc )
>>> Instead of dropping the efi_enabled(), I think you want to replace EFI_LOADER
>>> by EFI_BOOT.
>> No - that's differently buggy.  I suppose the commit message wasn't
>> clear enough?
>>
>> We'd still have a CodeView record if we booted xen.efi using it's MB2
>> entrypoint without the EFI extensions.
> Hmm, if that's a possible mode of operation (as said in reply to Marek, I
> wasn't aware of that) - yes.

It's how I found the bug.

There's an awful lot of hackery in the patch-queue holding it together,
but in this case it's not really about MB2 or anything else; it's
xen_build_init() being incorrect in how it determines whether there's a
CodeView region or not.

>
>> This really is a property of being a PE32+ binary, and nothing to do
>> with EFI.
> Which still can be checked for without having this code path being taken
> for xen.gz, too: You could e.g. check for &efi > &_end. That's firmly an
> image property (yet I expect you're going to sigh about yet another hack).

It's all hacks, but no.

I'm amazed MISRA hasn't spotted that we've got a global `struct efi
efi;` and a label named efi, creating an alias for the object with it
out of bounds in the compiled image.  But even then, it's based on
XEN_BUILD_EFI not XEN_BUILD_PE and does not distinguish the property
that matters.

But the argument I'm going to make this this:  Why do you want a check,
even if you can find a correct one (and as said before, I cannot)?

This function is run exactly once.  We've excluded "nothing given by the
toolchain", and excluded "what the toolchain gave us was not the
expected ELF note".  The only thing left (modulo toolchain bugs) is the
CodeView region, and if it's not a valid CodeView region then we've
wasted a handful of cycles.

~Andrew

Re: [PATCH 1/3] x86/EFI: Fix detection of buildid

Posted by Jan Beulich 4 months, 3 weeks ago

On 05.06.2025 19:01, Andrew Cooper wrote:
> On 05/06/2025 2:24 pm, Jan Beulich wrote:
>> On 05.06.2025 14:14, Andrew Cooper wrote:
>>> On 05/06/2025 1:02 pm, Jan Beulich wrote:
>>>> On 05.06.2025 13:16, Andrew Cooper wrote:
>>> This really is a property of being a PE32+ binary, and nothing to do
>>> with EFI.
>> Which still can be checked for without having this code path being taken
>> for xen.gz, too: You could e.g. check for &efi > &_end. That's firmly an
>> image property (yet I expect you're going to sigh about yet another hack).
> 
> It's all hacks, but no.
> 
> I'm amazed MISRA hasn't spotted that we've got a global `struct efi
> efi;` and a label named efi, creating an alias for the object with it
> out of bounds in the compiled image.  But even then, it's based on
> XEN_BUILD_EFI not XEN_BUILD_PE and does not distinguish the property
> that matters.

The use of XEN_BUILD_EFI in the linker script should have been switched
to XEN_BUILD_PE when the split was introduced.

> But the argument I'm going to make this this:  Why do you want a check,
> even if you can find a correct one (and as said before, I cannot)?
> 
> This function is run exactly once.  We've excluded "nothing given by the
> toolchain", and excluded "what the toolchain gave us was not the
> expected ELF note".  The only thing left (modulo toolchain bugs) is the
> CodeView region, and if it's not a valid CodeView region then we've
> wasted a handful of cycles.

Two reasons: Having code which cannot possibly do anything useful isn't
good. Misra calls the latest the body of the inner if() "unreachable code"
and objects to the presence of such in a build. (I'm pretty sure Eclair
wouldn't spot it, but that doesn't eliminate this being a violation of
the respective rule.)

And then, based on your reasoning above, why don't you also drop the
#ifdef CONFIG_X86?

Jan

Re: [PATCH 1/3] x86/EFI: Fix detection of buildid

Posted by Stefano Stabellini 4 months, 2 weeks ago

On Fri, 6 Jun 2025, Jan Beulich wrote:
> > But the argument I'm going to make this this:  Why do you want a check,
> > even if you can find a correct one (and as said before, I cannot)?
> > 
> > This function is run exactly once.  We've excluded "nothing given by the
> > toolchain", and excluded "what the toolchain gave us was not the
> > expected ELF note".  The only thing left (modulo toolchain bugs) is the
> > CodeView region, and if it's not a valid CodeView region then we've
> > wasted a handful of cycles.
> 
> Two reasons: Having code which cannot possibly do anything useful isn't
> good. Misra calls the latest the body of the inner if() "unreachable code"
> and objects to the presence of such in a build. (I'm pretty sure Eclair
> wouldn't spot it, but that doesn't eliminate this being a violation of
> the respective rule.)

As far as I can tell this code can be reachable simply by having an
appropriate test with Xen booted via a non-EFI method and a CodeView
record being present. No code changes to Xen needed.


> And then, based on your reasoning above, why don't you also drop the
> #ifdef CONFIG_X86?

Because while theoretically possible, today there is not a single test
that would allow us to trigger this code on ARM. We would need to change
Xen to do it first.

Re: [PATCH 1/3] x86/EFI: Fix detection of buildid

Posted by Andrew Cooper 4 months, 3 weeks ago

On 06/06/2025 8:22 am, Jan Beulich wrote:
> On 05.06.2025 19:01, Andrew Cooper wrote:
>> On 05/06/2025 2:24 pm, Jan Beulich wrote:
>>> On 05.06.2025 14:14, Andrew Cooper wrote:
>>>> On 05/06/2025 1:02 pm, Jan Beulich wrote:
>>>>> On 05.06.2025 13:16, Andrew Cooper wrote:
>>>> This really is a property of being a PE32+ binary, and nothing to do
>>>> with EFI.
>>> Which still can be checked for without having this code path being taken
>>> for xen.gz, too: You could e.g. check for &efi > &_end. That's firmly an
>>> image property (yet I expect you're going to sigh about yet another hack).
>> It's all hacks, but no.
>>
>> I'm amazed MISRA hasn't spotted that we've got a global `struct efi
>> efi;` and a label named efi, creating an alias for the object with it
>> out of bounds in the compiled image.  But even then, it's based on
>> XEN_BUILD_EFI not XEN_BUILD_PE and does not distinguish the property
>> that matters.
> The use of XEN_BUILD_EFI in the linker script should have been switched
> to XEN_BUILD_PE when the split was introduced.

That doesn't build.  As I already explained, the stubs aren't split in a
way that allows that.
>> But the argument I'm going to make this this:  Why do you want a check,
>> even if you can find a correct one (and as said before, I cannot)?
>>
>> This function is run exactly once.  We've excluded "nothing given by the
>> toolchain", and excluded "what the toolchain gave us was not the
>> expected ELF note".  The only thing left (modulo toolchain bugs) is the
>> CodeView region, and if it's not a valid CodeView region then we've
>> wasted a handful of cycles.
> Two reasons: Having code which cannot possibly do anything useful isn't
> good. Misra calls the latest the body of the inner if() "unreachable code"
> and objects to the presence of such in a build.

It's not unreachable code, not even theoretically.

*If* there was a suitable check, I'd be using it, but everything you've
proposed has been buggy or doesn't even compile.

> And then, based on your reasoning above, why don't you also drop the
> #ifdef CONFIG_X86?

Because that's the one non-buggy way of excluding an impossible case.

x86 is the only architecture possibly linking with pep emulation, and
therefore the only architecture to possibly have a CodeView record.

~Andrew

Re: [PATCH 1/3] x86/EFI: Fix detection of buildid

Posted by Jan Beulich 4 months, 3 weeks ago

On 06.06.2025 17:01, Andrew Cooper wrote:
> On 06/06/2025 8:22 am, Jan Beulich wrote:
>> On 05.06.2025 19:01, Andrew Cooper wrote:
>>> On 05/06/2025 2:24 pm, Jan Beulich wrote:
>>>> On 05.06.2025 14:14, Andrew Cooper wrote:
>>>>> On 05/06/2025 1:02 pm, Jan Beulich wrote:
>>>>>> On 05.06.2025 13:16, Andrew Cooper wrote:
>>>>> This really is a property of being a PE32+ binary, and nothing to do
>>>>> with EFI.
>>>> Which still can be checked for without having this code path being taken
>>>> for xen.gz, too: You could e.g. check for &efi > &_end. That's firmly an
>>>> image property (yet I expect you're going to sigh about yet another hack).
>>> It's all hacks, but no.
>>>
>>> I'm amazed MISRA hasn't spotted that we've got a global `struct efi
>>> efi;` and a label named efi, creating an alias for the object with it
>>> out of bounds in the compiled image.  But even then, it's based on
>>> XEN_BUILD_EFI not XEN_BUILD_PE and does not distinguish the property
>>> that matters.
>> The use of XEN_BUILD_EFI in the linker script should have been switched
>> to XEN_BUILD_PE when the split was introduced.
> 
> That doesn't build.  As I already explained, the stubs aren't split in a
> way that allows that.

Which then is a pretty clear indication that the split was wrong to do in
the first place, don't you agree?

>>> But the argument I'm going to make this this:  Why do you want a check,
>>> even if you can find a correct one (and as said before, I cannot)?
>>>
>>> This function is run exactly once.  We've excluded "nothing given by the
>>> toolchain", and excluded "what the toolchain gave us was not the
>>> expected ELF note".  The only thing left (modulo toolchain bugs) is the
>>> CodeView region, and if it's not a valid CodeView region then we've
>>> wasted a handful of cycles.
>> Two reasons: Having code which cannot possibly do anything useful isn't
>> good. Misra calls the latest the body of the inner if() "unreachable code"
>> and objects to the presence of such in a build.
> 
> It's not unreachable code, not even theoretically.

How is it not? If we build without this CodeView record, it very much is
unreachable.

> *If* there was a suitable check, I'd be using it, but everything you've
> proposed has been buggy or doesn't even compile.

Okay, but we draw different conclusions: You want to do it in a way that,
as per above, imo introduces unreachable code. Whereas I keep wanting to
find a suitable check (or if necessary introduce whatever is needed to
have one).

>> And then, based on your reasoning above, why don't you also drop the
>> #ifdef CONFIG_X86?
> 
> Because that's the one non-buggy way of excluding an impossible case.
> 
> x86 is the only architecture possibly linking with pep emulation, and
> therefore the only architecture to possibly have a CodeView record.

And how's the, say, Arm case different from the x86 case with no such
record built in? Either it's unreachable code in both cases, or it's
not.

Jan

Re: [PATCH 1/3] x86/EFI: Fix detection of buildid

Posted by Andrew Cooper 4 months, 2 weeks ago

On 10/06/2025 9:01 am, Jan Beulich wrote:
> On 06.06.2025 17:01, Andrew Cooper wrote:
>> On 06/06/2025 8:22 am, Jan Beulich wrote:
>>> On 05.06.2025 19:01, Andrew Cooper wrote:
>>>> On 05/06/2025 2:24 pm, Jan Beulich wrote:
>>>>> On 05.06.2025 14:14, Andrew Cooper wrote:
>>>>>> On 05/06/2025 1:02 pm, Jan Beulich wrote:
>>>>>>> On 05.06.2025 13:16, Andrew Cooper wrote:
>>>>>> This really is a property of being a PE32+ binary, and nothing to do
>>>>>> with EFI.
>>>>> Which still can be checked for without having this code path being taken
>>>>> for xen.gz, too: You could e.g. check for &efi > &_end. That's firmly an
>>>>> image property (yet I expect you're going to sigh about yet another hack).
>>>> It's all hacks, but no.
>>>>
>>>> I'm amazed MISRA hasn't spotted that we've got a global `struct efi
>>>> efi;` and a label named efi, creating an alias for the object with it
>>>> out of bounds in the compiled image.  But even then, it's based on
>>>> XEN_BUILD_EFI not XEN_BUILD_PE and does not distinguish the property
>>>> that matters.
>>> The use of XEN_BUILD_EFI in the linker script should have been switched
>>> to XEN_BUILD_PE when the split was introduced.
>> That doesn't build.  As I already explained, the stubs aren't split in a
>> way that allows that.
> Which then is a pretty clear indication that the split was wrong to do in
> the first place, don't you agree?

I think my feelings on how xen.efi was done are quite well known, but so
what?

I've spent longer than I can afford trying to untangle this, and its an
impenetrable mess.

>>>> But the argument I'm going to make this this:  Why do you want a check,
>>>> even if you can find a correct one (and as said before, I cannot)?
>>>>
>>>> This function is run exactly once.  We've excluded "nothing given by the
>>>> toolchain", and excluded "what the toolchain gave us was not the
>>>> expected ELF note".  The only thing left (modulo toolchain bugs) is the
>>>> CodeView region, and if it's not a valid CodeView region then we've
>>>> wasted a handful of cycles.
>>> Two reasons: Having code which cannot possibly do anything useful isn't
>>> good. Misra calls the latest the body of the inner if() "unreachable code"
>>> and objects to the presence of such in a build.
>> It's not unreachable code, not even theoretically.
> How is it not? If we build without this CodeView record, it very much is
> unreachable.

Compiling without a CodeView record doesn't magically cause the prior
logic to guarantee to succeed.

The compiler is forced to emit real conditional logic, and there's
almost 2^96 bit-pattens the toolchain could put into memory which will
very literally reach the CodeView check.

"The toolchain shouldn't cause this path to be executed" is not the same
as genuinely unreachable.

What safety certification is liable to complain about is the inability
to construct a test that demonstrates coverage, but I'm not changing
that property with this patch.
>>> And then, based on your reasoning above, why don't you also drop the
>>> #ifdef CONFIG_X86?
>> Because that's the one non-buggy way of excluding an impossible case.
>>
>> x86 is the only architecture possibly linking with pep emulation, and
>> therefore the only architecture to possibly have a CodeView record.
> And how's the, say, Arm case different from the x86 case with no such
> record built in? 

Because its currently impossible for ARM to have a codeview record.

Remember that ARM writes a MZ/PE header by hand in a flat binary.  It
does not use a PEP linker.

~Andrew

Re: [PATCH 1/3] x86/EFI: Fix detection of buildid

Posted by Jan Beulich 4 months, 2 weeks ago

On 13.06.2025 23:28, Andrew Cooper wrote:
> On 10/06/2025 9:01 am, Jan Beulich wrote:
>> On 06.06.2025 17:01, Andrew Cooper wrote:
>>> On 06/06/2025 8:22 am, Jan Beulich wrote:
>>>> And then, based on your reasoning above, why don't you also drop the
>>>> #ifdef CONFIG_X86?
>>> Because that's the one non-buggy way of excluding an impossible case.
>>>
>>> x86 is the only architecture possibly linking with pep emulation, and
>>> therefore the only architecture to possibly have a CodeView record.
>> And how's the, say, Arm case different from the x86 case with no such
>> record built in? 
> 
> Because its currently impossible for ARM to have a codeview record.

There must be some pretty fundamental misunderstanding then: I can only
ask again - how's this different from xen.gz? There's not going to be
CodeView record there, yet still you enable the logic there. Hence the
body of the conditional is unreachable in that case.

To expand on my earlier suggestion (ab)using the "efi" global: With
the linker script having this

#ifdef EFI
  .reloc ALIGN(4) : {
    __base_relocs_start = .;
    *(.reloc)
    __base_relocs_end = .;
  }
#elif defined(XEN_BUILD_EFI)
  /*
   * Due to the way EFI support is currently implemented, these two symbols
   * need to be defined.  Their precise values shouldn't matter (the consuming
   * function doesn't get called), but to be on the safe side both values would
   * better match.  Of course the need to be reachable by the relocations
   * referencing them.
   */
  PROVIDE(__base_relocs_start = .);
  PROVIDE(__base_relocs_end = .);
#else
  efi = .;
#endif

where only the #if applies to xen.efi, can't we (ab)use the combination of the
other two symbols here to decide between xen.efi vs xen.gz?
__base_relocs_{start,efi} won't possibly be equal for xen.efi, except in an
extremely theoretical situation (and we could cover for that case by an ASSERT
in the linker script). Pseudo code:

#ifdef XEN_BUILD_EFI
    if ( __base_relocs_start != __base_relocs_end )
    {
        ...
    }
#endif

IOW that #if could simply replace the CONFIG_X86 one that's there right now.

> Remember that ARM writes a MZ/PE header by hand in a flat binary.  It
> does not use a PEP linker.

Of course. Much like for xen.gz there's no PEP linking involved.

Jan

Re: [PATCH 1/3] x86/EFI: Fix detection of buildid

Posted by Andrew Cooper 4 months, 1 week ago

On 16/06/2025 7:27 am, Jan Beulich wrote:
> To expand on my earlier suggestion (ab)using the "efi" global: With
> the linker script having this
>
> #ifdef EFI
>   .reloc ALIGN(4) : {
>     __base_relocs_start = .;
>     *(.reloc)
>     __base_relocs_end = .;
>   }
> #elif defined(XEN_BUILD_EFI)
>   /*
>    * Due to the way EFI support is currently implemented, these two symbols
>    * need to be defined.  Their precise values shouldn't matter (the consuming
>    * function doesn't get called), but to be on the safe side both values would
>    * better match.  Of course the need to be reachable by the relocations
>    * referencing them.
>    */
>   PROVIDE(__base_relocs_start = .);
>   PROVIDE(__base_relocs_end = .);
> #else
>   efi = .;
> #endif
>
> where only the #if applies to xen.efi, can't we (ab)use the combination of the
> other two symbols here to decide between xen.efi vs xen.gz?
> __base_relocs_{start,efi} won't possibly be equal for xen.efi, except in an
> extremely theoretical situation (and we could cover for that case by an ASSERT
> in the linker script). Pseudo code:
>
> #ifdef XEN_BUILD_EFI
>     if ( __base_relocs_start != __base_relocs_end )
>     {
>         ...
>     }
> #endif
>
> IOW that #if could simply replace the CONFIG_X86 one that's there right now.

That's horrifying.  Also you can't include efi-boot.h to get the
declarations.

But given that you are adamant that the (...) in there containing a
CodeView check is unacceptable to have, why does wrapping it in yet
another conditional make it ok?

~Andrew

Re: [PATCH 1/3] x86/EFI: Fix detection of buildid

Posted by Jan Beulich 4 months, 1 week ago

On 23.06.2025 22:18, Andrew Cooper wrote:
> On 16/06/2025 7:27 am, Jan Beulich wrote:
>> To expand on my earlier suggestion (ab)using the "efi" global: With
>> the linker script having this
>>
>> #ifdef EFI
>>   .reloc ALIGN(4) : {
>>     __base_relocs_start = .;
>>     *(.reloc)
>>     __base_relocs_end = .;
>>   }
>> #elif defined(XEN_BUILD_EFI)
>>   /*
>>    * Due to the way EFI support is currently implemented, these two symbols
>>    * need to be defined.  Their precise values shouldn't matter (the consuming
>>    * function doesn't get called), but to be on the safe side both values would
>>    * better match.  Of course the need to be reachable by the relocations
>>    * referencing them.
>>    */
>>   PROVIDE(__base_relocs_start = .);
>>   PROVIDE(__base_relocs_end = .);
>> #else
>>   efi = .;
>> #endif
>>
>> where only the #if applies to xen.efi, can't we (ab)use the combination of the
>> other two symbols here to decide between xen.efi vs xen.gz?
>> __base_relocs_{start,efi} won't possibly be equal for xen.efi, except in an
>> extremely theoretical situation (and we could cover for that case by an ASSERT
>> in the linker script). Pseudo code:
>>
>> #ifdef XEN_BUILD_EFI
>>     if ( __base_relocs_start != __base_relocs_end )
>>     {
>>         ...
>>     }
>> #endif
>>
>> IOW that #if could simply replace the CONFIG_X86 one that's there right now.
> 
> That's horrifying.  Also you can't include efi-boot.h to get the
> declarations.

The declarations could be moved, but ...

> But given that you are adamant that the (...) in there containing a
> CodeView check is unacceptable to have, why does wrapping it in yet
> another conditional make it ok?

... hmm, yes, there'll still be unreachable code there. I guess I'll shut
up here and leave this to the EFI maintainers. Just as long as there's not
going to be any Eclair / Misra fallout.

Jan

Re: [PATCH 1/3] x86/EFI: Fix detection of buildid

Posted by Jan Beulich 4 months, 3 weeks ago

On 06.06.2025 09:22, Jan Beulich wrote:
> On 05.06.2025 19:01, Andrew Cooper wrote:
>> On 05/06/2025 2:24 pm, Jan Beulich wrote:
>>> On 05.06.2025 14:14, Andrew Cooper wrote:
>>>> On 05/06/2025 1:02 pm, Jan Beulich wrote:
>>>>> On 05.06.2025 13:16, Andrew Cooper wrote:
>>>> This really is a property of being a PE32+ binary, and nothing to do
>>>> with EFI.
>>> Which still can be checked for without having this code path being taken
>>> for xen.gz, too: You could e.g. check for &efi > &_end. That's firmly an
>>> image property (yet I expect you're going to sigh about yet another hack).
>>
>> It's all hacks, but no.
>>
>> I'm amazed MISRA hasn't spotted that we've got a global `struct efi
>> efi;` and a label named efi, creating an alias for the object with it
>> out of bounds in the compiled image.  But even then, it's based on
>> XEN_BUILD_EFI not XEN_BUILD_PE and does not distinguish the property
>> that matters.
> 
> The use of XEN_BUILD_EFI in the linker script should have been switched
> to XEN_BUILD_PE when the split was introduced.
> 
>> But the argument I'm going to make this this:  Why do you want a check,
>> even if you can find a correct one (and as said before, I cannot)?
>>
>> This function is run exactly once.  We've excluded "nothing given by the
>> toolchain", and excluded "what the toolchain gave us was not the
>> expected ELF note".  The only thing left (modulo toolchain bugs) is the
>> CodeView region, and if it's not a valid CodeView region then we've
>> wasted a handful of cycles.
> 
> Two reasons: Having code which cannot possibly do anything useful isn't
> good. Misra calls the latest the body of the inner if() "unreachable code"
> and objects to the presence of such in a build. (I'm pretty sure Eclair
> wouldn't spot it, but that doesn't eliminate this being a violation of
> the respective rule.)
> 
> And then, based on your reasoning above, why don't you also drop the
> #ifdef CONFIG_X86?

..., saying in the description "we can as well check for this uniformly"

Jan