[PATCH v5 0/5] Reuse 32 bit C code more safely

Frediano Ziglio posted 5 patches 1 month ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20241016092154.1493035-1-frediano.ziglio@cloud.com
There is a newer version of this series
xen/arch/x86/boot/.gitignore                  |   5 +-
xen/arch/x86/boot/Makefile                    |  51 +++-
.../x86/boot/{build32.lds => build32.lds.S}   |  41 +++-
xen/arch/x86/boot/cmdline.c                   |  12 -
xen/arch/x86/boot/head.S                      |  49 +---
xen/arch/x86/boot/reloc-trampoline.c          |  36 +++
xen/arch/x86/boot/reloc.c                     |  35 +--
xen/arch/x86/efi/efi-boot.h                   |  15 +-
xen/tools/combine_two_binaries.py             | 218 ++++++++++++++++++
9 files changed, 348 insertions(+), 114 deletions(-)
rename xen/arch/x86/boot/{build32.lds => build32.lds.S} (63%)
create mode 100644 xen/arch/x86/boot/reloc-trampoline.c
create mode 100755 xen/tools/combine_two_binaries.py
[PATCH v5 0/5] Reuse 32 bit C code more safely
Posted by Frediano Ziglio 1 month ago
This series attempt to:
- use more C code, that is replace some assembly code with C;
- avoid some code duplication between C and assembly;
- prevent some issues having relocations in C code.

The idea is extending the current C to binary code conversion
done for 32 bit C code called from head.S making sure relocations
are safe and allowing external symbols usage from C code.

More details of the implementation are in commit message 1/5,
which is the largest patch.
Patch 2/5 reuses code to relocate the trampoline between 32 and 64 bit.
Patches 3/5 and 4/5 move some code from assembly to C.

Other RFC commits were excluded, I didn't manage to entangle the issues
sharing headers between 32 and 64 bits.
Since RFC code was more tested, also with CI and some incompatibility
were fixed. On that it's weird that we need Python 3.8 for Qemu but we
still use Python 2 for Xen. Shouldn't we bump requirement to Python 3
even for Xen?

Code boots successfully using:
- BIOS boot;
- EFI boot with Grub2 and ELF file;
- direct EFI boot without Grub.

Code is currently based on "master" branch, currently commit
3f6ee3db2e878398cfcde725399b4d1b04e92269.

Changes since v1:
- 2 preliminary commits for adjust .gitignore;
- last commit split (one variable at a time);
- lot of style and names changes;
- first commit, now 3/6 had some build changes, more details on
  commit message.

Changes since v2:
- removed merged commits;
- reverted order of 2 commits;
- remove some useless casts;
- added change to comment.

Changes since v3:
- added a preparation commit for Makefiles (mainly written by Andrew Cooper);
- added a comment improvement commit;
- allows also data;
- other minor style changes;
- added some Reviewed-by.

Changes since v4:
- add build32.final.lds build32.other.lds to targets macro;
- place some comments over a rule, not inside;
- simplified linking and producing binary rule;
- renamed built_in_32 to built-in-32, coding style;
- fix minor indentation;
- put magic numbers in Makefile and propagate them;
- minor variable clanups in Python script;
- add dependency to Python script.

Frediano Ziglio (5):
  x86/boot: create a C bundle for 32 bit boot code and use it
  x86/boot: Reuse code to relocate trampoline
  x86/boot: Use boot_vid_info variable directly from C code
  x86/boot: Use trampoline_phys variable directly from C code
  x86/boot: Clarify comment

 xen/arch/x86/boot/.gitignore                  |   5 +-
 xen/arch/x86/boot/Makefile                    |  51 +++-
 .../x86/boot/{build32.lds => build32.lds.S}   |  41 +++-
 xen/arch/x86/boot/cmdline.c                   |  12 -
 xen/arch/x86/boot/head.S                      |  49 +---
 xen/arch/x86/boot/reloc-trampoline.c          |  36 +++
 xen/arch/x86/boot/reloc.c                     |  35 +--
 xen/arch/x86/efi/efi-boot.h                   |  15 +-
 xen/tools/combine_two_binaries.py             | 218 ++++++++++++++++++
 9 files changed, 348 insertions(+), 114 deletions(-)
 rename xen/arch/x86/boot/{build32.lds => build32.lds.S} (63%)
 create mode 100644 xen/arch/x86/boot/reloc-trampoline.c
 create mode 100755 xen/tools/combine_two_binaries.py

-- 
2.34.1
lib32 followup (visibility issue and/or toolchain bug?)
Posted by Andrew Cooper 1 month ago
Hello,

Preempting some future work I'm expecting to arrive, I had a go at using
__builtin_*() in obj32.

This is formed of 2 patches on top of this series:
https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-lib32

Patch 1 introduces lib32 beside obj32, with strlen() being the first
broken-out function, and patch 2 swaps to __builtin_strlen().

Both compile, but the difference that patch 2 introduces was unexpected.

With just lib32, and taking strsubcmp() as an example, we get:

00000000 <strsubcmp>:
   0:        83 ec 0c                 sub    $0xc,%esp
   3:        89 5c 24 04              mov    %ebx,0x4(%esp)
   7:        89 74 24 08              mov    %esi,0x8(%esp)
   b:        89 c6                    mov    %eax,%esi
   d:        89 d3                    mov    %edx,%ebx
   f:        89 d0                    mov    %edx,%eax
  11:    /-- e8 fc ff ff ff           call   12 <strsubcmp+0x12>
            12: R_386_PC32    strlen
  16:        89 c1                    mov    %eax,%ecx
  18:        89 da                    mov    %ebx,%edx
  1a:        89 f0                    mov    %esi,%eax
  1c:    /-- e8 fc ff ff ff           call   1d <strsubcmp+0x1d>
            1d: R_386_PC32    .text.strncmp
  21:        8b 5c 24 04              mov    0x4(%esp),%ebx
  25:        8b 74 24 08              mov    0x8(%esp),%esi
  29:        83 c4 0c                 add    $0xc,%esp
  2c:        c3                       ret

which all seems fine.  We get a plain PC32 relocation to strlen (which
is now in the separate library).

However, with patch 2 in place (simply swapping the plain extern for
__builtin_strlen(), we now get:

00000000 <strsubcmp>:
   0:        83 ec 0c                 sub    $0xc,%esp
   3:        89 1c 24                 mov    %ebx,(%esp)
   6:        89 74 24 04              mov    %esi,0x4(%esp)
   a:        89 7c 24 08              mov    %edi,0x8(%esp)
   e:    /-- e8 fc ff ff ff           call   f <strsubcmp+0xf>
            f: R_386_PC32    __x86.get_pc_thunk.bx
  13:        81 c3 02 00 00 00        add    $0x2,%ebx
            15: R_386_GOTPC    _GLOBAL_OFFSET_TABLE_
  19:        89 c7                    mov    %eax,%edi
  1b:        89 d6                    mov    %edx,%esi
  1d:        89 d0                    mov    %edx,%eax
  1f:    /-- e8 fc ff ff ff           call   20 <strsubcmp+0x20>
            20: R_386_PLT32    strlen
  24:        89 c1                    mov    %eax,%ecx
  26:        89 f2                    mov    %esi,%edx
  28:        89 f8                    mov    %edi,%eax
  2a:    /-- e8 fc ff ff ff           call   2b <strsubcmp+0x2b>
            2b: R_386_PC32    .text.strncmp
  2f:        8b 1c 24                 mov    (%esp),%ebx
  32:        8b 74 24 04              mov    0x4(%esp),%esi
  36:        8b 7c 24 08              mov    0x8(%esp),%edi
  3a:        83 c4 0c                 add    $0xc,%esp
  3d:        c3                       ret


The builtin hasn't managed to optimise away the call to strlen (that's
fine).  But, we've ended up spilling %ebx to the stack, calculating the
location of the GOT and not using it.

So, as it stands, trying to use __builtin_strlen() results in worse code
generation.  One thing I noticed was that we're not passing
-fvisibility=hidden into CFLAGS_x86_32, but fixing that doesn't help
either.  We do have the pragma from compiler.h, so I'm out of visibility
ideas.

Anything else I've missed?

~Andrew

Re: lib32 followup (visibility issue and/or toolchain bug?)
Posted by Frediano Ziglio 1 month ago
On Wed, Oct 16, 2024 at 2:30 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> Hello,
>
> Preempting some future work I'm expecting to arrive, I had a go at using
> __builtin_*() in obj32.
>
> This is formed of 2 patches on top of this series:
> https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-lib32
>

You are confident we'll have a lot of shared code to need an
additional "lib32" in the Makefile!
I would personally stick with obj32.
Note that file should be strlen.c, not strlen.32.c, otherwise you are
possibly going to pick up the general rule and not the one in the
Makefile (but maybe is what you wanted).

> Patch 1 introduces lib32 beside obj32, with strlen() being the first
> broken-out function, and patch 2 swaps to __builtin_strlen().
>
> Both compile, but the difference that patch 2 introduces was unexpected.
>
> With just lib32, and taking strsubcmp() as an example, we get:
>
> 00000000 <strsubcmp>:
>    0:        83 ec 0c                 sub    $0xc,%esp
>    3:        89 5c 24 04              mov    %ebx,0x4(%esp)
>    7:        89 74 24 08              mov    %esi,0x8(%esp)
>    b:        89 c6                    mov    %eax,%esi
>    d:        89 d3                    mov    %edx,%ebx
>    f:        89 d0                    mov    %edx,%eax
>   11:    /-- e8 fc ff ff ff           call   12 <strsubcmp+0x12>
>             12: R_386_PC32    strlen
>   16:        89 c1                    mov    %eax,%ecx
>   18:        89 da                    mov    %ebx,%edx
>   1a:        89 f0                    mov    %esi,%eax
>   1c:    /-- e8 fc ff ff ff           call   1d <strsubcmp+0x1d>
>             1d: R_386_PC32    .text.strncmp
>   21:        8b 5c 24 04              mov    0x4(%esp),%ebx
>   25:        8b 74 24 08              mov    0x8(%esp),%esi
>   29:        83 c4 0c                 add    $0xc,%esp
>   2c:        c3                       ret
>
> which all seems fine.  We get a plain PC32 relocation to strlen (which
> is now in the separate library).
>
> However, with patch 2 in place (simply swapping the plain extern for
> __builtin_strlen(), we now get:
>
> 00000000 <strsubcmp>:
>    0:        83 ec 0c                 sub    $0xc,%esp
>    3:        89 1c 24                 mov    %ebx,(%esp)
>    6:        89 74 24 04              mov    %esi,0x4(%esp)
>    a:        89 7c 24 08              mov    %edi,0x8(%esp)
>    e:    /-- e8 fc ff ff ff           call   f <strsubcmp+0xf>
>             f: R_386_PC32    __x86.get_pc_thunk.bx
>   13:        81 c3 02 00 00 00        add    $0x2,%ebx
>             15: R_386_GOTPC    _GLOBAL_OFFSET_TABLE_
>   19:        89 c7                    mov    %eax,%edi
>   1b:        89 d6                    mov    %edx,%esi
>   1d:        89 d0                    mov    %edx,%eax
>   1f:    /-- e8 fc ff ff ff           call   20 <strsubcmp+0x20>
>             20: R_386_PLT32    strlen

PLT means it not declared hidden, otherwise it would have used the
relative relocation.
Maybe

size_t strlen(const char *s);
#define strlen(s) __builtin_strlen(s)

xen/compiler.h is included, so all declaration should get the hidden
by default ? Or add __attribute__((visibility("hidden"))) explicitly.

>   24:        89 c1                    mov    %eax,%ecx
>   26:        89 f2                    mov    %esi,%edx
>   28:        89 f8                    mov    %edi,%eax
>   2a:    /-- e8 fc ff ff ff           call   2b <strsubcmp+0x2b>
>             2b: R_386_PC32    .text.strncmp
>   2f:        8b 1c 24                 mov    (%esp),%ebx
>   32:        8b 74 24 04              mov    0x4(%esp),%esi
>   36:        8b 7c 24 08              mov    0x8(%esp),%edi
>   3a:        83 c4 0c                 add    $0xc,%esp
>   3d:        c3                       ret
>
>
> The builtin hasn't managed to optimise away the call to strlen (that's
> fine).  But, we've ended up spilling %ebx to the stack, calculating the
> location of the GOT and not using it.
>

Maybe the ABI for PLT is to have %ebx set to the GOT ? Not sure about it.

> So, as it stands, trying to use __builtin_strlen() results in worse code
> generation.  One thing I noticed was that we're not passing
> -fvisibility=hidden into CFLAGS_x86_32, but fixing that doesn't help
> either.  We do have the pragma from compiler.h, so I'm out of visibility
> ideas.
>

The -fvisibility=hidden should be overridden by the xen/compiler.h;
but should be overridden with hidden!
Maybe strlen is defined by default with another visibility?
If you generate the assembly, you should see if the strlen symbol gets
the .hidden bless or not.

> Anything else I've missed?
>

Coffee :-)

Frediano
Re: lib32 followup (visibility issue and/or toolchain bug?)
Posted by Jan Beulich 1 month ago
On 16.10.2024 16:53, Frediano Ziglio wrote:
> On Wed, Oct 16, 2024 at 2:30 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>
>> Hello,
>>
>> Preempting some future work I'm expecting to arrive, I had a go at using
>> __builtin_*() in obj32.
>>
>> This is formed of 2 patches on top of this series:
>> https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-lib32
>>
> 
> You are confident we'll have a lot of shared code to need an
> additional "lib32" in the Makefile!
> I would personally stick with obj32.
> Note that file should be strlen.c, not strlen.32.c, otherwise you are
> possibly going to pick up the general rule and not the one in the
> Makefile (but maybe is what you wanted).
> 
>> Patch 1 introduces lib32 beside obj32, with strlen() being the first
>> broken-out function, and patch 2 swaps to __builtin_strlen().
>>
>> Both compile, but the difference that patch 2 introduces was unexpected.
>>
>> With just lib32, and taking strsubcmp() as an example, we get:
>>
>> 00000000 <strsubcmp>:
>>    0:        83 ec 0c                 sub    $0xc,%esp
>>    3:        89 5c 24 04              mov    %ebx,0x4(%esp)
>>    7:        89 74 24 08              mov    %esi,0x8(%esp)
>>    b:        89 c6                    mov    %eax,%esi
>>    d:        89 d3                    mov    %edx,%ebx
>>    f:        89 d0                    mov    %edx,%eax
>>   11:    /-- e8 fc ff ff ff           call   12 <strsubcmp+0x12>
>>             12: R_386_PC32    strlen
>>   16:        89 c1                    mov    %eax,%ecx
>>   18:        89 da                    mov    %ebx,%edx
>>   1a:        89 f0                    mov    %esi,%eax
>>   1c:    /-- e8 fc ff ff ff           call   1d <strsubcmp+0x1d>
>>             1d: R_386_PC32    .text.strncmp
>>   21:        8b 5c 24 04              mov    0x4(%esp),%ebx
>>   25:        8b 74 24 08              mov    0x8(%esp),%esi
>>   29:        83 c4 0c                 add    $0xc,%esp
>>   2c:        c3                       ret
>>
>> which all seems fine.  We get a plain PC32 relocation to strlen (which
>> is now in the separate library).
>>
>> However, with patch 2 in place (simply swapping the plain extern for
>> __builtin_strlen(), we now get:
>>
>> 00000000 <strsubcmp>:
>>    0:        83 ec 0c                 sub    $0xc,%esp
>>    3:        89 1c 24                 mov    %ebx,(%esp)
>>    6:        89 74 24 04              mov    %esi,0x4(%esp)
>>    a:        89 7c 24 08              mov    %edi,0x8(%esp)
>>    e:    /-- e8 fc ff ff ff           call   f <strsubcmp+0xf>
>>             f: R_386_PC32    __x86.get_pc_thunk.bx
>>   13:        81 c3 02 00 00 00        add    $0x2,%ebx
>>             15: R_386_GOTPC    _GLOBAL_OFFSET_TABLE_
>>   19:        89 c7                    mov    %eax,%edi
>>   1b:        89 d6                    mov    %edx,%esi
>>   1d:        89 d0                    mov    %edx,%eax
>>   1f:    /-- e8 fc ff ff ff           call   20 <strsubcmp+0x20>
>>             20: R_386_PLT32    strlen
> 
> PLT means it not declared hidden, otherwise it would have used the
> relative relocation.
> Maybe
> 
> size_t strlen(const char *s);
> #define strlen(s) __builtin_strlen(s)
> 
> xen/compiler.h is included, so all declaration should get the hidden
> by default ? Or add __attribute__((visibility("hidden"))) explicitly.
> 
>>   24:        89 c1                    mov    %eax,%ecx
>>   26:        89 f2                    mov    %esi,%edx
>>   28:        89 f8                    mov    %edi,%eax
>>   2a:    /-- e8 fc ff ff ff           call   2b <strsubcmp+0x2b>
>>             2b: R_386_PC32    .text.strncmp
>>   2f:        8b 1c 24                 mov    (%esp),%ebx
>>   32:        8b 74 24 04              mov    0x4(%esp),%esi
>>   36:        8b 7c 24 08              mov    0x8(%esp),%edi
>>   3a:        83 c4 0c                 add    $0xc,%esp
>>   3d:        c3                       ret
>>
>>
>> The builtin hasn't managed to optimise away the call to strlen (that's
>> fine).  But, we've ended up spilling %ebx to the stack, calculating the
>> location of the GOT and not using it.
> 
> Maybe the ABI for PLT is to have %ebx set to the GOT ? Not sure about it.

Yes, PIC PLT entries use %ebx.

>> So, as it stands, trying to use __builtin_strlen() results in worse code
>> generation.  One thing I noticed was that we're not passing
>> -fvisibility=hidden into CFLAGS_x86_32, but fixing that doesn't help
>> either.  We do have the pragma from compiler.h, so I'm out of visibility
>> ideas.
> 
> The -fvisibility=hidden should be overridden by the xen/compiler.h;
> but should be overridden with hidden!
> Maybe strlen is defined by default with another visibility?
> If you generate the assembly, you should see if the strlen symbol gets
> the .hidden bless or not.

Only defined symbols have .hidden emitted for them, afaics. But that also
doesn't matter much, as it's the emission of the @plt on the calls which
requires the setting up of %ebx prior to those calls. (Arguably in 32-bit
code, where all addresses are reachable, "hidden" could serve as a hint
that calls don't need to go through the PLT. This maybe considered a
missed optimization opportunity.) There's no difference there between
calling strlen() or e.g. a custom (extern) StrLen().

And btw, reloc.c and cmdline.c currently also compile to code using
__x86.get_pc_thunk.* (and @gotoff relocations for data references).

Jan

Re: lib32 followup (visibility issue and/or toolchain bug?)
Posted by Frediano Ziglio 1 month ago
On Wed, Oct 16, 2024 at 3:53 PM Frediano Ziglio
<frediano.ziglio@cloud.com> wrote:
>
> On Wed, Oct 16, 2024 at 2:30 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >
> > Hello,
> >
> > Preempting some future work I'm expecting to arrive, I had a go at using
> > __builtin_*() in obj32.
> >
> > This is formed of 2 patches on top of this series:
> > https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-lib32
> >
>
> You are confident we'll have a lot of shared code to need an
> additional "lib32" in the Makefile!
> I would personally stick with obj32.
> Note that file should be strlen.c, not strlen.32.c, otherwise you are
> possibly going to pick up the general rule and not the one in the
> Makefile (but maybe is what you wanted).
>
> > Patch 1 introduces lib32 beside obj32, with strlen() being the first
> > broken-out function, and patch 2 swaps to __builtin_strlen().
> >
> > Both compile, but the difference that patch 2 introduces was unexpected.
> >
> > With just lib32, and taking strsubcmp() as an example, we get:
> >
> > 00000000 <strsubcmp>:
> >    0:        83 ec 0c                 sub    $0xc,%esp
> >    3:        89 5c 24 04              mov    %ebx,0x4(%esp)
> >    7:        89 74 24 08              mov    %esi,0x8(%esp)
> >    b:        89 c6                    mov    %eax,%esi
> >    d:        89 d3                    mov    %edx,%ebx
> >    f:        89 d0                    mov    %edx,%eax
> >   11:    /-- e8 fc ff ff ff           call   12 <strsubcmp+0x12>
> >             12: R_386_PC32    strlen
> >   16:        89 c1                    mov    %eax,%ecx
> >   18:        89 da                    mov    %ebx,%edx
> >   1a:        89 f0                    mov    %esi,%eax
> >   1c:    /-- e8 fc ff ff ff           call   1d <strsubcmp+0x1d>
> >             1d: R_386_PC32    .text.strncmp
> >   21:        8b 5c 24 04              mov    0x4(%esp),%ebx
> >   25:        8b 74 24 08              mov    0x8(%esp),%esi
> >   29:        83 c4 0c                 add    $0xc,%esp
> >   2c:        c3                       ret
> >
> > which all seems fine.  We get a plain PC32 relocation to strlen (which
> > is now in the separate library).
> >
> > However, with patch 2 in place (simply swapping the plain extern for
> > __builtin_strlen(), we now get:
> >
> > 00000000 <strsubcmp>:
> >    0:        83 ec 0c                 sub    $0xc,%esp
> >    3:        89 1c 24                 mov    %ebx,(%esp)
> >    6:        89 74 24 04              mov    %esi,0x4(%esp)
> >    a:        89 7c 24 08              mov    %edi,0x8(%esp)
> >    e:    /-- e8 fc ff ff ff           call   f <strsubcmp+0xf>
> >             f: R_386_PC32    __x86.get_pc_thunk.bx
> >   13:        81 c3 02 00 00 00        add    $0x2,%ebx
> >             15: R_386_GOTPC    _GLOBAL_OFFSET_TABLE_
> >   19:        89 c7                    mov    %eax,%edi
> >   1b:        89 d6                    mov    %edx,%esi
> >   1d:        89 d0                    mov    %edx,%eax
> >   1f:    /-- e8 fc ff ff ff           call   20 <strsubcmp+0x20>
> >             20: R_386_PLT32    strlen
>
> PLT means it not declared hidden, otherwise it would have used the
> relative relocation.
> Maybe
>
> size_t strlen(const char *s);
> #define strlen(s) __builtin_strlen(s)
>
> xen/compiler.h is included, so all declaration should get the hidden
> by default ? Or add __attribute__((visibility("hidden"))) explicitly.
>
> >   24:        89 c1                    mov    %eax,%ecx
> >   26:        89 f2                    mov    %esi,%edx
> >   28:        89 f8                    mov    %edi,%eax
> >   2a:    /-- e8 fc ff ff ff           call   2b <strsubcmp+0x2b>
> >             2b: R_386_PC32    .text.strncmp
> >   2f:        8b 1c 24                 mov    (%esp),%ebx
> >   32:        8b 74 24 04              mov    0x4(%esp),%esi
> >   36:        8b 7c 24 08              mov    0x8(%esp),%edi
> >   3a:        83 c4 0c                 add    $0xc,%esp
> >   3d:        c3                       ret
> >
> >
> > The builtin hasn't managed to optimise away the call to strlen (that's
> > fine).  But, we've ended up spilling %ebx to the stack, calculating the
> > location of the GOT and not using it.
> >
>
> Maybe the ABI for PLT is to have %ebx set to the GOT ? Not sure about it.
>
> > So, as it stands, trying to use __builtin_strlen() results in worse code
> > generation.  One thing I noticed was that we're not passing
> > -fvisibility=hidden into CFLAGS_x86_32, but fixing that doesn't help
> > either.  We do have the pragma from compiler.h, so I'm out of visibility
> > ideas.
> >
>
> The -fvisibility=hidden should be overridden by the xen/compiler.h;
> but should be overridden with hidden!
> Maybe strlen is defined by default with another visibility?
> If you generate the assembly, you should see if the strlen symbol gets
> the .hidden bless or not.
>

I did some more experiments, but I didn't manage to fix the issue. It
looks like when __builtin_strlen fallback to calling strlen it uses a
private declaration of strlen. -mregparam argument is taken into
account, but not visibility. I tried to declare strlen as hidden
(checking also preprocessor output to see if other declaration were
present, none found), still the @PLT. I tried to add a "static inline
strlen"... and it was not used.

I found this workaround:

diff --git a/xen/arch/x86/boot/cmdline.c b/xen/arch/x86/boot/cmdline.c
index 80ffd0885e..ac2b0b0a4d 100644
--- a/xen/arch/x86/boot/cmdline.c
+++ b/xen/arch/x86/boot/cmdline.c
@@ -51,7 +51,12 @@ static const char delim_chars_comma[] = ", \n\r\t";

#define delim_chars    (delim_chars_comma + 1)

-#define strlen(s) __builtin_strlen(s)
+size_t strlen(const char *s);
+#define strlen(str) \
+  (__extension__ (__builtin_constant_p(str) \
+                 ? __builtin_strlen(str)    \
+                 : strlen(str)))
+

static int strncmp(const char *cs, const char *ct, size_t count)
{
diff --git a/xen/arch/x86/boot/strlen.32.c b/xen/arch/x86/boot/strlen.c
similarity index 100%
rename from xen/arch/x86/boot/strlen.32.c
rename to xen/arch/x86/boot/strlen.c

Frediano