xen/arch/x86/boot/.gitignore | 5 +- xen/arch/x86/boot/Makefile | 51 +++- .../x86/boot/{build32.lds => build32.lds.S} | 41 +++- xen/arch/x86/boot/cmdline.c | 12 - xen/arch/x86/boot/head.S | 49 +--- xen/arch/x86/boot/reloc-trampoline.c | 36 +++ xen/arch/x86/boot/reloc.c | 35 +-- xen/arch/x86/efi/efi-boot.h | 15 +- xen/tools/combine_two_binaries.py | 218 ++++++++++++++++++ 9 files changed, 348 insertions(+), 114 deletions(-) rename xen/arch/x86/boot/{build32.lds => build32.lds.S} (63%) create mode 100644 xen/arch/x86/boot/reloc-trampoline.c create mode 100755 xen/tools/combine_two_binaries.py
This series attempt to: - use more C code, that is replace some assembly code with C; - avoid some code duplication between C and assembly; - prevent some issues having relocations in C code. The idea is extending the current C to binary code conversion done for 32 bit C code called from head.S making sure relocations are safe and allowing external symbols usage from C code. More details of the implementation are in commit message 1/5, which is the largest patch. Patch 2/5 reuses code to relocate the trampoline between 32 and 64 bit. Patches 3/5 and 4/5 move some code from assembly to C. Other RFC commits were excluded, I didn't manage to entangle the issues sharing headers between 32 and 64 bits. Since RFC code was more tested, also with CI and some incompatibility were fixed. On that it's weird that we need Python 3.8 for Qemu but we still use Python 2 for Xen. Shouldn't we bump requirement to Python 3 even for Xen? Code boots successfully using: - BIOS boot; - EFI boot with Grub2 and ELF file; - direct EFI boot without Grub. Code is currently based on "master" branch, currently commit 3f6ee3db2e878398cfcde725399b4d1b04e92269. Changes since v1: - 2 preliminary commits for adjust .gitignore; - last commit split (one variable at a time); - lot of style and names changes; - first commit, now 3/6 had some build changes, more details on commit message. Changes since v2: - removed merged commits; - reverted order of 2 commits; - remove some useless casts; - added change to comment. Changes since v3: - added a preparation commit for Makefiles (mainly written by Andrew Cooper); - added a comment improvement commit; - allows also data; - other minor style changes; - added some Reviewed-by. Changes since v4: - add build32.final.lds build32.other.lds to targets macro; - place some comments over a rule, not inside; - simplified linking and producing binary rule; - renamed built_in_32 to built-in-32, coding style; - fix minor indentation; - put magic numbers in Makefile and propagate them; - minor variable clanups in Python script; - add dependency to Python script. Frediano Ziglio (5): x86/boot: create a C bundle for 32 bit boot code and use it x86/boot: Reuse code to relocate trampoline x86/boot: Use boot_vid_info variable directly from C code x86/boot: Use trampoline_phys variable directly from C code x86/boot: Clarify comment xen/arch/x86/boot/.gitignore | 5 +- xen/arch/x86/boot/Makefile | 51 +++- .../x86/boot/{build32.lds => build32.lds.S} | 41 +++- xen/arch/x86/boot/cmdline.c | 12 - xen/arch/x86/boot/head.S | 49 +--- xen/arch/x86/boot/reloc-trampoline.c | 36 +++ xen/arch/x86/boot/reloc.c | 35 +-- xen/arch/x86/efi/efi-boot.h | 15 +- xen/tools/combine_two_binaries.py | 218 ++++++++++++++++++ 9 files changed, 348 insertions(+), 114 deletions(-) rename xen/arch/x86/boot/{build32.lds => build32.lds.S} (63%) create mode 100644 xen/arch/x86/boot/reloc-trampoline.c create mode 100755 xen/tools/combine_two_binaries.py -- 2.34.1
Hello, Preempting some future work I'm expecting to arrive, I had a go at using __builtin_*() in obj32. This is formed of 2 patches on top of this series: https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-lib32 Patch 1 introduces lib32 beside obj32, with strlen() being the first broken-out function, and patch 2 swaps to __builtin_strlen(). Both compile, but the difference that patch 2 introduces was unexpected. With just lib32, and taking strsubcmp() as an example, we get: 00000000 <strsubcmp>: 0: 83 ec 0c sub $0xc,%esp 3: 89 5c 24 04 mov %ebx,0x4(%esp) 7: 89 74 24 08 mov %esi,0x8(%esp) b: 89 c6 mov %eax,%esi d: 89 d3 mov %edx,%ebx f: 89 d0 mov %edx,%eax 11: /-- e8 fc ff ff ff call 12 <strsubcmp+0x12> 12: R_386_PC32 strlen 16: 89 c1 mov %eax,%ecx 18: 89 da mov %ebx,%edx 1a: 89 f0 mov %esi,%eax 1c: /-- e8 fc ff ff ff call 1d <strsubcmp+0x1d> 1d: R_386_PC32 .text.strncmp 21: 8b 5c 24 04 mov 0x4(%esp),%ebx 25: 8b 74 24 08 mov 0x8(%esp),%esi 29: 83 c4 0c add $0xc,%esp 2c: c3 ret which all seems fine. We get a plain PC32 relocation to strlen (which is now in the separate library). However, with patch 2 in place (simply swapping the plain extern for __builtin_strlen(), we now get: 00000000 <strsubcmp>: 0: 83 ec 0c sub $0xc,%esp 3: 89 1c 24 mov %ebx,(%esp) 6: 89 74 24 04 mov %esi,0x4(%esp) a: 89 7c 24 08 mov %edi,0x8(%esp) e: /-- e8 fc ff ff ff call f <strsubcmp+0xf> f: R_386_PC32 __x86.get_pc_thunk.bx 13: 81 c3 02 00 00 00 add $0x2,%ebx 15: R_386_GOTPC _GLOBAL_OFFSET_TABLE_ 19: 89 c7 mov %eax,%edi 1b: 89 d6 mov %edx,%esi 1d: 89 d0 mov %edx,%eax 1f: /-- e8 fc ff ff ff call 20 <strsubcmp+0x20> 20: R_386_PLT32 strlen 24: 89 c1 mov %eax,%ecx 26: 89 f2 mov %esi,%edx 28: 89 f8 mov %edi,%eax 2a: /-- e8 fc ff ff ff call 2b <strsubcmp+0x2b> 2b: R_386_PC32 .text.strncmp 2f: 8b 1c 24 mov (%esp),%ebx 32: 8b 74 24 04 mov 0x4(%esp),%esi 36: 8b 7c 24 08 mov 0x8(%esp),%edi 3a: 83 c4 0c add $0xc,%esp 3d: c3 ret The builtin hasn't managed to optimise away the call to strlen (that's fine). But, we've ended up spilling %ebx to the stack, calculating the location of the GOT and not using it. So, as it stands, trying to use __builtin_strlen() results in worse code generation. One thing I noticed was that we're not passing -fvisibility=hidden into CFLAGS_x86_32, but fixing that doesn't help either. We do have the pragma from compiler.h, so I'm out of visibility ideas. Anything else I've missed? ~Andrew
On Wed, Oct 16, 2024 at 2:30 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote: > > Hello, > > Preempting some future work I'm expecting to arrive, I had a go at using > __builtin_*() in obj32. > > This is formed of 2 patches on top of this series: > https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-lib32 > You are confident we'll have a lot of shared code to need an additional "lib32" in the Makefile! I would personally stick with obj32. Note that file should be strlen.c, not strlen.32.c, otherwise you are possibly going to pick up the general rule and not the one in the Makefile (but maybe is what you wanted). > Patch 1 introduces lib32 beside obj32, with strlen() being the first > broken-out function, and patch 2 swaps to __builtin_strlen(). > > Both compile, but the difference that patch 2 introduces was unexpected. > > With just lib32, and taking strsubcmp() as an example, we get: > > 00000000 <strsubcmp>: > 0: 83 ec 0c sub $0xc,%esp > 3: 89 5c 24 04 mov %ebx,0x4(%esp) > 7: 89 74 24 08 mov %esi,0x8(%esp) > b: 89 c6 mov %eax,%esi > d: 89 d3 mov %edx,%ebx > f: 89 d0 mov %edx,%eax > 11: /-- e8 fc ff ff ff call 12 <strsubcmp+0x12> > 12: R_386_PC32 strlen > 16: 89 c1 mov %eax,%ecx > 18: 89 da mov %ebx,%edx > 1a: 89 f0 mov %esi,%eax > 1c: /-- e8 fc ff ff ff call 1d <strsubcmp+0x1d> > 1d: R_386_PC32 .text.strncmp > 21: 8b 5c 24 04 mov 0x4(%esp),%ebx > 25: 8b 74 24 08 mov 0x8(%esp),%esi > 29: 83 c4 0c add $0xc,%esp > 2c: c3 ret > > which all seems fine. We get a plain PC32 relocation to strlen (which > is now in the separate library). > > However, with patch 2 in place (simply swapping the plain extern for > __builtin_strlen(), we now get: > > 00000000 <strsubcmp>: > 0: 83 ec 0c sub $0xc,%esp > 3: 89 1c 24 mov %ebx,(%esp) > 6: 89 74 24 04 mov %esi,0x4(%esp) > a: 89 7c 24 08 mov %edi,0x8(%esp) > e: /-- e8 fc ff ff ff call f <strsubcmp+0xf> > f: R_386_PC32 __x86.get_pc_thunk.bx > 13: 81 c3 02 00 00 00 add $0x2,%ebx > 15: R_386_GOTPC _GLOBAL_OFFSET_TABLE_ > 19: 89 c7 mov %eax,%edi > 1b: 89 d6 mov %edx,%esi > 1d: 89 d0 mov %edx,%eax > 1f: /-- e8 fc ff ff ff call 20 <strsubcmp+0x20> > 20: R_386_PLT32 strlen PLT means it not declared hidden, otherwise it would have used the relative relocation. Maybe size_t strlen(const char *s); #define strlen(s) __builtin_strlen(s) xen/compiler.h is included, so all declaration should get the hidden by default ? Or add __attribute__((visibility("hidden"))) explicitly. > 24: 89 c1 mov %eax,%ecx > 26: 89 f2 mov %esi,%edx > 28: 89 f8 mov %edi,%eax > 2a: /-- e8 fc ff ff ff call 2b <strsubcmp+0x2b> > 2b: R_386_PC32 .text.strncmp > 2f: 8b 1c 24 mov (%esp),%ebx > 32: 8b 74 24 04 mov 0x4(%esp),%esi > 36: 8b 7c 24 08 mov 0x8(%esp),%edi > 3a: 83 c4 0c add $0xc,%esp > 3d: c3 ret > > > The builtin hasn't managed to optimise away the call to strlen (that's > fine). But, we've ended up spilling %ebx to the stack, calculating the > location of the GOT and not using it. > Maybe the ABI for PLT is to have %ebx set to the GOT ? Not sure about it. > So, as it stands, trying to use __builtin_strlen() results in worse code > generation. One thing I noticed was that we're not passing > -fvisibility=hidden into CFLAGS_x86_32, but fixing that doesn't help > either. We do have the pragma from compiler.h, so I'm out of visibility > ideas. > The -fvisibility=hidden should be overridden by the xen/compiler.h; but should be overridden with hidden! Maybe strlen is defined by default with another visibility? If you generate the assembly, you should see if the strlen symbol gets the .hidden bless or not. > Anything else I've missed? > Coffee :-) Frediano
On 16.10.2024 16:53, Frediano Ziglio wrote: > On Wed, Oct 16, 2024 at 2:30 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> >> Hello, >> >> Preempting some future work I'm expecting to arrive, I had a go at using >> __builtin_*() in obj32. >> >> This is formed of 2 patches on top of this series: >> https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-lib32 >> > > You are confident we'll have a lot of shared code to need an > additional "lib32" in the Makefile! > I would personally stick with obj32. > Note that file should be strlen.c, not strlen.32.c, otherwise you are > possibly going to pick up the general rule and not the one in the > Makefile (but maybe is what you wanted). > >> Patch 1 introduces lib32 beside obj32, with strlen() being the first >> broken-out function, and patch 2 swaps to __builtin_strlen(). >> >> Both compile, but the difference that patch 2 introduces was unexpected. >> >> With just lib32, and taking strsubcmp() as an example, we get: >> >> 00000000 <strsubcmp>: >> 0: 83 ec 0c sub $0xc,%esp >> 3: 89 5c 24 04 mov %ebx,0x4(%esp) >> 7: 89 74 24 08 mov %esi,0x8(%esp) >> b: 89 c6 mov %eax,%esi >> d: 89 d3 mov %edx,%ebx >> f: 89 d0 mov %edx,%eax >> 11: /-- e8 fc ff ff ff call 12 <strsubcmp+0x12> >> 12: R_386_PC32 strlen >> 16: 89 c1 mov %eax,%ecx >> 18: 89 da mov %ebx,%edx >> 1a: 89 f0 mov %esi,%eax >> 1c: /-- e8 fc ff ff ff call 1d <strsubcmp+0x1d> >> 1d: R_386_PC32 .text.strncmp >> 21: 8b 5c 24 04 mov 0x4(%esp),%ebx >> 25: 8b 74 24 08 mov 0x8(%esp),%esi >> 29: 83 c4 0c add $0xc,%esp >> 2c: c3 ret >> >> which all seems fine. We get a plain PC32 relocation to strlen (which >> is now in the separate library). >> >> However, with patch 2 in place (simply swapping the plain extern for >> __builtin_strlen(), we now get: >> >> 00000000 <strsubcmp>: >> 0: 83 ec 0c sub $0xc,%esp >> 3: 89 1c 24 mov %ebx,(%esp) >> 6: 89 74 24 04 mov %esi,0x4(%esp) >> a: 89 7c 24 08 mov %edi,0x8(%esp) >> e: /-- e8 fc ff ff ff call f <strsubcmp+0xf> >> f: R_386_PC32 __x86.get_pc_thunk.bx >> 13: 81 c3 02 00 00 00 add $0x2,%ebx >> 15: R_386_GOTPC _GLOBAL_OFFSET_TABLE_ >> 19: 89 c7 mov %eax,%edi >> 1b: 89 d6 mov %edx,%esi >> 1d: 89 d0 mov %edx,%eax >> 1f: /-- e8 fc ff ff ff call 20 <strsubcmp+0x20> >> 20: R_386_PLT32 strlen > > PLT means it not declared hidden, otherwise it would have used the > relative relocation. > Maybe > > size_t strlen(const char *s); > #define strlen(s) __builtin_strlen(s) > > xen/compiler.h is included, so all declaration should get the hidden > by default ? Or add __attribute__((visibility("hidden"))) explicitly. > >> 24: 89 c1 mov %eax,%ecx >> 26: 89 f2 mov %esi,%edx >> 28: 89 f8 mov %edi,%eax >> 2a: /-- e8 fc ff ff ff call 2b <strsubcmp+0x2b> >> 2b: R_386_PC32 .text.strncmp >> 2f: 8b 1c 24 mov (%esp),%ebx >> 32: 8b 74 24 04 mov 0x4(%esp),%esi >> 36: 8b 7c 24 08 mov 0x8(%esp),%edi >> 3a: 83 c4 0c add $0xc,%esp >> 3d: c3 ret >> >> >> The builtin hasn't managed to optimise away the call to strlen (that's >> fine). But, we've ended up spilling %ebx to the stack, calculating the >> location of the GOT and not using it. > > Maybe the ABI for PLT is to have %ebx set to the GOT ? Not sure about it. Yes, PIC PLT entries use %ebx. >> So, as it stands, trying to use __builtin_strlen() results in worse code >> generation. One thing I noticed was that we're not passing >> -fvisibility=hidden into CFLAGS_x86_32, but fixing that doesn't help >> either. We do have the pragma from compiler.h, so I'm out of visibility >> ideas. > > The -fvisibility=hidden should be overridden by the xen/compiler.h; > but should be overridden with hidden! > Maybe strlen is defined by default with another visibility? > If you generate the assembly, you should see if the strlen symbol gets > the .hidden bless or not. Only defined symbols have .hidden emitted for them, afaics. But that also doesn't matter much, as it's the emission of the @plt on the calls which requires the setting up of %ebx prior to those calls. (Arguably in 32-bit code, where all addresses are reachable, "hidden" could serve as a hint that calls don't need to go through the PLT. This maybe considered a missed optimization opportunity.) There's no difference there between calling strlen() or e.g. a custom (extern) StrLen(). And btw, reloc.c and cmdline.c currently also compile to code using __x86.get_pc_thunk.* (and @gotoff relocations for data references). Jan
On Wed, Oct 16, 2024 at 3:53 PM Frediano Ziglio <frediano.ziglio@cloud.com> wrote: > > On Wed, Oct 16, 2024 at 2:30 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote: > > > > Hello, > > > > Preempting some future work I'm expecting to arrive, I had a go at using > > __builtin_*() in obj32. > > > > This is formed of 2 patches on top of this series: > > https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-lib32 > > > > You are confident we'll have a lot of shared code to need an > additional "lib32" in the Makefile! > I would personally stick with obj32. > Note that file should be strlen.c, not strlen.32.c, otherwise you are > possibly going to pick up the general rule and not the one in the > Makefile (but maybe is what you wanted). > > > Patch 1 introduces lib32 beside obj32, with strlen() being the first > > broken-out function, and patch 2 swaps to __builtin_strlen(). > > > > Both compile, but the difference that patch 2 introduces was unexpected. > > > > With just lib32, and taking strsubcmp() as an example, we get: > > > > 00000000 <strsubcmp>: > > 0: 83 ec 0c sub $0xc,%esp > > 3: 89 5c 24 04 mov %ebx,0x4(%esp) > > 7: 89 74 24 08 mov %esi,0x8(%esp) > > b: 89 c6 mov %eax,%esi > > d: 89 d3 mov %edx,%ebx > > f: 89 d0 mov %edx,%eax > > 11: /-- e8 fc ff ff ff call 12 <strsubcmp+0x12> > > 12: R_386_PC32 strlen > > 16: 89 c1 mov %eax,%ecx > > 18: 89 da mov %ebx,%edx > > 1a: 89 f0 mov %esi,%eax > > 1c: /-- e8 fc ff ff ff call 1d <strsubcmp+0x1d> > > 1d: R_386_PC32 .text.strncmp > > 21: 8b 5c 24 04 mov 0x4(%esp),%ebx > > 25: 8b 74 24 08 mov 0x8(%esp),%esi > > 29: 83 c4 0c add $0xc,%esp > > 2c: c3 ret > > > > which all seems fine. We get a plain PC32 relocation to strlen (which > > is now in the separate library). > > > > However, with patch 2 in place (simply swapping the plain extern for > > __builtin_strlen(), we now get: > > > > 00000000 <strsubcmp>: > > 0: 83 ec 0c sub $0xc,%esp > > 3: 89 1c 24 mov %ebx,(%esp) > > 6: 89 74 24 04 mov %esi,0x4(%esp) > > a: 89 7c 24 08 mov %edi,0x8(%esp) > > e: /-- e8 fc ff ff ff call f <strsubcmp+0xf> > > f: R_386_PC32 __x86.get_pc_thunk.bx > > 13: 81 c3 02 00 00 00 add $0x2,%ebx > > 15: R_386_GOTPC _GLOBAL_OFFSET_TABLE_ > > 19: 89 c7 mov %eax,%edi > > 1b: 89 d6 mov %edx,%esi > > 1d: 89 d0 mov %edx,%eax > > 1f: /-- e8 fc ff ff ff call 20 <strsubcmp+0x20> > > 20: R_386_PLT32 strlen > > PLT means it not declared hidden, otherwise it would have used the > relative relocation. > Maybe > > size_t strlen(const char *s); > #define strlen(s) __builtin_strlen(s) > > xen/compiler.h is included, so all declaration should get the hidden > by default ? Or add __attribute__((visibility("hidden"))) explicitly. > > > 24: 89 c1 mov %eax,%ecx > > 26: 89 f2 mov %esi,%edx > > 28: 89 f8 mov %edi,%eax > > 2a: /-- e8 fc ff ff ff call 2b <strsubcmp+0x2b> > > 2b: R_386_PC32 .text.strncmp > > 2f: 8b 1c 24 mov (%esp),%ebx > > 32: 8b 74 24 04 mov 0x4(%esp),%esi > > 36: 8b 7c 24 08 mov 0x8(%esp),%edi > > 3a: 83 c4 0c add $0xc,%esp > > 3d: c3 ret > > > > > > The builtin hasn't managed to optimise away the call to strlen (that's > > fine). But, we've ended up spilling %ebx to the stack, calculating the > > location of the GOT and not using it. > > > > Maybe the ABI for PLT is to have %ebx set to the GOT ? Not sure about it. > > > So, as it stands, trying to use __builtin_strlen() results in worse code > > generation. One thing I noticed was that we're not passing > > -fvisibility=hidden into CFLAGS_x86_32, but fixing that doesn't help > > either. We do have the pragma from compiler.h, so I'm out of visibility > > ideas. > > > > The -fvisibility=hidden should be overridden by the xen/compiler.h; > but should be overridden with hidden! > Maybe strlen is defined by default with another visibility? > If you generate the assembly, you should see if the strlen symbol gets > the .hidden bless or not. > I did some more experiments, but I didn't manage to fix the issue. It looks like when __builtin_strlen fallback to calling strlen it uses a private declaration of strlen. -mregparam argument is taken into account, but not visibility. I tried to declare strlen as hidden (checking also preprocessor output to see if other declaration were present, none found), still the @PLT. I tried to add a "static inline strlen"... and it was not used. I found this workaround: diff --git a/xen/arch/x86/boot/cmdline.c b/xen/arch/x86/boot/cmdline.c index 80ffd0885e..ac2b0b0a4d 100644 --- a/xen/arch/x86/boot/cmdline.c +++ b/xen/arch/x86/boot/cmdline.c @@ -51,7 +51,12 @@ static const char delim_chars_comma[] = ", \n\r\t"; #define delim_chars (delim_chars_comma + 1) -#define strlen(s) __builtin_strlen(s) +size_t strlen(const char *s); +#define strlen(str) \ + (__extension__ (__builtin_constant_p(str) \ + ? __builtin_strlen(str) \ + : strlen(str))) + static int strncmp(const char *cs, const char *ct, size_t count) { diff --git a/xen/arch/x86/boot/strlen.32.c b/xen/arch/x86/boot/strlen.c similarity index 100% rename from xen/arch/x86/boot/strlen.32.c rename to xen/arch/x86/boot/strlen.c Frediano
© 2016 - 2024 Red Hat, Inc.