From nobody Thu May 2 00:10:33 2024 Delivered-To: importer@patchew.org Received-SPF: none (zoho.com: 198.145.21.10 is neither permitted nor denied by domain of lists.01.org) client-ip=198.145.21.10; envelope-from=edk2-devel-bounces@lists.01.org; helo=ml01.01.org; Authentication-Results: mx.zohomail.com; spf=none (zoho.com: 198.145.21.10 is neither permitted nor denied by domain of lists.01.org) smtp.mailfrom=edk2-devel-bounces@lists.01.org Return-Path: Received: from ml01.01.org (ml01.01.org [198.145.21.10]) by mx.zohomail.com with SMTPS id 1502411678635604.348876641948; Thu, 10 Aug 2017 17:34:38 -0700 (PDT) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 115AC21E2572C; Thu, 10 Aug 2017 17:32:17 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 70E43208F7A18 for ; Thu, 10 Aug 2017 17:32:15 -0700 (PDT) Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id CE54B8047F; Fri, 11 Aug 2017 00:34:34 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-96.phx2.redhat.com [10.3.116.96]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1C432600C0; Fri, 11 Aug 2017 00:34:32 +0000 (UTC) X-Original-To: edk2-devel@lists.01.org DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com CE54B8047F Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=lersek@redhat.com From: Laszlo Ersek To: edk2-devel-01 Date: Fri, 11 Aug 2017 02:34:26 +0200 Message-Id: <20170811003426.2332-2-lersek@redhat.com> In-Reply-To: <20170811003426.2332-1-lersek@redhat.com> References: <20170811003426.2332-1-lersek@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Fri, 11 Aug 2017 00:34:35 +0000 (UTC) Subject: [edk2] [PATCH 1/1] BaseTools/tools_def.template: revert to large code model for X64/GCC5/LTO X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ard Biesheuvel , Jordan Justen , Liming Gao , Michael Kinney MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Errors-To: edk2-devel-bounces@lists.01.org Sender: "edk2-devel" X-ZohoMail: RSF_4 Z_629925259 SPT_0 Content-Type: text/plain; charset="utf-8" Several users have recently reported boot failures with OVMF. The symptoms are: blank screen and all VCPUs pegged at 100%. Alex Williamson has found the commit (via bisection) that exposes the issue. In this patch I'll attempt to analyze the symptoms and fix the root problem. (1) "UefiCpuPkg/Library/MpInitLib" is an IA32/X64 library instance that provides multiprocessing services. It is built into the CpuMpPei module (for the PEI phase) and the CpuDxe module (for the DXE and BDS phases). When either of these modules starts up during boot, it briefly boots up all of the CPUs, perfoms some counting / synchronization, and then all the APs (application processors) are quiesced until further work arrives. The APs boot in real mode, and need assembly language init code (a "reset vector") that gradually takes them into 64-bit mode, before they can call back into normal C code. The reset vector code is assembled at build time, but it is copied as data under 1MB (for real mode execution by the APs) during boot, whenever CpuMpPei or CpuDxe launches. Part of the reset vector code is FPU initialization. The reset vector in "UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm" calls the assembly language function InitializeFloatingPointUnits(), which is however provided by an external library. Normally, when the object files were linked together, this function call would result in an absolute reference (and matching relocation, performed at module loading), and the instance of the reset vector that was copied under 1MB during boot would refer to the same, unchanged, absolute address of InitializeFloatingPointUnits(), residing in CpuMpPei or CpuDxe. This didn't work with X64 XCODE5 builds however. XCODE5 prefers RIP-relative addressing for X64 (i.e., the assembly language function call depended on the distance between the call site and the callee). When the call site was copied under 1MB but InitializeFloatingPointUnits() would not move, the call broke. This was tracked in TianoCore BZ#565 and fixed in commit 3b2928b46987 ("UefiCpuPkg/MpInitLib: Fix X64 XCODE5/NASM compatibility issues", 2017-05-17). The solution was to add another function pointer to the MP_CPU_EXCHANGE_INFO communication area. (The function pointer would have size 4 in Ia32 builds, and size 8 in X64 builds.) MpInitLib would assign the absolute address of InitializeFloatingPointUnits() to the new field, and the APs would retrieve it -- matching the pointer size correctly --, and call it. The C assignment can be seen in the following hunk (from commit 3b2928b46987): > diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/Mp= InitLib/MpLib.c > index 407c44c95e62..735e099b32f2 100644 > --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c > +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c > @@ -751,6 +751,8 @@ FillExchangeInfoData ( > > ExchangeInfo->EnableExecuteDisable =3D IsBspExecuteDisableEnabled (); > > + ExchangeInfo->InitializeFloatingPointUnitsAddress =3D (UINTN)Initializ= eFloatingPointUnits; > + > // > // Get the BSP's data of GDT and IDT > // This moved the relocation of the InitializeFloatingPointUnits() reference (or, with XCODE5, the RIP-relative InitializeFloatingPointUnits() reference) from the assembly code (which is copied around) to the C code (which is not copied around). (2) Unfortunately, the C-language assignment to "MP_CPU_EXCHANGE_INFO.InitializeFloatingPointUnitsAddress" is miscompiled under the following circumstances: - the edk2 build target is DEBUG (or RELEASE), - the edk2 toolchain selection is GCC5, - (from the above two it follows that -Os and LTO are enabled,) - MpInitLib is built for X64, - and the compiler is gcc-7.1 (shipped in Fedora-26 e.g.). (If the PEI phase is 64-bit, then MpInitLib breaks in CpuMpPei. If the PEI phase is 32-bit and the DXE phase is 64-bit, then MpInitLib breaks in CpuDxe. If the DXE phase is 32-bit, then nothing breaks.) Below I'll use the NOOPT/GCC5/X64/gcc-7.1 build of CpuMpPei to show the correct assembly code generated by gcc for the assignment, and flip the build target to DEBUG for showing the broken assembly code. (3) Correct code for the assignment: > 5406: 48 8d 15 73 24 00 00 lea 0x2473(%rip),%rdx #= 7880 > 540d: 48 8b 45 f8 mov -0x8(%rbp),%rax > 5411: 48 89 90 84 00 00 00 mov %rdx,0x84(%rax) Here the absolute address of InitializeFloatingPointUnits() is loaded into RDX with RIP-relative addressing. Then RAX is pointed to the ExchangeInfo structure, and finally RDX is stored into ExchangeInfo->InitializeFloatingPointUnitsAddress (the field offset is 0x84 in the structure). Here the only relocation happens when CpuMpPei is linked. The distance between the call site and the callee is calculated and encoded in the LEA instruction. The same distance is valid when CpuMpPei runs, so the PEI core doesn't have to relocate the LEA instruction when it loads CpuMpPei. (4) Incorrect code for the assignment: > 2d69: 48 c7 c2 60 58 00 00 mov $0x5860,%rdx > ... > 2d7b: 48 89 95 84 00 00 00 mov %rdx,0x84(%rbp) > ... > 0000000000005860 : > 5860: 9b db e3 finit This is not RIP-relative addressing. Therefore gcc generates a relocation record for the address encoded in the MOV instruction, at offset 0x2d69+0x3 -- initial value being 0x5860: > Sections: > Idx Name Size VMA LMA File off = Algn > 0 .text 00006e66 0000000000000240 0000000000000240 000000c0 = 2**6 > CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE > ... > RELOCATION RECORDS FOR [.text]: > OFFSET TYPE VALUE > ... > 0000000000002b2c R_X86_64_32S InitializeFloatingPointUnits (The offset of the address to relocate is 0x2d69+3 =3D=3D 0x2d6c, which equals the .text section base plus the relocation offset, i.e., 0x240+0x2b2c.) This is the only R_X86_64_32S relocation record in the linked-together CpuMpPei.debug file (still an ELF file). In the rest of the build process, the GenFw utility translates CpuMpPei.dll to a PE/COFF image (CpuMpPei.efi), and converts the ELF relocation record to a PE/COFF relocation record (of type EFI_IMAGE_REL_BASED_HIGHLOW). The EFI_IMAGE_REL_BASED_HIGHLOW PE/COFF relocation is acted upon first when CpuMpPei.efi is built into the PEIFV volume, since CpuMpPei.efi could theoretically execute from flash, and PEIFV has a nonzero base address. (However, CpuMpPei has a DEPEX on permanent PEI RAM discovery, so the PEI core will only launch it from permanent PEI RAM, in practice.) The EFI_IMAGE_REL_BASED_HIGHLOW PE/COFF relocation is acted upon for the second time when the PEI core loads CpuMpPei into permanent PEI RAM, and performs the relocations. I confirmed (with debug messages) that the relocated address (within the instruction) is correct: > PeiCore:PeCoffLoaderRelocateImage: Fixup32=3DBFF25D6C FixupData=3D0 > PeiCore:PeCoffLoaderRelocateImage: *Fixup32=3D0x8487A0 Adjust=3D0xBF6E00C0 > Loading PEIM at 0x000BFF23000 EntryPoint=3D0x000BFF271BB CpuMpPei.efi The "Fixup32" pointer points to the address constant that should be relocated (0xBFF25D6C - 0xBFF23000 =3D=3D 0x2D6C). The initial value of (*Fixup32) is 0x8487A0, which comes from the composition of PEIFV that I described above (as "first relocation"). After incrementing (*Fixup32) by "Adjust", the final value is 0xBFF28860. This is the absolute address that is patched into the MOV instruction that sets RDX for the assignment. This value is correct; the InitializeFloatingPointUnits() function really exists in CpuMpPei at this address, in permanent PEI RAM. It is at an offset of 0x5860 from the load address 0x000BFF23000 (see third line). So where do things go wrong? Again, the MOV instruction itself is wrong. I modified the MpInitLib C code to print the value of the field right after the assignment, and I got: > AP Loop Mode is 1 > WakeupBufferStart =3D 9F000, WakeupBufferSize =3D 1000 > CpuMpPei:FillExchangeInfoData: InitializeFloatingPointUnits @ 0xFFFFFFFFB= FF28860 The APs are ejected into outer space when they try to init their FPUs -- there is nothing at 0xFFFFFFFF_BFF28860; the assignment should store 0x00000000_BFF28860! (5) So open the Intel SDM, and decode the instruction by hand: > 2d69: 48 c7 c2 60 58 00 00 mov $0x5860,%rdx 0x48 stands for the REX.W prefix (0x40 for REX, plus bit 3 for .W). Looking up REX.W + 0xc7 (the next byte) under the MOV specification, we get: - opcode: REX.W + C7 /0 - instruction: MOV r/m64, imm32 - Description: Move imm32 sign extended to 64-bits to r/m64. "Sign extended". Our imm32 value, patched into the MOV, is 0xBFF28860 -- the i440fx machine types may have up to 3GB of 32-bit RAM -- so the sign bit is set. And MOV sign-extends 0xBFF28860 to 0xFFFFFFFF_BFF28860 in the assignment to RDX. So... why on earth generates gcc this MOV instruction, with sign extension? (6) It does because we told it to. From the gcc manual: > -mcmodel=3Dsmall > Generate code for the small code model: the program and its symbols > must be linked in the lower 2 GB of the address space. Pointers are > 64 bits. Programs can be statically or dynamically linked. This is > the default code model. and please refer to the following commit (message slightly rewrapped here): > commit f49513f666ed25d24bdf3a02a1fdb5d18ae081c0 > Author: Ard Biesheuvel > Date: Sat Jul 16 00:16:11 2016 +0200 > > BaseTools/tools_def: switch GCC/X64 to the PIE small model > > The ordinary small code model for x86_64 cannot be used in UEFI, > since it assumes the executable is loaded in the first 2 GB of > memory. Therefore, we use the large model instead, which can execute > anywhere, but uses absolute 64-bit wide quantities for all symbol > references, which is costly in terms of code size. > > So switch to the PIE small code model, this uses 32-bit relative > references where possible, but does not make any assumptions about > the load address (i.e., all absolute symbol references are 64-bits > wide). Note that, due to the 'protected' visibility pragma > introduced in an earlier patch, there is no need for the EDK2 build > system to deal with GOT related ELF relocation types. > > Contributed-under: TianoCore Contribution Agreement 1.0 > Signed-off-by: Ard Biesheuvel > Reviewed-by: Jordan Justen > Tested-by: Laszlo Ersek > Tested-By: Liming Gao > Reviewed-by: Liming Gao Apparently, gcc-7.1, with Link Time Optimization enabled, *does* make assumptions about the load address, despite the fact that we didn't just replace "-mcmodel=3Dlarge" with "-mcmodel=3Dsmall", but we also ad= ded "-fpie". (7) Commit f49513f666ed has been safe until the advent of gcc-7.1, so do not revert it whole-sale. Instead, undo it only for the LTO build env where it might cause problems, by appending "-fno-pie -mcmodel=3Dlarge". The assignment is then compiled like this: > 33d7: 48 ba 60 64 00 00 00 movabs $0x6460,%rdx > 33de: 00 00 00 > ... > 33e5: 48 89 95 84 00 00 00 mov %rdx,0x84(%rbp) > ... > 0000000000006460 : And the matching relocation is (note the target offset 0x240+0x3199=3D0x33d9): > Sections: > Idx Name Size VMA LMA File off = Algn > 0 .text 00007c5e 0000000000000240 0000000000000240 000000c0 = 2**6 > CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE > ... > RELOCATION RECORDS FOR [.text]: > OFFSET TYPE VALUE > ... > 0000000000003199 R_X86_64_64 InitializeFloatingPointUnits (8) Size changes by this patch: $ build -a X64 -p OvmfPkg/OvmfPkgX64.dsc \ -t GCC5 -b DEBUG \ -D NETWORK_IP6_ENABLE -D HTTP_BOOT_ENABLE -D TLS_ENABLE Before: > SECFV [10%Full] 212992 total, 21616 used, 191376 free > FVMAIN_COMPACT [43%Full] 3440640 total, 1498200 used, 1942440 free > DXEFV [47%Full] 10485760 total, 4959448 used, 5526312 free > PEIFV [21%Full] 917504 total, 194480 used, 723024 free After: > SECFV [11%Full] 212992 total, 23728 used, 189264 free > FVMAIN_COMPACT [44%Full] 3440640 total, 1539200 used, 1901440 free > DXEFV [54%Full] 10485760 total, 5741528 used, 4744232 free > PEIFV [24%Full] 917504 total, 224304 used, 693200 free $ build -a IA32 -a X64 -p OvmfPkg/OvmfPkgIa32X64.dsc \ -t GCC5 -b DEBUG \ -D NETWORK_IP6_ENABLE -D HTTP_BOOT_ENABLE -D TLS_ENABLE \ -D SECURE_BOOT_ENABLE -D SMM_REQUIRE Before: > SECFV [10%Full] 212992 total, 21536 used, 191456 free > FVMAIN_COMPACT [48%Full] 3440640 total, 1682008 used, 1758632 free > DXEFV [57%Full] 10485760 total, 6056504 used, 4429256 free > PEIFV [22%Full] 917504 total, 202032 used, 715472 free After: > SECFV [10%Full] 212992 total, 21536 used, 191456 free > FVMAIN_COMPACT [50%Full] 3440640 total, 1728096 used, 1712544 free > DXEFV [66%Full] 10485760 total, 6979448 used, 3506312 free > PEIFV [22%Full] 917504 total, 202032 used, 715472 free Cc: Alex Williamson Cc: Ard Biesheuvel Cc: Jordan Justen Cc: Liming Gao Cc: Michael Kinney Cc: Yonghong Zhu Contributed-under: TianoCore Contribution Agreement 1.1 Signed-off-by: Laszlo Ersek Tested-by: Alex Williamson --- BaseTools/Conf/tools_def.template | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/BaseTools/Conf/tools_def.template b/BaseTools/Conf/tools_def.t= emplate index 1fa3ca3ceae9..2ef495540e5f 100755 --- a/BaseTools/Conf/tools_def.template +++ b/BaseTools/Conf/tools_def.template @@ -5335,10 +5335,10 @@ RELEASE_GCC5_IA32_DLINK_FLAGS =3D DEF(GCC5_IA32_= X64_DLINK_FLAGS) -flto -Os -Wl, *_GCC5_X64_OBJCOPY_FLAGS =3D *_GCC5_X64_NASM_FLAGS =3D -f elf64 =20 - DEBUG_GCC5_X64_CC_FLAGS =3D DEF(GCC5_X64_CC_FLAGS) -flto -DUSING_= LTO -Os + DEBUG_GCC5_X64_CC_FLAGS =3D DEF(GCC5_X64_CC_FLAGS) -fno-pie -mcmo= del=3Dlarge -flto -DUSING_LTO -Os DEBUG_GCC5_X64_DLINK_FLAGS =3D DEF(GCC5_X64_DLINK_FLAGS) -flto -Os =20 -RELEASE_GCC5_X64_CC_FLAGS =3D DEF(GCC5_X64_CC_FLAGS) -flto -DUSING_= LTO -Os -Wno-unused-but-set-variable +RELEASE_GCC5_X64_CC_FLAGS =3D DEF(GCC5_X64_CC_FLAGS) -fno-pie -mcmo= del=3Dlarge -flto -DUSING_LTO -Os -Wno-unused-but-set-variable RELEASE_GCC5_X64_DLINK_FLAGS =3D DEF(GCC5_X64_DLINK_FLAGS) -flto -Os =20 NOOPT_GCC5_X64_CC_FLAGS =3D DEF(GCC5_X64_CC_FLAGS) -O0 --=20 2.13.1.3.g8be5a757fa67 _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel