From nobody Tue Feb 10 08:05:08 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1657905764; cv=none; d=zohomail.com; s=zohoarc; b=R0BrOS3toRXpUQrqmPUyXZit3DyJhlFE08NMQg70/8Xxhi15UJgyNH9lls9+sgCX+jbcmvC0HFCzT4fCTsJuvG7xQaIXMb4pkzb0YX/TrmhfVj9+VVzylXtBs4xBAc8HBt2rZ/GogOWPbth4/k5yv0McPuKjnPI5P2lGpMo6fr0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1657905764; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=KZd3TtRmVnCZDQgAiTP7VwFbVts8moEpOPpkPyQqsxY=; b=ArFMA0wn4+CxrvsPvH8w+7t+PvhoCwHnCE+QgLSXUHHfKn+GHkYmwKEggYNjGbmVdIwv8u+ADa5GXaHeFOA5OoF4GOM+vE5UEFlHmpaREJSdZG8f3YpRM/DqHxiD8wF+U8TQYt7RcUw6eKfHYz/cnP/VB9tPwQs7xTg4N1zm/W8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1657905764746345.74146849645456; Fri, 15 Jul 2022 10:22:44 -0700 (PDT) Received: from localhost ([::1]:48704 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oCP1W-0001zl-U3 for importer@patchew.org; Fri, 15 Jul 2022 13:22:42 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45862) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oCOwW-0001fP-Cy for qemu-devel@nongnu.org; Fri, 15 Jul 2022 13:17:32 -0400 Received: from mx0b-00069f02.pphosted.com ([205.220.177.32]:1954) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oCOwU-0005gQ-FW for qemu-devel@nongnu.org; Fri, 15 Jul 2022 13:17:32 -0400 Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26FH6v6c030782; Fri, 15 Jul 2022 17:17:26 GMT Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3h71r1g13f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 15 Jul 2022 17:17:26 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 26FHBiBC039720; Fri, 15 Jul 2022 17:17:25 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3h7047u0dg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 15 Jul 2022 17:17:25 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 26FHGjMF013789; Fri, 15 Jul 2022 17:17:25 GMT Received: from paddy.uk.oracle.com (dhcp-10-175-181-254.vpn.oracle.com [10.175.181.254]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3h7047tyn7-11; Fri, 15 Jul 2022 17:17:24 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2022-7-12; bh=KZd3TtRmVnCZDQgAiTP7VwFbVts8moEpOPpkPyQqsxY=; b=lIakTsfa7Kn9doSyYrqVjB4De0PFZ990kqZIE0toQ0t6BBo9o4zUAbhf5+WBu3Xeu0Le VGgMTaqHgYYG94bL5cYmWPnQQncF2987tcsaVmAz1tt18c1e+j81kkWYTStbDS+ezQw9 QS4/gVrJGBeZMTqn/WLWznSSHBw3O+WHuQIQrImtvdoS4lYTCarF8Kk6ll54S4t37PRl v04woenP3Lz0vlmOf34uzeDQUsouhOsfYaaAZm3DVpX1qQcvMm8giz2hUrSNTckjW+wS HpPlDWBGeBqj7X2QcwMX90VsZA2AVgleG3oEJCK7rqoWPNORIS+UzL2IA/M0zfD1PGdo 8g== From: Joao Martins To: qemu-devel@nongnu.org Cc: Igor Mammedov , Eduardo Habkost , "Michael S. Tsirkin" , Richard Henderson , Alex Williamson , Paolo Bonzini , Ani Sinha , Marcel Apfelbaum , "Dr. David Alan Gilbert" , Suravee Suthikulpanit , Joao Martins Subject: [PATCH v8 10/11] i386/pc: relocate 4g start to 1T where applicable Date: Fri, 15 Jul 2022 18:16:27 +0100 Message-Id: <20220715171628.21437-11-joao.m.martins@oracle.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20220715171628.21437-1-joao.m.martins@oracle.com> References: <20220715171628.21437-1-joao.m.martins@oracle.com> X-Proofpoint-ORIG-GUID: 1_6NpB9wGG2QCX0D0UGzLJj2VnswlAS_ X-Proofpoint-GUID: 1_6NpB9wGG2QCX0D0UGzLJj2VnswlAS_ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.177.32; envelope-from=joao.m.martins@oracle.com; helo=mx0b-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1657905765979100001 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" It is assumed that the whole GPA space is available to be DMA addressable, within a given address space limit, except for a tiny region before the 4G. Since Linux v5.4, VFIO validates whether the selected GPA is indeed valid i.e. not reserved by IOMMU on behalf of some specific devices or platform-defined restrictions, and thus failing the ioctl(VFIO_DMA_MAP) with -EINVAL. AMD systems with an IOMMU are examples of such platforms and particularly may only have these ranges as allowed: 0000000000000000 - 00000000fedfffff (0 .. 3.982G) 00000000fef00000 - 000000fcffffffff (3.983G .. 1011.9G) 0000010000000000 - ffffffffffffffff (1Tb .. 16Pb[*]) We already account for the 4G hole, albeit if the guest is big enough we will fail to allocate a guest with >1010G due to the ~12G hole at the 1Tb boundary, reserved for HyperTransport (HT). [*] there is another reserved region unrelated to HT that exists in the 256T boundary in Fam 17h according to Errata #1286, documeted also in "Open-Source Register Reference for AMD Family 17h Processors (PUB)" When creating the region above 4G, take into account that on AMD platforms the HyperTransport range is reserved and hence it cannot be used either as GPAs. On those cases rather than establishing the start of ram-above-4g to be 4G, relocate instead to 1Tb. See AMD IOMMU spec, section 2.1.2 "IOMMU Logical Topology", for more information on the underlying restriction of IOVAs. After accounting for the 1Tb hole on AMD hosts, mtree should look like: 0000000000000000-000000007fffffff (prio 0, i/o): alias ram-below-4g @pc.ram 0000000000000000-000000007fffffff 0000010000000000-000001ff7fffffff (prio 0, i/o): alias ram-above-4g @pc.ram 0000000080000000-000000ffffffffff If the relocation is done or the address space covers it, we also add the the reserved HT e820 range as reserved. Default phys-bits on Qemu is TCG_PHYS_ADDR_BITS (40) which is enough to address 1Tb (0xff ffff ffff). On AMD platforms, if a ram-above-4g relocation is attempted and the CPU wasn't configured with a big enough phys-bits, an error message will be printed due to the maxphysaddr vs maxusedaddr check previously added. Suggested-by: Igor Mammedov Signed-off-by: Joao Martins Acked-by: Igor Mammedov --- hw/i386/pc.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index f30661b7f1a2..a71135930833 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -892,6 +892,40 @@ static hwaddr pc_max_used_gpa(PCMachineState *pcms, ui= nt64_t pci_hole64_size) return pc_pci_hole64_start() + pci_hole64_size - 1; } =20 +/* + * AMD systems with an IOMMU have an additional hole close to the + * 1Tb, which are special GPAs that cannot be DMA mapped. Depending + * on kernel version, VFIO may or may not let you DMA map those ranges. + * Starting Linux v5.4 we validate it, and can't create guests on AMD mach= ines + * with certain memory sizes. It's also wrong to use those IOVA ranges + * in detriment of leading to IOMMU INVALID_DEVICE_REQUEST or worse. + * The ranges reserved for Hyper-Transport are: + * + * FD_0000_0000h - FF_FFFF_FFFFh + * + * The ranges represent the following: + * + * Base Address Top Address Use + * + * FD_0000_0000h FD_F7FF_FFFFh Reserved interrupt address space + * FD_F800_0000h FD_F8FF_FFFFh Interrupt/EOI IntCtl + * FD_F900_0000h FD_F90F_FFFFh Legacy PIC IACK + * FD_F910_0000h FD_F91F_FFFFh System Management + * FD_F920_0000h FD_FAFF_FFFFh Reserved Page Tables + * FD_FB00_0000h FD_FBFF_FFFFh Address Translation + * FD_FC00_0000h FD_FDFF_FFFFh I/O Space + * FD_FE00_0000h FD_FFFF_FFFFh Configuration + * FE_0000_0000h FE_1FFF_FFFFh Extended Configuration/Device Messages + * FE_2000_0000h FF_FFFF_FFFFh Reserved + * + * See AMD IOMMU spec, section 2.1.2 "IOMMU Logical Topology", + * Table 3: Special Address Controls (GPA) for more information. + */ +#define AMD_HT_START 0xfd00000000UL +#define AMD_HT_END 0xffffffffffUL +#define AMD_ABOVE_1TB_START (AMD_HT_END + 1) +#define AMD_HT_SIZE (AMD_ABOVE_1TB_START - AMD_HT_START) + void pc_memory_init(PCMachineState *pcms, MemoryRegion *system_memory, MemoryRegion *rom_memory, @@ -915,6 +949,26 @@ void pc_memory_init(PCMachineState *pcms, =20 linux_boot =3D (machine->kernel_filename !=3D NULL); =20 + /* + * The HyperTransport range close to the 1T boundary is unique to AMD + * hosts with IOMMUs enabled. Restrict the ram-above-4g relocation + * to above 1T to AMD vCPUs only. + */ + if (IS_AMD_CPU(&cpu->env)) { + /* Bail out if max possible address does not cross HT range */ + if (pc_max_used_gpa(pcms, pci_hole64_size) >=3D AMD_HT_START) { + x86ms->above_4g_mem_start =3D AMD_ABOVE_1TB_START; + } + + /* + * Advertise the HT region if address space covers the reserved + * region or if we relocate. + */ + if (cpu->phys_bits >=3D 40) { + e820_add_entry(AMD_HT_START, AMD_HT_SIZE, E820_RESERVED); + } + } + /* * phys-bits is required to be appropriately configured * to make sure max used GPA is reachable. --=20 2.17.2