From nobody Sun Nov 24 09:54:20 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=reject dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1724245032; cv=none; d=zohomail.com; s=zohoarc; b=J+ZsJFCMLQGcwwOZdsnT1i1ew3GNG/IlXtIgzYmtTPLPk35WIFHNNjA5NqaFet5yKO596lg1ODUSDiATkmXAon6XwdZ+W+G4F0e9tLMItW/Pea+HrcjEX4GNu9VuI15e2Bnp8bJrhSs6W2BgehN6XRYw4M1LZBIa2iuzhAq2ehY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1724245032; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=fvolAmoI3nPEtnlPEK3w/T9QeHCWGdSJuteACSwhOCc=; b=ZKeOm6xkeGUJpMzgxL46l39DECq59E0XRAwr7EnCnd3k/Tysz82mOyDnOJ0LtTKr5RHaor8eJahHJe6OGGuqueZkEXq2+LFLWZxiHgJnhz2vOfadtL6oYS7s0JKouWkO2I2scJhHmXUd8xjblU+zdz8oQ8WHRSW6051tRjhq3o8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1724245032290914.9089616378093; Wed, 21 Aug 2024 05:57:12 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sgksj-0003u5-A2; Wed, 21 Aug 2024 08:56:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sgksf-0003lg-Ii for qemu-devel@nongnu.org; Wed, 21 Aug 2024 08:56:05 -0400 Received: from mx0b-00069f02.pphosted.com ([205.220.177.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sgksd-0004yk-LA for qemu-devel@nongnu.org; Wed, 21 Aug 2024 08:56:05 -0400 Received: from pps.filterd (m0246631.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 47LBtVvF031794; Wed, 21 Aug 2024 12:55:53 GMT Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.appoci.oracle.com [147.154.18.20]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 412m2dfgsd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 21 Aug 2024 12:55:53 +0000 (GMT) Received: from pps.filterd (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 47LBNxTa011277; Wed, 21 Aug 2024 12:55:52 GMT Received: from pps.reinject (localhost [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 415fb23m62-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 21 Aug 2024 12:55:52 +0000 Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 47LCtpoQ002465; Wed, 21 Aug 2024 12:55:52 GMT Received: from jonah-ol8.us.oracle.com (dhcp-10-39-201-48.vpn.oracle.com [10.39.201.48]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTP id 415fb23m55-2; Wed, 21 Aug 2024 12:55:52 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=corp-2023-11-20; bh=f volAmoI3nPEtnlPEK3w/T9QeHCWGdSJuteACSwhOCc=; b=ROwKJmObeoK73xvdc HSvL2CDU+G7t2mK1kmWqMXruu8l64Hu36U3+4VtrqIUPFGiXBRSvOfzc8mZGRG70 Y1LdD4pp5AxGS+YTU5bdlUPeVzmH0qpbs/5vA8taoEvRxLtkN4K6lICAEca1/B4y gtbU7Dci5J/xkB5Sx9o2NHRx0kbC9CO/aEMyAgv3DM+8091X7UV6k0hw+02YKJBr n/OzKmI5X1lyGZ0K3oYZtZpOKmIPJgFRYpNspzzdWPzwBBf5ra2LnSGW5rG5wxej 1Rgtr0Mg/diue/aOf8KO7hvpnCgeLYRyzNsC9wY5OKi2dthUFD//PgoouScllpzP BOj4Q== From: Jonah Palmer To: qemu-devel@nongnu.org Cc: eperezma@redhat.com, mst@redhat.com, leiyang@redhat.com, peterx@redhat.com, dtatulea@nvidia.com, jasowang@redhat.com, si-wei.liu@oracle.com, boris.ostrovsky@oracle.com, jonah.palmer@oracle.com Subject: [RFC 1/2] vhost-vdpa: Decouple the IOVA allocator Date: Wed, 21 Aug 2024 08:55:45 -0400 Message-ID: <20240821125548.749143-2-jonah.palmer@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240821125548.749143-1-jonah.palmer@oracle.com> References: <20240821125548.749143-1-jonah.palmer@oracle.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-08-21_09,2024-08-19_03,2024-05-17_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 phishscore=0 adultscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2407110000 definitions=main-2408210094 X-Proofpoint-GUID: lyrnDbKEUTbL36xAziOo72mTThDazO5N X-Proofpoint-ORIG-GUID: lyrnDbKEUTbL36xAziOo72mTThDazO5N Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.177.32; envelope-from=jonah.palmer@oracle.com; helo=mx0b-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1724245033521116600 Content-Type: text/plain; charset="utf-8" Decouples the IOVA allocator from the IOVA->HVA tree and instead adds the allocated IOVA range to an IOVA-only tree (iova_map). This IOVA tree will hold all IOVA ranges that have been allocated (e.g. in the IOVA->HVA tree) and are removed when any IOVA ranges are deallocated. A new API function vhost_iova_tree_insert() is also created to add a IOVA->HVA mapping into the IOVA->HVA tree. Signed-off-by: Jonah Palmer --- hw/virtio/vhost-iova-tree.c | 38 ++++++++++++++++++++++++++++++++----- hw/virtio/vhost-iova-tree.h | 1 + hw/virtio/vhost-vdpa.c | 31 ++++++++++++++++++++++++------ net/vhost-vdpa.c | 13 +++++++++++-- 4 files changed, 70 insertions(+), 13 deletions(-) diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c index 3d03395a77..32c03db2f5 100644 --- a/hw/virtio/vhost-iova-tree.c +++ b/hw/virtio/vhost-iova-tree.c @@ -28,12 +28,17 @@ struct VhostIOVATree { =20 /* IOVA address to qemu memory maps. */ IOVATree *iova_taddr_map; + + /* IOVA tree (IOVA allocator) */ + IOVATree *iova_map; }; =20 /** - * Create a new IOVA tree + * Create a new VhostIOVATree with a new set of IOVATree's: + * - IOVA allocator (iova_map) + * - IOVA->HVA tree (iova_taddr_map) * - * Returns the new IOVA tree + * Returns the new VhostIOVATree */ VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last) { @@ -44,6 +49,7 @@ VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwa= ddr iova_last) tree->iova_last =3D iova_last; =20 tree->iova_taddr_map =3D iova_tree_new(); + tree->iova_map =3D iova_tree_new(); return tree; } =20 @@ -53,6 +59,7 @@ VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwa= ddr iova_last) void vhost_iova_tree_delete(VhostIOVATree *iova_tree) { iova_tree_destroy(iova_tree->iova_taddr_map); + iova_tree_destroy(iova_tree->iova_map); g_free(iova_tree); } =20 @@ -88,13 +95,12 @@ int vhost_iova_tree_map_alloc(VhostIOVATree *tree, DMAM= ap *map) /* Some vhost devices do not like addr 0. Skip first page */ hwaddr iova_first =3D tree->iova_first ?: qemu_real_host_page_size(); =20 - if (map->translated_addr + map->size < map->translated_addr || - map->perm =3D=3D IOMMU_NONE) { + if (map->perm =3D=3D IOMMU_NONE) { return IOVA_ERR_INVALID; } =20 /* Allocate a node in IOVA address */ - return iova_tree_alloc_map(tree->iova_taddr_map, map, iova_first, + return iova_tree_alloc_map(tree->iova_map, map, iova_first, tree->iova_last); } =20 @@ -107,4 +113,26 @@ int vhost_iova_tree_map_alloc(VhostIOVATree *tree, DMA= Map *map) void vhost_iova_tree_remove(VhostIOVATree *iova_tree, DMAMap map) { iova_tree_remove(iova_tree->iova_taddr_map, map); + iova_tree_remove(iova_tree->iova_map, map); +} + +/** + * Insert a new mapping to the IOVA->HVA tree + * + * @tree: The VhostIOVATree + * @map: The iova map + * + * Returns: + * - IOVA_OK if the map fits in the container + * - IOVA_ERR_INVALID if the map does not make sense (like size overflow) + * - IOVA_ERR_OVERLAP if the IOVA range overlaps with an existing range + */ +int vhost_iova_tree_insert(VhostIOVATree *iova_tree, DMAMap *map) +{ + if (map->translated_addr + map->size < map->translated_addr || + map->perm =3D=3D IOMMU_NONE) { + return IOVA_ERR_INVALID; + } + + return iova_tree_insert(iova_tree->iova_taddr_map, map); } diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h index 4adfd79ff0..8bf7b64786 100644 --- a/hw/virtio/vhost-iova-tree.h +++ b/hw/virtio/vhost-iova-tree.h @@ -23,5 +23,6 @@ const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATr= ee *iova_tree, const DMAMap *map); int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map); void vhost_iova_tree_remove(VhostIOVATree *iova_tree, DMAMap map); +int vhost_iova_tree_insert(VhostIOVATree *iova_tree, DMAMap *map); =20 #endif diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index 3cdaa12ed5..6702459065 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -361,10 +361,10 @@ static void vhost_vdpa_listener_region_add(MemoryList= ener *listener, if (s->shadow_data) { int r; =20 - mem_region.translated_addr =3D (hwaddr)(uintptr_t)vaddr, mem_region.size =3D int128_get64(llsize) - 1, mem_region.perm =3D IOMMU_ACCESS_FLAG(true, section->readonly), =20 + /* Allocate IOVA range and add the mapping to the IOVA tree */ r =3D vhost_iova_tree_map_alloc(s->iova_tree, &mem_region); if (unlikely(r !=3D IOVA_OK)) { error_report("Can't allocate a mapping (%d)", r); @@ -372,6 +372,14 @@ static void vhost_vdpa_listener_region_add(MemoryListe= ner *listener, } =20 iova =3D mem_region.iova; + + /* Add mapping to the IOVA->HVA tree */ + mem_region.translated_addr =3D (hwaddr)(uintptr_t)vaddr; + r =3D vhost_iova_tree_insert(s->iova_tree, &mem_region); + if (unlikely(r !=3D IOVA_OK)) { + error_report("Can't add listener region mapping (%d)", r); + goto fail_map; + } } =20 vhost_vdpa_iotlb_batch_begin_once(s); @@ -1142,19 +1150,30 @@ static void vhost_vdpa_svq_unmap_rings(struct vhost= _dev *dev, * * @v: Vhost-vdpa device * @needle: The area to search iova + * @taddr: The translated address (SVQ HVA) * @errorp: Error pointer */ static bool vhost_vdpa_svq_map_ring(struct vhost_vdpa *v, DMAMap *needle, - Error **errp) + hwaddr taddr, Error **errp) { int r; =20 + /* Allocate IOVA range and add the mapping to the IOVA tree */ r =3D vhost_iova_tree_map_alloc(v->shared->iova_tree, needle); if (unlikely(r !=3D IOVA_OK)) { error_setg(errp, "Cannot allocate iova (%d)", r); return false; } =20 + /* Add mapping to the IOVA->HVA tree */ + needle->translated_addr =3D taddr; + r =3D vhost_iova_tree_insert(v->shared->iova_tree, needle); + if (unlikely(r !=3D IOVA_OK)) { + error_setg(errp, "Cannot add SVQ vring mapping (%d)", r); + vhost_iova_tree_remove(v->shared->iova_tree, *needle); + return false; + } + r =3D vhost_vdpa_dma_map(v->shared, v->address_space_id, needle->iova, needle->size + 1, (void *)(uintptr_t)needle->translated_addr, @@ -1192,11 +1211,11 @@ static bool vhost_vdpa_svq_map_rings(struct vhost_d= ev *dev, vhost_svq_get_vring_addr(svq, &svq_addr); =20 driver_region =3D (DMAMap) { - .translated_addr =3D svq_addr.desc_user_addr, .size =3D driver_size - 1, .perm =3D IOMMU_RO, }; - ok =3D vhost_vdpa_svq_map_ring(v, &driver_region, errp); + ok =3D vhost_vdpa_svq_map_ring(v, &driver_region, svq_addr.desc_user_a= ddr, + errp); if (unlikely(!ok)) { error_prepend(errp, "Cannot create vq driver region: "); return false; @@ -1206,11 +1225,11 @@ static bool vhost_vdpa_svq_map_rings(struct vhost_d= ev *dev, addr->avail_user_addr =3D driver_region.iova + avail_offset; =20 device_region =3D (DMAMap) { - .translated_addr =3D svq_addr.used_user_addr, .size =3D device_size - 1, .perm =3D IOMMU_RW, }; - ok =3D vhost_vdpa_svq_map_ring(v, &device_region, errp); + ok =3D vhost_vdpa_svq_map_ring(v, &device_region, svq_addr.used_user_a= ddr, + errp); if (unlikely(!ok)) { error_prepend(errp, "Cannot create vq device region: "); vhost_vdpa_svq_unmap_ring(v, driver_region.translated_addr); diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 03457ead66..81da956b92 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -512,15 +512,24 @@ static int vhost_vdpa_cvq_map_buf(struct vhost_vdpa *= v, void *buf, size_t size, DMAMap map =3D {}; int r; =20 - map.translated_addr =3D (hwaddr)(uintptr_t)buf; map.size =3D size - 1; map.perm =3D write ? IOMMU_RW : IOMMU_RO, + + /* Allocate IOVA range and add the mapping to the IOVA tree */ r =3D vhost_iova_tree_map_alloc(v->shared->iova_tree, &map); if (unlikely(r !=3D IOVA_OK)) { - error_report("Cannot map injected element"); + error_report("Cannot allocate IOVA range for injected element"); return r; } =20 + /* Add mapping to the IOVA->HVA tree */ + map.translated_addr =3D (hwaddr)(uintptr_t)buf; + r =3D vhost_iova_tree_insert(v->shared->iova_tree, &map); + if (unlikely(r !=3D IOVA_OK)) { + error_report("Cannot map injected element into IOVA->HVA tree"); + goto dma_map_err; + } + r =3D vhost_vdpa_dma_map(v->shared, v->address_space_id, map.iova, vhost_vdpa_net_cvq_cmd_page_len(), buf, !write); if (unlikely(r < 0)) { --=20 2.43.5 From nobody Sun Nov 24 09:54:20 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=reject dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1724245010; cv=none; d=zohomail.com; s=zohoarc; b=eS43y4gxmC0pAy87sr3mipgwdD1huldJmtXny/LgnheEhONvrHw5pJ5kwG50nnL0ksrST/578+xAsRnrLYaOf9yx+U4f4gKjiSedvqe8+E+yRodh9DydVUh1lXeD5OKkqNNlRUcRoRoojlLCw0y7KTfiFwl+IorPJ4ygDEnDjZ0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1724245010; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=0LQEa7498TyMDmmIEdio1sXsd3gmjbCq4DuBSqWdik4=; b=cJ2WO78d1OER+F1q8/W2+gHvFqB4js/UDXQ8cJN6yrZLMJ9LpTuStRrGgl3OE5fvwaCeMw1cfALI1cyWB32Ydck5hO0uUy81CX8Av4qHHvYjMy//cOU0bR9TQi/D9xBVy3rZvB2XLVqbXsmGEl1DFdnhJYwU/+9dm9SeIxxvp8M= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1724245010583349.24791051073896; Wed, 21 Aug 2024 05:56:50 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sgksf-0003kN-7J; Wed, 21 Aug 2024 08:56:05 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sgksb-0003jA-7o for qemu-devel@nongnu.org; Wed, 21 Aug 2024 08:56:01 -0400 Received: from mx0b-00069f02.pphosted.com ([205.220.177.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sgksW-0004yr-Nj for qemu-devel@nongnu.org; Wed, 21 Aug 2024 08:55:59 -0400 Received: from pps.filterd (m0246631.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 47LBtRLF031722; Wed, 21 Aug 2024 12:55:54 GMT Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.appoci.oracle.com [147.154.18.20]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 412m2dfgse-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 21 Aug 2024 12:55:54 +0000 (GMT) Received: from pps.filterd (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 47LBO408011438; Wed, 21 Aug 2024 12:55:53 GMT Received: from pps.reinject (localhost [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 415fb23m6f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 21 Aug 2024 12:55:53 +0000 Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 47LCtpoS002465; Wed, 21 Aug 2024 12:55:52 GMT Received: from jonah-ol8.us.oracle.com (dhcp-10-39-201-48.vpn.oracle.com [10.39.201.48]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTP id 415fb23m55-3; Wed, 21 Aug 2024 12:55:52 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=corp-2023-11-20; bh=0 LQEa7498TyMDmmIEdio1sXsd3gmjbCq4DuBSqWdik4=; b=fN4KuZTG+fDNU0fJh eWlOQ52N93QJlDe57EcSws/6VOzh1CQcUXssRtuLguQquwe5o7oJ4k85uNLYdbxG MI/s/f9Tzo3ZMZK5NIb6l5Um5NX5wrokRzeNymj/2ixMaeP60Pz/LVPAlcDxinmG 7MPOuAyWkUjQf8zVODB4XeKK8ey5+VtOFeuPOP3wXg/i7uQGQvV0gjespfRieSYY wM57Rl6p0edYvpaNaMXHZbNMl+rtQym7yURZsVwMafLqn9QK6MP4QiUy3sWw/iVz pLP5Z2+yMsX4i3KA2fy39IRm/Gpm0JazSo4qJrxlOkmX6rpH+AZPQjRyX3FTOM9v y9whw== From: Jonah Palmer To: qemu-devel@nongnu.org Cc: eperezma@redhat.com, mst@redhat.com, leiyang@redhat.com, peterx@redhat.com, dtatulea@nvidia.com, jasowang@redhat.com, si-wei.liu@oracle.com, boris.ostrovsky@oracle.com, jonah.palmer@oracle.com Subject: [RFC 2/2] vhost-vdpa: Implement GPA->IOVA & IOVA->SVQ HVA trees Date: Wed, 21 Aug 2024 08:55:46 -0400 Message-ID: <20240821125548.749143-3-jonah.palmer@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240821125548.749143-1-jonah.palmer@oracle.com> References: <20240821125548.749143-1-jonah.palmer@oracle.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-08-21_09,2024-08-19_03,2024-05-17_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 phishscore=0 adultscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2407110000 definitions=main-2408210094 X-Proofpoint-GUID: KovVKa32aR9q6P1azJNfVGdSo7WwM3n- X-Proofpoint-ORIG-GUID: KovVKa32aR9q6P1azJNfVGdSo7WwM3n- Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.177.32; envelope-from=jonah.palmer@oracle.com; helo=mx0b-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1724245011679116600 Content-Type: text/plain; charset="utf-8" Implements a GPA->IOVA and IOVA->SVQ HVA tree for handling mapping, unmapping, and translations for guest and host-only memory, respectively. By splitting up a full IOVA->HVA tree (containing both guest and host-only memory mappings) into a GPA->IOVA tree (containing only guest memory mappings) and a IOVA->SVQ HVA tree (containing host-only memory mappings), we can avoid translating to the wrong IOVA when the guest has overlapping memory regions where different GPAs lead to the same HVA. In other words, if the guest has overlapping memory regions, translating an HVA to an IOVA may result in receiving an incorrect IOVA when searching the full IOVA->HVA tree. This would be due to one HVA range being contained (overlapping) in another HVA range in the IOVA->HVA tree. To avoid this issue, creating a GPA->IOVA tree and using it to translate a GPA to an IOVA ensures that the IOVA we receive is the correct one (instead of relying on a HVA->IOVA translation). As a byproduct of creating a GPA->IOVA tree, the full IOVA->HVA tree now becomes a partial IOVA->SVQ HVA tree. That is, since we're moving all guest memory mappings to the GPA->IOVA tree, the host-only memory mappings are now the only mappings being put into the IOVA->HVA tree. Furthermore, as an additional byproduct of splitting up guest and host-only memory mappings into separate trees, special attention needs to be paid to vhost_svq_translate_addr() when translating memory buffers from iovec. The memory buffers from iovec can be backed by guest memory or host-only memory, which means that we need to figure out who is backing these buffers and then decide which tree to use for translating it. In this patch we determine the backer of this buffer by first checking if a RAM block can be inferred from the buffer's HVA. That is, we use qemu_ram_block_from_host() and if a valid RAM block is returned, we know the buffer's HVA is backed by guest memory. Then we derive the GPA from it and translate the GPA to an IOVA using the GPA->IOVA tree. If an invalid RAM block is returned, the buffer's HVA is likely backed by host-only memory. In this case, we can then simply translate the HVA to an IOVA using the partial IOVA->SVQ HVA tree. However, this method is sub-optimal, especially for memory buffers backed by host-only memory, due to needing to iterate over some (if not all) RAMBlock structures and then searching either the GPA->IOVA tree or the IOVA->SVQ HVA tree. Optimizations to improve performance in this area should be revisited at some point. Signed-off-by: Jonah Palmer --- hw/virtio/vhost-iova-tree.c | 53 +++++++++++++++++++++++++++++- hw/virtio/vhost-iova-tree.h | 5 ++- hw/virtio/vhost-shadow-virtqueue.c | 48 +++++++++++++++++++++++---- hw/virtio/vhost-vdpa.c | 18 +++++----- include/qemu/iova-tree.h | 22 +++++++++++++ util/iova-tree.c | 46 ++++++++++++++++++++++++++ 6 files changed, 173 insertions(+), 19 deletions(-) diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c index 32c03db2f5..5a3f6b5cd9 100644 --- a/hw/virtio/vhost-iova-tree.c +++ b/hw/virtio/vhost-iova-tree.c @@ -26,15 +26,19 @@ struct VhostIOVATree { /* Last addressable iova address in the device */ uint64_t iova_last; =20 - /* IOVA address to qemu memory maps. */ + /* IOVA address to qemu SVQ memory maps. */ IOVATree *iova_taddr_map; =20 /* IOVA tree (IOVA allocator) */ IOVATree *iova_map; + + /* GPA->IOVA tree */ + IOVATree *gpa_map; }; =20 /** * Create a new VhostIOVATree with a new set of IOVATree's: + * - GPA->IOVA tree (gpa_map) * - IOVA allocator (iova_map) * - IOVA->HVA tree (iova_taddr_map) * @@ -50,6 +54,7 @@ VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwa= ddr iova_last) =20 tree->iova_taddr_map =3D iova_tree_new(); tree->iova_map =3D iova_tree_new(); + tree->gpa_map =3D gpa_tree_new(); return tree; } =20 @@ -136,3 +141,49 @@ int vhost_iova_tree_insert(VhostIOVATree *iova_tree, D= MAMap *map) =20 return iova_tree_insert(iova_tree->iova_taddr_map, map); } + +/** + * Insert a new GPA->IOVA mapping to the GPA->IOVA tree + * + * @iova_tree: The VhostIOVATree + * @map: The GPA->IOVA mapping + * + * Returns: + * - IOVA_OK if the map fits in the container + * - IOVA_ERR_INVALID if the map does not make sense (like size overflow) + * - IOVA_ERR_OVERLAP if the GPA range overlaps with an existing range + */ +int vhost_gpa_tree_insert(VhostIOVATree *iova_tree, DMAMap *map) +{ + if (map->iova + map->size < map->iova || map->perm =3D=3D IOMMU_NONE) { + return IOVA_ERR_INVALID; + } + + return gpa_tree_insert(iova_tree->gpa_map, map); +} + +/** + * Find the IOVA address stored from a guest memory address (GPA) + * + * @tree: The VhostIOVATree + * @map: The map with the guest memory address + * + * Return the stored mapping, or NULL if not found. + */ +const DMAMap *vhost_gpa_tree_find_iova(const VhostIOVATree *tree, + const DMAMap *map) +{ + return iova_tree_find_iova(tree->gpa_map, map); +} + +/** + * Remove existing mappings from the GPA->IOVA tree and IOVA tree + * + * @iova_tree: The VhostIOVATree + * @map: The map to remove + */ +void vhost_gpa_tree_remove(VhostIOVATree *iova_tree, DMAMap map) +{ + iova_tree_remove(iova_tree->gpa_map, map); + iova_tree_remove(iova_tree->iova_map, map); +} diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h index 8bf7b64786..c22941db4f 100644 --- a/hw/virtio/vhost-iova-tree.h +++ b/hw/virtio/vhost-iova-tree.h @@ -24,5 +24,8 @@ const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATr= ee *iova_tree, int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map); void vhost_iova_tree_remove(VhostIOVATree *iova_tree, DMAMap map); int vhost_iova_tree_insert(VhostIOVATree *iova_tree, DMAMap *map); - +int vhost_gpa_tree_insert(VhostIOVATree *iova_tree, DMAMap *map); +const DMAMap *vhost_gpa_tree_find_iova(const VhostIOVATree *iova_tree, + const DMAMap *map); +void vhost_gpa_tree_remove(VhostIOVATree *iova_tree, DMAMap map); #endif diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-vi= rtqueue.c index fc5f408f77..12eabddaa6 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -16,6 +16,7 @@ #include "qemu/log.h" #include "qemu/memalign.h" #include "linux-headers/linux/vhost.h" +#include "exec/ramblock.h" =20 /** * Validate the transport device features that both guests can use with th= e SVQ @@ -88,14 +89,45 @@ static bool vhost_svq_translate_addr(const VhostShadowV= irtqueue *svq, } =20 for (size_t i =3D 0; i < num; ++i) { - DMAMap needle =3D { - .translated_addr =3D (hwaddr)(uintptr_t)iovec[i].iov_base, - .size =3D iovec[i].iov_len, - }; - Int128 needle_last, map_last; - size_t off; + RAMBlock *rb; + hwaddr gpa; + ram_addr_t offset; + const DMAMap *map; + DMAMap needle; + + /* + * Determine if this HVA is backed by guest memory by attempting to + * infer a RAM block from it. If a valid RAM block is returned, the + * VA is backed by guest memory and we can derive the GPA from it. + * Then search the GPA->IOVA tree for the corresponding IOVA. + * + * If the RAM block is invalid, the HVA is likely backed by host-o= nly + * memory. Use the HVA to search the IOVA->HVA tree for the + * corresponding IOVA. + * + * TODO: This additional second lookup is sub-optimal when the HVA + * is backed by host-only memory. Find optimizations for this + * (e.g. using an HVA->IOVA tree). + */ + rb =3D qemu_ram_block_from_host(iovec[i].iov_base, false, &offset); + if (rb) { + gpa =3D rb->offset + offset; + + /* Search the GPA->IOVA tree */ + needle =3D (DMAMap) { + .translated_addr =3D gpa, + .size =3D iovec[i].iov_len, + }; + map =3D vhost_gpa_tree_find_iova(svq->iova_tree, &needle); + } else { + /* Search the IOVA->HVA tree */ + needle =3D (DMAMap) { + .translated_addr =3D (hwaddr)(uintptr_t)iovec[i].iov_base, + .size =3D iovec[i].iov_len, + }; + map =3D vhost_iova_tree_find_iova(svq->iova_tree, &needle); + } =20 - const DMAMap *map =3D vhost_iova_tree_find_iova(svq->iova_tree, &n= eedle); /* * Map cannot be NULL since iova map contains all guest space and * qemu already has a physical address mapped @@ -106,6 +138,8 @@ static bool vhost_svq_translate_addr(const VhostShadowV= irtqueue *svq, needle.translated_addr); return false; } + Int128 needle_last, map_last; + size_t off; =20 off =3D needle.translated_addr - map->translated_addr; addrs[i] =3D map->iova + off; diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index 6702459065..0da0a117dc 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -373,9 +373,9 @@ static void vhost_vdpa_listener_region_add(MemoryListen= er *listener, =20 iova =3D mem_region.iova; =20 - /* Add mapping to the IOVA->HVA tree */ - mem_region.translated_addr =3D (hwaddr)(uintptr_t)vaddr; - r =3D vhost_iova_tree_insert(s->iova_tree, &mem_region); + /* Add mapping to the GPA->IOVA tree */ + mem_region.translated_addr =3D section->offset_within_address_spac= e; + r =3D vhost_gpa_tree_insert(s->iova_tree, &mem_region); if (unlikely(r !=3D IOVA_OK)) { error_report("Can't add listener region mapping (%d)", r); goto fail_map; @@ -394,7 +394,7 @@ static void vhost_vdpa_listener_region_add(MemoryListen= er *listener, =20 fail_map: if (s->shadow_data) { - vhost_iova_tree_remove(s->iova_tree, mem_region); + vhost_gpa_tree_remove(s->iova_tree, mem_region); } =20 fail: @@ -448,21 +448,19 @@ static void vhost_vdpa_listener_region_del(MemoryList= ener *listener, =20 if (s->shadow_data) { const DMAMap *result; - const void *vaddr =3D memory_region_get_ram_ptr(section->mr) + - section->offset_within_region + - (iova - section->offset_within_address_space); DMAMap mem_region =3D { - .translated_addr =3D (hwaddr)(uintptr_t)vaddr, + .translated_addr =3D section->offset_within_address_space, .size =3D int128_get64(llsize) - 1, }; =20 - result =3D vhost_iova_tree_find_iova(s->iova_tree, &mem_region); + /* Search the GPA->IOVA tree */ + result =3D vhost_gpa_tree_find_iova(s->iova_tree, &mem_region); if (!result) { /* The memory listener map wasn't mapped */ return; } iova =3D result->iova; - vhost_iova_tree_remove(s->iova_tree, *result); + vhost_gpa_tree_remove(s->iova_tree, *result); } vhost_vdpa_iotlb_batch_begin_once(s); /* diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h index 2a10a7052e..57cfc63d33 100644 --- a/include/qemu/iova-tree.h +++ b/include/qemu/iova-tree.h @@ -40,6 +40,15 @@ typedef struct DMAMap { } QEMU_PACKED DMAMap; typedef gboolean (*iova_tree_iterator)(DMAMap *map); =20 +/** + * gpa_tree_new: + * + * Create a new GPA->IOVA tree. + * + * Returns: the tree pointer on success, or NULL otherwise. + */ +IOVATree *gpa_tree_new(void); + /** * iova_tree_new: * @@ -49,6 +58,19 @@ typedef gboolean (*iova_tree_iterator)(DMAMap *map); */ IOVATree *iova_tree_new(void); =20 +/** + * gpa_tree_insert: + * + * @tree: The GPA->IOVA tree we're inserting the mapping to + * @map: The GPA->IOVA mapping to insert + * + * Insert a GPA range to the GPA->IOVA tree. If there are overlapped + * ranges, IOVA_ERR_OVERLAP will be returned. + * + * Return: 0 if success, or < 0 if error. + */ +int gpa_tree_insert(IOVATree *tree, const DMAMap *map); + /** * iova_tree_insert: * diff --git a/util/iova-tree.c b/util/iova-tree.c index 536789797e..e3f50fbf5c 100644 --- a/util/iova-tree.c +++ b/util/iova-tree.c @@ -71,6 +71,22 @@ static int iova_tree_compare(gconstpointer a, gconstpoin= ter b, gpointer data) return 0; } =20 +static int gpa_tree_compare(gconstpointer a, gconstpointer b, gpointer dat= a) +{ + const DMAMap *m1 =3D a, *m2 =3D b; + + if (m1->translated_addr > m2->translated_addr + m2->size) { + return 1; + } + + if (m1->translated_addr + m1->size < m2->translated_addr) { + return -1; + } + + /* Overlapped */ + return 0; +} + IOVATree *iova_tree_new(void) { IOVATree *iova_tree =3D g_new0(IOVATree, 1); @@ -81,6 +97,15 @@ IOVATree *iova_tree_new(void) return iova_tree; } =20 +IOVATree *gpa_tree_new(void) +{ + IOVATree *gpa_tree =3D g_new0(IOVATree, 1); + + gpa_tree->tree =3D g_tree_new_full(gpa_tree_compare, NULL, g_free, NUL= L); + + return gpa_tree; +} + const DMAMap *iova_tree_find(const IOVATree *tree, const DMAMap *map) { return g_tree_lookup(tree->tree, map); @@ -128,6 +153,27 @@ static inline void iova_tree_insert_internal(GTree *gt= ree, DMAMap *range) g_tree_insert(gtree, range, range); } =20 +int gpa_tree_insert(IOVATree *tree, const DMAMap *map) +{ + DMAMap *new; + + if (map->translated_addr + map->size < map->translated_addr || + map->perm =3D=3D IOMMU_NONE) { + return IOVA_ERR_INVALID; + } + + /* We don't allow inserting ranges that overlap with existing ones */ + if (iova_tree_find(tree, map)) { + return IOVA_ERR_OVERLAP; + } + + new =3D g_new0(DMAMap, 1); + memcpy(new, map, sizeof(*new)); + iova_tree_insert_internal(tree->tree, new); + + return IOVA_OK; +} + int iova_tree_insert(IOVATree *tree, const DMAMap *map) { DMAMap *new; --=20 2.43.5