From nobody Fri May 17 03:00:37 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of redhat.com designates 170.10.133.124 as permitted sender) client-ip=170.10.133.124; envelope-from=libvir-list-bounces@redhat.com; helo=us-smtp-delivery-124.mimecast.com; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1661178362; cv=none; d=zohomail.com; s=zohoarc; b=kj6QBqxwQpYx/D+TkyzVfo37QN78gbCJlTTW6FioIw1ZbqesDPi5b//zPLoVDlE1O8khGg3X/GXdbdJePIJAGoQOOO4N7wYZC2JxKd3EbPvajrySenFr9rmk0GjJl5GKeLjHXiZtPHvfUMUk8JcM5b91ikmiRVNGcpzL5aX79iE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1661178362; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Sender:Subject:To; bh=PN29DHbB/+CUR8VhVjFtInzrdlFGK8jVXINnmM/lRDc=; b=HKxTgWFdc9I0Kkw+z+5/g4L7/c7+tetkXmTMzI75/znGcG9zioyo7huj7jzPCzfVRYOLq+ue5qwA6UtCLwg22Ys355Pgh3qqbNHr1f0/mEywMcy1B8g8OvhnlklXxqxJBfP8kNGmnvPqRH00PB3M0xIvR101sbXkHDa/OoTvDK8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mx.zohomail.com with SMTPS id 1661178362232926.5390096075885; Mon, 22 Aug 2022 07:26:02 -0700 (PDT) Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-14-MQgr754cMra-bn6rzy2q2w-1; Mon, 22 Aug 2022 10:25:57 -0400 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DE0663817A75; Mon, 22 Aug 2022 14:25:55 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (unknown [10.30.29.100]) by smtp.corp.redhat.com (Postfix) with ESMTP id 68E7C492C3B; Mon, 22 Aug 2022 14:25:55 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (localhost [IPv6:::1]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 4043A1946A52; Mon, 22 Aug 2022 14:25:55 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 3DFB61946A47 for ; Mon, 22 Aug 2022 14:25:53 +0000 (UTC) Received: by smtp.corp.redhat.com (Postfix) id 111AA4010E3C; Mon, 22 Aug 2022 14:25:53 +0000 (UTC) Received: from himantopus.redhat.com (unknown [10.22.9.8]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DD36F4010D2A; Mon, 22 Aug 2022 14:25:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661178361; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=PN29DHbB/+CUR8VhVjFtInzrdlFGK8jVXINnmM/lRDc=; b=cGghikNp6S+jsTUF6KYJvOZve/XMXxx+ykKHR4UAr0fr3V495zswbAYvdgUC0SeltARAsO HkNrj2icCSLFbvJI92WshtlSuD8JSgZYaxqaFtqkd2Nfkmc8rld6t62mswjjooK3apbK3H 5ecnsk40hQp5vO23O2mjHL2hW3FB+lA= X-MC-Unique: MQgr754cMra-bn6rzy2q2w-1 X-Original-To: libvir-list@listman.corp.redhat.com From: Jonathon Jongsma To: libvir-list@redhat.com Subject: [libvirt PATCH] qemu: adjust memlock for multiple vfio/vdpa devices Date: Mon, 22 Aug 2022 09:25:52 -0500 Message-Id: <20220822142552.3110201-1-jjongsma@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.11.54.2 X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: lstump@redhat.com Errors-To: libvir-list-bounces@redhat.com Sender: "libvir-list" X-Scanned-By: MIMEDefang 2.85 on 10.11.54.9 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1661178364596100001 Content-Type: text/plain; charset="utf-8"; x-default="true" When multiple VFIO or VDPA devices are assigned to a guest, the guest can fail to start because the guest fails to map enough memory. For example, the case mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=3D1994893 results in this failure: 2021-08-05T09:51:47.692578Z qemu-kvm: failed to write, fd=3D24, errno= =3D14 (Bad address) 2021-08-05T09:51:47.692590Z qemu-kvm: vhost vdpa map fail! 2021-08-05T09:51:47.692594Z qemu-kvm: vhost-vdpa: DMA mapping failed, u= nable to continue The current memlock limit calculation does not work for scenarios where there are multiple such devices assigned to a guest. The root causes are a little bit different between VFIO and VDPA devices. For VFIO devices, the issue only occurs when a vIOMMU is present. In this scenario, each vfio device is assigned a separate AddressSpace fully mapping guest RAM. When there is no vIOMMU, the devices are all within the same AddressSpace so no additional memory limit is needed. For VDPA devices, each device requires the full memory to be mapped regardless of whether there is a vIOMMU or not. In order to enable these scenarios, we need to multiply memlock limit by the number of VDPA devices plus the number of VFIO devices for guests with a vIOMMU. This has the potential for pushing the memlock limit above the host physical memory and negating any protection that these locked memory limits are providing, but there is no other short-term solution. In the future, there should be have a revised userspace iommu interface (iommufd) that the VFIO and VDPA backends can make use of. This will be able to share locked memory limits between both vfio and vdpa use cases and address spaces and then we can disable these short term hacks. But this is still in development upstream. Signed-off-by: Jonathon Jongsma Reviewed-by: Laine Stump --- src/qemu/qemu_domain.c | 56 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 54 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 45f00e162d..a1e91ef48f 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -9233,6 +9233,40 @@ getPPC64MemLockLimitBytes(virDomainDef *def, } =20 =20 +static int +qemuDomainGetNumVFIODevices(const virDomainDef *def) +{ + int i; + int n =3D 0; + + for (i =3D 0; i < def->nhostdevs; i++) { + if (virHostdevIsVFIODevice(def->hostdevs[i]) || + virHostdevIsMdevDevice(def->hostdevs[i])) + n++; + } + for (i =3D 0; i < def->ndisks; i++) { + if (virStorageSourceChainHasNVMe(def->disks[i]->src)) + n++; + } + return n; +} + + +static int +qemuDomainGetNumVDPANetDevices(const virDomainDef *def) +{ + int i; + int n =3D 0; + + for (i =3D 0; i < def->nnets; i++) { + if (virDomainNetGetActualType(def->nets[i]) =3D=3D VIR_DOMAIN_NET_= TYPE_VDPA) + n++; + } + + return n; +} + + /** * qemuDomainGetMemLockLimitBytes: * @def: domain definition @@ -9252,6 +9286,8 @@ qemuDomainGetMemLockLimitBytes(virDomainDef *def, bool forceVFIO) { unsigned long long memKB =3D 0; + int nvfio; + int nvdpa; =20 /* prefer the hard limit */ if (virMemoryLimitIsSet(def->mem.hard_limit)) { @@ -9270,6 +9306,8 @@ qemuDomainGetMemLockLimitBytes(virDomainDef *def, if (ARCH_IS_PPC64(def->os.arch) && def->virtType =3D=3D VIR_DOMAIN_VIR= T_KVM) return getPPC64MemLockLimitBytes(def, forceVFIO); =20 + nvfio =3D qemuDomainGetNumVFIODevices(def); + nvdpa =3D qemuDomainGetNumVDPANetDevices(def); /* For device passthrough using VFIO the guest memory and MMIO memory * regions need to be locked persistent in order to allow DMA. * @@ -9288,8 +9326,22 @@ qemuDomainGetMemLockLimitBytes(virDomainDef *def, * * Note that this may not be valid for all platforms. */ - if (forceVFIO || qemuDomainNeedsVFIO(def) || virDomainDefHasVDPANet(de= f)) - memKB =3D virDomainDefGetMemoryTotal(def) + 1024 * 1024; + if (forceVFIO || nvfio || nvdpa) { + /* At present, the full memory needs to be locked for each VFIO / = VDPA + * device. For VFIO devices, this only applies when there is a vIO= MMU + * present. Yes, this may result in a memory limit that is greater= than + * the host physical memory, which is not ideal. The long-term sol= ution + * is a new userspace iommu interface (iommufd) which should elimi= nate + * this duplicate memory accounting. But for now this is the only = way + * to enable configurations with e.g. multiple vdpa devices. + */ + int factor =3D nvdpa; + + if (def->iommu) + factor +=3D nvfio; + + memKB =3D MAX(factor, 1) * virDomainDefGetMemoryTotal(def) + 1024 = * 1024; + } =20 return memKB << 10; } --=20 2.37.1