From nobody Sun Feb 8 05:27:09 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; envelope-from=libvir-list-bounces@redhat.com; helo=mx1.redhat.com; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=fail(p=none dis=none) header.from=gmail.com Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by mx.zohomail.com with SMTPS id 1551790117755700.8142706016359; Tue, 5 Mar 2019 04:48:37 -0800 (PST) Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D7785308FF4F; Tue, 5 Mar 2019 12:48:35 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id AA00160141; Tue, 5 Mar 2019 12:48:35 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 6AD173FAF5; Tue, 5 Mar 2019 12:48:35 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id x25Cm7EO027806 for ; Tue, 5 Mar 2019 07:48:07 -0500 Received: by smtp.corp.redhat.com (Postfix) id 85FA61A835; Tue, 5 Mar 2019 12:48:07 +0000 (UTC) Received: from mx1.redhat.com (ext-mx18.extmail.prod.ext.phx2.redhat.com [10.5.110.47]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A2A5B3DA4; Tue, 5 Mar 2019 12:48:05 +0000 (UTC) Received: from mail-qk1-f193.google.com (mail-qk1-f193.google.com [209.85.222.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9086130821B2; Tue, 5 Mar 2019 12:48:04 +0000 (UTC) Received: by mail-qk1-f193.google.com with SMTP id y140so4674470qkb.9; Tue, 05 Mar 2019 04:48:04 -0800 (PST) Received: from rekt.ibmmodules.com ([179.228.153.14]) by smtp.gmail.com with ESMTPSA id y49sm2019848qtk.23.2019.03.05.04.48.01 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 05 Mar 2019 04:48:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=MrE4d0Mr2naTAPSLnu68nLCGAS+Fzw5I3dEqt32J5GY=; b=k8MJMHkd56uqXWUXfyG5Lciag8wYXJKDxuKuJky4608vfS4hE5feTFQ8yToNmf4EQ6 UVrC/XaMKj1SR1iAyZvrhLTNv59QHbZy4oebpLguvw4lfa/7Itw/MC5XRR3qz3z2ZYnv 3yzksxEXAIOG/FdEFFc7iUaTo/leuMwaiY+yiYF4Hbl0bAUy3mMOHZiVlOpsypE69bWX Usg6AycaVslx5iOdrz95vlcCiJuPMOT6ym85lmgCrzNb1yqIqyR4ZLlJcRKQ8KkZWpBj Xr4sfv9Qd6ytjhopXFcmEKOWXr7Uz5xU4OASEdx+dpXmB3JNfr6VFYFaXRcCZIRQk/xu Kqig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=MrE4d0Mr2naTAPSLnu68nLCGAS+Fzw5I3dEqt32J5GY=; b=BlAZ4UHbJgV8sDP+FdZoTUT8qJWJ43dmZmRIsXdwaAN2dJVC/8RLXmdD38DH9l1jn3 2LOAOhQjFZn9MK4CeSFxtboyV4n3soz386xn5dt2qzHYcp4P5s++qiBmzRZUPuhkUYgt y2ugSM4h47JYp6qnDdCjBXWN5feCBfPdxrpOmwvOtlHqHABpse+XoLYaQEZmqQB5FR3L fWzWrZ0gbEtpBgOoBg9RGqoFu80Ev1/gx6KuFqq9aNO1P0gLnw8bfIBuBzc+Fcu9Uyww CEu/0HuzCFoGPc5g6jdfg/J+F8cAVvQDA3KOApuquo5tc5j9wqtr06JaDcAx1zQhVdeZ 6MNA== X-Gm-Message-State: APjAAAXhVZCu/OQiwm9yVZz3Aq0Rqyc/GAxuBdTjaQ00Lbrh778Jotsh qWu8mlyzHYa0N85oYlkyl446oG0ZqMY= X-Google-Smtp-Source: APXvYqwQTwYmnkjV58d7BWjGZs5a835OCWIqxUmp01SQt/A9wgRbSMkX7TsXgwdDJiBXyTFHyl4Ksg== X-Received: by 2002:a37:83c6:: with SMTP id f189mr1513741qkd.196.1551790083669; Tue, 05 Mar 2019 04:48:03 -0800 (PST) From: Daniel Henrique Barboza To: libvir-list@redhat.com Date: Tue, 5 Mar 2019 09:46:09 -0300 Message-Id: <20190305124609.873-5-danielhb413@gmail.com> In-Reply-To: <20190305124609.873-1-danielhb413@gmail.com> References: <20190305124609.873-1-danielhb413@gmail.com> MIME-Version: 1.0 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Tue, 05 Mar 2019 12:48:04 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Tue, 05 Mar 2019 12:48:04 +0000 (UTC) for IP:'209.85.222.193' DOMAIN:'mail-qk1-f193.google.com' HELO:'mail-qk1-f193.google.com' FROM:'danielhb413@gmail.com' RCPT:'' X-RedHat-Spam-Score: 0.139 (DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_PASS) 209.85.222.193 mail-qk1-f193.google.com 209.85.222.193 mail-qk1-f193.google.com X-Scanned-By: MIMEDefang 2.84 on 10.5.110.47 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-loop: libvir-list@redhat.com Cc: Erik Skultety , Alexey Kardashevskiy , Daniel Henrique Barboza , Piotr Jaroszynski , Leonardo Augusto Guimaraes Garcia Subject: [libvirt] [PATCH v3 4/4] PPC64 support for NVIDIA V100 GPU with NVLink2 passthrough X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: quoted-printable Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.49]); Tue, 05 Mar 2019 12:48:36 +0000 (UTC) X-ZohoMail-DKIM: fail (Header signature does not verify) Content-Type: text/plain; charset="utf-8" The NVIDIA V100 GPU has an onboard RAM that is mapped into the host memory and accessible as normal RAM via an NVLink2 bus. When passed through in a guest, QEMU puts the NVIDIA RAM window in a non-contiguous area, above the PCI MMIO area that starts at 32TiB. This means that the NVIDIA RAM window starts at 64TiB and go all the way to 128TiB. This means that the guest might request a 64-bit window, for each PCI Host Bridge, that goes all the way to 128TiB. However, the NVIDIA RAM window isn't counted as regular RAM, thus this window is considered only for the allocation of the Translation and Control Entry (TCE). This memory layout differs from the existing VFIO case, requiring its own formula. This patch changes the PPC64 code of qemuDomainGetMemLockLimitBytes to: - detect if a VFIO PCI device is using NVLink2 capabilities. This is done by using the device tree inspection mechanisms that were implemented in the previous patch; - if any device is a NVIDIA GPU using a NVLink2 bus, passthroughLimit is calculated in a different way to account for the extra memory the TCE table can alloc. The 64TiB..128TiB window is more than enough to fit all possible GPUs, thus the memLimit is the same regardless of passing through 1 or multiple V100 GPUs. Signed-off-by: Daniel Henrique Barboza --- src/qemu/qemu_domain.c | 42 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 39 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 97de5793e2..c0abd6da9a 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -10551,7 +10551,9 @@ getPPC64MemLockLimitBytes(virDomainDefPtr def) unsigned long long baseLimit, memory, maxMemory; unsigned long long passthroughLimit =3D 0; size_t i, nPCIHostBridges =3D 0; - bool usesVFIO =3D false; + virPCIDeviceAddressPtr pciAddr; + char *pciAddrStr =3D NULL; + bool usesVFIO =3D false, nvlink2Capable =3D false; =20 for (i =3D 0; i < def->ncontrollers; i++) { virDomainControllerDefPtr cont =3D def->controllers[i]; @@ -10569,7 +10571,15 @@ getPPC64MemLockLimitBytes(virDomainDefPtr def) dev->source.subsys.type =3D=3D VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_= PCI && dev->source.subsys.u.pci.backend =3D=3D VIR_DOMAIN_HOSTDEV_PCI= _BACKEND_VFIO) { usesVFIO =3D true; - break; + + pciAddr =3D &dev->source.subsys.u.pci.addr; + if (virPCIDeviceAddressIsValid(pciAddr, false)) { + pciAddrStr =3D virPCIDeviceAddressAsString(pciAddr); + if (device_is_nvlink2_capable(pciAddrStr)) { + nvlink2Capable =3D true; + break; + } + } } } =20 @@ -10596,6 +10606,32 @@ getPPC64MemLockLimitBytes(virDomainDefPtr def) 4096 * nPCIHostBridges + 8192; =20 + /* NVLink2 support in QEMU is a special case of the passthrough + * mechanics explained in the usesVFIO case below. The GPU RAM + * is placed with a gap after maxMemory. The current QEMU + * implementation puts the NVIDIA RAM above the PCI MMIO, which + * starts at 32TiB and is the MMIO reserved for the guest main RAM. + * + * This window ends at 64TiB, and this is where the GPUs are being + * placed. The next available window size is at 128TiB, and + * 64TiB..128TiB will fit all possible NVIDIA GPUs. + * + * The same assumption as the most common case applies here: + * the guest will request a 64-bit DMA window, per PHB, that is + * big enough to map all its RAM, which is now at 128TiB due + * to the GPUs. + * + * Note that the NVIDIA RAM window must be accounted for the TCE + * table size, but *not* for the main RAM (maxMemory). This gives + * us the following passthroughLimit for the NVLink2 case: + * + * passthroughLimit =3D maxMemory + + * 128TiB/512KiB * #PHBs + 8 MiB */ + if (nvlink2Capable) + passthroughLimit =3D maxMemory + + 128 * (1ULL<<30) / 512 * nPCIHostBridges + + 8192; + /* passthroughLimit :=3D max( 2 GiB * #PHBs, (c) * memory (d) * + memory * 1/512 * #PHBs + 8 MiB ) (e) @@ -10615,7 +10651,7 @@ getPPC64MemLockLimitBytes(virDomainDefPtr def) * kiB pages, less still if the guest is mapped with hugepages (unlike * the default 32-bit DMA window, DDW windows can use large IOMMU * pages). 8 MiB is for second and further level overheads, like (b) */ - if (usesVFIO) + else if (usesVFIO) passthroughLimit =3D MAX(2 * 1024 * 1024 * nPCIHostBridges, memory + memory / 512 * nPCIHostBridges + 8192); --=20 2.20.1 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list