From nobody Sat Feb 7 09:59:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; envelope-from=libvir-list-bounces@redhat.com; helo=mx1.redhat.com; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=fail(p=none dis=none) header.from=gmail.com Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by mx.zohomail.com with SMTPS id 1552427974835297.3500527296527; Tue, 12 Mar 2019 14:59:34 -0700 (PDT) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id F205287630; Tue, 12 Mar 2019 21:59:32 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id AC8A75D706; Tue, 12 Mar 2019 21:59:32 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 5B633181A136; Tue, 12 Mar 2019 21:59:32 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id x2CLuE1A001502 for ; Tue, 12 Mar 2019 17:56:14 -0400 Received: by smtp.corp.redhat.com (Postfix) id 8EE435D71F; Tue, 12 Mar 2019 21:56:14 +0000 (UTC) Received: from mx1.redhat.com (ext-mx07.extmail.prod.ext.phx2.redhat.com [10.5.110.31]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 78EAD5D706; Tue, 12 Mar 2019 21:56:10 +0000 (UTC) Received: from mail-qt1-f193.google.com (mail-qt1-f193.google.com [209.85.160.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4A9BCC049D63; Tue, 12 Mar 2019 21:56:09 +0000 (UTC) Received: by mail-qt1-f193.google.com with SMTP id v10so4439518qtp.8; Tue, 12 Mar 2019 14:56:09 -0700 (PDT) Received: from rekt.ibmmodules.com ([2804:431:f701:aef2:4320:3acd:b307:d018]) by smtp.gmail.com with ESMTPSA id m185sm6131224qkb.48.2019.03.12.14.56.06 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 12 Mar 2019 14:56:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=pnKvZ/e4cbJ2aakuqted4RWPxp2roitQjhAxNDyYnKs=; b=YrIkq5FXa6yPt3ylhY/baivlU6RxMshGtVFl+Li9gX/pgVP1Z82nxEV5Uzfe7DhKQf PI7aYjClouDMzVoPaGm4H9cJBLad3Mztyvtsx0yPktevVkkFBavgDuw+1FmesRJbgMLL ZKPOipioLJKMzP3Koenz+Dp8t7yDtkOcsloojhkPerwbuVyoV2Az4p2PiH2k6+BosaUd Z0EoFwaL5sX7EvDg7MR52wgERc+tqt/MaC1e7UJxa2vCR90ZhcWVPwGuTYtNZtaKyl29 0ajeypuiLSf5ZaUV7EE/jYRWSpBXBE5mPzkjAUCuQ6DC4PvqO27NCAZ5a/NHE14BxrFc h4Kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=pnKvZ/e4cbJ2aakuqted4RWPxp2roitQjhAxNDyYnKs=; b=iUcfQn8AQqt4WHBHi7cVUQq5bvO76rB1ZvZqPFRkXFkDSO3YSywPgE9XaGJyQ7zICg M8bq5As0/pjKNXTixH6C4j9v5taaWaAqCTX/T9jZPOghvfgjOKSlqiY3laloEaBUQ92f 2m045ZqT1XyPnnZo1QtAwgDjOKuRKF7cx9JtnVZm//56wgKttVcHhxj0gvd0TZ2dBb9k FZI/hFkeYW9gDNLBIx0tej7zzEIS0GuBqq1Zs8VlcDbo3Agv8cQ6tLevgnCSqMrVwTzm SzpDBhz7veQVVNte3lRO6warELO3o5p6dpRuOScAlg5IKwnpcvPqAddxALZz97UeGtJi f5CA== X-Gm-Message-State: APjAAAWuPvuD3tq31ES5OS0gBMSMnz+eFHIqNLjMYm9QeMo8V666cbet OMXO7eeYZK46gAwU9Lnt2viaZd805bA= X-Google-Smtp-Source: APXvYqyZIeVzT8bq2mLB2OSYUf0ml+GwfJVOrVp/OYPNzVRgWUNw+vzzGawB8Ri+ekoSE4WJXFSDYA== X-Received: by 2002:ac8:21c9:: with SMTP id 9mr31952304qtz.78.1552427768437; Tue, 12 Mar 2019 14:56:08 -0700 (PDT) From: Daniel Henrique Barboza To: libvir-list@redhat.com Date: Tue, 12 Mar 2019 18:55:50 -0300 Message-Id: <20190312215550.28625-3-danielhb413@gmail.com> In-Reply-To: <20190312215550.28625-1-danielhb413@gmail.com> References: <20190312215550.28625-1-danielhb413@gmail.com> MIME-Version: 1.0 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 12 Mar 2019 21:56:09 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 12 Mar 2019 21:56:09 +0000 (UTC) for IP:'209.85.160.193' DOMAIN:'mail-qt1-f193.google.com' HELO:'mail-qt1-f193.google.com' FROM:'danielhb413@gmail.com' RCPT:'' X-RedHat-Spam-Score: 0.142 (DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_PASS) 209.85.160.193 mail-qt1-f193.google.com 209.85.160.193 mail-qt1-f193.google.com X-Scanned-By: MIMEDefang 2.78 on 10.5.110.31 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-loop: libvir-list@redhat.com Cc: eskultet@redhat.com, aik@ozlabs.ru, Daniel Henrique Barboza , pjaroszynski@nvidia.com, lagarcia@br.ibm.com Subject: [libvirt] [PATCH v4 2/2] PPC64 support for NVIDIA V100 GPU with NVLink2 passthrough X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: quoted-printable Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Tue, 12 Mar 2019 21:59:33 +0000 (UTC) X-ZohoMail-DKIM: fail (Header signature does not verify) Content-Type: text/plain; charset="utf-8" The NVIDIA V100 GPU has an onboard RAM that is mapped into the host memory and accessible as normal RAM via an NVLink2 bridge. When passed through in a guest, QEMU puts the NVIDIA RAM window in a non-contiguous area, above the PCI MMIO area that starts at 32TiB. This means that the NVIDIA RAM window starts at 64TiB and go all the way to 128TiB. This means that the guest might request a 64-bit window, for each PCI Host Bridge, that goes all the way to 128TiB. However, the NVIDIA RAM window isn't counted as regular RAM, thus this window is considered only for the allocation of the Translation and Control Entry (TCE). This memory layout differs from the existing VFIO case, requiring its own formula. This patch changes the PPC64 code of @qemuDomainGetMemLockLimitBytes to: - detect if we have a NVLink2 bridge being passed through to the guest. This is done by using the @ppc64VFIODeviceIsNV2Bridge function added in the previous patch. The existence of the NVLink2 bridge in the guest means that we are dealing with the NVLink2 memory layout; - if an IBM NVLink2 bridge exists, passthroughLimit is calculated in a different way to account for the extra memory the TCE table can alloc. The 64TiB..128TiB window is more than enough to fit all possible GPUs, thus the memLimit is the same regardless of passing through 1 or multiple V100 GPUs. Signed-off-by: Daniel Henrique Barboza --- src/qemu/qemu_domain.c | 42 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 40 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index dcc92d253c..6d1a69491d 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -10443,7 +10443,10 @@ getPPC64MemLockLimitBytes(virDomainDefPtr def) unsigned long long maxMemory =3D 0; unsigned long long passthroughLimit =3D 0; size_t i, nPCIHostBridges =3D 0; + virPCIDeviceAddressPtr pciAddr; + char *pciAddrStr =3D NULL; bool usesVFIO =3D false; + bool nvlink2Capable =3D false; =20 for (i =3D 0; i < def->ncontrollers; i++) { virDomainControllerDefPtr cont =3D def->controllers[i]; @@ -10461,7 +10464,16 @@ getPPC64MemLockLimitBytes(virDomainDefPtr def) dev->source.subsys.type =3D=3D VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_= PCI && dev->source.subsys.u.pci.backend =3D=3D VIR_DOMAIN_HOSTDEV_PCI= _BACKEND_VFIO) { usesVFIO =3D true; - break; + + pciAddr =3D &dev->source.subsys.u.pci.addr; + if (virPCIDeviceAddressIsValid(pciAddr, false)) { + pciAddrStr =3D virPCIDeviceAddressAsString(pciAddr); + if (ppc64VFIODeviceIsNV2Bridge(pciAddrStr)) { + nvlink2Capable =3D true; + break; + } + } + } } =20 @@ -10488,6 +10500,32 @@ getPPC64MemLockLimitBytes(virDomainDefPtr def) 4096 * nPCIHostBridges + 8192; =20 + /* NVLink2 support in QEMU is a special case of the passthrough + * mechanics explained in the usesVFIO case below. The GPU RAM + * is placed with a gap after maxMemory. The current QEMU + * implementation puts the NVIDIA RAM above the PCI MMIO, which + * starts at 32TiB and is the MMIO reserved for the guest main RAM. + * + * This window ends at 64TiB, and this is where the GPUs are being + * placed. The next available window size is at 128TiB, and + * 64TiB..128TiB will fit all possible NVIDIA GPUs. + * + * The same assumption as the most common case applies here: + * the guest will request a 64-bit DMA window, per PHB, that is + * big enough to map all its RAM, which is now at 128TiB due + * to the GPUs. + * + * Note that the NVIDIA RAM window must be accounted for the TCE + * table size, but *not* for the main RAM (maxMemory). This gives + * us the following passthroughLimit for the NVLink2 case: + * + * passthroughLimit =3D maxMemory + + * 128TiB/512KiB * #PHBs + 8 MiB */ + if (nvlink2Capable) + passthroughLimit =3D maxMemory + + 128 * (1ULL<<30) / 512 * nPCIHostBridges + + 8192; + /* passthroughLimit :=3D max( 2 GiB * #PHBs, (c) * memory (d) * + memory * 1/512 * #PHBs + 8 MiB ) (e) @@ -10507,7 +10545,7 @@ getPPC64MemLockLimitBytes(virDomainDefPtr def) * kiB pages, less still if the guest is mapped with hugepages (unlike * the default 32-bit DMA window, DDW windows can use large IOMMU * pages). 8 MiB is for second and further level overheads, like (b) */ - if (usesVFIO) + else if (usesVFIO) passthroughLimit =3D MAX(2 * 1024 * 1024 * nPCIHostBridges, memory + memory / 512 * nPCIHostBridges + 8192); --=20 2.20.1 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list