From nobody Thu May 2 08:58:33 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; envelope-from=libvir-list-bounces@redhat.com; helo=mx1.redhat.com; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=fail(p=none dis=none) header.from=gmail.com Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by mx.zohomail.com with SMTPS id 1551790113381670.6464616614956; Tue, 5 Mar 2019 04:48:33 -0800 (PST) Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9CBB630821B2; Tue, 5 Mar 2019 12:48:31 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 75A831001DF0; Tue, 5 Mar 2019 12:48:31 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 33D5B3FB13; Tue, 5 Mar 2019 12:48:31 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id x25Cm2ed027785 for ; Tue, 5 Mar 2019 07:48:02 -0500 Received: by smtp.corp.redhat.com (Postfix) id A0B0360139; Tue, 5 Mar 2019 12:48:02 +0000 (UTC) Received: from mx1.redhat.com (ext-mx17.extmail.prod.ext.phx2.redhat.com [10.5.110.46]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 74F4F600D7; Tue, 5 Mar 2019 12:47:58 +0000 (UTC) Received: from mail-qt1-f196.google.com (mail-qt1-f196.google.com [209.85.160.196]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 31C9830EBE71; Tue, 5 Mar 2019 12:47:57 +0000 (UTC) Received: by mail-qt1-f196.google.com with SMTP id j36so8771202qta.7; Tue, 05 Mar 2019 04:47:57 -0800 (PST) Received: from rekt.ibmmodules.com ([179.228.153.14]) by smtp.gmail.com with ESMTPSA id y49sm2019848qtk.23.2019.03.05.04.47.54 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 05 Mar 2019 04:47:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=AdqAyVKDYASw6vAoQUni4p9+nMHAD3gycKiQWtWsEz8=; b=ZJiwjRxXPhxh4mK1ORnn7K0r0cUN0DI1PdZsafTo+5pXwXr0kkkt8RCvl2zcbKbOcy Q0nDC6LkyeQZXp0rRwMZXKBHuU7iUAwRwni0RUoNgswDh70m8T8sVGrjsVVE4TTltu36 8eCq3FwckjRkWX2fjdvfwr4AbDGdSsrWE0I/3QwlD90GiclhyoH9Ea9i/tdOZtuVPKGm Bdz+WU1FHGkC6iC/Ua5aDtokdmcvIH4GJnMSwYYlc78+vzIXo0eJCa5rUBlGMwqDrIIY WjUnH/z6fxe0yXWdWPze1H3Jq/hpBWpEj4DT/HNNN67o64XQflugz6q6UY9xSu05n1dr VCCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=AdqAyVKDYASw6vAoQUni4p9+nMHAD3gycKiQWtWsEz8=; b=p3nH9+I8hD+AB0xweE2SF1MdaZdsXh+I9rw7eAEId3EhMEJdRIzawO3czVQPTuS5Gy FeYmR4NCjq61GjTUxwhL3JxPeIhJWOJuFBFpwEPTSw2jIxtw6qYH/klyDugWRDA9Uhh/ Yxt7/9Hkxblfi308dsCqNyKosSqmB2tH0wa6H0Cj3qWbfk2TNMq/Ic6s/+uZPmvFtRHU 0lmdVVjHqlaPCu7GaQ35zI69Nevbo+tmah/Er2zF06Kfi5JeF9oi79koXQG8+Ie+jWL8 AyUKzNAxZuyVaaIohMg2deU4rQk2pYkl9QQvE5mXoeesWGNuQnaA2nEaUkcmgA9gLI3s Mlgw== X-Gm-Message-State: APjAAAVmioLHcTN8CTDDK2XoYS1L4eoPHzpZutKRvWkvsKxyNWAvFTuE cDISeLfYOHOFDo6uY5z8ae5EBB+x4hw= X-Google-Smtp-Source: APXvYqxzU+EY4D3q+PSc/2BxMBzH/8tnpms3eBVggXREUUCzeBo/YYyA6JbkIoh536oDgApNF/GgrQ== X-Received: by 2002:a0c:87d0:: with SMTP id 16mr1879861qvk.166.1551790076418; Tue, 05 Mar 2019 04:47:56 -0800 (PST) From: Daniel Henrique Barboza To: libvir-list@redhat.com Date: Tue, 5 Mar 2019 09:46:06 -0300 Message-Id: <20190305124609.873-2-danielhb413@gmail.com> In-Reply-To: <20190305124609.873-1-danielhb413@gmail.com> References: <20190305124609.873-1-danielhb413@gmail.com> MIME-Version: 1.0 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.46]); Tue, 05 Mar 2019 12:47:57 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.46]); Tue, 05 Mar 2019 12:47:57 +0000 (UTC) for IP:'209.85.160.196' DOMAIN:'mail-qt1-f196.google.com' HELO:'mail-qt1-f196.google.com' FROM:'danielhb413@gmail.com' RCPT:'' X-RedHat-Spam-Score: 0.142 (DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_PASS) 209.85.160.196 mail-qt1-f196.google.com 209.85.160.196 mail-qt1-f196.google.com X-Scanned-By: MIMEDefang 2.84 on 10.5.110.46 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-loop: libvir-list@redhat.com Cc: Erik Skultety , Alexey Kardashevskiy , Daniel Henrique Barboza , Piotr Jaroszynski , Leonardo Augusto Guimaraes Garcia Subject: [libvirt] [PATCH v3 1/4] qemu_domain: simplify non-VFIO memLockLimit calc for PPC64 X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: quoted-printable Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Tue, 05 Mar 2019 12:48:32 +0000 (UTC) X-ZohoMail-DKIM: fail (Header signature does not verify) Content-Type: text/plain; charset="utf-8" passthroughLimit is being calculated even if usesVFIO is false. After that, a if/else conditional is used to check if we're going to sum it up with baseLimit. This patch initializes passthroughLimit to zero and always return memKB =3D baseLimit + passthroughLimit. The conditional is then used to calculate passthroughLimit if usesVFIO is true. This results in some cycles spared for the usesVFIO=3Dfalse scenario, but the real motivation is to make the code simpler to add an alternative passthroughLimit formula for NVLink2 passthrough. Signed-off-by: Daniel Henrique Barboza Reviewed-by: Erik Skultety --- src/qemu/qemu_domain.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 1487268a89..099097fe62 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -10378,7 +10378,7 @@ qemuDomainGetMemLockLimitBytes(virDomainDefPtr def) unsigned long long maxMemory; unsigned long long memory; unsigned long long baseLimit; - unsigned long long passthroughLimit; + unsigned long long passthroughLimit =3D 0; size_t nPCIHostBridges =3D 0; bool usesVFIO =3D false; =20 @@ -10444,15 +10444,12 @@ qemuDomainGetMemLockLimitBytes(virDomainDefPtr de= f) * kiB pages, less still if the guest is mapped with hugepages (un= like * the default 32-bit DMA window, DDW windows can use large IOMMU * pages). 8 MiB is for second and further level overheads, like (= b) */ - passthroughLimit =3D MAX(2 * 1024 * 1024 * nPCIHostBridges, - memory + - memory / 512 * nPCIHostBridges + 8192); - if (usesVFIO) - memKB =3D baseLimit + passthroughLimit; - else - memKB =3D baseLimit; + passthroughLimit =3D MAX(2 * 1024 * 1024 * nPCIHostBridges, + memory + + memory / 512 * nPCIHostBridges + 8192); =20 + memKB =3D baseLimit + passthroughLimit; goto done; } =20 --=20 2.20.1 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list From nobody Thu May 2 08:58:33 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; envelope-from=libvir-list-bounces@redhat.com; helo=mx1.redhat.com; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=fail(p=none dis=none) header.from=gmail.com Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by mx.zohomail.com with SMTPS id 1551790093193746.3975384383325; Tue, 5 Mar 2019 04:48:13 -0800 (PST) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DE8E8308338F; Tue, 5 Mar 2019 12:48:09 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 238595D783; Tue, 5 Mar 2019 12:48:09 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 5129E3FB12; Tue, 5 Mar 2019 12:48:05 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id x25Cm0GR027779 for ; Tue, 5 Mar 2019 07:48:00 -0500 Received: by smtp.corp.redhat.com (Postfix) id BA7B761D05; Tue, 5 Mar 2019 12:48:00 +0000 (UTC) Received: from mx1.redhat.com (ext-mx07.extmail.prod.ext.phx2.redhat.com [10.5.110.31]) by smtp.corp.redhat.com (Postfix) with ESMTPS id B194261D34; Tue, 5 Mar 2019 12:48:00 +0000 (UTC) Received: from mail-qt1-f193.google.com (mail-qt1-f193.google.com [209.85.160.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 95E41C049E38; Tue, 5 Mar 2019 12:47:59 +0000 (UTC) Received: by mail-qt1-f193.google.com with SMTP id u7so8767006qtg.9; Tue, 05 Mar 2019 04:47:59 -0800 (PST) Received: from rekt.ibmmodules.com ([179.228.153.14]) by smtp.gmail.com with ESMTPSA id y49sm2019848qtk.23.2019.03.05.04.47.56 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 05 Mar 2019 04:47:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=PDfhJdEdvkVfj7RQf7TG5TvXcHpGTml5+vFO2AFPPhc=; b=TOBNEx7E3fxQug4SbsUP0oyMlSNz9BsXtfu84lVCJlulAiy2A/FYDyh1fqVwEjvzMg XzQRjGAj3mTGOiL/kpf4qoXdZFcqxOx6Vi2lsPowrFv6W2ZCG8J0vK6HQ/PwYWxD1GxX eao3iEJdt3CFuIpdxSG9xWu5sMCo/Y2oWcTQRlNGJdaEvPvvO95TBtpPVheiS0g0gImf Z0cE97nu6o4Vyn9nTDSdN85kD8/wWEul2hfRGHoBR0Ic+4vX355VwMweTQInOaI92GJw MAxwdSFkTPOLhsQ1Z4S8C2bmFxH3C4Gr6ll5qaLEgYz5xHle9jVym9j1MvlUzWR5MAVm Humw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=PDfhJdEdvkVfj7RQf7TG5TvXcHpGTml5+vFO2AFPPhc=; b=EE2bhSt5pMIYm8bf1d5qPf4alXfgq+dabzGg4Im5CPMTFd/EwwcZKb7dXRfwb+uRJV /3VqI82zRuaMDj/F3JenKHKpdseORtH/p24GETBW622w1piQKTwluW9GwXzs+NCIJDDK p3bPzOkOYflxDs7taWvjnVlRnpvFs6+YvhAYZEN2ROpcQHKlV/ZUuenJXUO13NdbW7pw mD+OvREweh6WN1HHjs58HksqieMn4RlWXdlSJR1NDtuY19xTxb39Xp3noAfZLZmWQIcy juyM9s/Cecy3cySNkeqEeliNmCavYkl3di4hNmFdz9fpztPWQIcIfS7RhFKq+JifoYWt yRRA== X-Gm-Message-State: APjAAAWwPqXek9bDHcPg5p4bGmEJZoPVzTAKsg3X9VRQAGHCtl2gjmHn wYXMNVvDEz/NdZTF/1FmEmyIcPmqiFo= X-Google-Smtp-Source: APXvYqxNXoEOT/RSp9rsLUOIaWuGinSoN0SH2IJM6DDlszdPoz0ACCmjTLFeyQuVbRAcMi6Ll5HKcw== X-Received: by 2002:a0c:9461:: with SMTP id i30mr1905503qvi.71.1551790078604; Tue, 05 Mar 2019 04:47:58 -0800 (PST) From: Daniel Henrique Barboza To: libvir-list@redhat.com Date: Tue, 5 Mar 2019 09:46:07 -0300 Message-Id: <20190305124609.873-3-danielhb413@gmail.com> In-Reply-To: <20190305124609.873-1-danielhb413@gmail.com> References: <20190305124609.873-1-danielhb413@gmail.com> MIME-Version: 1.0 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 05 Mar 2019 12:47:59 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 05 Mar 2019 12:47:59 +0000 (UTC) for IP:'209.85.160.193' DOMAIN:'mail-qt1-f193.google.com' HELO:'mail-qt1-f193.google.com' FROM:'danielhb413@gmail.com' RCPT:'' X-RedHat-Spam-Score: 0.142 (DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_PASS) 209.85.160.193 mail-qt1-f193.google.com 209.85.160.193 mail-qt1-f193.google.com X-Scanned-By: MIMEDefang 2.78 on 10.5.110.31 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-loop: libvir-list@redhat.com Cc: Erik Skultety , Alexey Kardashevskiy , Daniel Henrique Barboza , Piotr Jaroszynski , Leonardo Augusto Guimaraes Garcia Subject: [libvirt] [PATCH v3 2/4] qemu_domain: add a PPC64 memLockLimit helper X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: quoted-printable Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Tue, 05 Mar 2019 12:48:11 +0000 (UTC) X-ZohoMail-DKIM: fail (Header signature does not verify) Content-Type: text/plain; charset="utf-8" There are a lot of documentation in the comments about how PPC64 handles passthrough VFIO devices to calculate the memLockLimit. And more will be added with the PPC64 NVLink2 support code. Let's remove the PPC64 code from qemuDomainGetMemLockLimitBytes body and put it into a helper function. This will simply the flow of qemuDomainGetMemLockLimitBytes that handles all other platforms and improves the readability of PPC64 specifics. Suggested-by: Erik Skultety Signed-off-by: Daniel Henrique Barboza Reviewed-by: Erik Skultety --- src/qemu/qemu_domain.c | 169 ++++++++++++++++++++++------------------- 1 file changed, 91 insertions(+), 78 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 099097fe62..77548c224c 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -10343,6 +10343,95 @@ qemuDomainUpdateCurrentMemorySize(virDomainObjPtr = vm) } =20 =20 +/** + * getPPC64MemLockLimitBytes: + * @def: domain definition + * + * A PPC64 helper that calculates the memory locking limit in order for + * the guest to operate properly. + */ +static unsigned long long +getPPC64MemLockLimitBytes(virDomainDefPtr def) +{ + unsigned long long memKB =3D 0; + unsigned long long baseLimit, memory, maxMemory; + unsigned long long passthroughLimit =3D 0; + size_t i, nPCIHostBridges =3D 0; + bool usesVFIO =3D false; + + for (i =3D 0; i < def->ncontrollers; i++) { + virDomainControllerDefPtr cont =3D def->controllers[i]; + + if (!virDomainControllerIsPSeriesPHB(cont)) + continue; + + nPCIHostBridges++; + } + + for (i =3D 0; i < def->nhostdevs; i++) { + virDomainHostdevDefPtr dev =3D def->hostdevs[i]; + + if (dev->mode =3D=3D VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && + dev->source.subsys.type =3D=3D VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_= PCI && + dev->source.subsys.u.pci.backend =3D=3D VIR_DOMAIN_HOSTDEV_PCI= _BACKEND_VFIO) { + usesVFIO =3D true; + break; + } + } + + memory =3D virDomainDefGetMemoryTotal(def); + + if (def->mem.max_memory) + maxMemory =3D def->mem.max_memory; + else + maxMemory =3D memory; + + /* baseLimit :=3D maxMemory / 128 (a) + * + 4 MiB * #PHBs + 8 MiB (b) + * + * (a) is the hash table + * + * (b) is accounting for the 32-bit DMA window - it could be either the + * KVM accelerated TCE tables for emulated devices, or the VFIO + * userspace view. The 4 MiB per-PHB (including the default one) covers + * a 2GiB DMA window: default is 1GiB, but it's possible it'll be + * increased to help performance. The 8 MiB extra should be plenty for + * the TCE table index for any reasonable number of PHBs and several + * spapr-vlan or spapr-vscsi devices (512kB + a tiny bit each) */ + baseLimit =3D maxMemory / 128 + + 4096 * nPCIHostBridges + + 8192; + + /* passthroughLimit :=3D max( 2 GiB * #PHBs, (c) + * memory (d) + * + memory * 1/512 * #PHBs + 8 MiB ) (e) + * + * (c) is the pre-DDW VFIO DMA window accounting. We're allowing 2 GiB + * rather than 1 GiB + * + * (d) is the with-DDW (and memory pre-registration and related + * features) DMA window accounting - assuming that we only account RAM + * once, even if mapped to multiple PHBs + * + * (e) is the with-DDW userspace view and overhead for the 64-bit DMA + * window. This is based a bit on expected guest behaviour, but there + * really isn't a way to completely avoid that. We assume the guest + * requests a 64-bit DMA window (per PHB) just big enough to map all + * its RAM. 4 kiB page size gives the 1/512; it will be less with 64 + * kiB pages, less still if the guest is mapped with hugepages (unlike + * the default 32-bit DMA window, DDW windows can use large IOMMU + * pages). 8 MiB is for second and further level overheads, like (b) */ + if (usesVFIO) + passthroughLimit =3D MAX(2 * 1024 * 1024 * nPCIHostBridges, + memory + + memory / 512 * nPCIHostBridges + 8192); + + memKB =3D baseLimit + passthroughLimit; + + return memKB << 10; +} + + /** * qemuDomainGetMemLockLimitBytes: * @def: domain definition @@ -10374,84 +10463,8 @@ qemuDomainGetMemLockLimitBytes(virDomainDefPtr def) if (def->mem.locked) return VIR_DOMAIN_MEMORY_PARAM_UNLIMITED; =20 - if (ARCH_IS_PPC64(def->os.arch) && def->virtType =3D=3D VIR_DOMAIN_VIR= T_KVM) { - unsigned long long maxMemory; - unsigned long long memory; - unsigned long long baseLimit; - unsigned long long passthroughLimit =3D 0; - size_t nPCIHostBridges =3D 0; - bool usesVFIO =3D false; - - for (i =3D 0; i < def->ncontrollers; i++) { - virDomainControllerDefPtr cont =3D def->controllers[i]; - - if (!virDomainControllerIsPSeriesPHB(cont)) - continue; - - nPCIHostBridges++; - } - - for (i =3D 0; i < def->nhostdevs; i++) { - virDomainHostdevDefPtr dev =3D def->hostdevs[i]; - - if (dev->mode =3D=3D VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && - dev->source.subsys.type =3D=3D VIR_DOMAIN_HOSTDEV_SUBSYS_T= YPE_PCI && - dev->source.subsys.u.pci.backend =3D=3D VIR_DOMAIN_HOSTDEV= _PCI_BACKEND_VFIO) { - usesVFIO =3D true; - break; - } - } - - memory =3D virDomainDefGetMemoryTotal(def); - - if (def->mem.max_memory) - maxMemory =3D def->mem.max_memory; - else - maxMemory =3D memory; - - /* baseLimit :=3D maxMemory / 128 = (a) - * + 4 MiB * #PHBs + 8 MiB (= b) - * - * (a) is the hash table - * - * (b) is accounting for the 32-bit DMA window - it could be eithe= r the - * KVM accelerated TCE tables for emulated devices, or the VFIO - * userspace view. The 4 MiB per-PHB (including the default one) c= overs - * a 2GiB DMA window: default is 1GiB, but it's possible it'll be - * increased to help performance. The 8 MiB extra should be plenty= for - * the TCE table index for any reasonable number of PHBs and sever= al - * spapr-vlan or spapr-vscsi devices (512kB + a tiny bit each) */ - baseLimit =3D maxMemory / 128 + - 4096 * nPCIHostBridges + - 8192; - - /* passthroughLimit :=3D max( 2 GiB * #PHBs, = (c) - * memory (= d) - * + memory * 1/512 * #PHBs + 8 MiB ) (= e) - * - * (c) is the pre-DDW VFIO DMA window accounting. We're allowing 2= GiB - * rather than 1 GiB - * - * (d) is the with-DDW (and memory pre-registration and related - * features) DMA window accounting - assuming that we only account= RAM - * once, even if mapped to multiple PHBs - * - * (e) is the with-DDW userspace view and overhead for the 64-bit = DMA - * window. This is based a bit on expected guest behaviour, but th= ere - * really isn't a way to completely avoid that. We assume the guest - * requests a 64-bit DMA window (per PHB) just big enough to map a= ll - * its RAM. 4 kiB page size gives the 1/512; it will be less with = 64 - * kiB pages, less still if the guest is mapped with hugepages (un= like - * the default 32-bit DMA window, DDW windows can use large IOMMU - * pages). 8 MiB is for second and further level overheads, like (= b) */ - if (usesVFIO) - passthroughLimit =3D MAX(2 * 1024 * 1024 * nPCIHostBridges, - memory + - memory / 512 * nPCIHostBridges + 8192); - - memKB =3D baseLimit + passthroughLimit; - goto done; - } + if (ARCH_IS_PPC64(def->os.arch) && def->virtType =3D=3D VIR_DOMAIN_VIR= T_KVM) + return getPPC64MemLockLimitBytes(def); =20 /* For device passthrough using VFIO the guest memory and MMIO memory * regions need to be locked persistent in order to allow DMA. --=20 2.20.1 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list From nobody Thu May 2 08:58:33 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; envelope-from=libvir-list-bounces@redhat.com; helo=mx1.redhat.com; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=fail(p=none dis=none) header.from=gmail.com Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by mx.zohomail.com with SMTPS id 1551790120924822.1926903063206; Tue, 5 Mar 2019 04:48:40 -0800 (PST) Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3381130917AA; Tue, 5 Mar 2019 12:48:39 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 027576013C; Tue, 5 Mar 2019 12:48:39 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id B12733FA48; Tue, 5 Mar 2019 12:48:38 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id x25Cm79f027811 for ; Tue, 5 Mar 2019 07:48:07 -0500 Received: by smtp.corp.redhat.com (Postfix) id CE0E61949C; Tue, 5 Mar 2019 12:48:07 +0000 (UTC) Received: from mx1.redhat.com (ext-mx14.extmail.prod.ext.phx2.redhat.com [10.5.110.43]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4E9D11D9; Tue, 5 Mar 2019 12:48:03 +0000 (UTC) Received: from mail-qt1-f196.google.com (mail-qt1-f196.google.com [209.85.160.196]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 24CF73092653; Tue, 5 Mar 2019 12:48:02 +0000 (UTC) Received: by mail-qt1-f196.google.com with SMTP id p25so8787765qtb.3; Tue, 05 Mar 2019 04:48:02 -0800 (PST) Received: from rekt.ibmmodules.com ([179.228.153.14]) by smtp.gmail.com with ESMTPSA id y49sm2019848qtk.23.2019.03.05.04.47.58 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 05 Mar 2019 04:48:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=edA8POEJokrC0sphhP5/pygPJSc7EZiZd60LpGX6c/U=; b=uulhcWc6QD6pq6nc2PqBJ4geJqXw+NEKYQr019/EQyH4DMwmnZ3m4RWuV5ORB0IeXH 4LS2kXs8cORypNeYGmII8dGFubvcfR9uFnmlOhU8h3yZfBUzo8lzq4Nvu1hCEd8WKGXl Dp1dtakAXtRi8psDjFYEN97EeBB7BNsiWUHISd5Kk+TLHeoyPMeSaAGkdOGoDL6LK+rb Vbx3+idQc2EITWUrXMCRPd7Fs0eSrpZzfkUF/TLOw99cW1cfryZenlgrpOZxiLe2EyrU gMEsHXQ4pXa21SFa+EDZx4ZiHq+ZZbTUGBrX+0bIq8UqPMhNRqJgnlqppZebpbUJ4DNF 6Qlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=edA8POEJokrC0sphhP5/pygPJSc7EZiZd60LpGX6c/U=; b=EuwerFqSEyjMdLBZVch3xqu73p+NkFjkyT14HbfVIMERuiPzLnZHILofT5Hcg6Sj7V drn9LUMOYZMyS03gem5PZkGSi69jNY/VtjEz8AfgN+AuMqJAnEKX0RZZJCBANDnWEhjo xADfFh79crM1MHqMg6fWpiy6pxhcRgpCmMuPEUdHx0A/akXBaAeb3/MRkNacg7g/DpTR 1w0rKH8YE0GlT2sabOCohBvU6llBlcAJDO3MwTPatrolXmx9I0L/6kGuSZa056VGQWhx O0oDI6bOOGd6f8vdZyZ7oyLSEhnr4XnL9rfeL+zb1yW6sMocYvlwTcqw+joxHQ1xb5mU 2FRw== X-Gm-Message-State: APjAAAUqVi4JGrRfrvOkY2RUMObMMegYlVrdd2y9auW4Lo+D+pbLonvr lvmbPgrBgYrL18RZ+i5x/oI0MQKl4ck= X-Google-Smtp-Source: APXvYqzx+QKd8yS6sxO4RBytbEWkgS0Rd/Q/q/+MTvStfN2D+vkVlZcPnvtIV04G8iC8uK5xGi8LWg== X-Received: by 2002:ac8:35f8:: with SMTP id l53mr1274060qtb.15.1551790081183; Tue, 05 Mar 2019 04:48:01 -0800 (PST) From: Daniel Henrique Barboza To: libvir-list@redhat.com Date: Tue, 5 Mar 2019 09:46:08 -0300 Message-Id: <20190305124609.873-4-danielhb413@gmail.com> In-Reply-To: <20190305124609.873-1-danielhb413@gmail.com> References: <20190305124609.873-1-danielhb413@gmail.com> MIME-Version: 1.0 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.43]); Tue, 05 Mar 2019 12:48:02 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.43]); Tue, 05 Mar 2019 12:48:02 +0000 (UTC) for IP:'209.85.160.196' DOMAIN:'mail-qt1-f196.google.com' HELO:'mail-qt1-f196.google.com' FROM:'danielhb413@gmail.com' RCPT:'' X-RedHat-Spam-Score: 0.142 (DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_PASS) 209.85.160.196 mail-qt1-f196.google.com 209.85.160.196 mail-qt1-f196.google.com X-Scanned-By: MIMEDefang 2.84 on 10.5.110.43 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-loop: libvir-list@redhat.com Cc: Erik Skultety , Alexey Kardashevskiy , Daniel Henrique Barboza , Piotr Jaroszynski , Leonardo Augusto Guimaraes Garcia Subject: [libvirt] [PATCH v3 3/4] qemu_domain: NVLink2 device tree functions for PPC64 X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: quoted-printable Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.41]); Tue, 05 Mar 2019 12:48:39 +0000 (UTC) X-ZohoMail-DKIM: fail (Header signature does not verify) Content-Type: text/plain; charset="utf-8" The NVLink2 support in QEMU implements the detection of NVLink2 capable devices by verfying the attributes of the VFIO mem region QEMU allocates for the NVIDIA GPUs. To properly allocate an adequate amount of memLock, Libvirt needs this information before a QEMU instance is even created. An alternative is presented in this patch. Given a PCI device, we'll traverse the device tree at /proc/device-tree to check if the device has a NPU bridge, retrieve the node of the NVLink2 bus, find the memory-node that is related to the bus and see if it's a NVLink2 bus by inspecting its 'reg' value. This logic is contained inside the 'device_is_nvlink2_capable' function, which uses other new helper functions to navigate and fetch values from the device tree nodes. Signed-off-by: Daniel Henrique Barboza --- src/qemu/qemu_domain.c | 194 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 194 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 77548c224c..97de5793e2 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -10343,6 +10343,200 @@ qemuDomainUpdateCurrentMemorySize(virDomainObjPtr= vm) } =20 =20 +/** + * Reads a phandle file and returns the phandle value. + */ +static int +read_dt_phandle(const char* file) +{ + unsigned int buf[1]; + size_t read; + FILE *f; + + f =3D fopen(file, "r"); + if (!f) + return -1; + + read =3D fread(buf, sizeof(unsigned int), 1, f); + + if (!read) { + VIR_CLOSE(f); + return 0; + } + + VIR_CLOSE(f); + return be32toh(buf[0]); +} + + +/** + * Reads a memory reg file and returns the first 4 int values. + * + * The caller is responsible for freeing the returned array. + */ +static unsigned int * +read_dt_memory_reg(const char *file) +{ + unsigned int *buf; + size_t read, i; + FILE *f; + + f =3D fopen(file, "r"); + if (!f) + return NULL; + + if (VIR_ALLOC_N(buf, 4) < 0) + return NULL; + + read =3D fread(buf, sizeof(unsigned int), 4, f); + + if (!read && read < 4) + /* shouldn't happen */ + VIR_FREE(buf); + else for (i =3D 0; i < 4; i++) + buf[i] =3D be32toh(buf[i]); + + VIR_CLOSE(f); + return buf; +} + + +/** + * This wrapper function receives arguments to be used in a + * 'find' call to retrieve the file names that matches + * the criteria inside the /proc/device-tree dir. + * + * A 'find' call with '-iname phandle' inside /proc/device-tree + * provides more than a thousand matches. Adding '-path' to + * narrow it down further is necessary to keep the file + * listing sane. + * + * The caller is responsible to free the buffer returned by + * this function. + */ +static char * +retrieve_dt_files_pattern(const char *path_pattern, const char *file_patte= rn) +{ + virCommandPtr cmd =3D NULL; + char *output =3D NULL; + + cmd =3D virCommandNew("find"); + virCommandAddArgList(cmd, "/proc/device-tree/", "-path", path_pattern, + "-iname", file_pattern, NULL); + virCommandSetOutputBuffer(cmd, &output); + + if (virCommandRun(cmd, NULL) < 0) + VIR_FREE(output); + + virCommandFree(cmd); + return output; +} + + +/** + * Helper function that receives a listing of file names and + * calls read_dt_phandle() on each one finding for a match + * with the given phandle argument. Returns the file name if a + * match is found, NULL otherwise. + */ +static char * +find_dt_file_with_phandle(char *files, int phandle) +{ + char *line, *tmp; + int ret; + + line =3D strtok_r(files, "\n", &tmp); + do { + ret =3D read_dt_phandle(line); + if (ret =3D=3D phandle) + break; + } while ((line =3D strtok_r(NULL, "\n", &tmp)) !=3D NULL); + + return line; +} + + +/** + * This function receives a string that represents a PCI device, + * such as '0004:04:00.0', and tells if the device is NVLink2 capable. + * + * The logic goes as follows: + * + * 1 - get the phandle of a nvlink of the device, reading the 'ibm,npu' + * attribute; + * 2 - find the device tree node of the nvlink bus using the phandle + * found in (1) + * 3 - get the phandle of the memory region of the nvlink bus + * 4 - find the device tree node of the memory region using the + * phandle found in (3) + * 5 - read the 'reg' value of the memory region. If the value of + * the second 64 bit value is 0x02 0x00, the device is attached + * to a NVLink2 bus. + * + * If any of these steps fails, the function returns false. + */ +static bool +device_is_nvlink2_capable(const char *device) +{ + char *file, *files, *tmp; + unsigned int *reg; + int phandle; + + if ((virAsprintf(&file, "/sys/bus/pci/devices/%s/of_node/ibm,npu", + device)) < 0) + return false; + + /* Find phandles of nvlinks: */ + if ((phandle =3D read_dt_phandle(file)) =3D=3D -1) + return false; + + /* Find a DT node for the phandle found */ + files =3D retrieve_dt_files_pattern("*device-tree/pci*", "phandle"); + if (!files) + return false; + + if ((file =3D find_dt_file_with_phandle(files, phandle)) =3D=3D NULL) + goto fail; + + /* Find a phandle of the GPU memory region of the device. The + * file found above ends with '/phandle' - the memory region + * of the GPU ends with '/memory-region */ + tmp =3D strrchr(file, '/'); + *tmp =3D '\0'; + file =3D strcat(file, "/memory-region"); + + if ((phandle =3D read_dt_phandle(file)) =3D=3D -1) + goto fail; + + file =3D NULL; + VIR_FREE(files); + + /* Find the memory node for the phandle found above */ + files =3D retrieve_dt_files_pattern("*device-tree/memory*", "phandle"); + if (!files) + return false; + + if ((file =3D find_dt_file_with_phandle(files, phandle)) =3D=3D NULL) + goto fail; + + /* And see its size in the second 64bit value of 'reg'. First, + * the end of the file needs to be changed from '/phandle' to + * '/reg' */ + tmp =3D strrchr(file, '/'); + *tmp =3D '\0'; + file =3D strcat(file, "/reg"); + + reg =3D read_dt_memory_reg(file); + if (reg && reg[2] =3D=3D 0x20 && reg[3] =3D=3D 0x00) + return true; + + fail: + VIR_FREE(files); + VIR_FREE(reg); + return false; +} + + /** * getPPC64MemLockLimitBytes: * @def: domain definition --=20 2.20.1 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list From nobody Thu May 2 08:58:33 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; envelope-from=libvir-list-bounces@redhat.com; helo=mx1.redhat.com; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=fail(p=none dis=none) header.from=gmail.com Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by mx.zohomail.com with SMTPS id 1551790117755700.8142706016359; Tue, 5 Mar 2019 04:48:37 -0800 (PST) Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D7785308FF4F; Tue, 5 Mar 2019 12:48:35 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id AA00160141; Tue, 5 Mar 2019 12:48:35 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 6AD173FAF5; Tue, 5 Mar 2019 12:48:35 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id x25Cm7EO027806 for ; Tue, 5 Mar 2019 07:48:07 -0500 Received: by smtp.corp.redhat.com (Postfix) id 85FA61A835; Tue, 5 Mar 2019 12:48:07 +0000 (UTC) Received: from mx1.redhat.com (ext-mx18.extmail.prod.ext.phx2.redhat.com [10.5.110.47]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A2A5B3DA4; Tue, 5 Mar 2019 12:48:05 +0000 (UTC) Received: from mail-qk1-f193.google.com (mail-qk1-f193.google.com [209.85.222.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9086130821B2; Tue, 5 Mar 2019 12:48:04 +0000 (UTC) Received: by mail-qk1-f193.google.com with SMTP id y140so4674470qkb.9; Tue, 05 Mar 2019 04:48:04 -0800 (PST) Received: from rekt.ibmmodules.com ([179.228.153.14]) by smtp.gmail.com with ESMTPSA id y49sm2019848qtk.23.2019.03.05.04.48.01 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 05 Mar 2019 04:48:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=MrE4d0Mr2naTAPSLnu68nLCGAS+Fzw5I3dEqt32J5GY=; b=k8MJMHkd56uqXWUXfyG5Lciag8wYXJKDxuKuJky4608vfS4hE5feTFQ8yToNmf4EQ6 UVrC/XaMKj1SR1iAyZvrhLTNv59QHbZy4oebpLguvw4lfa/7Itw/MC5XRR3qz3z2ZYnv 3yzksxEXAIOG/FdEFFc7iUaTo/leuMwaiY+yiYF4Hbl0bAUy3mMOHZiVlOpsypE69bWX Usg6AycaVslx5iOdrz95vlcCiJuPMOT6ym85lmgCrzNb1yqIqyR4ZLlJcRKQ8KkZWpBj Xr4sfv9Qd6ytjhopXFcmEKOWXr7Uz5xU4OASEdx+dpXmB3JNfr6VFYFaXRcCZIRQk/xu Kqig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=MrE4d0Mr2naTAPSLnu68nLCGAS+Fzw5I3dEqt32J5GY=; b=BlAZ4UHbJgV8sDP+FdZoTUT8qJWJ43dmZmRIsXdwaAN2dJVC/8RLXmdD38DH9l1jn3 2LOAOhQjFZn9MK4CeSFxtboyV4n3soz386xn5dt2qzHYcp4P5s++qiBmzRZUPuhkUYgt y2ugSM4h47JYp6qnDdCjBXWN5feCBfPdxrpOmwvOtlHqHABpse+XoLYaQEZmqQB5FR3L fWzWrZ0gbEtpBgOoBg9RGqoFu80Ev1/gx6KuFqq9aNO1P0gLnw8bfIBuBzc+Fcu9Uyww CEu/0HuzCFoGPc5g6jdfg/J+F8cAVvQDA3KOApuquo5tc5j9wqtr06JaDcAx1zQhVdeZ 6MNA== X-Gm-Message-State: APjAAAXhVZCu/OQiwm9yVZz3Aq0Rqyc/GAxuBdTjaQ00Lbrh778Jotsh qWu8mlyzHYa0N85oYlkyl446oG0ZqMY= X-Google-Smtp-Source: APXvYqwQTwYmnkjV58d7BWjGZs5a835OCWIqxUmp01SQt/A9wgRbSMkX7TsXgwdDJiBXyTFHyl4Ksg== X-Received: by 2002:a37:83c6:: with SMTP id f189mr1513741qkd.196.1551790083669; Tue, 05 Mar 2019 04:48:03 -0800 (PST) From: Daniel Henrique Barboza To: libvir-list@redhat.com Date: Tue, 5 Mar 2019 09:46:09 -0300 Message-Id: <20190305124609.873-5-danielhb413@gmail.com> In-Reply-To: <20190305124609.873-1-danielhb413@gmail.com> References: <20190305124609.873-1-danielhb413@gmail.com> MIME-Version: 1.0 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Tue, 05 Mar 2019 12:48:04 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Tue, 05 Mar 2019 12:48:04 +0000 (UTC) for IP:'209.85.222.193' DOMAIN:'mail-qk1-f193.google.com' HELO:'mail-qk1-f193.google.com' FROM:'danielhb413@gmail.com' RCPT:'' X-RedHat-Spam-Score: 0.139 (DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_PASS) 209.85.222.193 mail-qk1-f193.google.com 209.85.222.193 mail-qk1-f193.google.com X-Scanned-By: MIMEDefang 2.84 on 10.5.110.47 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-loop: libvir-list@redhat.com Cc: Erik Skultety , Alexey Kardashevskiy , Daniel Henrique Barboza , Piotr Jaroszynski , Leonardo Augusto Guimaraes Garcia Subject: [libvirt] [PATCH v3 4/4] PPC64 support for NVIDIA V100 GPU with NVLink2 passthrough X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: quoted-printable Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.49]); Tue, 05 Mar 2019 12:48:36 +0000 (UTC) X-ZohoMail-DKIM: fail (Header signature does not verify) Content-Type: text/plain; charset="utf-8" The NVIDIA V100 GPU has an onboard RAM that is mapped into the host memory and accessible as normal RAM via an NVLink2 bus. When passed through in a guest, QEMU puts the NVIDIA RAM window in a non-contiguous area, above the PCI MMIO area that starts at 32TiB. This means that the NVIDIA RAM window starts at 64TiB and go all the way to 128TiB. This means that the guest might request a 64-bit window, for each PCI Host Bridge, that goes all the way to 128TiB. However, the NVIDIA RAM window isn't counted as regular RAM, thus this window is considered only for the allocation of the Translation and Control Entry (TCE). This memory layout differs from the existing VFIO case, requiring its own formula. This patch changes the PPC64 code of qemuDomainGetMemLockLimitBytes to: - detect if a VFIO PCI device is using NVLink2 capabilities. This is done by using the device tree inspection mechanisms that were implemented in the previous patch; - if any device is a NVIDIA GPU using a NVLink2 bus, passthroughLimit is calculated in a different way to account for the extra memory the TCE table can alloc. The 64TiB..128TiB window is more than enough to fit all possible GPUs, thus the memLimit is the same regardless of passing through 1 or multiple V100 GPUs. Signed-off-by: Daniel Henrique Barboza --- src/qemu/qemu_domain.c | 42 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 39 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 97de5793e2..c0abd6da9a 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -10551,7 +10551,9 @@ getPPC64MemLockLimitBytes(virDomainDefPtr def) unsigned long long baseLimit, memory, maxMemory; unsigned long long passthroughLimit =3D 0; size_t i, nPCIHostBridges =3D 0; - bool usesVFIO =3D false; + virPCIDeviceAddressPtr pciAddr; + char *pciAddrStr =3D NULL; + bool usesVFIO =3D false, nvlink2Capable =3D false; =20 for (i =3D 0; i < def->ncontrollers; i++) { virDomainControllerDefPtr cont =3D def->controllers[i]; @@ -10569,7 +10571,15 @@ getPPC64MemLockLimitBytes(virDomainDefPtr def) dev->source.subsys.type =3D=3D VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_= PCI && dev->source.subsys.u.pci.backend =3D=3D VIR_DOMAIN_HOSTDEV_PCI= _BACKEND_VFIO) { usesVFIO =3D true; - break; + + pciAddr =3D &dev->source.subsys.u.pci.addr; + if (virPCIDeviceAddressIsValid(pciAddr, false)) { + pciAddrStr =3D virPCIDeviceAddressAsString(pciAddr); + if (device_is_nvlink2_capable(pciAddrStr)) { + nvlink2Capable =3D true; + break; + } + } } } =20 @@ -10596,6 +10606,32 @@ getPPC64MemLockLimitBytes(virDomainDefPtr def) 4096 * nPCIHostBridges + 8192; =20 + /* NVLink2 support in QEMU is a special case of the passthrough + * mechanics explained in the usesVFIO case below. The GPU RAM + * is placed with a gap after maxMemory. The current QEMU + * implementation puts the NVIDIA RAM above the PCI MMIO, which + * starts at 32TiB and is the MMIO reserved for the guest main RAM. + * + * This window ends at 64TiB, and this is where the GPUs are being + * placed. The next available window size is at 128TiB, and + * 64TiB..128TiB will fit all possible NVIDIA GPUs. + * + * The same assumption as the most common case applies here: + * the guest will request a 64-bit DMA window, per PHB, that is + * big enough to map all its RAM, which is now at 128TiB due + * to the GPUs. + * + * Note that the NVIDIA RAM window must be accounted for the TCE + * table size, but *not* for the main RAM (maxMemory). This gives + * us the following passthroughLimit for the NVLink2 case: + * + * passthroughLimit =3D maxMemory + + * 128TiB/512KiB * #PHBs + 8 MiB */ + if (nvlink2Capable) + passthroughLimit =3D maxMemory + + 128 * (1ULL<<30) / 512 * nPCIHostBridges + + 8192; + /* passthroughLimit :=3D max( 2 GiB * #PHBs, (c) * memory (d) * + memory * 1/512 * #PHBs + 8 MiB ) (e) @@ -10615,7 +10651,7 @@ getPPC64MemLockLimitBytes(virDomainDefPtr def) * kiB pages, less still if the guest is mapped with hugepages (unlike * the default 32-bit DMA window, DDW windows can use large IOMMU * pages). 8 MiB is for second and further level overheads, like (b) */ - if (usesVFIO) + else if (usesVFIO) passthroughLimit =3D MAX(2 * 1024 * 1024 * nPCIHostBridges, memory + memory / 512 * nPCIHostBridges + 8192); --=20 2.20.1 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list