From nobody Sat May 4 08:46:14 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; envelope-from=libvir-list-bounces@redhat.com; helo=mx1.redhat.com; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=fail(p=none dis=none) header.from=gmail.com Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by mx.zohomail.com with SMTPS id 1551619451110498.5436503019175; Sun, 3 Mar 2019 05:24:11 -0800 (PST) Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 47A643C2CF7; Sun, 3 Mar 2019 13:24:09 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id E787F19C5A; Sun, 3 Mar 2019 13:24:08 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id A1B6F181A12A; Sun, 3 Mar 2019 13:24:08 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id x23DNbGL011213 for ; Sun, 3 Mar 2019 08:23:37 -0500 Received: by smtp.corp.redhat.com (Postfix) id C1D805D964; Sun, 3 Mar 2019 13:23:37 +0000 (UTC) Received: from mx1.redhat.com (ext-mx11.extmail.prod.ext.phx2.redhat.com [10.5.110.40]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C84C95D9C5; Sun, 3 Mar 2019 13:23:33 +0000 (UTC) Received: from mail-qk1-f195.google.com (mail-qk1-f195.google.com [209.85.222.195]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8700E308424B; Sun, 3 Mar 2019 13:23:32 +0000 (UTC) Received: by mail-qk1-f195.google.com with SMTP id y15so1397085qki.8; Sun, 03 Mar 2019 05:23:32 -0800 (PST) Received: from rekt.ibmmodules.com ([189.51.3.229]) by smtp.gmail.com with ESMTPSA id y17sm2128477qtc.33.2019.03.03.05.23.30 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 03 Mar 2019 05:23:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ByIk+npVxMAOzurCek/8k3VymnV6vi0sj93OugFhGVQ=; b=N+G0ER398HN+yQ/NQP1Iwvg9fVVoxYc96T/jfTPX952RZQsUGm9XWjJsUutCQol2kQ +A9xP4dT/qb14VAN2hNlpowzBQYZtqajXyKMJxjyNjWwNbMqNzwRDJDTrv4FMSbAdkvQ jR7xUiBEha4Y4XB2VgfBEcJpaMYUJBMiP/IBXsEYnAeuxSKrABNivkwgtCN0OUyZWcdb ItabpYr7v8L5ldAr2a5uMoSgafn9lWr4QpvtFgZYhdHezSTsuJqzYVhpzFZwl4x1MpWh 3FNUH2U+XChHjyYVZ1PEALn66aTK3alcKePXUWAggjjMbvhlckiDP/srSsq2+dBP5w8o uGDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ByIk+npVxMAOzurCek/8k3VymnV6vi0sj93OugFhGVQ=; b=GNcJtuB7n71ZZ/sHdgDfZKTK4ccYi8dmL4q0wnEwMWA0Tm870+o4mfO4g1fcNQqDr9 IDXQz6jn3IldSHwbib4/e5xdYc2jGOo48dQFZOwNAohC4OQPEjaqM+W/Xxo9gYzZz5MP MCFs2iqrkyuDVgY4OhgSeMeSAs3ipojQvVh/bCX1/LD6AJU5MtS4liCPSuVXWY/khzzo PntwZjkL+JC1j3G1SLG1NMCwIYDqMlOH1HjU3XA+5X0Qh3nrzuE4P9YDaiXJUr+5B+jT ir73aOqLzH+LoLc3O4q++9/kuwQttfqVdM6+8yhXe8niQiS7TUYyJxezzVzjURXVP63+ 3dvA== X-Gm-Message-State: APjAAAX93+n6vZ8VOoVbvlc2/zo4rOg8MlyTgQbnlU18mYSlme8W77aC RkOTgsC80EtPtaVdoom8Fhd67/9F3hE= X-Google-Smtp-Source: APXvYqxeBgLTU5r5R76gW//vmGzLk9NzC9T0QuwYt2+JLetlIQkNa3b7VNY+drLjno0xSnGdj+OC1Q== X-Received: by 2002:a37:59c7:: with SMTP id n190mr10154316qkb.142.1551619411758; Sun, 03 Mar 2019 05:23:31 -0800 (PST) From: Daniel Henrique Barboza To: libvir-list@redhat.com Date: Sun, 3 Mar 2019 10:23:12 -0300 Message-Id: <20190303132314.27814-2-danielhb413@gmail.com> In-Reply-To: <20190303132314.27814-1-danielhb413@gmail.com> References: <20190303132314.27814-1-danielhb413@gmail.com> MIME-Version: 1.0 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Sun, 03 Mar 2019 13:23:32 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Sun, 03 Mar 2019 13:23:32 +0000 (UTC) for IP:'209.85.222.195' DOMAIN:'mail-qk1-f195.google.com' HELO:'mail-qk1-f195.google.com' FROM:'danielhb413@gmail.com' RCPT:'' X-RedHat-Spam-Score: 0.139 (DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_PASS) 209.85.222.195 mail-qk1-f195.google.com 209.85.222.195 mail-qk1-f195.google.com X-Scanned-By: MIMEDefang 2.84 on 10.5.110.40 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-loop: libvir-list@redhat.com Cc: aik@ozlabs.ru, Daniel Henrique Barboza Subject: [libvirt] [PATCH v2 1/3] qemu_domain: simplify non-VFIO memLockLimit calc for PPC64 X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: quoted-printable Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Sun, 03 Mar 2019 13:24:09 +0000 (UTC) X-ZohoMail-DKIM: fail (Header signature does not verify) Content-Type: text/plain; charset="utf-8" passthroughLimit is being calculated even if usesVFIO is false. After that, a if/else conditional is used to check if we're going to sum it up with baseLimit. This patch initializes passthroughLimit to zero and always return memKB =3D baseLimit + passthroughLimit. The conditional is then used to calculate passthroughLimit if usesVFIO is true. This results in some cycles spared for the usesVFIO=3Dfalse scenario, but the real motivation is to make the code simpler to add an alternative passthroughLimit formula for NVLink2 passthrough. Signed-off-by: Daniel Henrique Barboza --- src/qemu/qemu_domain.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 59fe1eb401..55578f3d19 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -10366,7 +10366,7 @@ qemuDomainGetMemLockLimitBytes(virDomainDefPtr def) unsigned long long maxMemory; unsigned long long memory; unsigned long long baseLimit; - unsigned long long passthroughLimit; + unsigned long long passthroughLimit =3D 0; size_t nPCIHostBridges =3D 0; bool usesVFIO =3D false; =20 @@ -10432,15 +10432,12 @@ qemuDomainGetMemLockLimitBytes(virDomainDefPtr de= f) * kiB pages, less still if the guest is mapped with hugepages (un= like * the default 32-bit DMA window, DDW windows can use large IOMMU * pages). 8 MiB is for second and further level overheads, like (= b) */ - passthroughLimit =3D MAX(2 * 1024 * 1024 * nPCIHostBridges, - memory + - memory / 512 * nPCIHostBridges + 8192); - if (usesVFIO) - memKB =3D baseLimit + passthroughLimit; - else - memKB =3D baseLimit; + passthroughLimit =3D MAX(2 * 1024 * 1024 * nPCIHostBridges, + memory + + memory / 512 * nPCIHostBridges + 8192); =20 + memKB =3D baseLimit + passthroughLimit; goto done; } =20 --=20 2.20.1 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list From nobody Sat May 4 08:46:14 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; envelope-from=libvir-list-bounces@redhat.com; helo=mx1.redhat.com; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=fail(p=none dis=none) header.from=gmail.com Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by mx.zohomail.com with SMTPS id 1551619428184519.0329893950541; Sun, 3 Mar 2019 05:23:48 -0800 (PST) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2374A3082B21; Sun, 3 Mar 2019 13:23:46 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id E33045D70A; Sun, 3 Mar 2019 13:23:44 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 4720F181A048; Sun, 3 Mar 2019 13:23:43 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id x23DNdEg011219 for ; Sun, 3 Mar 2019 08:23:39 -0500 Received: by smtp.corp.redhat.com (Postfix) id 751195D71A; Sun, 3 Mar 2019 13:23:39 +0000 (UTC) Received: from mx1.redhat.com (ext-mx19.extmail.prod.ext.phx2.redhat.com [10.5.110.48]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 6B8245D70A; Sun, 3 Mar 2019 13:23:35 +0000 (UTC) Received: from mail-qk1-f193.google.com (mail-qk1-f193.google.com [209.85.222.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6F4D9307D8BE; Sun, 3 Mar 2019 13:23:34 +0000 (UTC) Received: by mail-qk1-f193.google.com with SMTP id c2so1400262qkb.3; Sun, 03 Mar 2019 05:23:34 -0800 (PST) Received: from rekt.ibmmodules.com ([189.51.3.229]) by smtp.gmail.com with ESMTPSA id y17sm2128477qtc.33.2019.03.03.05.23.31 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 03 Mar 2019 05:23:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ARaiwXcWGzcd/xxDJ/SZVV0k+YFgY4jC0wJcMpdSG7o=; b=Mnsumi9jWf2Ys5I+XMHbP2vRLbCpxaSI+7lJnjrdsezG9RJtZK6kj3R5vOytzGwHsm kDcqP5lGE3FsPzVh4DMbnxiBSfEDlLJM67H/p1G9nBQt9TCtSvR6V/OktfJ+LSb79mKA cr20UUKrtIrmRzVhXUEOTYA7ZpaiLCjYaxBX7TICFylKVPjp6QeaWYjSiAMoYJDT/DGo CasQFrwhGw6HxU6eQQukZnEZSS/2jEY4G6DhhZBgRoGtqE6m+pwFBw0tB3tcKoWqddXh fzqa6brB+PdTaWpEZnFmhTzEoi9vdEISK5ZRGE2w8vxnyz9gyoPNpakOVzNvzmEgysnd 6rbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ARaiwXcWGzcd/xxDJ/SZVV0k+YFgY4jC0wJcMpdSG7o=; b=lHehAAGUNJOLwdw7ZpUdcH9Ocvxq2ujsTQl5z4e6cbUEhNM7x0es4MowmAhKlw3/MA wUpI5uz/59Tkt2+qLiOODqIbhxrSC+veqKJyrTbmlAcmRHBGe4Pq9y3s9aNlwU07H+yf ynj5FV+o+aBlFIARvO7x/rmdpL64PM4ibsgSucljJFreLsZyN9N8dKKN/TpJisFbm4Nx aCLtraehdxpXSSB8IoKBpBE+jWY8Q+/KfUJPkyR13oM+wB6ciNYCJkY9lxTETGg6nndA qtuIno18frY5wKV89fW8v0vSmjSvb+5Hms58PedujESvOtbfWzijb5czbJsam1TEYfah 4F/w== X-Gm-Message-State: APjAAAX8+Cxc/l5ZYMKnOMvTpdm9SOnw7NFlMGKVNEmZw9zvj0cxlDVJ SrSIEXRM4gjfYXci/dIKiKGn0fheJE8= X-Google-Smtp-Source: APXvYqz79chi7wCifCsp12CIVIftxgDZmxXmmZ6tF2WtqnLsbdL7kGeua9leQf9GRmUvAqTtCJt16g== X-Received: by 2002:a37:c38b:: with SMTP id r11mr10538775qkl.159.1551619413483; Sun, 03 Mar 2019 05:23:33 -0800 (PST) From: Daniel Henrique Barboza To: libvir-list@redhat.com Date: Sun, 3 Mar 2019 10:23:13 -0300 Message-Id: <20190303132314.27814-3-danielhb413@gmail.com> In-Reply-To: <20190303132314.27814-1-danielhb413@gmail.com> References: <20190303132314.27814-1-danielhb413@gmail.com> MIME-Version: 1.0 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Sun, 03 Mar 2019 13:23:34 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Sun, 03 Mar 2019 13:23:34 +0000 (UTC) for IP:'209.85.222.193' DOMAIN:'mail-qk1-f193.google.com' HELO:'mail-qk1-f193.google.com' FROM:'danielhb413@gmail.com' RCPT:'' X-RedHat-Spam-Score: 0.139 (DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_PASS) 209.85.222.193 mail-qk1-f193.google.com 209.85.222.193 mail-qk1-f193.google.com X-Scanned-By: MIMEDefang 2.84 on 10.5.110.48 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-loop: libvir-list@redhat.com Cc: aik@ozlabs.ru, Daniel Henrique Barboza Subject: [libvirt] [PATCH v2 2/3] qemu_domain: NVLink2 device tree functions for PPC64 X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: quoted-printable Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Sun, 03 Mar 2019 13:23:46 +0000 (UTC) X-ZohoMail-DKIM: fail (Header signature does not verify) Content-Type: text/plain; charset="utf-8" The NVLink2 support in QEMU implements the detection of NVLink2 capable devices by verfying the attributes of the VFIO mem region QEMU allocates for the NVIDIA GPUs. To properly allocate an adequate amount of memLock, Libvirt needs this information before a QEMU instance is even created. An alternative is presented in this patch. Given a PCI device, we'll traverse the device tree at /proc/device-tree to check if the device has a NPU bridge, retrieve the node of the NVLink2 bus, find the memory-node that is related to the bus and see if it's a NVLink2 bus by inspecting its 'reg' value. This logic is contained inside the 'device_is_nvlink2_capable' function, which uses other new helper functions to navigate and fetch values from the device tree nodes. Signed-off-by: Daniel Henrique Barboza --- src/qemu/qemu_domain.c | 188 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 188 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 55578f3d19..76e1e4b161 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -10331,6 +10331,194 @@ qemuDomainUpdateCurrentMemorySize(virDomainObjPtr= vm) } =20 =20 +/** + * Reads a phandle file and returns the phandle value. + */ +static int read_dt_phandle(const char* file) +{ + unsigned int buf[1]; + size_t read; + FILE *f; + + f =3D fopen(file, "r"); + if (!f) + return -1; + + read =3D fread(buf, sizeof(unsigned int), 1, f); + + if (!read) { + fclose(f); + return 0; + } + + fclose(f); + return be32toh(buf[0]); +} + + +/** + * Reads a memory reg file and returns the first 4 int values. + * + * The caller is responsible for freeing the returned array. + */ +static unsigned int *read_dt_memory_reg(const char *file) +{ + unsigned int *buf; + size_t read, i; + FILE *f; + + f =3D fopen(file, "r"); + if (!f) + return NULL; + + buf =3D calloc(4, sizeof(unsigned int)); + read =3D fread(buf, sizeof(unsigned int), 4, f); + + if (!read && read < 4) + /* shouldn't happen */ + VIR_FREE(buf); + else for (i =3D 0; i < 4; i++) + buf[i] =3D be32toh(buf[i]); + + fclose(f); + return buf; +} + + +/** + * This wrapper function receives arguments to be used in a + * 'find' call to retrieve the file names that matches + * the criteria inside the /proc/device-tree dir. + * + * A 'find' call with '-iname phandle' inside /proc/device-tree + * provides more than a thousand matches. Adding '-path' to + * narrow it down further is necessary to keep the file + * listing sane. + * + * The caller is responsible to free the buffer returned by + * this function. + */ +static char *retrieve_dt_files_pattern(const char *path_pattern, + const char *file_pattern) +{ + virCommandPtr cmd =3D NULL; + char *output =3D NULL; + + cmd =3D virCommandNew("find"); + virCommandAddArgList(cmd, "/proc/device-tree/","-path", path_pattern, + "-iname", file_pattern, NULL); + virCommandSetOutputBuffer(cmd, &output); + + if (virCommandRun(cmd, NULL) < 0) + VIR_FREE(output); + + virCommandFree(cmd); + return output; +} + + +/** + * Helper function that receives a listing of file names and + * calls read_dt_phandle() on each one finding for a match + * with the given phandle argument. Returns the file name if a + * match is found, NULL otherwise. + */ +static char *find_dt_file_with_phandle(char *files, int phandle) +{ + char *line, *tmp; + int ret; + + line =3D strtok_r(files, "\n", &tmp); + do { + ret =3D read_dt_phandle(line); + if (ret =3D=3D phandle) + break; + } while ((line =3D strtok_r(NULL, "\n", &tmp)) !=3D NULL); + + return line; +} + + +/** + * This function receives a string that represents a PCI device, + * such as '0004:04:00.0', and tells if the device is NVLink2 capable. + * + * The logic goes as follows: + * + * 1 - get the phandle of a nvlink of the device, reading the 'ibm,npu' + * attribute; + * 2 - find the device tree node of the nvlink bus using the phandle + * found in (1) + * 3 - get the phandle of the memory region of the nvlink bus + * 4 - find the device tree node of the memory region using the + * phandle found in (3) + * 5 - read the 'reg' value of the memory region. If the value of + * the second 64 bit value is 0x02 0x00, the device is attached + * to a NVLink2 bus. + * + * If any of these steps fails, the function returns false. + */ +static bool device_is_nvlink2_capable(const char *device) +{ + char *file, *files, *tmp; + unsigned int *reg; + int phandle; + + if ((virAsprintf(&file, "/sys/bus/pci/devices/%s/of_node/ibm,npu", + device)) < 0) + return false; + + /* Find phandles of nvlinks: */ + if ((phandle =3D read_dt_phandle(file)) =3D=3D -1) + return false; + + /* Find a DT node for the phandle found */ + files =3D retrieve_dt_files_pattern("*device-tree/pci*", "phandle"); + if (!files) + return false; + + if ((file =3D find_dt_file_with_phandle(files, phandle)) =3D=3D NULL) + goto fail; + + /* Find a phandle of the GPU memory region of the device. The + * file found above ends with '/phandle' - the memory region + * of the GPU ends with '/memory-region */ + tmp =3D strrchr(file, '/'); + *tmp =3D '\0'; + file =3D strcat(file, "/memory-region"); + + if ((phandle =3D read_dt_phandle(file)) =3D=3D -1) + goto fail; + + file =3D NULL; + VIR_FREE(files); + + /* Find the memory node for the phandle found above */ + files =3D retrieve_dt_files_pattern("*device-tree/memory*", "phandle"); + if (!files) + return false; + + if ((file =3D find_dt_file_with_phandle(files, phandle)) =3D=3D NULL) + goto fail; + + /* And see its size in the second 64bit value of 'reg'. First, + * the end of the file needs to be changed from '/phandle' to + * '/reg' */ + tmp =3D strrchr(file, '/'); + *tmp =3D '\0'; + file =3D strcat(file, "/reg"); + + reg =3D read_dt_memory_reg(file); + if (reg && reg[2] =3D=3D 0x20 && reg[3] =3D=3D 0x00) + return true; + +fail: + VIR_FREE(files); + VIR_FREE(reg); + return false; +} + + /** * qemuDomainGetMemLockLimitBytes: * @def: domain definition --=20 2.20.1 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list From nobody Sat May 4 08:46:14 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; envelope-from=libvir-list-bounces@redhat.com; helo=mx1.redhat.com; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=fail(p=none dis=none) header.from=gmail.com Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by mx.zohomail.com with SMTPS id 1551619428134370.25734534607363; Sun, 3 Mar 2019 05:23:48 -0800 (PST) Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id F096B308213C; Sun, 3 Mar 2019 13:23:45 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2CF1219C58; Sun, 3 Mar 2019 13:23:44 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 3E86F181A010; Sun, 3 Mar 2019 13:23:40 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id x23DNb6R011208 for ; Sun, 3 Mar 2019 08:23:37 -0500 Received: by smtp.corp.redhat.com (Postfix) id 69D3119C58; Sun, 3 Mar 2019 13:23:37 +0000 (UTC) Received: from mx1.redhat.com (ext-mx04.extmail.prod.ext.phx2.redhat.com [10.5.110.28]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5FA1019C56; Sun, 3 Mar 2019 13:23:37 +0000 (UTC) Received: from mail-qk1-f196.google.com (mail-qk1-f196.google.com [209.85.222.196]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5D51D85528; Sun, 3 Mar 2019 13:23:36 +0000 (UTC) Received: by mail-qk1-f196.google.com with SMTP id f196so1390882qke.10; Sun, 03 Mar 2019 05:23:36 -0800 (PST) Received: from rekt.ibmmodules.com ([189.51.3.229]) by smtp.gmail.com with ESMTPSA id y17sm2128477qtc.33.2019.03.03.05.23.33 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 03 Mar 2019 05:23:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=i8ivq8Nw2wLPQMN+wU8tuYIcuxpnWczPxeOMISItfXI=; b=Q1h51OZSHMj5B+eweAYs0CZVzCGmfGDyP6P8YW2SnjN7DhxdtjhVbo5OBsIjhC+SCQ FdItiAJHmbOd6COeeedAmM6AlV4+kT/Gj1qpBv+IiO3sbNFbZpEv51aK9Ld92Xf9VwnE RLVuBlVUkvQVez0s2ls7V7vsqkqGAtSyGb1q4kLuoZ5f6tCI74ybEv7SuXR3M1T++soB TtsD4oQHCDKjWxbX92HekVLgQ/x6N2KrpGX9/MVNQ8LY3GCE9vu9k7yNj2rVTFY82yzk wOorRJoGDM/Bp0At7MLyzvfBAhz+XXsrqBevL9LLT/xvLQYYumK9eOKLqBfR6xJNspHl wFkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=i8ivq8Nw2wLPQMN+wU8tuYIcuxpnWczPxeOMISItfXI=; b=g+soycZHeyDHLaECmq6i59MjE9hZNObo2yTKhObz+eZcQsCGAyT0NiKRvc6vTQmIgb QO3Ws94VF3xrkqXO+l+sFxqBn1ilNVLnIDtw8bmJn4Kwml1xaLIGpnt9NOnJX4uXCc36 h1SA59LgzP455Ta5OOC0ZtTR1fvbnKTEbE2nT4TxXZ7JTBnXj+duv5YmGnZuSuBbHgef HSQgjqk6H7BTMO/aD8kmXSKwL3Z5OjGGur101pVB9VB09b8nxjKjwCifFt6ep5GuRicA ba+RTcZJAG7jUOj5FRGZNnEyEg2i2vTnREljT6R0EmSIydNGYfXt9H0Z/Y1SSVnFSUlk PFgQ== X-Gm-Message-State: APjAAAW+XASXuGHdcCfwh30Xo5GyXUFE/VNjDgM/mbl7lQ3HFyJRD0rI ghx1NBEYtv7MY/dYwc38fsmZYb+/rWw= X-Google-Smtp-Source: APXvYqzj+hobz9ryB+LrvDQY4spXL0Ks2DuCHzllAYbNGmBhnKcOXMP6BDjKv2jbezU2846aJJlmwg== X-Received: by 2002:a37:f506:: with SMTP id l6mr10770439qkk.110.1551619415490; Sun, 03 Mar 2019 05:23:35 -0800 (PST) From: Daniel Henrique Barboza To: libvir-list@redhat.com Date: Sun, 3 Mar 2019 10:23:14 -0300 Message-Id: <20190303132314.27814-4-danielhb413@gmail.com> In-Reply-To: <20190303132314.27814-1-danielhb413@gmail.com> References: <20190303132314.27814-1-danielhb413@gmail.com> MIME-Version: 1.0 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Sun, 03 Mar 2019 13:23:36 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Sun, 03 Mar 2019 13:23:36 +0000 (UTC) for IP:'209.85.222.196' DOMAIN:'mail-qk1-f196.google.com' HELO:'mail-qk1-f196.google.com' FROM:'danielhb413@gmail.com' RCPT:'' X-RedHat-Spam-Score: 0.139 (DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_PASS) 209.85.222.196 mail-qk1-f196.google.com 209.85.222.196 mail-qk1-f196.google.com X-Scanned-By: MIMEDefang 2.78 on 10.5.110.28 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-loop: libvir-list@redhat.com Cc: aik@ozlabs.ru, Daniel Henrique Barboza Subject: [libvirt] [PATCH v2 3/3] PPC64 support for NVIDIA V100 GPU with NVLink2 passthrough X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: quoted-printable Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Sun, 03 Mar 2019 13:23:46 +0000 (UTC) X-ZohoMail-DKIM: fail (Header signature does not verify) Content-Type: text/plain; charset="utf-8" The NVIDIA V100 GPU has an onboard RAM that is mapped into the host memory and accessible as normal RAM via an NVLink2 bus. When passed through in a guest, QEMU puts the NVIDIA RAM window in a non-contiguous area, above the PCI MMIO area that starts at 32TiB. This means that the NVIDIA RAM window starts at 64TiB and go all the way to 128TiB. This means that the guest might request a 64-bit window, for each PCI Host Bridge, that goes all the way to 128TiB. However, the NVIDIA RAM window isn't counted as regular RAM, thus this window is considered only for the allocation of the Translation and Control Entry (TCE). This memory layout differs from the existing VFIO case, requiring its own formula. This patch changes the PPC64 code of qemuDomainGetMemLockLimitBytes to: - detect if a VFIO PCI device is using NVLink2 capabilities. This is done by using the device tree inspection mechanisms that were implemented in the previous patch; - if any device is a NVIDIA GPU using a NVLink2 bus, passthroughLimit is calculated in a different way to account for the extra memory the TCE table can alloc. The 64TiB..128TiB window is more than enough to fit all possible GPUs, thus the memLimit is the same regardless of passing through 1 or multiple V100 GPUs. Signed-off-by: Daniel Henrique Barboza --- src/qemu/qemu_domain.c | 44 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 41 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 76e1e4b161..56b45fcfb7 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -10556,7 +10556,9 @@ qemuDomainGetMemLockLimitBytes(virDomainDefPtr def) unsigned long long baseLimit; unsigned long long passthroughLimit =3D 0; size_t nPCIHostBridges =3D 0; - bool usesVFIO =3D false; + virPCIDeviceAddressPtr pciAddr; + char *pciAddrStr =3D NULL; + bool usesVFIO =3D false, nvlink2Capable =3D false; =20 for (i =3D 0; i < def->ncontrollers; i++) { virDomainControllerDefPtr cont =3D def->controllers[i]; @@ -10573,8 +10575,18 @@ qemuDomainGetMemLockLimitBytes(virDomainDefPtr def) if (dev->mode =3D=3D VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && dev->source.subsys.type =3D=3D VIR_DOMAIN_HOSTDEV_SUBSYS_T= YPE_PCI && dev->source.subsys.u.pci.backend =3D=3D VIR_DOMAIN_HOSTDEV= _PCI_BACKEND_VFIO) { + usesVFIO =3D true; - break; + + pciAddr =3D &dev->source.subsys.u.pci.addr; + if (virPCIDeviceAddressIsValid(pciAddr, false)) { + pciAddrStr =3D virPCIDeviceAddressAsString(pciAddr); + + if (device_is_nvlink2_capable(pciAddrStr)) { + nvlink2Capable =3D true; + break; + } + } } } =20 @@ -10601,6 +10613,32 @@ qemuDomainGetMemLockLimitBytes(virDomainDefPtr def) 4096 * nPCIHostBridges + 8192; =20 + /* NVLink2 support in QEMU is a special case of the passthrough + * mechanics explained in the usesVFIO case below. The GPU RAM + * is placed with a gap after maxMemory. The current QEMU + * implementation puts the NVIDIA RAM above the PCI MMIO, which + * starts at 32TiB and is the MMIO reserved for the guest main RAM. + * + * This window ends at 64TiB, and this is where the GPUs are being + * placed. The next available window size is at 128TiB, and + * 64TiB..128TiB will fit all possible NVIDIA GPUs. + * + * The same assumption as the most common case applies here: + * the guest will request a 64-bit DMA window, per PHB, that is + * big enough to map all its RAM, which is now at 128TiB due + * to the GPUs. + * + * Note that the NVIDIA RAM window must be accounted for the TCE + * table size, but *not* for the main RAM (maxMemory). This gives + * us the following passthroughLimit for the NVLink2 case: + * + * passthroughLimit =3D maxMemory + + * 128TiB/512KiB * #PHBs + 8 MiB */ + if (nvlink2Capable) + passthroughLimit =3D maxMemory + + 128 * (1ULL<<30) / 512 * nPCIHostBridges + + 8192; + /* passthroughLimit :=3D max( 2 GiB * #PHBs, = (c) * memory (= d) * + memory * 1/512 * #PHBs + 8 MiB ) (= e) @@ -10620,7 +10658,7 @@ qemuDomainGetMemLockLimitBytes(virDomainDefPtr def) * kiB pages, less still if the guest is mapped with hugepages (un= like * the default 32-bit DMA window, DDW windows can use large IOMMU * pages). 8 MiB is for second and further level overheads, like (= b) */ - if (usesVFIO) + else if (usesVFIO) passthroughLimit =3D MAX(2 * 1024 * 1024 * nPCIHostBridges, memory + memory / 512 * nPCIHostBridges + 8192); --=20 2.20.1 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list