From nobody Wed Nov 27 20:32:06 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; envelope-from=libvir-list-bounces@redhat.com; helo=mx1.redhat.com; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=fail(p=none dis=none) header.from=oracle.com Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by mx.zohomail.com with SMTPS id 1537869785985180.82576903001984; Tue, 25 Sep 2018 03:03:05 -0700 (PDT) Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A6A9B8665F; Tue, 25 Sep 2018 10:03:03 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5D3A35DD78; Tue, 25 Sep 2018 10:03:03 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id C3EDB181A12F; Tue, 25 Sep 2018 10:03:02 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id w8PA31Td031038 for ; Tue, 25 Sep 2018 06:03:01 -0400 Received: by smtp.corp.redhat.com (Postfix) id 511972010D6B; Tue, 25 Sep 2018 10:03:01 +0000 (UTC) Received: from mx1.redhat.com (ext-mx19.extmail.prod.ext.phx2.redhat.com [10.5.110.48]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 487192010D62 for ; Tue, 25 Sep 2018 10:02:59 +0000 (UTC) Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A43A5307D84A for ; Tue, 25 Sep 2018 10:02:57 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w8P9xCSJ064402 for ; Tue, 25 Sep 2018 10:02:57 GMT Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2130.oracle.com with ESMTP id 2mnd5tb45b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Tue, 25 Sep 2018 10:02:57 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w8PA2ukc001605 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Tue, 25 Sep 2018 10:02:56 GMT Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w8PA2uRe004566 for ; Tue, 25 Sep 2018 10:02:56 GMT Received: from waters.dynamic.ziggo.nl (/10.175.18.50) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 25 Sep 2018 03:02:55 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2018-07-02; bh=rHzczARytsu4CAmcUTbunpgFRddILSduY+S3CnSTqwM=; b=wXZpwJjGTTuKgdnY4ZR4RF8Tu7mUpU+U2ZHdkeDQ1UJfAEW29dRTW4YB0+/w1Vc4Te/3 RVzuXDnnvM8S8hM8jAoCQmMRICTmRvsZXGcTW8EIrw6TWEgBgQmtoOARUXFXTfCzl4Ir +1JmW5mHNf1ifJ+plkq7U3/fesxAaMoRHfwxk0retQhAuoe8l+cl0NM1IfOKQ6YUZIsm tqkdQiykrFUtDWnsuUc6ivHTHLEkd4/4QJgS/fYDnE/uL5kFgmaPJ8n5nZvE2rSbOuIw wFDLWCqYoA4z7ZwB0nlh+PMeUdsf4cdHhgGhxoDBv0LrmDws+PHv80ikaufLimFeUuzx IA== From: Wim Ten Have To: Libvirt Development List Date: Tue, 25 Sep 2018 12:02:41 +0200 Message-Id: <20180925100242.10678-2-wim.ten.have@oracle.com> In-Reply-To: <20180925100242.10678-1-wim.ten.have@oracle.com> References: <20180925100242.10678-1-wim.ten.have@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9026 signatures=668707 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1809250104 X-Greylist: Sender passed SPF test, Sender IP whitelisted by DNSRBL, ACL 214 matched, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Tue, 25 Sep 2018 10:02:57 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Tue, 25 Sep 2018 10:02:57 +0000 (UTC) for IP:'156.151.31.86' DOMAIN:'userp2130.oracle.com' HELO:'userp2130.oracle.com' FROM:'wim.ten.have@oracle.com' RCPT:'' X-RedHat-Spam-Score: -102.857 (DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_MED, SPF_HELO_PASS, SPF_PASS, UNPARSEABLE_RELAY, USER_IN_WHITELIST) 156.151.31.86 userp2130.oracle.com 156.151.31.86 userp2130.oracle.com X-Scanned-By: MIMEDefang 2.84 on 10.5.110.48 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.25 X-loop: libvir-list@redhat.com Cc: Wim ten Have Subject: [libvirt] [RFC PATCH auto partition NUMA guest domains v1 1/2] domain: auto partition guests providing the host NUMA topology X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Tue, 25 Sep 2018 10:03:04 +0000 (UTC) X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZohoMail: RDMRC_1 RDKM_2 RSF_0 Z_629925259 SPT_0 Content-Type: text/plain; charset="utf-8" From: Wim ten Have Add a mechanism to auto partition the host NUMA topology under the guest domain. This patch adds a framework to automatically partition the host into a small vNUMA subset defined by the guest XML given and description when are in effect and the hypervisor indicates per the host capabilities that a physical NUMA topology is in effect. The mechanism automatically renders the host capabilities provided NUMA architecture, evenly balances the guest reserved vcpu and memory amongst its vNUMA composed cells and have the cell allocated vcpus pinned towards the host NUMA node physical cpusets. This in such way that the host NUMA topology is still in effect under the partitioned guest vNUMA domain. Signed-off-by: Wim ten Have --- docs/formatdomain.html.in | 7 ++ docs/schemas/cputypes.rng | 1 + src/conf/cpu_conf.c | 3 +- src/conf/cpu_conf.h | 1 + src/conf/domain_conf.c | 166 ++++++++++++++++++++++++++++++++++++++ 5 files changed, 177 insertions(+), 1 deletion(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 1f12ab5b4214..ba073d952545 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1500,6 +1500,13 @@
The virtual CPU created by the hypervisor will be checked against the CPU specification and the domain will not be start= ed unless the two CPUs match.
+ +
numa
+
When the CPU mode=3D'host-passthrough' check=3D'numa' option + combination is set, libvirt auto partitions the guest domain + by rendering the host NUMA architecture. Here the virtual + CPUs and memory are evenly balanced across the defined NUMA + nodes. The vCPUs are also pinned to their physical CPUs.
=20 Since 0.9.10, an optional mode<= /code> diff --git a/docs/schemas/cputypes.rng b/docs/schemas/cputypes.rng index 1f1e0e36d59b..d384d161ee7e 100644 --- a/docs/schemas/cputypes.rng +++ b/docs/schemas/cputypes.rng @@ -29,6 +29,7 @@ none partial full + numa diff --git a/src/conf/cpu_conf.c b/src/conf/cpu_conf.c index 863413e75eaa..0d52f6aa4813 100644 --- a/src/conf/cpu_conf.c +++ b/src/conf/cpu_conf.c @@ -52,7 +52,8 @@ VIR_ENUM_IMPL(virCPUCheck, VIR_CPU_CHECK_LAST, "default", "none", "partial", - "full") + "full", + "numa") =20 VIR_ENUM_IMPL(virCPUFallback, VIR_CPU_FALLBACK_LAST, "allow", diff --git a/src/conf/cpu_conf.h b/src/conf/cpu_conf.h index 9f2e7ee2649d..f2e2f0bef3ae 100644 --- a/src/conf/cpu_conf.h +++ b/src/conf/cpu_conf.h @@ -68,6 +68,7 @@ typedef enum { VIR_CPU_CHECK_NONE, VIR_CPU_CHECK_PARTIAL, VIR_CPU_CHECK_FULL, + VIR_CPU_CHECK_NUMA, =20 VIR_CPU_CHECK_LAST } virCPUCheck; diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 9911d56130a9..c2f9398cfe85 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -1759,6 +1759,168 @@ virDomainDefGetVcpusTopology(const virDomainDef *de= f, } =20 =20 +/** + * virDomainNumaAutoconfig: auto partition guest vNUMA XML definitions + * taking the machine NUMA topology creating a small guest copy instance. + * @def: domain definition + * @caps: host capabilities + * + * Auto partitioning vNUMA guests is requested under XML configuration + * . Here libvirt takes the + * host NUMA topology, including maxvcpus, online vcpus, memory and + * pinning node cpu's where it renders the guest domain vNUMA topology + * building an architectural copy of the host. + * + * Returns 0 on success and -1 on error. + */ +static int +virDomainNumaAutoconfig(virDomainDefPtr def, + virCapsPtr caps) +{ + int ret =3D -1; + + if (caps && def->cpu && + def->cpu->mode =3D=3D VIR_CPU_MODE_HOST_PASSTHROUGH && + def->cpu->check =3D=3D VIR_CPU_CHECK_NUMA) { + + size_t i, cell; + size_t nvcpus =3D 0; + size_t nnumaCell =3D 0; + unsigned long long memsizeCell =3D 0; + virBitmapPtr vnumask =3D NULL; + virCapsHostPtr host =3D &caps->host; + + nnumaCell =3D host->nnumaCell; + if (!nnumaCell) + goto cleanup; + + /* Compute the online vcpus */ + for (i =3D 0; i < def->maxvcpus; i++) + if (def->vcpus[i]->online) + nvcpus++; + + if (nvcpus < nnumaCell) { + VIR_WARN("vNUMA disabled: %ld vcpus is insufficient " + "to arrange a vNUMA topology for %ld nodes.", + nvcpus, nnumaCell); + goto cleanup; + } + + /* Compute the memory size (memsizeCell) per arranged nnumaCell + */ + if ((memsizeCell =3D def->mem.total_memory / nnumaCell) =3D=3D 0) + goto cleanup; + + /* Correct vNUMA can only be accomplished if the number of maxvcpus + * is a multiple of the number of physical nodes. If this is not + * possible we set sockets, cores and threads to 0 so libvirt + * creates a default topology where all vcpus appear as sockets and + * cores and threads are set to 1. + */ + if (def->maxvcpus % nnumaCell) { + VIR_WARN("vNUMA: configured %ld vcpus do not meet the host " + "%ld NUMA nodes for an evenly balanced cpu topology.", + def->maxvcpus, nnumaCell); + def->cpu->sockets =3D def->cpu->cores =3D def->cpu->threads = =3D 0; + } else { + /* Below artificial cpu topology computed aims for best host + * matching cores/threads alignment fitting the configured vcp= us. + */ + unsigned int sockets =3D host->numaCell[nnumaCell-1]->cpus->so= cket_id + 1; + unsigned int threads =3D host->cpu->threads; + + if (def->maxvcpus % (sockets * threads)) + threads =3D 1; + + def->cpu->cores =3D def->maxvcpus / (sockets * threads); + def->cpu->threads =3D threads; + def->cpu->sockets =3D sockets; + } + + /* Build the vNUMA topology. Our former universe might have + * changed entirely where it did grow beyond former dimensions + * so fully free current allocations and start from scratch + * building new vNUMA topology. + */ + virDomainNumaFree(def->numa); + if (!(def->numa =3D virDomainNumaNew())) + goto error; + + if (!virDomainNumaSetNodeCount(def->numa, nnumaCell)) + goto error; + + for (cell =3D 0; cell < nnumaCell; cell++) { + char *vcpus =3D NULL; + size_t ndistances; + virBitmapPtr cpumask =3D NULL; + virCapsHostNUMACell *numaCell =3D host->numaCell[cell]; + + /* per NUMA cell memory size */ + virDomainNumaSetNodeMemorySize(def->numa, cell, memsizeCell); + + /* per NUMA cell vcpu range to mask */ + for (i =3D cell; i < def->maxvcpus; i +=3D nnumaCell) { + char *tmp =3D NULL; + + if ((virAsprintf(&tmp, "%ld%s", i, + ((def->maxvcpus - i) > nnumaCell) ? "," : "") < 0)= || + (virAsprintf(&vcpus, "%s%s", + (vcpus ? vcpus : ""), tmp) < 0)) { + VIR_FREE(tmp); + VIR_FREE(vcpus); + goto error; + } + VIR_FREE(tmp); + } + + if ((virBitmapParse(vcpus, &cpumask, VIR_DOMAIN_CPUMASK_LEN) <= 0) || + (virDomainNumaSetNodeCpumask(def->numa, cell, cpumask) =3D= =3D NULL)) { + VIR_FREE(vcpus); + goto error; + } + VIR_FREE(vcpus); + + /* per NUMA cpus sibling vNUMA pinning */ + if (!(vnumask =3D virBitmapNew(nnumaCell * numaCell->ncpus))) + goto error; + + for (i =3D 0; i < numaCell->ncpus; i++) { + unsigned int id =3D numaCell->cpus[i].id; + + if (virBitmapSetBit(vnumask, id) < 0) { + virBitmapFree(vnumask); + goto error; + } + } + + for (i =3D 0; i < def->maxvcpus; i++) { + if (virBitmapIsBitSet(cpumask, i)) + def->vcpus[i]->cpumask =3D virBitmapNewCopy(vnumask); + } + virBitmapFree(vnumask); + + /* per NUMA cell sibling distances */ + ndistances =3D numaCell->nsiblings; + if (ndistances && + virDomainNumaSetNodeDistanceCount(def->numa, cell, ndistan= ces) !=3D nnumaCell) + goto error; + + for (i =3D 0; i < ndistances; i++) { + unsigned int distance =3D numaCell->siblings[i].distance; + + if (virDomainNumaSetNodeDistance(def->numa, cell, i, dista= nce) !=3D distance) + goto error; + } + } + } + cleanup: + ret =3D 0; + + error: + return ret; +} + + virDomainDiskDefPtr virDomainDiskDefNew(virDomainXMLOptionPtr xmlopt) { @@ -19749,6 +19911,10 @@ virDomainDefParseXML(xmlDocPtr xml, if (virDomainNumaDefCPUParseXML(def->numa, ctxt) < 0) goto error; =20 + /* Check and apply auto partition vNUMA topology to the guest if reque= sted */ + if (virDomainNumaAutoconfig(def, caps)) + goto error; + if (virDomainNumaGetCPUCountTotal(def->numa) > virDomainDefGetVcpusMax= (def)) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Number of CPUs in exceeds the" --=20 2.17.1 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list