From nobody Sun Feb 8 23:32:34 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of redhat.com designates 216.205.24.124 as permitted sender) client-ip=216.205.24.124; envelope-from=libvir-list-bounces@redhat.com; helo=us-smtp-delivery-124.mimecast.com; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 216.205.24.124 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1623333599; cv=none; d=zohomail.com; s=zohoarc; b=lXFrZrM7lZHCjnByJG45ekMjhWDzxhlqCthLEzLOLdAz23gWBd7qGlxumDVvl3dIZOWha3LUZjPz46rYz4IdqPYmhkubjjYSh9nchqIpoEhyt2/OuMfhXgxwuxzkGYDSWzB3Jk62o5hECQIQc8rmBeVi+/gceemGarWPonvkwDw= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1623333599; h=Content-Type:Content-Transfer-Encoding:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=pGyvJr8Pqod6s/JG66954g8MgUUNVDXPSAxaM5jp2OY=; b=mpzQEoOo5XAFpa7S0fB45boHO1EoNd5z0iCHPMElTb1eoUdMnHN7LAwXPxjONKFCGxkNMzNCF2EaflXEtvHdfY8bZGpjByF2EkRjw9AHJHTlQC4VZ8Vr/p6uSqh7btD7warppzqx3dxW2e1lEMdFyKrwEE61ExxNGRR0kfE/Wyg= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 216.205.24.124 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by mx.zohomail.com with SMTPS id 1623333599344657.5295283944279; Thu, 10 Jun 2021 06:59:59 -0700 (PDT) Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-233-uBJaDr7PO6Scu03E7RY9Rg-1; Thu, 10 Jun 2021 09:59:55 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 738E31084F52; Thu, 10 Jun 2021 13:59:49 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 1C2371007623; Thu, 10 Jun 2021 13:59:49 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id D152C46F82; Thu, 10 Jun 2021 13:59:48 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id 15ADvWsH005363 for ; Thu, 10 Jun 2021 09:57:32 -0400 Received: by smtp.corp.redhat.com (Postfix) id DDCA819C46; Thu, 10 Jun 2021 13:57:32 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.193.19]) by smtp.corp.redhat.com (Postfix) with ESMTP id 60AF219C45 for ; Thu, 10 Jun 2021 13:57:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1623333598; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=pGyvJr8Pqod6s/JG66954g8MgUUNVDXPSAxaM5jp2OY=; b=RQDgnNuglhpefZb6NahcXCqNITLGgK707YsYscWgotYURCcXlmYFTFJDm1S8ND1vBbx1B2 ss6rmt7nIpse4WKCYKkwcYkf+YmCkzpVpKQqMSdjOY8rHpebuS45GLm9qpQe3/qE1pd0bW fm1j8WKIGgmp28BtzjF6VdIxe7u4kTA= X-MC-Unique: uBJaDr7PO6Scu03E7RY9Rg-1 From: Michal Privoznik To: libvir-list@redhat.com Subject: [PATCH v2 09/10] capabilities: Expose NUMA interconnects Date: Thu, 10 Jun 2021 15:57:18 +0200 Message-Id: In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-loop: libvir-list@redhat.com X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=libvir-list-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @redhat.com) Content-Type: text/plain; charset="utf-8" Links between NUMA nodes can have different latencies and bandwidths. This info is newly defined in ACPI 6.2 under Heterogeneous Memory Attribute Table (HMAT) table. Linux kernel learned how to report these values under sysfs and thus we can expose them in our capabilities XML. The sysfs interface is documented in kernel's Documentation/admin-guide/mm/numaperf.rst. Long story short, two nodes can be in initiator-target relationship. A node can be initiator if it has a CPU or a device that's capable of initiating memory transfer. Therefore a node that has just memory can only be target. An initiator-target link can then have any combination of {bandwidth, latency} - {access, read, write} attribute (6 in total). However, the standard says access is applicable iff read and write values are the same. Therefore, we really have just four combinations of attributes: bandwidth-read, bandwidth-write, latency-read, latency-write. This is the combination that kernel reports anyway. Then, under /sys/system/devices/node/nodeX/acccessN/initiators we find values for those 4 attributes and also symlinks named "nodeN" which then represent initiators to nodeX. For instance: /sys/system/node/node1/access1/initiators/node0 -> ../../node0 /sys/system/node/node1/access1/initiators/read_bandwidth /sys/system/node/node1/access1/initiators/read_latency /sys/system/node/node1/access1/initiators/write_bandwidth /sys/system/node/node1/access1/initiators/write_latency This means that node0 is initiator and node1 is target and values of the interconnect can be read. In theory, there can be separate links to memory side caches too (e.g. one link from node X to node Y's main memory, another from node X to node Y's L1 cache, another one to L2 cache and so on). But sysfs does not express this relationship just yet. The "accessN" means either "access0" or "access1". The difference is that while the former expresses the best interconnect between two nodes including CPUS and I/O devices (such as GPUs and NICs), the latter includes only CPUs and thus is what we need. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=3D1786309 Signed-off-by: Michal Privoznik --- docs/schemas/capability.rng | 3 + src/conf/capabilities.c | 188 +++++++++++++++++++++++++++++++++++- src/conf/capabilities.h | 1 + 3 files changed, 191 insertions(+), 1 deletion(-) diff --git a/docs/schemas/capability.rng b/docs/schemas/capability.rng index 5c1fb3607c..66dba829a8 100644 --- a/docs/schemas/capability.rng +++ b/docs/schemas/capability.rng @@ -138,6 +138,9 @@ + + + =20 diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c index 6be2d0d791..6cb42a12e1 100644 --- a/src/conf/capabilities.c +++ b/src/conf/capabilities.c @@ -191,7 +191,8 @@ virCapabilitiesHostNUMAUnref(virCapsHostNUMA *caps) =20 if (g_atomic_int_dec_and_test(&caps->refs)) { g_ptr_array_unref(caps->cells); - + if (caps->interconnects) + g_array_unref(caps->interconnects); g_free(caps); } } @@ -890,6 +891,13 @@ virCapabilitiesHostNUMAFormat(virBuffer *buf, } virBufferAdjustIndent(buf, -2); virBufferAddLit(buf, "\n"); + + if (caps->interconnects) { + const virNumaInterconnect *interconnects; + interconnects =3D &g_array_index(caps->interconnects, virNumaInter= connect, 0); + virNumaInterconnectFormat(buf, interconnects, caps->interconnects-= >len); + } + virBufferAdjustIndent(buf, -2); virBufferAddLit(buf, "\n"); return 0; @@ -1735,6 +1743,181 @@ virCapabilitiesHostNUMAInitFake(virCapsHostNUMA *ca= ps) } =20 =20 +static void +virCapabilitiesHostInsertHMAT(GArray *interconnects, + unsigned int initiator, + unsigned int target, + unsigned int read_bandwidth, + unsigned int write_bandwidth, + unsigned int read_latency, + unsigned int write_latency) +{ + virNumaInterconnect ni; + + ni =3D (virNumaInterconnect) { VIR_NUMA_INTERCONNECT_TYPE_BANDWIDTH, + initiator, target, 0, VIR_MEMORY_LATENCY_READ, read_bandwidth}; + g_array_append_val(interconnects, ni); + + ni =3D (virNumaInterconnect) { VIR_NUMA_INTERCONNECT_TYPE_BANDWIDTH, + initiator, target, 0, VIR_MEMORY_LATENCY_WRITE, write_bandwidth}; + g_array_append_val(interconnects, ni); + + ni =3D (virNumaInterconnect) { VIR_NUMA_INTERCONNECT_TYPE_LATENCY, + initiator, target, 0, VIR_MEMORY_LATENCY_READ, read_latency}; + g_array_append_val(interconnects, ni); + + ni =3D (virNumaInterconnect) { VIR_NUMA_INTERCONNECT_TYPE_LATENCY, + initiator, target, 0, VIR_MEMORY_LATENCY_WRITE, write_latency}; + g_array_append_val(interconnects, ni); +} + + +static int +virCapabilitiesHostNUMAInitInterconnectsNode(GArray *interconnects, + unsigned int node) +{ + g_autofree char *path =3D NULL; + g_autofree char *initPath =3D NULL; + g_autoptr(DIR) dir =3D NULL; + int direrr =3D 0; + struct dirent *entry; + unsigned int read_bandwidth; + unsigned int write_bandwidth; + unsigned int read_latency; + unsigned int write_latency; + + /* Unfortunately, kernel does not expose full HMAT table. I mean it do= es, + * in its binary form under /sys/firmware/acpi/tables/HMAT but we don't + * want to parse that. But some important info is still exposed, under + * "access0" and "access1" directories. The former contains the best + * interconnect to given node including CPUs and devices that might do= I/O + * (such as GPUs and NICs). The latter contains the best interconnect = to + * given node but only CPUs are considered. Stick with access1 until s= ysfs + * exposes the full table in a sensible way. + * NB on most system access0 and access1 contain the same values. */ + path =3D g_strdup_printf(SYSFS_SYSTEM_PATH "/node/node%d/access1", nod= e); + + if (!virFileExists(path)) + return 0; + + if (virCapabilitiesGetNodeCacheReadFile(path, "initiators", + "read_bandwidth", + &read_bandwidth) < 0) + return -1; + if (virCapabilitiesGetNodeCacheReadFile(path, "initiators", + "write_bandwidth", + &write_bandwidth) < 0) + return -1; + + /* Bandwidths are read in MiB but stored in KiB */ + read_bandwidth <<=3D 10; + write_bandwidth <<=3D 10; + + if (virCapabilitiesGetNodeCacheReadFile(path, "initiators", + "read_latency", + &read_latency) < 0) + return -1; + if (virCapabilitiesGetNodeCacheReadFile(path, "initiators", + "write_latency", + &write_latency) < 0) + return -1; + + initPath =3D g_strdup_printf("%s/initiators", path); + + if (virDirOpen(&dir, initPath) < 0) + return -1; + + while ((direrr =3D virDirRead(dir, &entry, path)) > 0) { + const char *dname =3D STRSKIP(entry->d_name, "node"); + unsigned int initNode; + + if (!dname) + continue; + + if (virStrToLong_ui(dname, NULL, 10, &initNode) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("unable to parse %s"), + entry->d_name); + return -1; + } + + virCapabilitiesHostInsertHMAT(interconnects, + initNode, node, + read_bandwidth, + write_bandwidth, + read_latency, + write_latency); + } + + return 0; +} + + +static int +virCapsHostNUMAInterconnectComparator(const void *a, + const void *b) +{ + const virNumaInterconnect *aa =3D a; + const virNumaInterconnect *bb =3D b; + + if (aa->type !=3D bb->type) + return aa->type - bb->type; + + if (aa->initiator !=3D bb->initiator) + return aa->initiator - bb->initiator; + + if (aa->target !=3D bb->target) + return aa->target - bb->target; + + if (aa->cache !=3D bb->cache) + return aa->cache - bb->cache; + + if (aa->accessType !=3D bb->accessType) + return aa->accessType - bb->accessType; + + return aa->value - bb->value; +} + + +static int +virCapabilitiesHostNUMAInitInterconnects(virCapsHostNUMA *caps) +{ + g_autoptr(DIR) dir =3D NULL; + int direrr =3D 0; + struct dirent *entry; + const char *path =3D SYSFS_SYSTEM_PATH "/node/"; + g_autoptr(GArray) interconnects =3D g_array_new(FALSE, FALSE, sizeof(v= irNumaInterconnect)); + + if (virDirOpenIfExists(&dir, path) < 0) + return -1; + + while (dir && (direrr =3D virDirRead(dir, &entry, path)) > 0) { + const char *dname =3D STRSKIP(entry->d_name, "node"); + unsigned int node; + + if (!dname) + continue; + + if (virStrToLong_ui(dname, NULL, 10, &node) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("unable to parse %s"), + entry->d_name); + return -1; + } + + if (virCapabilitiesHostNUMAInitInterconnectsNode(interconnects, no= de) < 0) + return -1; + } + + if (interconnects->len > 0) { + g_array_sort(interconnects, virCapsHostNUMAInterconnectComparator); + caps->interconnects =3D g_steal_pointer(&interconnects); + } + + return 0; +} + + static int virCapabilitiesHostNUMAInitReal(virCapsHostNUMA *caps) { @@ -1795,6 +1978,9 @@ virCapabilitiesHostNUMAInitReal(virCapsHostNUMA *caps) &caches); } =20 + if (virCapabilitiesHostNUMAInitInterconnects(caps) < 0) + goto cleanup; + ret =3D 0; =20 cleanup: diff --git a/src/conf/capabilities.h b/src/conf/capabilities.h index 334b361e7a..1b99202c9b 100644 --- a/src/conf/capabilities.h +++ b/src/conf/capabilities.h @@ -114,6 +114,7 @@ struct _virCapsHostNUMACell { struct _virCapsHostNUMA { gint refs; GPtrArray *cells; + GArray *interconnects; /* virNumaInterconnect */ }; =20 struct _virCapsHostSecModelLabel { --=20 2.31.1