From nobody Mon Feb 9 11:29:52 2026 Delivered-To: importer@patchew.org Received-SPF: none (zohomail.com: 8.43.85.245 is neither permitted nor denied by domain of lists.libvirt.org) client-ip=8.43.85.245; envelope-from=devel-bounces@lists.libvirt.org; helo=lists.libvirt.org; Authentication-Results: mx.zohomail.com; spf=none (zohomail.com: 8.43.85.245 is neither permitted nor denied by domain of lists.libvirt.org) smtp.mailfrom=devel-bounces@lists.libvirt.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.libvirt.org (lists.libvirt.org [8.43.85.245]) by mx.zohomail.com with SMTPS id 1699258377770128.190218805011; Mon, 6 Nov 2023 00:12:57 -0800 (PST) Received: by lists.libvirt.org (Postfix, from userid 996) id 9CEB9197B; Mon, 6 Nov 2023 03:12:56 -0500 (EST) Received: from lists.libvirt.org (localhost [IPv6:::1]) by lists.libvirt.org (Postfix) with ESMTP id 3F24D19AE; Mon, 6 Nov 2023 02:40:56 -0500 (EST) Received: by lists.libvirt.org (Postfix, from userid 996) id 2E36118AF; Mon, 6 Nov 2023 02:39:16 -0500 (EST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.libvirt.org (Postfix) with ESMTPS id 9B72C18BB for ; Mon, 6 Nov 2023 02:39:10 -0500 (EST) Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-259-X50GRq5EPvS4iRKM3brZkA-1; Mon, 06 Nov 2023 02:39:07 -0500 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5B1F285A58A; Mon, 6 Nov 2023 07:39:05 +0000 (UTC) Received: from vhost3.router.laine.org (unknown [10.22.32.79]) by smtp.corp.redhat.com (Postfix) with ESMTP id 13035492BE0; Mon, 6 Nov 2023 07:39:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on lists.libvirt.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.4 X-MC-Unique: X50GRq5EPvS4iRKM3brZkA-1 From: Laine Stump To: devel@lists.libvirt.org, libvir-list@redhat.com Subject: [libvirt PATCH v2 12/15] conf: support manually specifying VFIO variant driver in XML Date: Mon, 6 Nov 2023 02:38:57 -0500 Message-ID: <20231106073901.735582-13-laine@redhat.com> In-Reply-To: <20231106073901.735582-1-laine@redhat.com> References: <20231106073901.735582-1-laine@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Message-ID-Hash: NCAJL64RAGY2Z63ZDYLQSMLJMAZ254C2 X-Message-ID-Hash: NCAJL64RAGY2Z63ZDYLQSMLJMAZ254C2 X-MailFrom: laine@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-config-1; header-match-config-2; header-match-config-3; header-match-devel.lists.libvirt.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: Joao Martins , Jason Gunthorpe , Cedric Le Goater X-Mailman-Version: 3.2.2 Precedence: list List-Id: Development discussions about the libvirt library & tools Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="utf-8"; x-default="true" Content-Transfer-Encoding: quoted-printable X-ZM-MESSAGEID: 1699258379038100001 This patch makes it possible to manually specify which VFIO variant driver to use for PCI hostdev device assignment, so that, e.g. you could force use of the generic vfio-pci driver with when libvirt would have normally (after applying a subsequent patch) found a "better match" for a device in the active kernel's modules.alias file. (The main use of this manual override would be to work around a bug in a new VFIO variant driver by temporarily not using that driver). This sounds like a simple addition to XML parsing/formatting, but of course it's messier than that, since the attribute we want to use was previously used in config for a now non-existent purpose (choosing a type of device assignment that was removed from the kernel nearly a decade ago), and continues to be used *internal to the C code*. Background: Beginning when VFIO device assignment support was added to or ) in libvirt's QEMU driver, it was possible to specify which device assignment backend to use with "". This was only useful for a couple of years in the early 2010's when VFIO device assignment was newly introduced, but "legacy KVM" device assignment was still possible (in reality "" was only intended to be used (1) in the *very* early days of VFIO when "kvm" was still the default, or (2) when the host kernel was too old to have VFIO support (meaning it was e.g. pre-RHEL7) or (3) some bug was encountered with the then-new VFIO code that could have been avoided by switching back to the older style of device assignment (I don't recall anyone ever needing to set it for this reason, but that is one of the main reasons we added the knob). Later, when the libxl (Xen) driver in libvirt began supporting "device passthrough" with , they added to indicate that mode of operation. But in practice this was also never necessary in any config, since Xen only supports one type of device assignment, and so the attribute was anyway set in the C object by the libxl driver code prior to calling any hypervisor-common code (i.e. the stuff in hypervisor/hostdev.c and util/virpci.c) that needs to know which type of device assignment is being used - setting it in the XML config was superfluous (kind of like me saying "I am now describing this patch in a human language", ref: Perd Hapley on "Parks and Recreation"). So the setting was available in the XML, but never needed to be set by the user. Because it was automatically set in the C code though, sometimes libvirt-generated XML would contain the option even though the user hadn't included it in the original input XML (e.g. the libxl driver sets it in the PostParse, so all saved configs with a PCI device will have ; also status XML from the QEMU and libxl drivers will contain - in both cases completely unnecessary and redundant information). When adding support for specifying a variant driver for VFIO device assignment, from the beginning I have wanted to use to force a specific variant driver (e.g. ), with the assumption that the name attribute could be easily repurposed because it had been *completely* unused for so long. What I discovered though, was that 1) the field in the C object corresponding to driver name was an enum value rather than a string (so parser and formatter need to be changed, not just the driver code looking at a string in the C object), and 2) even though the XML attribute was effectively unused (except in some output), the field in the C object was *always* being set internally as a way for the hypervisor driver code to inform the hypervisor-common hostdev manager (in src/hypervisor) which method of device assignment to use. So re-use wasn't as simple as I'd wished. However. What I have hit upon that permits us to use is to do the following: 1) rename the existing driver name attribute to driver *type* - this will now be the enum that is set internally by the hypervisor driver prior to calling the hostdev manager (and also is what will show up in status XML; the libxl driver was modified in a previous patch so that the setting isn't done until domain runtime (rather than during PostParse), so it will no longer show up in regurgitated libxml config) 2) add a new attribute with the now-newly-unused name "name" - this will be a string that can be used to communicate a specific host kernel driver to the hostdev manager. 3) In the parsing code for , if is encountered, set the driver *type* to the appropriate enum instead, and clear our the name string. (since the four places that use the hostdev element now all call a common parser function, this check is only needed in a single place) I could alternately just leave "name" as-is, and create a new attribute with a never-before-used name within driver. I suppose instead of "type" and "name", we could instead have "name and "model" (where "model" would be the string that could be set to "mlx5_vfio_pci"). I just prefer type/name to name/model, and think it's been long enough since has had a useful purpose (and the fixup for backward compatibility is limited to a few lines in one function) that this change is safe. (another alternate would be to add an extra parameter to the C API that calls into the hostdev manager rather than keeping/adding the publicly visible "type" attribute, but it seems at least possible that sometime in the future another alternate "type" of device assignment could be introduced for either Xen or QEMU, and we would then need to add back the XML attribute anyway) I'd be fine with doing it one of the other ways, but decided to post this way first, since it's the one that I find the most aesthetically pleasing :-) Note that a few test cases have been modified - places where an existing "name=3D'vfio'" was changed to "type=3D'vfio'" were in in status XML or only the *output* XML for a test (except the case of the virnetworkportxml2xmltest, which doesn't have a separate directory for the XML result; fortunately the converged parsing of between domain/network/networkport means that the test cases for network and domain XML are already testing the same code that would convert "name" to "type" in thte networkport XML) Signed-off-by: Laine Stump --- docs/formatdomain.rst | 55 +++++++++++++------ src/conf/device_conf.c | 29 +++++++++- src/conf/device_conf.h | 4 ++ src/conf/domain_conf.c | 1 + src/conf/network_conf.c | 2 + src/conf/schemas/basictypes.rng | 21 ++++--- src/conf/virnetworkportdef.c | 1 + tests/networkxml2xmlin/hostdev-pf-old.xml | 8 +++ tests/networkxml2xmlin/hostdev-pf.xml | 2 +- tests/networkxml2xmlout/hostdev-pf-old.xml | 8 +++ tests/networkxml2xmlout/hostdev-pf.xml | 2 +- tests/networkxml2xmltest.c | 6 ++ .../qemuhotplug-hostdev-pci.xml | 1 - .../qemuhotplug-base-live+hostdev-pci.xml | 2 +- ...uhotplug-pseries-base-live+hostdev-pci.xml | 2 +- tests/qemustatusxml2xmldata/modern-in.xml | 2 +- .../hostdev-vfio.x86_64-latest.args | 5 +- tests/qemuxml2argvdata/hostdev-vfio.xml | 18 ++++++ .../hostdev-vfio.x86_64-latest.xml | 23 +++++++- .../plug-hostdev-pci-unmanaged.xml | 2 +- .../plug-hostdev-pci.xml | 2 +- 21 files changed, 161 insertions(+), 35 deletions(-) create mode 100644 tests/networkxml2xmlin/hostdev-pf-old.xml create mode 100644 tests/networkxml2xmlout/hostdev-pf-old.xml diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 459815d2b5..d67421a054 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -4500,24 +4500,47 @@ or: an error. See the `Device Addresses`_ section for more details on the a= ddress element. ``driver`` - PCI devices can have an optional ``driver`` subelement that specifies w= hich - backend driver to use for PCI device assignment. Use the ``name`` attri= bute - to select either "vfio" (for the new VFIO device assignment backend, wh= ich is - compatible with UEFI SecureBoot) or "kvm" (the legacy device assignment - handled directly by the KVM kernel module) :since:`Since 1.0.5 (QEMU an= d KVM - only, requires kernel 3.6 or newer)` . When specified, device assignmen= t will - fail if the requested method of device assignment isn't available on the - host. When not specified, the default is "vfio" on systems where the VF= IO - driver is available and loaded, and "kvm" on older systems, or those wh= ere - the VFIO driver hasn't been loaded :since:`Since 1.1.3` (prior to that = the - default was always "kvm"). + PCI hostdev devices can have an optional ``driver`` subelement that + specifies which host driver to bind to the device when preparing it + for assignment to a guest. :since:`Since 9.10.0 (useful for QEMU and + KVM only)`. This is done by setting the ```` element's ``name`` + attribute, for example:: + + ... + + + ... + + tells libvirt to bind the driver "vfio-pci-igb" to the device on + the host before handing it off to QEMU for assignment to the + guest. Normally libvirt will bind the device to the "best match" + VFIO-type driver that it finds in the kernel's modules.alias file + (based on matching the corresponding fields of the device's + modalias file in sysfs) or to the generic "vfio-pci" driver if no + better match is found (vfio-pci is always used prior to libvirt + 9.10.0), but in cases when the correct driver isn't listed in + modules.alias then the desired device-specific driver can be forced + by setting driver name, or if the device-specific driver that is + found is "problematic" in some way, the generic vfio-pci driver + similarly be forced. + + (Note: in versions of libvirt prior to 9.10.0 (:since:`Since + 1.0.5), the ``name`` attribute was described to be used to select + the type of device assignment ("vfio", "kvm", or "xen"), but those + values have been mostly useless, since the type of device + assignment is actually determined by which hypservisor is in + use. This means that you may occasionally see ```` or ```` in a domain's status + XML, or more rarely in config, but those specific values are + essentially ignored.) + ``readonly`` - Indicates that the device is readonly, only supported by SCSI host devi= ce - now. :since:`Since 1.0.6 (QEMU and KVM only)` + Indicates that the device is readonly, only supported by SCSI host + device now. :since:`Since 1.0.6 (QEMU and KVM only)` ``shareable`` - If present, this indicates the device is expected to be shared between - domains (assuming the hypervisor and OS support this). Only supported b= y SCSI - host device. :since:`Since 1.0.6` + If present, this indicates the device is expected to be shared + between domains (assuming the hypervisor and OS support this). Only + supported by SCSI host device. :since:`Since 1.0.6` =20 Note: Although ``shareable`` was introduced :since:`in 1.0.6` , it did = not work as as expected until :since:`1.2.2` . diff --git a/src/conf/device_conf.c b/src/conf/device_conf.c index e022783816..2ad7232678 100644 --- a/src/conf/device_conf.c +++ b/src/conf/device_conf.c @@ -60,13 +60,29 @@ int virDeviceHostdevPCIDriverInfoParseXML(xmlNodePtr node, virDeviceHostdevPCIDriverInfo *drive= r) { - if (virXMLPropEnum(node, "name", + if (virXMLPropEnum(node, "type", virDeviceHostdevPCIDriverTypeFromString, VIR_XML_PROP_NONZERO, &driver->type) < 0) { return -1; } =20 + driver->name =3D virXMLPropString(node, "name"); + + /* translate special case of to + * for backward compatibility + */ + if (STREQ_NULLABLE(driver->name, "vfio")) { + driver->type =3D VIR_DEVICE_HOSTDEV_PCI_DRIVER_TYPE_VFIO; + VIR_FREE(driver->name); + } else if (STREQ_NULLABLE(driver->name, "xen")) { + driver->type =3D VIR_DEVICE_HOSTDEV_PCI_DRIVER_TYPE_XEN; + VIR_FREE(driver->name); + } else if (STREQ_NULLABLE(driver->name, "kvm")) { + driver->type =3D VIR_DEVICE_HOSTDEV_PCI_DRIVER_TYPE_KVM; + VIR_FREE(driver->name); + } + return 0; } =20 @@ -88,14 +104,23 @@ virDeviceHostdevPCIDriverInfoFormat(virBuffer *buf, return -1; } =20 - virBufferAsprintf(&driverAttrBuf, " name=3D'%s'", driverType); + virBufferAsprintf(&driverAttrBuf, " type=3D'%s'", driverType); } =20 + virBufferEscapeString(&driverAttrBuf, " name=3D'%s'", driver->name); + virXMLFormatElement(buf, "driver", &driverAttrBuf, NULL); return 0; } =20 =20 +void +virDeviceHostdevPCIDriverInfoClear(virDeviceHostdevPCIDriverInfo *driver) +{ + VIR_FREE(driver->name); +} + + static int virZPCIDeviceAddressParseXML(xmlNodePtr node, virPCIDeviceAddress *addr) diff --git a/src/conf/device_conf.h b/src/conf/device_conf.h index 9f4b9f5375..cccc60cb29 100644 --- a/src/conf/device_conf.h +++ b/src/conf/device_conf.h @@ -46,6 +46,7 @@ VIR_ENUM_DECL(virDeviceHostdevPCIDriver); =20 struct _virDeviceHostdevPCIDriverInfo { virDeviceHostdevPCIDriverType type; + char *name; }; =20 typedef enum { @@ -192,6 +193,9 @@ int virDeviceHostdevPCIDriverInfoParseXML(xmlNodePtr no= de, int virDeviceHostdevPCIDriverInfoFormat(virBuffer *buf, const virDeviceHostdevPCIDriverInf= o *driver); =20 +void virDeviceHostdevPCIDriverInfoPostParse(virDeviceHostdevPCIDriverInfo = *driver); +void virDeviceHostdevPCIDriverInfoClear(virDeviceHostdevPCIDriverInfo *dri= ver); + void virDomainDeviceInfoClear(virDomainDeviceInfo *info); void virDomainDeviceInfoFree(virDomainDeviceInfo *info); =20 diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 04d9de7f32..3b11b927b5 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -2635,6 +2635,7 @@ virDomainHostdevDefClear(virDomainHostdevDef *def) VIR_FREE(def->source.subsys.u.scsi_host.wwpn); break; case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: + virDeviceHostdevPCIDriverInfoClear(&def->source.subsys.u.pci.d= river); g_clear_pointer(&def->source.subsys.u.pci.origstates, virBitma= pFree); break; case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB: diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index 2d78d588aa..c8ced95af6 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -229,6 +229,8 @@ virNetworkForwardDefClear(virNetworkForwardDef *def) { size_t i; =20 + virDeviceHostdevPCIDriverInfoClear(&def->driver); + for (i =3D 0; i < def->npfs && def->pfs; i++) virNetworkForwardPfDefClear(&def->pfs[i]); VIR_FREE(def->pfs); diff --git a/src/conf/schemas/basictypes.rng b/src/conf/schemas/basictypes.= rng index 8d5f4475ca..46e8a493f7 100644 --- a/src/conf/schemas/basictypes.rng +++ b/src/conf/schemas/basictypes.rng @@ -658,13 +658,20 @@ =20 - - - kvm - vfio - xen - - + + + + kvm + vfio + xen + + + + + + + + diff --git a/src/conf/virnetworkportdef.c b/src/conf/virnetworkportdef.c index 49d00b2ea6..64db63ae66 100644 --- a/src/conf/virnetworkportdef.c +++ b/src/conf/virnetworkportdef.c @@ -64,6 +64,7 @@ virNetworkPortDefFree(virNetworkPortDef *def) break; =20 case VIR_NETWORK_PORT_PLUG_TYPE_HOSTDEV_PCI: + virDeviceHostdevPCIDriverInfoClear(&def->plug.hostdevpci.driver); break; =20 case VIR_NETWORK_PORT_PLUG_TYPE_LAST: diff --git a/tests/networkxml2xmlin/hostdev-pf-old.xml b/tests/networkxml2x= mlin/hostdev-pf-old.xml new file mode 100644 index 0000000000..5b8f59858c --- /dev/null +++ b/tests/networkxml2xmlin/hostdev-pf-old.xml @@ -0,0 +1,8 @@ + + hostdev + 81ff0d90-c91e-6742-64da-4a736edb9a9b + + + + + diff --git a/tests/networkxml2xmlin/hostdev-pf.xml b/tests/networkxml2xmlin= /hostdev-pf.xml index 5b8f59858c..72a3049800 100644 --- a/tests/networkxml2xmlin/hostdev-pf.xml +++ b/tests/networkxml2xmlin/hostdev-pf.xml @@ -2,7 +2,7 @@ hostdev 81ff0d90-c91e-6742-64da-4a736edb9a9b - + diff --git a/tests/networkxml2xmlout/hostdev-pf-old.xml b/tests/networkxml2= xmlout/hostdev-pf-old.xml new file mode 100644 index 0000000000..72a3049800 --- /dev/null +++ b/tests/networkxml2xmlout/hostdev-pf-old.xml @@ -0,0 +1,8 @@ + + hostdev + 81ff0d90-c91e-6742-64da-4a736edb9a9b + + + + + diff --git a/tests/networkxml2xmlout/hostdev-pf.xml b/tests/networkxml2xmlo= ut/hostdev-pf.xml index 5b8f59858c..72a3049800 100644 --- a/tests/networkxml2xmlout/hostdev-pf.xml +++ b/tests/networkxml2xmlout/hostdev-pf.xml @@ -2,7 +2,7 @@ hostdev 81ff0d90-c91e-6742-64da-4a736edb9a9b - + diff --git a/tests/networkxml2xmltest.c b/tests/networkxml2xmltest.c index b0814c7529..928f28b579 100644 --- a/tests/networkxml2xmltest.c +++ b/tests/networkxml2xmltest.c @@ -146,6 +146,12 @@ mymain(void) DO_TEST_FLAGS("passthrough-pf", VIR_NETWORK_XML_INACTIVE); DO_TEST("hostdev"); DO_TEST_FLAGS("hostdev-pf", VIR_NETWORK_XML_INACTIVE); + + /* libvirt pre-9.9.0 used "name=3D'vfio'" which should be + * automatically translated to "type=3D'vfio'" by new parser + */ + DO_TEST_FLAGS("hostdev-pf-old", VIR_NETWORK_XML_INACTIVE); + DO_TEST("passthrough-address-crash"); DO_TEST("nat-network-explicit-flood"); DO_TEST("host-bridge-no-flood"); diff --git a/tests/qemuhotplugtestdevices/qemuhotplug-hostdev-pci.xml b/tes= ts/qemuhotplugtestdevices/qemuhotplug-hostdev-pci.xml index 6f7c99c943..0d6dec9d26 100644 --- a/tests/qemuhotplugtestdevices/qemuhotplug-hostdev-pci.xml +++ b/tests/qemuhotplugtestdevices/qemuhotplug-hostdev-pci.xml @@ -1,5 +1,4 @@ -
diff --git a/tests/qemuhotplugtestdomains/qemuhotplug-base-live+hostdev-pci= .xml b/tests/qemuhotplugtestdomains/qemuhotplug-base-live+hostdev-pci.xml index e49e21c53e..b14f184044 100644 --- a/tests/qemuhotplugtestdomains/qemuhotplug-base-live+hostdev-pci.xml +++ b/tests/qemuhotplugtestdomains/qemuhotplug-base-live+hostdev-pci.xml @@ -49,7 +49,7 @@