From nobody Fri May 17 03:00:54 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of redhat.com designates 170.10.129.124 as permitted sender) client-ip=170.10.129.124; envelope-from=libvir-list-bounces@redhat.com; helo=us-smtp-delivery-124.mimecast.com; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1660194014; cv=none; d=zohomail.com; s=zohoarc; b=bY02mspVLMV4QMeUUfB2cnlRZ8BH8j5kVnDC/J17Khrf9051EzLQKN0Ja6ic5sZYfMOVGmm1L82/3K/10Pv7f7VLCSVWiyeyXzqJYXnfz5Hk/w5NwlfOUApPJo5xhOw7cyTfx5IyNZf5rYakAfQYGvzgvzVN9mtUHZaSJCLuwEg= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1660194014; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Sender:Subject:To; bh=uHt27zNSHgli2nAVhmHy6M7y+zqA5NVga3SDaEEZeAE=; b=Gsdr6G5u7z3DPNlmbDVcuvbqv83vs8On0H/OPE1c1Jb1oxoaXubnZn7GWXCnwuUPTmReAVms2e+A01dSNazh3HzJzCBQbsKfLBk41WKT241BgnVgGTLr7ZhWTkPiLYAgo853GBzybaAAoZ0prM9gRa/7624/4nYK2SzsSk5CJto= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mx.zohomail.com with SMTPS id 1660194014547954.4285124380896; Wed, 10 Aug 2022 22:00:14 -0700 (PDT) Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-595-ugxcaJB4PbCKDgxhYKDYJg-1; Thu, 11 Aug 2022 01:00:10 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BE8B9804196; Thu, 11 Aug 2022 05:00:07 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (unknown [10.30.29.100]) by smtp.corp.redhat.com (Postfix) with ESMTP id AD39C40CFD0A; Thu, 11 Aug 2022 05:00:05 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (localhost [IPv6:::1]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 819381946A53; Thu, 11 Aug 2022 05:00:05 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 62D051946A41 for ; Thu, 11 Aug 2022 05:00:04 +0000 (UTC) Received: by smtp.corp.redhat.com (Postfix) id 49CFE9C07E; Thu, 11 Aug 2022 05:00:04 +0000 (UTC) Received: from vhost3.router.laine.org (unknown [10.2.16.13]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0E86B946B9; Thu, 11 Aug 2022 05:00:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1660194013; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=uHt27zNSHgli2nAVhmHy6M7y+zqA5NVga3SDaEEZeAE=; b=ianad1qyHrGuN2OFwAKww1umBO7KkAQjyKXRQ+ZVWtnlZDeS7l1OPxJsyo5bLJT8B3qNFt Y7xmE5zvf+u9FxD/biNx7tE+Me7iQPHWUMIsUXGfMFYcMMbmHR8MNG4qBitq+xmTcZe40L S/FEFNu2rt62FVn/SQsST8M8Zg+Xuuc= X-MC-Unique: ugxcaJB4PbCKDgxhYKDYJg-1 X-Original-To: libvir-list@listman.corp.redhat.com From: Laine Stump To: libvir-list@redhat.com Subject: [PATCH v2] util: basic support for VFIO variant drivers Date: Thu, 11 Aug 2022 01:00:03 -0400 Message-Id: <20220811050003.154291-1-laine@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Jason Gunthorpe , Erik Skultety Errors-To: libvir-list-bounces@redhat.com Sender: "libvir-list" X-Scanned-By: MIMEDefang 2.84 on 10.11.54.1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1660194015471100001 Content-Type: text/plain; charset="utf-8"; x-default="true" Before a PCI device can be assigned to a guest with VFIO, that device must be bound to the vfio-pci driver rather than to the device's normal driver. The vfio-pci driver provides APIs that permit QEMU to perform all the necessary operations to make the device accessible to the guest. There has been kernel work recently to support vendor/device-specific VFIO variant drivers that provide the basic vfio-pci driver functionality while adding support for device-specific operations (for example these device-specific drivers are planned to support live migration of certain devices). All that will be needed to make this functionality available will be to bind the new vendor-specific driver to the device (rather than the generic vfio-pci driver, which will continue to work just without the extra functionality). But until now libvirt has required that all PCI devices being assigned to a guest with VFIO specifically have the "vfio-pci" driver bound to the device. So even if the user manually binds a shiny new vendor-specific vfio variant driver to the device (and puts "managed=3D'no'" in the config to prevent libvirt from changing the binding), libvirt will just fail during startup of the guest (or during hotplug) because the driver bound to the device isn't exactly "vfio-pci". This patch loosens that restriction a bit - rather than requiring that the device be bound to "vfio-pci", it also checks if the drivername contains the string "vfio" at all, and in this case allows the operation to continue. If the driver is in fact a VFIO variant, then the assignment will succeed, but if it is not a VFIO variant then QEMU will fail (and report the error back to libvirt). In the near future (possibly by kernel 6.0) there will be a formal method of identifying a VFIO variant driver by looking in sysfs; in the meantime the inexact, but simple, method in this patch will allow users of the few existing VFIO variant drivers (and developers of new VFIO variant drivers) to use their new drivers without needing to remove libvirt from their setup - they can simply pre-bind the device to the new driver, then use "managed=3D'no'" in their libvirt config. NB: this patch does *not* handle automatically determining the proper vendor-specific driver and binding to it in the case of "managed=3D'yes'". This will be implemented later when there is a widely available driver / device combo we can use for testing. Signed-off-by: Laine Stump --- V1 here: https://listman.redhat.com/archives/libvir-list/2022-August/233327= .html Change in V2: V1 used the output of modinfo to look for "vfio_pci" as an alias for a driver to see if it was a VFIO variant driver. As a result of discussion of V1, V2 is much simpler - it just assumes that any driver with "vfio" in the name is a VFIO variant. This is okay because 1) QEMU will still catch it and libvirt will properly log the error if the driver isn't actually a VFIO variant, and 2) it's a temporary situation, just to enable use of VFIO variant drivers with libvirt until a standard method of detecting this is added to sysfs (which, according to the discussion of V1, is coming in the near future). (NB: I did implement checking of /lib/modules/`uname -r`/modules.alias as suggested by Erik, but it turned out that this caused the unit tests to call uname(3) and open the modules.alias file on the test host - for a proper unit test I would have also needed to mock these two functions, and it seemed like too much complexity for a temporary workaround. I've implemented Jason's suggestion here (accept any driver with "vfio" in the name), which is similar to danpb's suggestion (accept specifically the two drivers that are already in the upstream kernel), but will also allow for new drivers that may be under development.) src/hypervisor/virhostdev.c | 26 ++++--------- src/util/virpci.c | 76 ++++++++++++++++++++++++++++++++++--- src/util/virpci.h | 3 ++ 3 files changed, 82 insertions(+), 23 deletions(-) diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c index c0ce867596..15b35fa75e 100644 --- a/src/hypervisor/virhostdev.c +++ b/src/hypervisor/virhostdev.c @@ -747,9 +747,8 @@ virHostdevPreparePCIDevicesImpl(virHostdevManager *mgr, mgr->inactivePCIHostdevs) < 0) goto reattachdevs; } else { - g_autofree char *driverPath =3D NULL; - g_autofree char *driverName =3D NULL; - int stub; + g_autofree char *drvName =3D NULL; + virPCIStubDriver drvType; =20 /* Unmanaged devices should already have been marked as * inactive: if that's the case, we can simply move on */ @@ -769,18 +768,14 @@ virHostdevPreparePCIDevicesImpl(virHostdevManager *mg= r, * information about active / inactive device across * daemon restarts has been implemented */ =20 - if (virPCIDeviceGetDriverPathAndName(pci, - &driverPath, &driverName)= < 0) + if (virPCIDeviceGetDriverNameAndType(pci, &drvName, &drvType) = < 0) goto reattachdevs; =20 - stub =3D virPCIStubDriverTypeFromString(driverName); - - if (stub > VIR_PCI_STUB_DRIVER_NONE && - stub < VIR_PCI_STUB_DRIVER_LAST) { + if (drvType > VIR_PCI_STUB_DRIVER_NONE) { =20 /* The device is bound to a known stub driver: store this * information and add a copy to the inactive list */ - virPCIDeviceSetStubDriver(pci, stub); + virPCIDeviceSetStubDriver(pci, drvType); =20 VIR_DEBUG("Adding PCI device %s to inactive list", virPCIDeviceGetName(pci)); @@ -2292,18 +2287,13 @@ virHostdevPrepareOneNVMeDevice(virHostdevManager *h= ostdev_mgr, /* Let's check if all PCI devices are NVMe disks. */ for (i =3D 0; i < virPCIDeviceListCount(pciDevices); i++) { virPCIDevice *pci =3D virPCIDeviceListGet(pciDevices, i); - g_autofree char *drvPath =3D NULL; g_autofree char *drvName =3D NULL; - int stub =3D VIR_PCI_STUB_DRIVER_NONE; + virPCIStubDriver drvType; =20 - if (virPCIDeviceGetDriverPathAndName(pci, &drvPath, &drvName) < 0) + if (virPCIDeviceGetDriverNameAndType(pci, &drvName, &drvType) < 0) goto cleanup; =20 - if (drvName) - stub =3D virPCIStubDriverTypeFromString(drvName); - - if (stub =3D=3D VIR_PCI_STUB_DRIVER_VFIO || - STREQ_NULLABLE(drvName, "nvme")) + if (drvType =3D=3D VIR_PCI_STUB_DRIVER_VFIO || STREQ_NULLABLE(drvN= ame, "nvme")) continue; =20 VIR_WARN("Suspicious NVMe disk assignment. PCI device " diff --git a/src/util/virpci.c b/src/util/virpci.c index 7800966963..51ccf4d9fd 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -277,6 +277,71 @@ virPCIDeviceGetDriverPathAndName(virPCIDevice *dev, ch= ar **path, char **name) } =20 =20 +/** + * virPCIDeviceGetDriverNameAndType: + * @dev: virPCIDevice object to examine + * @drvName: returns name of driver bound to this device (if any) + * @drvType: returns type of driver if it is a known stub driver type + * + * Find the name of the driver bound to @dev (if any) and the type of + * the driver if it is a known/recognized "stub" driver (based on the + * driver name). + * + * There are vfio "variant" drivers that provide all the basic + * functionality of the standard vfio-pci driver as well as additional + * stuff. There is a plan to add info to sysfs that will allow easily + * determining if a driver is a vfio variant driver, but that sysfs + * entry isn't yet available. In the meantime as a workaround so that + * the few existing vfio variant drivers can be used with libvirt, and + * so that driver developers can test their new vfio variant drivers + * without needing to bypass libvirt, we also check if the driver name + * contains the string "vfio"; if it does, then we consider this drier + * as type VFIO. This can lead to false positives, but that isn't a + * horrible thing, because the problem will still be caught by QEMU as + * soon as libvirt makes the request to attach the device. + * + * Return 0 on success, -1 on failure. If -1 is returned, then an error + * message has been logged. + */ +int +virPCIDeviceGetDriverNameAndType(virPCIDevice *dev, + char **drvName, + virPCIStubDriver *drvType) +{ + g_autofree char *drvPath =3D NULL; + int tmpType; + + if (virPCIDeviceGetDriverPathAndName(dev, &drvPath, drvName) < 0) + return -1; + + if (!*drvName) { + *drvType =3D VIR_PCI_STUB_DRIVER_NONE; + return 0; + } + + tmpType =3D virPCIStubDriverTypeFromString(*drvName); + + if (tmpType > VIR_PCI_STUB_DRIVER_NONE) { + *drvType =3D tmpType; + return 0; /* exact match of a known driver name (or no name) */ + } + + /* Check if the drivername contains "vfio" and count as a VFIO + * driver if so - see above for explanation. + */ + + if (strstr(*drvName, "vfio")) { + VIR_DEBUG("Driver %s is a vfio_pci driver", *drvName); + *drvType =3D VIR_PCI_STUB_DRIVER_VFIO; + } else { + VIR_DEBUG("Driver %s is NOT a vfio_pci driver", *drvName); + *drvType =3D VIR_PCI_STUB_DRIVER_NONE; + } + + return 0; +} + + static int virPCIDeviceConfigOpenInternal(virPCIDevice *dev, bool readonly, bool fata= l) { @@ -1004,8 +1069,8 @@ virPCIDeviceReset(virPCIDevice *dev, virPCIDeviceList *activeDevs, virPCIDeviceList *inactiveDevs) { - g_autofree char *drvPath =3D NULL; g_autofree char *drvName =3D NULL; + virPCIStubDriver drvType; int ret =3D -1; int fd =3D -1; int hdrType =3D -1; @@ -1032,15 +1097,16 @@ virPCIDeviceReset(virPCIDevice *dev, * reset it whenever appropriate, so doing it ourselves would just * be redundant. */ - if (virPCIDeviceGetDriverPathAndName(dev, &drvPath, &drvName) < 0) + if (virPCIDeviceGetDriverNameAndType(dev, &drvName, &drvType) < 0) goto cleanup; =20 - if (virPCIStubDriverTypeFromString(drvName) =3D=3D VIR_PCI_STUB_DRIVER= _VFIO) { - VIR_DEBUG("Device %s is bound to vfio-pci - skip reset", - dev->name); + if (drvType =3D=3D VIR_PCI_STUB_DRIVER_VFIO) { + + VIR_DEBUG("Device %s is bound to %s - skip reset", dev->name, drvN= ame); ret =3D 0; goto cleanup; } + VIR_DEBUG("Resetting device %s", dev->name); =20 if ((fd =3D virPCIDeviceConfigOpenWrite(dev)) < 0) diff --git a/src/util/virpci.h b/src/util/virpci.h index 4d9193f24e..0532b90f90 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -280,6 +280,9 @@ int virPCIDeviceRebind(virPCIDevice *dev); int virPCIDeviceGetDriverPathAndName(virPCIDevice *dev, char **path, char **name); +int virPCIDeviceGetDriverNameAndType(virPCIDevice *dev, + char **drvName, + virPCIStubDriver *drvType); =20 int virPCIDeviceIsPCIExpress(virPCIDevice *dev); int virPCIDeviceHasPCIExpressLink(virPCIDevice *dev); --=20 2.37.1