[libvirt PATCH v3 0/8] Support for VFIO variant drivers, Part 1

Laine Stump posted 8 patches 8 months, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20230821193258.520859-1-laine@redhat.com
src/hypervisor/domain_driver.c |   9 +-
src/hypervisor/domain_driver.h |   2 +
src/hypervisor/virhostdev.c    |  35 +++-----
src/libvirt_private.syms       |   9 +-
src/libxl/libxl_driver.c       |   3 +-
src/qemu/qemu_driver.c         |  37 ++++----
src/util/virnvme.c             |   2 +-
src/util/virpci.c              | 156 +++++++++++++++++++++++++--------
src/util/virpci.h              |  18 ++--
tests/virhostdevtest.c         |   2 +-
tests/virpcitest.c             |  10 +--
11 files changed, 185 insertions(+), 98 deletions(-)
[libvirt PATCH v3 0/8] Support for VFIO variant drivers, Part 1
Posted by Laine Stump 8 months, 2 weeks ago
A "VFIO variant" driver is a kernel driver for a device that supports
all the APIs of the basic vfio-pci driver (which enables assigning a
host PCI device to a QEMU guest) plus "some extra stuff" (e.g. to
enable things like saving/restoring device state in order to support
live migration.)

Way back last year I posted a couple attempts to support VFIO variant
drivers; here is V2 (along with a later followup discussion from a
couple months ago):

https://listman.redhat.com/archives/libvir-list/2022-August/233661.html
https://listman.redhat.com/archives/libvir-list/2023-May/240108.html

The mlx5-vfio-pci driver has now been upstream for quite awhile (and
even in the downstream Fedora 38 kernel, for example), as are the
sysfs bits that allow us to determine whether or not a driver is a
VFIO variant, and I've updated the patch(es) to use this.

I've also been working on auto-binding to the "best-match" VFIO
variant driver based on comparing the device's modalias file in sysfs
to the contents of the kernel's modules.alias file, but that isn't
quite ready (partly code that isn't yet working, but also partly
indecision about exactly where in the XML to put the driver name when
it is specified; I won't take up more space here with that though).

In the meantime, there are people who want to use the mlx5-vfio-pci
driver (and Cedric Le Goater also has written vfio-pci-igbvf and
vfio-pci-e1000e drivers (which area very useful for testing), although
I don't think he has posted them anywhere yet), so I would like to get
the basic patches here merged in upstream now while I continue working
on "Part 2".

These patches provide two improvements that make testing/using VFIO
drivers much more convenient:

1) The specific driver can be given in the virsh nodedev-detach
command (or the virNodeDeviceDetachFlags() API call), e.g.:

    virsh nodedev-detach pci_0000_04_11_5 --driver vfio-pci-igbvf

2) If the <hostdev> (or "<interface> ... <type='hostdev'/>" has
"managed='no'", then libvirt will recognize any VFIO variant driver
(rather than the current behavior of rejecting anything that isn't
exactly "vfio-pci")

With these two capabilities, it's simple and straightforward to bind a
device to a VFIO variant driver, and then start a guest that uses that
device.

Change in V2:

* complete remake, more refactoring

* use existence of "vfio-dev" subdirectory of device directory in
  sysfs to determine whether the currently-bound driver is a vfio
  variant.

* support binding to a user-specified driver during nodedev-detach,
  rather than only supporting vfio-pci.

Laine Stump (8):
  util: use "stubDriverType" instead of just "stubDriver"
  util: add stub driver name to virPCIDevice object
  util: rename virPCIDeviceGetDriverPathAndName
  util: permit existing binding to VFIO variant driver
  util: probe stub driver from within function that binds to stub driver
  util: honor stubDriverName when probing/binding stub driver for a
    device
  node_device: support binding other drivers with
    virNodeDeviceDetachFlags()
  qemu: turn two multiline log messages into single line

 src/hypervisor/domain_driver.c |   9 +-
 src/hypervisor/domain_driver.h |   2 +
 src/hypervisor/virhostdev.c    |  35 +++-----
 src/libvirt_private.syms       |   9 +-
 src/libxl/libxl_driver.c       |   3 +-
 src/qemu/qemu_driver.c         |  37 ++++----
 src/util/virnvme.c             |   2 +-
 src/util/virpci.c              | 156 +++++++++++++++++++++++++--------
 src/util/virpci.h              |  18 ++--
 tests/virhostdevtest.c         |   2 +-
 tests/virpcitest.c             |  10 +--
 11 files changed, 185 insertions(+), 98 deletions(-)

-- 
2.41.0
Re: [libvirt PATCH v3 0/8] Support for VFIO variant drivers, Part 1
Posted by Joao Martins 8 months, 2 weeks ago
On 21/08/2023 20:32, Laine Stump wrote:
> A "VFIO variant" driver is a kernel driver for a device that supports
> all the APIs of the basic vfio-pci driver (which enables assigning a
> host PCI device to a QEMU guest) plus "some extra stuff" (e.g. to
> enable things like saving/restoring device state in order to support
> live migration.)
> 
> Way back last year I posted a couple attempts to support VFIO variant
> drivers; here is V2 (along with a later followup discussion from a
> couple months ago):
> 
> https://listman.redhat.com/archives/libvir-list/2022-August/233661.html
> https://listman.redhat.com/archives/libvir-list/2023-May/240108.html
> 
> The mlx5-vfio-pci driver has now been upstream for quite awhile (and
> even in the downstream Fedora 38 kernel, for example), as are the
> sysfs bits that allow us to determine whether or not a driver is a
> VFIO variant, and I've updated the patch(es) to use this.
> 
> I've also been working on auto-binding to the "best-match" VFIO
> variant driver based on comparing the device's modalias file in sysfs
> to the contents of the kernel's modules.alias file, but that isn't
> quite ready (partly code that isn't yet working, but also partly
> indecision about exactly where in the XML to put the driver name when
> it is specified; I won't take up more space here with that though).
> 
Don't we have a:

<interface type='hostdev' managed='yes'>
   ...
   <driver name='vfio'>
   ...
</interface/

... that could be re-used somehow to specify the VFIO variant driver of choice?
Somewhat similar to the nodedev-detach equivalent? OTOH, it would only cover
networking hostdevs... :/

> In the meantime, there are people who want to use the mlx5-vfio-pci
> driver (and Cedric Le Goater also has written vfio-pci-igbvf and
> vfio-pci-e1000e drivers (which area very useful for testing), although
> I don't think he has posted them anywhere yet), 

Interesting, e1000e and igbvf have quiesce/resume/dirty-tracking commands?
Sounds too old to have :D unless of course this is Qemu-created commands and not
actual IGB/E1000e hardware doing those

> so I would like to get
> the basic patches here merged in upstream now while I continue working
> on "Part 2".
> 
> These patches provide two improvements that make testing/using VFIO
> drivers much more convenient:
> 
> 1) The specific driver can be given in the virsh nodedev-detach
> command (or the virNodeDeviceDetachFlags() API call), e.g.:
> 
>     virsh nodedev-detach pci_0000_04_11_5 --driver vfio-pci-igbvf
> 
> 2) If the <hostdev> (or "<interface> ... <type='hostdev'/>" has
> "managed='no'", then libvirt will recognize any VFIO variant driver
> (rather than the current behavior of rejecting anything that isn't
> exactly "vfio-pci")
> 
> With these two capabilities, it's simple and straightforward to bind a
> device to a VFIO variant driver, and then start a guest that uses that
> device.
> 

Nice

Provided the initial use of VFIO variant drivers have been to support live
migration, should the hostdev blocker be lifted too *just in case
it is simple addition of course* ?

In vfio 8.1 it is enabled by default, so there isn't much else to do other than
unblock migration with hostdevs. Something like below is <interface> specific;
which works but it is likely too specific:

diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
index d21551ab07d9..6edc8583c539 100644
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@@ -1316,13 +1316,15 @@ qemuMigrationSrcIsAllowedHostdev(const virDomainDef *def)
                     continue;
                 }

-                /* all other PCI hostdevs can't be migrated */
+                /* some PCI hostdevs can't be migrated */
                 if (hostdev->parentnet) {
-                    virDomainNetType actualType =
virDomainNetGetActualType(hostdev->parentnet);
-
-                    virReportError(VIR_ERR_OPERATION_UNSUPPORTED,
-                                   _("cannot migrate a domain with <interface
type='%1$s'>"),
-                                   virDomainNetTypeToString(actualType));
+                    /*
+                     * Some VFIO network devices can be migrated if vendor
+                     * VFIO driver is used e.g. via mlx5-vfio-pci.
+                     * We leave it up to the live migration blocker in Qemu to
+                     * stop us from proceeding.
+                     */
+                    continue;
                 } else {
                     virReportError(VIR_ERR_OPERATION_UNSUPPORTED,
                                    _("cannot migrate a domain with <hostdev
mode='subsystem' type='%1$s'>"),


Or even something like this generic to the hostdev as long as it's assigned to a
variant driver. For any off shot case that isn't a migration vfio driver then
Qemu will fail for us as it will fail to probe the right things e.g. (below is
totally untested, it is just for illustration purposes, we can probably do away
with just VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO and let Qemu block migration for us?)

diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
index d21551ab07d9..1f29493aa0c2 100644
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@@ -1301,6 +1301,8 @@ qemuMigrationSrcIsAllowedHostdev(const virDomainDef *def)
                 return false;

             case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI:
+                g_autoptr(virPCIDevice) pci = NULL;
+
                 /*
                  * if the device has a <teaming type='transient'>
                  * element, then migration *is* allowed because the
@@ -1316,6 +1318,22 @@ qemuMigrationSrcIsAllowedHostdev(const virDomainDef *def)
                     continue;
                 }

+                if ((hostdev->source.subsys.u.pci.backend ==
VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO) &&
+                    !virHostdevGetPCIHostDevice(hostdev, &pci) && pci) {
+                    g_autofree char *drvPath = NULL;
+                    g_autofree char *drvName = NULL;
+
+                    /*
+                     * Some VFIO network devices can be migrated if non standard
+                     * VFIO driver is used. We leave it up to the live migration
+                     * blocker in Qemu to stop us from proceeding.
+                     */
+                    if (!virPCIDeviceGetCurrentDriverPathAndName(pci, &drvPath,
+                                                                 &drvName) &&
+                        !STREQ(drvName, "vfio-pci"))
+                            continue;
+                }
+
                 /* all other PCI hostdevs can't be migrated */
                 if (hostdev->parentnet) {
                     virDomainNetType actualType =
virDomainNetGetActualType(hostdev->parentnet);
Re: [libvirt PATCH v3 0/8] Support for VFIO variant drivers, Part 1
Posted by Michal Prívozník 8 months, 2 weeks ago
On 8/21/23 21:32, Laine Stump wrote:
> A "VFIO variant" driver is a kernel driver for a device that supports
> all the APIs of the basic vfio-pci driver (which enables assigning a
> host PCI device to a QEMU guest) plus "some extra stuff" (e.g. to
> enable things like saving/restoring device state in order to support
> live migration.)
> 
> Way back last year I posted a couple attempts to support VFIO variant
> drivers; here is V2 (along with a later followup discussion from a
> couple months ago):
> 
> https://listman.redhat.com/archives/libvir-list/2022-August/233661.html
> https://listman.redhat.com/archives/libvir-list/2023-May/240108.html
> 
> The mlx5-vfio-pci driver has now been upstream for quite awhile (and
> even in the downstream Fedora 38 kernel, for example), as are the
> sysfs bits that allow us to determine whether or not a driver is a
> VFIO variant, and I've updated the patch(es) to use this.
> 
> I've also been working on auto-binding to the "best-match" VFIO
> variant driver based on comparing the device's modalias file in sysfs
> to the contents of the kernel's modules.alias file, but that isn't
> quite ready (partly code that isn't yet working, but also partly
> indecision about exactly where in the XML to put the driver name when
> it is specified; I won't take up more space here with that though).
> 
> In the meantime, there are people who want to use the mlx5-vfio-pci
> driver (and Cedric Le Goater also has written vfio-pci-igbvf and
> vfio-pci-e1000e drivers (which area very useful for testing), although
> I don't think he has posted them anywhere yet), so I would like to get
> the basic patches here merged in upstream now while I continue working
> on "Part 2".
> 
> These patches provide two improvements that make testing/using VFIO
> drivers much more convenient:
> 
> 1) The specific driver can be given in the virsh nodedev-detach
> command (or the virNodeDeviceDetachFlags() API call), e.g.:
> 
>     virsh nodedev-detach pci_0000_04_11_5 --driver vfio-pci-igbvf
> 
> 2) If the <hostdev> (or "<interface> ... <type='hostdev'/>" has
> "managed='no'", then libvirt will recognize any VFIO variant driver
> (rather than the current behavior of rejecting anything that isn't
> exactly "vfio-pci")
> 
> With these two capabilities, it's simple and straightforward to bind a
> device to a VFIO variant driver, and then start a guest that uses that
> device.
> 
> Change in V2:
> 
> * complete remake, more refactoring
> 
> * use existence of "vfio-dev" subdirectory of device directory in
>   sysfs to determine whether the currently-bound driver is a vfio
>   variant.
> 
> * support binding to a user-specified driver during nodedev-detach,
>   rather than only supporting vfio-pci.
> 
> Laine Stump (8):
>   util: use "stubDriverType" instead of just "stubDriver"
>   util: add stub driver name to virPCIDevice object
>   util: rename virPCIDeviceGetDriverPathAndName
>   util: permit existing binding to VFIO variant driver
>   util: probe stub driver from within function that binds to stub driver
>   util: honor stubDriverName when probing/binding stub driver for a
>     device
>   node_device: support binding other drivers with
>     virNodeDeviceDetachFlags()
>   qemu: turn two multiline log messages into single line
> 
>  src/hypervisor/domain_driver.c |   9 +-
>  src/hypervisor/domain_driver.h |   2 +
>  src/hypervisor/virhostdev.c    |  35 +++-----
>  src/libvirt_private.syms       |   9 +-
>  src/libxl/libxl_driver.c       |   3 +-
>  src/qemu/qemu_driver.c         |  37 ++++----
>  src/util/virnvme.c             |   2 +-
>  src/util/virpci.c              | 156 +++++++++++++++++++++++++--------
>  src/util/virpci.h              |  18 ++--
>  tests/virhostdevtest.c         |   2 +-
>  tests/virpcitest.c             |  10 +--
>  11 files changed, 185 insertions(+), 98 deletions(-)
> 

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>

Michal