create mode 100644 tests/nodedevmdevctldata/mdevctl-create.argv create mode 100644 tests/nodedevmdevctldata/mdevctl-list-defined.argv create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.json create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.out.xml create mode 100644 tests/nodedevmdevctldata/mdevctl-undefine.argv
This patch series follows the previously-merged series which added support for transient mediated devices. This series expands mdev support to include persistent device definitions. Again, it relies on mdevctl as the backend. It follows the common libvirt pattern of APIs by adding the following new APIs for node devices: - virNodeDeviceDefineXML() - defines a persistent device - virNodeDeviceUndefine() - undefines a persistent device - virNodeDeviceCreate() - starts a previously-defined device It also adds virsh commands mapping to these new APIs: nodedev-define, nodedev-undefine, and nodedev-start. Since we rely on mdevctl for the definition of mediated devices, we need a way to stay up-to-date with devices that are defined by mdevctl (outside of libvirt). The method for staying up-to-date is currently a little bit crude due to the fact that mdevctl does not emit any events when new devices are added or removed. As a workaround, we create a file monitor for the mdevctl config directory and re-query mdevctl when we detect changes within that directory. In the future, mdevctl may introduce a more elegant solution. Changes in v4: - rebase to git master - switch to throwaway thread for querying mdevctl - fixed a bug when removing devices because I was accidentally using virHashForEach() instead of virHashForeachSafe() - use DEFINED/UNDEFINED events instead of STARTED/STOPPED events - changes related to merging information about mdev devices from both udev a= nd mdevctl: - Re-used the same function to copy extra data from mdevctl regardless of whether we're processing a udev event or a mdevctl event (recommended by Erik). This results in slightly more complex handling of the object lifetimes (see patch 9), but it consolidates some code. - nodeDeviceDefCopyFromMdevctl() previously only copied the data that was unique to mdevctl and didn't exist in udev. It now copies additional data (possibly overwriting some udev). This solves a problem where a device = is defined but not active (i.e. we have not gotten any data from udev), and then changed (e.g. somebody calls 'mdevctl modify' to change the mdev type), but libvirt was not updating to the new definition. - fix a bug where we were mistakenly emitting 'update' events for devices th= at had not changed - Added the ability to specify a uuid for an mdev via device XML. - split some commits into multiple patches - updated new API version info to 7.1.0 - Fixed a bug reported by Yan Fu which hangs the client when attempting to destroy a nodedev that is in use by an active vm. See https://www.redhat.com/archives/libvir-list/2021-February/msg00116.html for solution suggested by Alex. - numerous smaller fixes from review findings changes in v3: - streamlined tests -- removed some unnecessary duplication - split out patch to factor out node device name generation function - split nodeDeviceParseMdevctlChildDevice() into a separate function - added follow-up patch to remove space-padded alignment in header - refactored the mdevctl update handling significantly: - no longer a separate persistent thread that gets signaled by a timer - now piggybacks onto the existing udev thread and signals the thread in t= he same way that the udev event does. - Daniel suggested spawning a throw-away thread to handle mdevctl updates, but that introduces the complexity of possibly serializing multiple throw-away threads (e.g. if we get an 'created' event followed immediate= ly by a 'deleted' event, two threads may be spawned and we'd need to ensure they are properly ordered) - added virNodeDeviceObjListForEach() and virNodeDeviceObjListRemoveLocked() to simplify removing devices that are removed from mdevctl. - coding style fixes - NOTE: per Erik's request, I experimented with changing the way that mdevctl commands were generated and tested (e.g. introducing something like virMdevctlGetCommand(def, MDEVCTL_COMMAND_<SUBCOMMAND>, ...)), but it was too invasive and awkward and didn't seem worthwhile Changes in v2: - rebase to latest git master Jonathon Jongsma (25): tests: remove extra trailing semicolon nodedev: introduce concept of 'active' node devices nodedev: Add ability to filter by active state nodedev: expose internal helper for naming devices nodedev: add ability to parse mdevs from mdevctl nodedev: add ability to list defined mdevs nodedev: add persistence to virNodeDeviceObj nodedev: add DEFINED/UNDEFINED lifecycle events nodedev: add mdevctl devices to node device list nodedev: add helper functions to remove node devices nodedev: handle mdevs that disappear from mdevctl nodedev: Refresh mdev devices when changes are detected nodedev: add function to generate mdevctl define command api: add virNodeDeviceDefineXML() virsh: Add --inactive, --all to nodedev-list virsh: add nodedev-define command nodedev: refactor tests to support mdev undefine api: add virNodeDeviceUndefine() virsh: Factor out function to find node device virsh: add nodedev-undefine command api: add virNodeDeviceCreate() virsh: add "nodedev-start" command nodedev: add <uuid> element to mdev caps nodedev: add ability to specify UUID for new mdevs nodedev: fix hang when destroying an mdev in use docs/schemas/nodedev.rng | 43 +- examples/c/misc/event-test.c | 4 + include/libvirt/libvirt-nodedev.h | 19 +- src/access/viraccessperm.c | 2 +- src/access/viraccessperm.h | 6 + src/conf/node_device_conf.c | 15 + src/conf/node_device_conf.h | 8 + src/conf/virnodedeviceobj.c | 149 +++- src/conf/virnodedeviceobj.h | 25 + src/driver-nodedev.h | 14 + src/libvirt-nodedev.c | 115 +++ src/libvirt_private.syms | 6 + src/libvirt_public.syms | 7 + src/node_device/node_device_driver.c | 657 +++++++++++++++++- src/node_device/node_device_driver.h | 41 ++ src/node_device/node_device_udev.c | 214 +++++- src/remote/remote_driver.c | 3 + src/remote/remote_protocol.x | 40 +- src/remote_protocol-structs | 16 + src/rpc/gendispatch.pl | 1 + ...19_36ea_4111_8f0a_8c9a70e21366-define.argv | 2 + ...19_36ea_4111_8f0a_8c9a70e21366-define.json | 1 + ...019_36ea_4111_8f0a_8c9a70e21366-start.argv | 3 +- ...39_495e_4243_ad9f_beb3f14c23d9-define.argv | 1 + ...39_495e_4243_ad9f_beb3f14c23d9-define.json | 1 + ...16_1ca8_49ac_b176_871d16c13076-define.argv | 1 + ...16_1ca8_49ac_b176_871d16c13076-define.json | 1 + tests/nodedevmdevctldata/mdevctl-create.argv | 1 + .../mdevctl-list-defined.argv | 1 + .../mdevctl-list-multiple.json | 59 ++ .../mdevctl-list-multiple.out.xml | 43 ++ .../nodedevmdevctldata/mdevctl-undefine.argv | 1 + tests/nodedevmdevctltest.c | 222 +++++- ...v_d069d019_36ea_4111_8f0a_8c9a70e21366.xml | 1 + tools/virsh-nodedev.c | 268 ++++++- 35 files changed, 1853 insertions(+), 138 deletions(-) create mode 100644 tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9= a70e21366-define.argv create mode 100644 tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9= a70e21366-define.json create mode 100644 tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb= 3f14c23d9-define.argv create mode 100644 tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb= 3f14c23d9-define.json create mode 100644 tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871= d16c13076-define.argv create mode 100644 tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871= d16c13076-define.json create mode 100644 tests/nodedevmdevctldata/mdevctl-create.argv create mode 100644 tests/nodedevmdevctldata/mdevctl-list-defined.argv create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.json create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.out.xml create mode 100644 tests/nodedevmdevctldata/mdevctl-undefine.argv --=20 2.26.2
On Wed, Feb 03, 2021 at 11:38:44AM -0600, Jonathon Jongsma wrote: > This patch series follows the previously-merged series which added support for > transient mediated devices. This series expands mdev support to include > persistent device definitions. Again, it relies on mdevctl as the backend. > > It follows the common libvirt pattern of APIs by adding the following new APIs > for node devices: > - virNodeDeviceDefineXML() - defines a persistent device > - virNodeDeviceUndefine() - undefines a persistent device > - virNodeDeviceCreate() - starts a previously-defined device So, I tested the patches on Friday and have a few notes: - all of the driver implementations mentioned ^above need to pass an error buffer to the respective mdevctl wrapper, because the generic error "Unable to XZY mediated device" doesn't help at all. Kind of like with QEMU errors, we need to extract it from an error buffer (you already have input and output, so just add another one) --> this relates to patches 14,18,21 - trying to undefine an active mdev reports an error in libvirt --> this is both inconsistent with the rest of libvirt's subsystems and there's also no reason for it since mdevctl supports it --> we need to support transient devices as well - trying to define the following XML hangs for 5s and then fails: <device> <parent>pci_0000_06_00_0</parent> <driver> <name>vfio_mdev</name> </driver> <capability type='mdev'> <type id='nvidia-11'/> <uuid>d41787d2-2e0e-4021-8ed2-b6f1ef305a9f</uuid> </capability> </device> -> I debugged this on Friday evening and for some reason driver->devs hash table of devices was NULL going through the nodeDeviceFindNewMediatedDevice call, but I haven't managed to figure out why it was NULL, listing devices worked just fine - I also prepared several adjustments to how we define the mdevctl wrappers + some test adjustments which I wanted to share with you, but I haven't tested those thoroughly since I was debugging why that XML couldn't be defined, I'll share it when we eliminate the underlying problem -> if you're wondering why I didn't just put it into the review, well, I figured sharing the code was more descriptive than if I used words - one thing that I didn't test with your patches are events > > It also adds virsh commands mapping to these new APIs: nodedev-define, > nodedev-undefine, and nodedev-start. > > Since we rely on mdevctl for the definition of mediated devices, we need a way > to stay up-to-date with devices that are defined by mdevctl (outside of > libvirt). The method for staying up-to-date is currently a little bit crude > due to the fact that mdevctl does not emit any events when new devices are > added or removed. As a workaround, we create a file monitor for the mdevctl > config directory and re-query mdevctl when we detect changes within that > directory. In the future, mdevctl may introduce a more elegant solution. > > Changes in v4: > - rebase to git master > - switch to throwaway thread for querying mdevctl > - fixed a bug when removing devices because I was accidentally using > virHashForEach() instead of virHashForeachSafe() > - use DEFINED/UNDEFINED events instead of STARTED/STOPPED events > - changes related to merging information about mdev devices from both udev a= > nd > mdevctl: > - Re-used the same function to copy extra data from mdevctl regardless of > whether we're processing a udev event or a mdevctl event (recommended by > Erik). This results in slightly more complex handling of the object > lifetimes (see patch 9), but it consolidates some code. > - nodeDeviceDefCopyFromMdevctl() previously only copied the data that was > unique to mdevctl and didn't exist in udev. It now copies additional data > (possibly overwriting some udev). This solves a problem where a device = > is > defined but not active (i.e. we have not gotten any data from udev), and > then changed (e.g. somebody calls 'mdevctl modify' to change the mdev > type), but libvirt was not updating to the new definition. > - fix a bug where we were mistakenly emitting 'update' events for devices th= > at > had not changed > - Added the ability to specify a uuid for an mdev via device XML. > - split some commits into multiple patches > - updated new API version info to 7.1.0 > - Fixed a bug reported by Yan Fu which hangs the client when attempting to > destroy a nodedev that is in use by an active vm. See > https://www.redhat.com/archives/libvir-list/2021-February/msg00116.html for > solution suggested by Alex. > - numerous smaller fixes from review findings > > changes in v3: > - streamlined tests -- removed some unnecessary duplication > - split out patch to factor out node device name generation function > - split nodeDeviceParseMdevctlChildDevice() into a separate function > - added follow-up patch to remove space-padded alignment in header > - refactored the mdevctl update handling significantly: > - no longer a separate persistent thread that gets signaled by a timer > - now piggybacks onto the existing udev thread and signals the thread in t= > he > same way that the udev event does. > - Daniel suggested spawning a throw-away thread to handle mdevctl updates, > but that introduces the complexity of possibly serializing multiple > throw-away threads (e.g. if we get an 'created' event followed immediate= > ly > by a 'deleted' event, two threads may be spawned and we'd need to ensure > they are properly ordered) > - added virNodeDeviceObjListForEach() and virNodeDeviceObjListRemoveLocked() > to simplify removing devices that are removed from mdevctl. > - coding style fixes > - NOTE: per Erik's request, I experimented with changing the way that mdevctl > commands were generated and tested (e.g. introducing something like > virMdevctlGetCommand(def, MDEVCTL_COMMAND_<SUBCOMMAND>, ...)), but it was > too invasive and awkward and didn't seem worthwhile I wrote the code for ^this and to be honest, it didn't seem that bad after all, like I said, I'll share the follow up patches when we know more about the issue I encountered. Erik
On Mon, 22 Feb 2021 11:08:12 +0100 Erik Skultety <eskultet@redhat.com> wrote: > On Wed, Feb 03, 2021 at 11:38:44AM -0600, Jonathon Jongsma wrote: > > This patch series follows the previously-merged series which added > > support for transient mediated devices. This series expands mdev > > support to include persistent device definitions. Again, it relies > > on mdevctl as the backend. > > > > It follows the common libvirt pattern of APIs by adding the > > following new APIs for node devices: > > - virNodeDeviceDefineXML() - defines a persistent device > > - virNodeDeviceUndefine() - undefines a persistent device > > - virNodeDeviceCreate() - starts a previously-defined device > > So, I tested the patches on Friday and have a few notes: > - all of the driver implementations mentioned ^above need to pass an > error buffer to the respective mdevctl wrapper, because the generic > error "Unable to XZY mediated device" doesn't help at all. Kind of > like with QEMU errors, we need to extract it from an error buffer > (you already have input and output, so just add another one) --> this > relates to patches 14,18,21 > > - trying to undefine an active mdev reports an error in libvirt > --> this is both inconsistent with the rest of libvirt's > subsystems and there's also no reason for it since mdevctl supports it > --> we need to support transient devices as well > > - trying to define the following XML hangs for 5s and then fails: > <device> > <parent>pci_0000_06_00_0</parent> > <driver> > <name>vfio_mdev</name> > </driver> > <capability type='mdev'> > <type id='nvidia-11'/> > <uuid>d41787d2-2e0e-4021-8ed2-b6f1ef305a9f</uuid> > </capability> > </device> I assume that you have a parent device that supports creating an mdev of this type? In other words, this was expected to succeed, right? > > -> I debugged this on Friday evening and for some reason > driver->devs hash table of devices was NULL going through the > nodeDeviceFindNewMediatedDevice call, but I haven't managed to figure > out why it was NULL, listing devices worked just fine So, I often get a 5s delay when trying to define a new mdev. But it always succeeds after the delay. The reason for the delay is that the device is generally not in our device list the first time we call nodeDeviceFindNewMediatedDevice(). It's usually not in our device list because the device is only added to the driver's device list after we get a notification from mdevctl (via monitoring the mdevctl config directory) and then we re-query mdevctl for the list of defined devices. Because mdevctl is not blazingly fast[1], and because we do the querying in a separate thread, the new device has usually not been discovered when we first call nodeDeviceFindNewMediatedDevice(). So we sleep for 5s before trying again. For udev devices, this 5s delay is usually not hit because nodeDeviceFindNewDevice() first calls 'udevadm settle' before checking the device list. This waits just long enough to ensure that pending udev events are handled. Unfortunately we don't have a similar "wait just long enough" command for mdevctl, so we almost always hit the 5s retry timeout. There are a couple of potential ways to avoid hitting this 5s timeout: 1. directly add a placeholder device to the driver's device list immediately after calling 'mdevctl define' so that it is guaranteed to be in the device list when we call nodeDeviceFindNewMediatedDevice(). We can then update that device definition with additional info when mdevctl is eventually queried. 2. don't wait 5s every time a device is not found. Instead, start with a small sleep timeout and increase it gradually until we hit the max timeout. In other words, instead of check; sleep(5); check; sleep(5); check; sleep(5); we could instead do something like: check; sleep(1); check; sleep(2); check; sleep(4); check; sleep(8)... [1]$ time mdevctl list --defined 49bf2346-6747-4ad6-be5a-adc2f0a10b5c 0000:00:02.0 i915-GVTg_V5_8 manual 4fcd3666-e58a-4c63-969c-bd616a158c0d 0000:00:02.0 i915-GVTg_V5_8 manual 5c152919-3a34-4338-b7c9-532f73fa5440 0000:00:02.0 i915-GVTg_V5_4 manual real 0m0.782s user 0m0.735s sys 0m0.056s > > - I also prepared several adjustments to how we define the mdevctl > wrappers + some test adjustments which I wanted to share with you, > but I haven't tested those thoroughly since I was debugging why that > XML couldn't be defined, I'll share it when we eliminate the > underlying problem -> if you're wondering why I didn't just put it > into the review, well, I figured sharing the code was more > descriptive than if I used words Would you like me to wait for this before sending a new series? Thanks, Jonathon
© 2016 - 2024 Red Hat, Inc.