[libvirt PATCH v4 00/25] Add support for persistent mediated devices

Jonathon Jongsma posted 25 patches 3 years, 2 months ago
Test syntax-check failed
Failed in applying to current master (apply log)
There is a newer version of this series
create mode 100644 tests/nodedevmdevctldata/mdevctl-create.argv
create mode 100644 tests/nodedevmdevctldata/mdevctl-list-defined.argv
create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.json
create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.out.xml
create mode 100644 tests/nodedevmdevctldata/mdevctl-undefine.argv
[libvirt PATCH v4 00/25] Add support for persistent mediated devices
Posted by Jonathon Jongsma 3 years, 2 months ago
This patch series follows the previously-merged series which added support for
transient mediated devices. This series expands mdev support to include
persistent device definitions. Again, it relies on mdevctl as the backend.

It follows the common libvirt pattern of APIs by adding the following new APIs
for node devices:
    - virNodeDeviceDefineXML() - defines a persistent device
    - virNodeDeviceUndefine() - undefines a persistent device
    - virNodeDeviceCreate() - starts a previously-defined device

It also adds virsh commands mapping to these new APIs: nodedev-define,
nodedev-undefine, and nodedev-start.

Since we rely on mdevctl for the definition of mediated devices, we need a way
to stay up-to-date with devices that are defined by mdevctl (outside of
libvirt).  The method for staying up-to-date is currently a little bit crude
due to the fact that mdevctl does not emit any events when new devices are
added or removed. As a workaround, we create a file monitor for the mdevctl
config directory and re-query mdevctl when we detect changes within that
directory. In the future, mdevctl may introduce a more elegant solution.

Changes in v4:
 - rebase to git master
 - switch to throwaway thread for querying mdevctl
 - fixed a bug when removing devices because I was accidentally using
   virHashForEach() instead of virHashForeachSafe()
 - use DEFINED/UNDEFINED events instead of STARTED/STOPPED events
 - changes related to merging information about mdev devices from both udev a=
nd
   mdevctl:
   - Re-used the same function to copy extra data from mdevctl regardless of
     whether we're processing a udev event or a mdevctl event (recommended by
     Erik). This results in slightly more complex handling of the object
     lifetimes (see patch 9), but it consolidates some code.
   - nodeDeviceDefCopyFromMdevctl() previously only copied the data that was
     unique to mdevctl and didn't exist in udev. It now copies additional data
     (possibly overwriting some udev).  This solves a problem where a device =
is
     defined but not active (i.e.  we have not gotten any data from udev), and
     then changed (e.g. somebody calls 'mdevctl modify' to change the mdev
     type), but libvirt was not updating to the new definition.
 - fix a bug where we were mistakenly emitting 'update' events for devices th=
at
   had not changed
 - Added the ability to specify a uuid for an mdev via device XML.
 - split some commits into multiple patches
 - updated new API version info to 7.1.0
 - Fixed a bug reported by Yan Fu which hangs the client when attempting to
   destroy a nodedev that is in use by an active vm. See
   https://www.redhat.com/archives/libvir-list/2021-February/msg00116.html for
   solution suggested by Alex.
 - numerous smaller fixes from review findings

changes in v3:
 - streamlined tests -- removed some unnecessary duplication
 - split out patch to factor out node device name generation function
 - split nodeDeviceParseMdevctlChildDevice() into a separate function
 - added follow-up patch to remove space-padded alignment in header
 - refactored the mdevctl update handling significantly:
   - no longer a separate persistent thread that gets signaled by a timer
   - now piggybacks onto the existing udev thread and signals the thread in t=
he
     same way that the udev event does.
   - Daniel suggested spawning a throw-away thread to handle mdevctl updates,
     but that introduces the complexity of possibly serializing multiple
     throw-away threads (e.g. if we get an 'created' event followed immediate=
ly
     by a 'deleted' event, two threads may be spawned and we'd need to ensure
     they are properly ordered)
 - added virNodeDeviceObjListForEach() and virNodeDeviceObjListRemoveLocked()
   to simplify removing devices that are removed from mdevctl.
 - coding style fixes
 - NOTE: per Erik's request, I experimented with changing the way that mdevctl
   commands were generated and tested (e.g. introducing something like
   virMdevctlGetCommand(def, MDEVCTL_COMMAND_<SUBCOMMAND>, ...)), but it was
   too invasive and awkward and didn't seem worthwhile

Changes in v2:
 - rebase to latest git master

Jonathon Jongsma (25):
  tests: remove extra trailing semicolon
  nodedev: introduce concept of 'active' node devices
  nodedev: Add ability to filter by active state
  nodedev: expose internal helper for naming devices
  nodedev: add ability to parse mdevs from mdevctl
  nodedev: add ability to list defined mdevs
  nodedev: add persistence to virNodeDeviceObj
  nodedev: add DEFINED/UNDEFINED lifecycle events
  nodedev: add mdevctl devices to node device list
  nodedev: add helper functions to remove node devices
  nodedev: handle mdevs that disappear from mdevctl
  nodedev: Refresh mdev devices when changes are detected
  nodedev: add function to generate mdevctl define command
  api: add virNodeDeviceDefineXML()
  virsh: Add --inactive, --all to nodedev-list
  virsh: add nodedev-define command
  nodedev: refactor tests to support mdev undefine
  api: add virNodeDeviceUndefine()
  virsh: Factor out function to find node device
  virsh: add nodedev-undefine command
  api: add virNodeDeviceCreate()
  virsh: add "nodedev-start" command
  nodedev: add <uuid> element to mdev caps
  nodedev: add ability to specify UUID for new mdevs
  nodedev: fix hang when destroying an mdev in use

 docs/schemas/nodedev.rng                      |  43 +-
 examples/c/misc/event-test.c                  |   4 +
 include/libvirt/libvirt-nodedev.h             |  19 +-
 src/access/viraccessperm.c                    |   2 +-
 src/access/viraccessperm.h                    |   6 +
 src/conf/node_device_conf.c                   |  15 +
 src/conf/node_device_conf.h                   |   8 +
 src/conf/virnodedeviceobj.c                   | 149 +++-
 src/conf/virnodedeviceobj.h                   |  25 +
 src/driver-nodedev.h                          |  14 +
 src/libvirt-nodedev.c                         | 115 +++
 src/libvirt_private.syms                      |   6 +
 src/libvirt_public.syms                       |   7 +
 src/node_device/node_device_driver.c          | 657 +++++++++++++++++-
 src/node_device/node_device_driver.h          |  41 ++
 src/node_device/node_device_udev.c            | 214 +++++-
 src/remote/remote_driver.c                    |   3 +
 src/remote/remote_protocol.x                  |  40 +-
 src/remote_protocol-structs                   |  16 +
 src/rpc/gendispatch.pl                        |   1 +
 ...19_36ea_4111_8f0a_8c9a70e21366-define.argv |   2 +
 ...19_36ea_4111_8f0a_8c9a70e21366-define.json |   1 +
 ...019_36ea_4111_8f0a_8c9a70e21366-start.argv |   3 +-
 ...39_495e_4243_ad9f_beb3f14c23d9-define.argv |   1 +
 ...39_495e_4243_ad9f_beb3f14c23d9-define.json |   1 +
 ...16_1ca8_49ac_b176_871d16c13076-define.argv |   1 +
 ...16_1ca8_49ac_b176_871d16c13076-define.json |   1 +
 tests/nodedevmdevctldata/mdevctl-create.argv  |   1 +
 .../mdevctl-list-defined.argv                 |   1 +
 .../mdevctl-list-multiple.json                |  59 ++
 .../mdevctl-list-multiple.out.xml             |  43 ++
 .../nodedevmdevctldata/mdevctl-undefine.argv  |   1 +
 tests/nodedevmdevctltest.c                    | 222 +++++-
 ...v_d069d019_36ea_4111_8f0a_8c9a70e21366.xml |   1 +
 tools/virsh-nodedev.c                         | 268 ++++++-
 35 files changed, 1853 insertions(+), 138 deletions(-)
 create mode 100644 tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9=
a70e21366-define.argv
 create mode 100644 tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9=
a70e21366-define.json
 create mode 100644 tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb=
3f14c23d9-define.argv
 create mode 100644 tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb=
3f14c23d9-define.json
 create mode 100644 tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871=
d16c13076-define.argv
 create mode 100644 tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871=
d16c13076-define.json
 create mode 100644 tests/nodedevmdevctldata/mdevctl-create.argv
 create mode 100644 tests/nodedevmdevctldata/mdevctl-list-defined.argv
 create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.json
 create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.out.xml
 create mode 100644 tests/nodedevmdevctldata/mdevctl-undefine.argv

--=20
2.26.2


Re: [libvirt PATCH v4 00/25] Add support for persistent mediated devices
Posted by Erik Skultety 3 years, 2 months ago
On Wed, Feb 03, 2021 at 11:38:44AM -0600, Jonathon Jongsma wrote:
> This patch series follows the previously-merged series which added support for
> transient mediated devices. This series expands mdev support to include
> persistent device definitions. Again, it relies on mdevctl as the backend.
> 
> It follows the common libvirt pattern of APIs by adding the following new APIs
> for node devices:
>     - virNodeDeviceDefineXML() - defines a persistent device
>     - virNodeDeviceUndefine() - undefines a persistent device
>     - virNodeDeviceCreate() - starts a previously-defined device

So, I tested the patches on Friday and have a few notes:
- all of the driver implementations mentioned ^above need to pass an error
  buffer to the respective mdevctl wrapper, because the generic error "Unable
to XZY mediated device" doesn't help at all. Kind of like with QEMU errors, we
need to extract it from an error buffer (you already have input and output, so
just add another one) --> this relates to patches 14,18,21

- trying to undefine an active mdev reports an error in libvirt
    --> this is both inconsistent with the rest of libvirt's subsystems and
        there's also no reason for it since mdevctl supports it
    --> we need to support transient devices as well

- trying to define the following XML hangs for 5s and then fails:
    <device>
      <parent>pci_0000_06_00_0</parent>
      <driver>
        <name>vfio_mdev</name>
      </driver>
      <capability type='mdev'>
        <type id='nvidia-11'/>
        <uuid>d41787d2-2e0e-4021-8ed2-b6f1ef305a9f</uuid>
      </capability>
    </device>

    -> I debugged this on Friday evening and for some reason driver->devs hash
       table of devices was NULL going through the nodeDeviceFindNewMediatedDevice
       call, but I haven't managed to figure out why it was NULL, listing
       devices worked just fine

- I also prepared several adjustments to how we define the mdevctl wrappers +
  some test adjustments which I wanted to share with you, but I haven't tested
  those thoroughly since I was debugging why that XML couldn't be defined, I'll
  share it when we eliminate the underlying problem
    -> if you're wondering why I didn't just put it into the review, well, I
       figured sharing the code was more descriptive than if I used words

- one thing that I didn't test with your patches are events



> 
> It also adds virsh commands mapping to these new APIs: nodedev-define,
> nodedev-undefine, and nodedev-start.
> 
> Since we rely on mdevctl for the definition of mediated devices, we need a way
> to stay up-to-date with devices that are defined by mdevctl (outside of
> libvirt).  The method for staying up-to-date is currently a little bit crude
> due to the fact that mdevctl does not emit any events when new devices are
> added or removed. As a workaround, we create a file monitor for the mdevctl
> config directory and re-query mdevctl when we detect changes within that
> directory. In the future, mdevctl may introduce a more elegant solution.
> 
> Changes in v4:
>  - rebase to git master
>  - switch to throwaway thread for querying mdevctl
>  - fixed a bug when removing devices because I was accidentally using
>    virHashForEach() instead of virHashForeachSafe()
>  - use DEFINED/UNDEFINED events instead of STARTED/STOPPED events
>  - changes related to merging information about mdev devices from both udev a=
> nd
>    mdevctl:
>    - Re-used the same function to copy extra data from mdevctl regardless of
>      whether we're processing a udev event or a mdevctl event (recommended by
>      Erik). This results in slightly more complex handling of the object
>      lifetimes (see patch 9), but it consolidates some code.
>    - nodeDeviceDefCopyFromMdevctl() previously only copied the data that was
>      unique to mdevctl and didn't exist in udev. It now copies additional data
>      (possibly overwriting some udev).  This solves a problem where a device =
> is
>      defined but not active (i.e.  we have not gotten any data from udev), and
>      then changed (e.g. somebody calls 'mdevctl modify' to change the mdev
>      type), but libvirt was not updating to the new definition.
>  - fix a bug where we were mistakenly emitting 'update' events for devices th=
> at
>    had not changed
>  - Added the ability to specify a uuid for an mdev via device XML.
>  - split some commits into multiple patches
>  - updated new API version info to 7.1.0
>  - Fixed a bug reported by Yan Fu which hangs the client when attempting to
>    destroy a nodedev that is in use by an active vm. See
>    https://www.redhat.com/archives/libvir-list/2021-February/msg00116.html for
>    solution suggested by Alex.
>  - numerous smaller fixes from review findings
> 
> changes in v3:
>  - streamlined tests -- removed some unnecessary duplication
>  - split out patch to factor out node device name generation function
>  - split nodeDeviceParseMdevctlChildDevice() into a separate function
>  - added follow-up patch to remove space-padded alignment in header
>  - refactored the mdevctl update handling significantly:
>    - no longer a separate persistent thread that gets signaled by a timer
>    - now piggybacks onto the existing udev thread and signals the thread in t=
> he
>      same way that the udev event does.
>    - Daniel suggested spawning a throw-away thread to handle mdevctl updates,
>      but that introduces the complexity of possibly serializing multiple
>      throw-away threads (e.g. if we get an 'created' event followed immediate=
> ly
>      by a 'deleted' event, two threads may be spawned and we'd need to ensure
>      they are properly ordered)
>  - added virNodeDeviceObjListForEach() and virNodeDeviceObjListRemoveLocked()
>    to simplify removing devices that are removed from mdevctl.
>  - coding style fixes
>  - NOTE: per Erik's request, I experimented with changing the way that mdevctl
>    commands were generated and tested (e.g. introducing something like
>    virMdevctlGetCommand(def, MDEVCTL_COMMAND_<SUBCOMMAND>, ...)), but it was
>    too invasive and awkward and didn't seem worthwhile

    I wrote the code for ^this and to be honest, it didn't seem that bad after
    all, like I said, I'll share the follow up patches when we know more about
    the issue I encountered.

Erik

Re: [libvirt PATCH v4 00/25] Add support for persistent mediated devices
Posted by Jonathon Jongsma 3 years, 2 months ago
On Mon, 22 Feb 2021 11:08:12 +0100
Erik Skultety <eskultet@redhat.com> wrote:

> On Wed, Feb 03, 2021 at 11:38:44AM -0600, Jonathon Jongsma wrote:
> > This patch series follows the previously-merged series which added
> > support for transient mediated devices. This series expands mdev
> > support to include persistent device definitions. Again, it relies
> > on mdevctl as the backend.
> > 
> > It follows the common libvirt pattern of APIs by adding the
> > following new APIs for node devices:
> >     - virNodeDeviceDefineXML() - defines a persistent device
> >     - virNodeDeviceUndefine() - undefines a persistent device
> >     - virNodeDeviceCreate() - starts a previously-defined device  
> 
> So, I tested the patches on Friday and have a few notes:
> - all of the driver implementations mentioned ^above need to pass an
> error buffer to the respective mdevctl wrapper, because the generic
> error "Unable to XZY mediated device" doesn't help at all. Kind of
> like with QEMU errors, we need to extract it from an error buffer
> (you already have input and output, so just add another one) --> this
> relates to patches 14,18,21
> 
> - trying to undefine an active mdev reports an error in libvirt
>     --> this is both inconsistent with the rest of libvirt's
> subsystems and there's also no reason for it since mdevctl supports it
>     --> we need to support transient devices as well  
> 
> - trying to define the following XML hangs for 5s and then fails:
>     <device>
>       <parent>pci_0000_06_00_0</parent>
>       <driver>
>         <name>vfio_mdev</name>
>       </driver>
>       <capability type='mdev'>
>         <type id='nvidia-11'/>
>         <uuid>d41787d2-2e0e-4021-8ed2-b6f1ef305a9f</uuid>
>       </capability>
>     </device>

I assume that you have a parent device that supports creating an mdev
of this type? In other words, this was expected to succeed, right?

> 
>     -> I debugged this on Friday evening and for some reason
> driver->devs hash table of devices was NULL going through the
> nodeDeviceFindNewMediatedDevice call, but I haven't managed to figure
> out why it was NULL, listing devices worked just fine

So, I often get a 5s delay when trying to define a new mdev. But it
always succeeds after the delay. 

The reason for the delay is that the device is generally not in our
device list the first time we call nodeDeviceFindNewMediatedDevice().
It's usually not in our device list because the device is only added
to the driver's device list after we get a notification from mdevctl
(via monitoring the mdevctl config directory) and then we re-query
mdevctl for the list of defined devices. Because mdevctl is not
blazingly fast[1], and because we do the querying in a separate thread,
the new device has usually not been discovered when we first call
nodeDeviceFindNewMediatedDevice(). So we sleep for 5s before trying
again.

For udev devices, this 5s delay is usually not hit because
nodeDeviceFindNewDevice() first calls 'udevadm settle' before
checking the device list. This waits just long enough to ensure that
pending udev events are handled. Unfortunately we don't have a similar
"wait just long enough" command for mdevctl, so we almost always hit
the 5s retry timeout. 

There are a couple of potential ways to avoid hitting this 5s timeout:
1. directly add a placeholder device to the driver's device list
immediately after calling 'mdevctl define' so that it is guaranteed to
be in the device list when we call nodeDeviceFindNewMediatedDevice().
We can then update that device definition with additional info
when mdevctl is eventually queried.
2. don't wait 5s every time a device is not found. Instead, start with
a small sleep timeout and increase it gradually until we hit the max
timeout. In other words, instead of check; sleep(5); check; sleep(5);
check; sleep(5); we could instead do something like: check; sleep(1);
check; sleep(2); check; sleep(4); check; sleep(8)...


[1]$ time mdevctl list --defined
49bf2346-6747-4ad6-be5a-adc2f0a10b5c 0000:00:02.0 i915-GVTg_V5_8 manual
4fcd3666-e58a-4c63-969c-bd616a158c0d 0000:00:02.0 i915-GVTg_V5_8 manual
5c152919-3a34-4338-b7c9-532f73fa5440 0000:00:02.0 i915-GVTg_V5_4 manual

real	0m0.782s
user	0m0.735s
sys	0m0.056s



> 
> - I also prepared several adjustments to how we define the mdevctl
> wrappers + some test adjustments which I wanted to share with you,
> but I haven't tested those thoroughly since I was debugging why that
> XML couldn't be defined, I'll share it when we eliminate the
> underlying problem -> if you're wondering why I didn't just put it
> into the review, well, I figured sharing the code was more
> descriptive than if I used words

Would you like me to wait for this before sending a new series?

Thanks,
Jonathon